...

Lab4

by user

on
Category:

travel

1

views

Report

Comments

Description

Transcript

Lab4
Practice of Bayesian Networks
Data Mining Lab 4
Bayesian Networks.
Two tasks
• Infer the structure of the network from the data (in
practice, the structure of the network is identified by data
experts, not by machine)
• Fill in conditional probabilities tables
The elementary inference types
in Bayes nets
Diagnostic
Causal
A
A
Intercausal
A
B
B
B
Increased probability of A
makes B more likely.
Increased probability of B
makes A more likely.
C
A can cause B
A and B can each cause C. B
explains C and so is evidence
against A
B is evidence for A
Bayesian Net for Weather Data
Diagnostic
A
B
Increased probability of B
makes A more likely.
B is evidence for A
Naïve Bayes
Assuming the attributes are independent of each other, we have a Naïve
Bayesian Network:
P(play=yes)=9/14,
with Laplace correction:
P(play=yes)=9+1/14+2=0.625
In general, to make Laplace
correction, we add an initial count
(1) to the total of all instances with
a given attribute value, and we
add the number of distinct values
of the same attribute to the total
number of instances in the group.
Naïve Bayes
And to fill the Conditional Probability Tables we compute conditional probabilities for
each node in form: Pr (attribute=value | parents values) for each combinations of
attributes values in parent nodes
P(outlook=sunny|play=yes)
=(2+1)/(9+3)=3/12
P(outlook=rainy|play=yes)
=(3+1)/(9+3)=4/12
Sum is1
P(outlook=overcast|play=ye
s) =(4+1)/(9+3)=5/12
P(outlook=sunny|play=yes)
=(2+1)/(9+3)=3/12
P(outlook=sunny|play=no)
=(3+1)/(5+3)=4/8
Sum is
NOT 1
WEKA Exercise 1. Bayesian network for
weather data with default parameters
•
Preprocess tab
–
–
•
Open file
weather.nominal.arff
Perform filters (if needed):
discretize or replace
missing values
Classify tab
–
–
–
Classifier->choose>classifiers->bayes>BayesNet
Click on row with selected
classifier, change Laplace
correction (initial count) to
1 (instead of 0.5) in the
option row for the
estimator
Cross-validation change
to 3 folds (since we have
only 14 instances, with 10
folds cross validation we
will have test groups of
size less than 2, which
makes the classifier less
reliable). Press Start
WEKA Exercise 2. Examining the
output
•
•
•
•
•
•
In the history box, right-click and
choose visualize graph. Check
that probabilities in CPT correspond
to what we calculated before
(clicking on the graph node brings
table of conditional probabilities)
Naïve Bayes? Study parameters of
the program. Click on choose line
again.
Save this model in file weather.xml
for later use
Click on searchAlgorithm row.
Default parameters are:
initAsNaiveBayes=true
maxNrOfParents=1
Change maxNrOfParents=2. Run.
Visualize graph
Change to
initAsNaiveBayes=false. Run.
Visualize graph. Change back to
true.
Possible values
of the parent
node attribute
Possible values
of the outlook
attribute
Conditional
probabilities
Click on the
node to see
probability
tables
Conditional Probabilities Tables
in WEKA
• After the structure is learned, the CPT for each node are computed.
• Simple estimator computes the relative frequencies of the
associated combinations of the attribute values in the training data
(just like we do in our excercises).
How it was computed
Total number of
different values
for outlook
P(outlook=sunny|play=yes)
=(2+1)/(9+3)=3/12=1/4=0.25
Initial count
for attribute
value=sunny
Number of
instances with
play=yes
Number of instances
with outlook=sunny
and play=yes
More complex Bayesian Network for
weather data (with maxNrOfParents=2
Possible values of the
attribute temperature
All combinations
of values for 2
parent nodes:
play and outlook
Conditional probabilities
How it was computed
P(temperature=hot|play=yes,o
utlook=sunny)
=(0+1)/(2+3)=1/5=0.2
Number of
instances with
play=yes and
outlook=sunny
Number of instances
with
temperature=hot,
outlook=sunny and
play=yes
How WEKA infers a structure of
the network
• The nodes correspond to the attributes
• Learning the structure is to find edges
– Searching through the possible edges sets
– For each set estimate the conditional probability tables from the data
– Estimate quality of the network as the probability of obtaining the set of
data given this network
WEKA Search Algorithms.
Example
• By default: K2.
• Starts with a given ordering of attributes.
• Adds one node in order and considers adding edges from each
previously added node to a new node.
• Then it adds the edge which maximizes the network score.
• The number of parents is restricted to a predefined maximum.
• The Markov blanket of a node includes all its parents, children and
children parents. It is proven, that a given node is conditionally
dependent only on nodes in its Markov blanket. So the edge is
added from the class node to the node which is not in its Markov
blanket. Otherwise the value of this attribute would be irrelevant for
the class.
WEKA Exercise 3. Improving the
network supplied as a file
• Bring in the window of classifier’s options
– Type in the BIFF file box: weather.xml
• Run
• In the output window find the comparison between two networks:
– the supplied and inferred by machine learning.
WEKA Exercise 4. Structure
supplied by the user
•
•
•
•
•
•
•
Bring in the window of classifier’s options
In searchAlgorithm row press Choose button
Choose search->fixed->FromFile. OK
Press searchAlgorithm row to define parameters
Type in the BIFF file box: weather.xml (Do NOT use the button Open…)
Run
Check that WEKA has produced the Naïve Bayes, as it was supplied in your file
Tutorial Exercise 1. Build Bayesian Network
• Suppose you are working for a financial institution and you are asked
to build a fraud detection system. You plan to use the following
information:
– When the card holder is traveling abroad, fraudulent transaction are more
likely since tourists are prime targets for thieves. More precisely, 1% of
transactions are fraudulent when the card holder is traveling, whereas only
0.2% of the transactions are fraudulent when he is not traveling. On
average, 5% of all transactions happen while card holder is traveling. If a
transaction is fraudulent, then the likelihood of a foreign purchase
increases, unless the card holder happens to be traveling. More precisely,
when the card holder is not traveling, 10% of the fraudulent transactions
are foreign purchases, whereas only 1% of the legitimate transactions are
foreign purchases. On the other hand, when the card holder is traveling,
90% of the transactions are foreign purchases regardless of the legitimacy
of the transactions.
Network Structure
Intercausal inference
Travel and fraud can each cause
foreign purchase. Travel
explains foreign purchase and
so is evidence against fraud
Travel
Foreign purchase
Diagnostic inference
Causal inference
Fraud
Increased probability of travel
makes fraud more likely. Travel
can cause fraud.
Increased probability of foreign
purchase makes fraud more
likely. Foreign purchase is
evidence for fraud.
Conditional probabilities
Travel
Fraud
True
False
True
False
True
True
0.90
0.10
0.05
0.95
False
True
0.10
0.90
True
False
0.90
0.10
False
False
0.01
0.99
Travel
Foreign purchase
Fraud
Travel
True
False
True
0.01
0.99
False
0.002
0.998
Tutorial Exercise 2.
Classify with hidden variables
• System has detected the foreign purchase. What is the probability
of a fraud if we don’t know whether the card holder is traveling or
not?
• It is equivalent to: classify with hidden variables
[travel=?,foreign_purchase=true, fraud=?]
P(fraud=true|foreign-purchase=true)=
α*Σ travel [P(fraud=true|travel) * P(foreign-purchase = true |travel, fraud=true)
* P(travel)]
= α* [P (fraud=true |travel=true)
* P(foreign-purchase = true |travel=true, fraud=true) *P(travel=true) +
P(fraud=true |travel=false)
* P(foreign-purchase = true |travel = false, fraud=true) * P(travel=false) ]
Tutorial Exercise 2.
True
False
0.05
0.95
Travel
Fraud
True
False
True
True
0.90
0.10
False
True
0.10
0.90
True
False
0.90
0.10
False
False
0.01
0.99
P(fraud=true|foreign-purchase=true)=
Travel
Foreign purchase
= α * [P(fraud=true | travel=true) * P(foreign-purchase = true |
travel=true, fraud=true) * P(travel=true) + P(fraud=true
|travel=false) * P(foreign-purchase = true |travel = false,
fraud=true)*P(travel=false) ]=
α * [0.01 *0.90 *0.05 + 0.002* 0.10* 0.95] = α * [0.00045 +
0.00019]=0.00064 α
Fraud
Travel
True
False
True
0.01
0.99
False
0.002
0.998
P(fraud=false |foreign-purchase=true)=
= α * [P(fraud=false | travel=true) * P(foreign-purchase = true |
travel=true, fraud=false) * P(travel=true) + P(fraud=false
|travel=false) * P(foreign-purchase = true |travel = false,
fraud=false)*P(travel=false) ]= α * [0.99 *0.90 *0.05 + 0.998* 0.01*
0.95] = α * [0.04455+0.009481]=0.054031 α
α=1/(0.00064+ 0.054031)
P(fraud=true|foreign-purchase=true)=1.1%
Tutorial Exercise 3.
Classify without the hidden variables
True
False
Travel
Fraud
True
False
0.05
0.95
True
True
0.90
0.10
False
True
0.10
0.90
True
False
0.90
0.10
False
False
0.01
0.99
Travel
Foreign purchase
Suppose that probability more than 1% causes an agent to call the
client to confirm the transaction. An agent calls but the card holder
is not at home. Her spouse confirms that she is out of town on a
business trip. How does the probability of the fraud changes based
on this new piece of information?
Fraud
P(fraud=true|foreign-purchase=true, travel=true)= α * 0.00045
P(fraud=false |foreign-purchase=true, travel=true) =
= α * 0.04455
Travel
True
False
True
0.01
0.99
α=1/(0.00045+ 0.04455)
False
0.002
0.998
P(fraud=true|foreign-purchase=true, travel=true)=1.0%
Tutorial Exercise 4
(Try it)
•
•
We need to add more information to the network of exercise 1.
Purchases made over the internet are more likely to be fraudulent. This is
especially true for card holders which do not own a computer. Currently,
60% of the population owns a computer and for those cardholders 1% of
their legitimate transactions are done over the internet, but this number
increases to 1.1% for fraudulent transactions. Unfortunately, the credit
company doesn’t know whether a cardholder owns a computer, however it
can usually guess by verifying whether any of the recent transactions
involve the purchase of computer related accessories. In any given week,
10% of those who own a computer purchase with their credit card at least
one computer related item as opposed to just 0.01% of those who don’t own
any computer. Incorporate this information into your system.
Expanded Bayes net for fraud
detection
Own Computer
Computer Related
Purchase
Travel
Internet
Purchase
Fraud
Foreign purchase
Tutorial Exercise 5
(Try it)
• Suppose the thief just stole a credit card. He knows how the fraud
detection system was set up, but he still wants to make an important
purchase over the internet. What can he do prior to his internet
purchase to reduce the risk that the transaction will be rejected as a
possible fraud?
Fly UP