Midterm Exam Answers
Test Form C.
· Clean data – somebody with knowledge of the data is very valuable in cleaning the data, and actually cleaning the data usually requires human action
· Data preparation – it may be valuable for the learning method for a person to develop new attributes based on existing attributes. Human intelligence is needed to determine what would be valuable
· Determine what experiments to do – people determine what algorithms are appropriate for the current data, what to try
· Evaluate results – people must determine if the accuracy found in tests is good or not
· Evaluate results – people must determine if the info learned makes sense
· Use results – people must determine what to do with what is learned
· A total of ____9____ predictions were incorrect.
· Of the ____12____ times Cancer was predicted, this prediction was correct ___5_____ times.
· Of the ___6___ times Not Cancer was predicted, this prediction was correct ___4_____ times.
· Of the ___7_____ times that Cancer occurred, the prediction was correct -____5____ times
· and incorrect ___2____ times.
· Of the ___11_____ times that Not Cancer occurred, the prediction was correct -____4____
· times and incorrect ___7____ times.
Probabilities with Laplace Estimator
|
Area |
Purchase = Yes |
Purchase = No |
|
Mt Airy |
4/10 |
4/13 |
|
Germantown |
5/10 |
3/13 |
|
Manyunk |
1/10 |
6/13 |
|
Home |
Purchase = Yes |
Purchase = No |
|
Own |
5/9 |
5/12 |
|
Rent |
4/9 |
7/12 |
|
Age |
Purchase = Yes |
Purchase = No |
|
Young |
6/11 |
2/14 |
|
Established |
3/11 |
5/14 |
|
Middle Aged |
1/11 |
4/14 |
|
Old |
1/11 |
3/14 |
|
To Predict |
Yes |
No |
|
Purchase |
8/19 |
11/19 |
Test Instance: Mt Airy, Rent, Established
Prob(Yes | Evidence ) = 4/10 * 4/9 * 3/11 * 8/19 = .0204
Prob(No | Evidence ) = 4/13 * 7/12 * 5/14 * 11/19 = .037
Predict NO, since it’s value is higher.
Sorted By Rating, showing value for Buy. Tally Yeses and Nos until have at least 3 of one – then continue until Buy answer switches
|
Rating |
Buy |
Num Yes |
Num No |
|
21 |
No |
0 |
1 |
|
25 |
Yes |
1 |
1 |
|
27 |
No |
1 |
2 |
|
28 |
Yes |
2 |
2 |
|
29 |
No |
2 |
3 |
|
30 |
No |
2 |
4 |
|
30 |
No |
2 |
5 |
|
33 |
No |
2 |
6 |
|
35 |
No |
2 |
7 |
|
38 |
No |
2 |
8 |
|
40 |
Yes |
1 |
0 |
|
41 |
Yes |
2 |
0 |
|
41 |
No |
2 |
1 |
|
45 |
No |
2 |
2 |
|
48 |
No |
2 |
3 |
|
49 |
No |
2 |
4 |
|
51 |
No |
2 |
5 |
|
52 |
Yes |
1 |
0 |
|
53 |
Yes |
2 |
0 |
Dividing lines are halfway in between – hence 39 and 51.5
Technically, the first two categories could be collapsed into one category since they have the same answer (No)