Learning Problem

Given the structure and a data set (examples of settings of the observable variables), want to find probabilities to put in the CPT to maximize the probability of the data.

Use a gradient-descent-type of approach: write probability of data as a function of the CPT entries, compute how data probability changes as CPTs are moved, move CPT entries in the best direction. (Adaptive Probabilistic Networks: APNs).

Derivative can be computed nicely based on the log likelihood (monotonically related). To keep CPT entries properly normalized, we can project them (this is actually correct).

Next: Comparison to Neural Nets Up: LEARNING Previous: Known Structure, Hidden Variables