Given the structure and a data set (examples of settings of the observable variables), want to find probabilities to put in the CPT to maximize the probability of the data.
Use a gradient-descent-type of approach: write probability of data as a function of the CPT entries, compute how data probability changes as CPTs are moved, move CPT entries in the best direction. (Adaptive Probabilistic Networks: APNs).
Derivative can be computed nicely based on the log likelihood (monotonically related). To keep CPT entries properly normalized, we can project them (this is actually correct).