\[ P(A,B,C) = \sum_{D}P(A,B,C,D)=P(A,B,C,d^0)+P(A,B,C,d^1) \]
Maybe we can use a graphical representation that helps us simplify that probability?
Several options: Markov Hidden Models, Bayesian networks…
Bayesian networks take advantage on the conditional independence of the variables…
\[ P(D,I,G,S,L) = P(D)P(I)P(G|I,D)P(S|I)P(L|G) \]
\[ P(L|s^0) = \frac{P(L,s^0)}{P(s^0)} = \sum_{D,I,G}\frac{P(D,I,G,L,s^0)}{P(s^0)} \] Applying the BN decomposition:
\[ P(L|s^0)= \frac{1}{P(s^0)}\sum_{D,I,G}P(D)P(I)P(G|I,D)P(s^0|I)P(L|G) \]
Calculating \(P(s^0)\):
\[ P(s^0) = \sum_{I}P(s^0,I) = \sum_{I}P(s^0|I)P(I) = P(s^0|i^0)P(i^0) + P(s^0|i^1)P(i^1) = 0.95\times0.7 + 0.2\times0.3 \]
Now, back to \(P(L|s^0)\), let’s eliminate variable \(D\):
\[ P(L|s^0)=\frac{1}{P(s^0)}\sum_{I,G}P(I)P(s^0|I)P(L|G)\sum_{D}P(D)P(G|I,D) \]
This gives us a table \(\tau_1(G,I)=\sum_{D}P(D)P(G|I,D)\)
Now, we eliminate \(I\):
\[ P(L|s^0) = \frac{1}{P(s^0)}\sum_{G}P(L|G)\sum_{I}P(s^0|I)P(I)\tau_1(G,I) \]
which gives us a table \(\tau_2(G)=\sum_{I}P(s^0|I)P(I)\tau_1(G,I)\)
Finally, we eliminate \(G\):
\[ P(L|s^0) = \frac{1}{P(s^0)}\sum_{G}P(L|G)\tau_2(G) \]
We thus get two numbers, \(P(L=l^0|s^0)\) and \(P(L=L=l^1|s^0)\), which would indicate the probability of each outcome.
How to use data to fit a Bayesian network? Two tasks:
Sometimes we mix a specialist knowledge with data, using a Bayesian learning approach
For example, what is the most probable graph \(G\) given we have dataset \(D\)? Bayes’ theorem:
\[ P(G|D)=\frac{P(D|G)P(G)}{P(D)} \propto P(D|G)P(G) \] * \(P(D|G\) can be calculated by exploring different network structures (likelihood of the data given a structure), and \(P(G)\) is called the a priori and is a specialist knowledge (for example, certain structures may be attributed a greater probability given what we expect).