Law of Total Probability (Discrete, From First Principles)

Short walkthrough of the law of total probability in the discrete setting, using notation common in TCS: a finite domain $\mathcal{X}$ , distributions $p(x)$ on $\mathcal{X}$ , and events as subsets of $\mathcal{X}$ .

1. Discrete distributions on a finite domain

Let $\mathcal{X}$ be a finite domain of size $k$ , $\mathcal{X} = [k] = \{1,2,\dots,k\}$ .

A discrete distribution on $\mathcal{X}$ is a function

p : \mathcal{X} \to [0,1]

such that

\sum_{x \in \mathcal{X}} p(x) = 1.

For an event $A \subseteq \mathcal{X}$ , define

p(A) = \sum_{x \in A} p(x).

Two basic rules:

Additivity for disjoint events
If $A_1,\dots,A_m \subseteq \mathcal{X}$ are pairwise disjoint, then
$p\Big(\bigcup_{i=1}^m A_i\Big) = \sum_{i=1}^m p(A_i).$
Conditional probability (with $p(B) > 0$ ):
$p(A \mid B) = \frac{p(A \cap B)}{p(B)}.$

Rewriting:

p(A \cap B) = p(A \mid B)\,p(B).

2. Partitions and the law of total probability

A family of events $B_1, \dots, B_m \subseteq \mathcal{X}$ is a partition of $\mathcal{X}$ if

$B_i \cap B_j = \varnothing$ for all $i \neq j$ (disjoint),
$\bigcup_{i=1}^m B_i = \mathcal{X}$ (they cover the domain).

Exactly one $B_i$ holds for each $x \in \mathcal{X}$ .

Take any event $A \subseteq \mathcal{X}$ . Because the $B_i$ cover $\mathcal{X}$ ,

A = \bigcup_{i=1}^m (A \cap B_i),

and the sets $(A \cap B_i)$ are disjoint.

By additivity,

p(A) = \sum_{i=1}^m p(A \cap B_i).

Using $p(A \cap B_i) = p(A \mid B_i)\,p(B_i)$ , we get

p(A) = \sum_{i=1}^m p(A \mid B_i)\,p(B_i).

This is the law of total probability in this notation:

\boxed{p(A) = \sum_{i=1}^m p(A \mid B_i)\,p(B_i).}

3. Coin example in this notation

We define the randomness of the process as a distribution $p$ over a finite domain $\mathcal{X}$ .

Setup

There are two coins:

Coin $F$ (fair): $p(\text{H} \mid F) = 0.5$
Coin $B$ (biased): $p(\text{H} \mid B) = 0.9$

Step 1: pick a coin.

$p(F) = 0.7$
$p(B) = 0.3$

Step 2: flip the chosen coin once.

Define the domain

\mathcal{X} = \{(F,H), (F,T), (B,H), (B,T)\}.

The distribution $p$ on $\mathcal{X}$ is:

$p(F,H) = p(F)\,p(H \mid F) = 0.7 \cdot 0.5 = 0.35$
$p(F,T) = 0.7 \cdot 0.5 = 0.35$
$p(B,H) = 0.3 \cdot 0.9 = 0.27$
$p(B,T) = 0.3 \cdot 0.1 = 0.03$

Check:

\sum_{x \in \mathcal{X}} p(x) = 0.35 + 0.35 + 0.27 + 0.03 = 1.

We care about the event

A = \{(F,H), (B,H)\} \quad \text{(flip is Heads)}.

Then

p(A) = p(F,H) + p(B,H) = 0.35 + 0.27 = 0.62.

Now let’s express the same calculation using the law of total probability.

Using a partition

Define events in $\mathcal{X}$ :

$B_1$ : “coin is $F$ ”
$B_1 = \{(F,H), (F,T)\}$
$B_2$ : “coin is $B$ ”
$B_2 = \{(B,H), (B,T)\}$

$\{B_1,B_2\}$ is a partition of $\mathcal{X}$ .

We have:

$p(B_1) = p(F) = 0.7$
$p(B_2) = p(B) = 0.3$
$p(A \mid B_1) = p(\text{H} \mid F) = 0.5$
$p(A \mid B_2) = p(\text{H} \mid B) = 0.9$

Apply the law:

p(A) = p(A \mid B_1)p(B_1) + p(A \mid B_2)p(B_2) = (0.5)(0.7) + (0.9)(0.3) = 0.35 + 0.27 = 0.62.

Same result, but now written as a sum of conditional probabilities over a partition.

4. Randomized algorithms: how to use this

View the randomness of an algorithm as a distribution $p$ over a finite domain $\mathcal{X}$ (all possible random choices, random bits, etc.).

Let:

$B_1, \dots, B_m \subseteq \mathcal{X}$ be a partition encoding cases:
e.g. “pivot is good”, “pivot is bad”, or “hash function is $i$ ”.
$A \subseteq \mathcal{X}$ be the event that the algorithm succeeds (or runs in time $\le T$ , etc.).

Then

p(A) = \sum_{i=1}^m p(A \mid B_i)\,p(B_i).

Decompose larger probability into smaller conditional pieces:

Partition into cases $B_1,\dots,B_m$ :
e.g. “Case 1: pivot is good”, “Case 2: pivot is bad”.
For each case $i$ , compute or bound $p(A \mid B_i)$ .
Compute or bound $p(B_i)$ .
Plug into $p(A) = \sum_{i=1}^m p(A \mid B_i)\,p(B_i).$

This is exactly the law of total probability applied to the distribution $p$ induced by the algorithm’s randomness.

5. Summary

Fix a finite domain $\mathcal{X}$ and a distribution $p$ on it.
Choose a partition $B_1,\dots,B_m$ of $\mathcal{X}$ .
For any event $A \subseteq \mathcal{X}$ , write $A = \bigcup_{i=1}^m (A \cap B_i)$ with $(A \cap B_i)$ disjoint.
Use additivity: $p(A) = \sum_{i=1}^m p(A \cap B_i).$
Use conditional probability: $p(A \cap B_i) = p(A \mid B_i)\,p(B_i).$
Combine: $p(A) = \sum_{i=1}^m p(A \mid B_i)\,p(B_i).$