Probability notes from set theory to more advanced

Set theory

Universal set (sample space) $\Omega$ : all possible outcomes.
Event: any subset $A \subseteq \Omega$ .
Empty event $\emptyset$ : never occurs.
Certain event $\Omega$ : always occurs.
Set operations:
- Union: $A \cup B$ — “ $A$ or $B$ happens.”
- Intersection: $A \cap B$ — “ $A$ and $B$ happen.”
- Complement: $A^c = \Omega \setminus A$ — “ $A$ does not happen.”
- Difference: $A \setminus B = A \cap B^c$ .
Disjoint (mutually exclusive): $A \cap B = \emptyset$ .

These operations satisfy Boolean algebra rules:

A \cup (B \cap C) = (A \cup B) \cap (A \cup C), \quad (A^c)^c = A.

Sigma-algebra

collection of measurable events To define probabilities consistently, we use a σ-algebra $\mathcal{F}$ over $\Omega$ :

$\Omega \in \mathcal{F}$ .
If $A \in \mathcal{F}$ , then $A^c \in \mathcal{F}$ .
If $A_1, A_2, \ldots \in \mathcal{F}$ , then $\bigcup_i A_i \in \mathcal{F}$ .

This ensures we can take complements and countable unions safely.

Basic probability spaces

Sample space: $\Omega$ (all outcomes). An event $A\subseteq\Omega$ .
Probability measure $\Pr[\cdot]$ satisfies $0\le\Pr[A]\le1$ , $\Pr[\Omega]=1$ , countable additivity.
Example: fair coin $\Omega=\{H,T\}$ , $\Pr[H]=\Pr[T]=1/2$ .

Conditional probability & product rule

Conditional probability of $A$ given $B$ (with $\Pr[B]>0$ ): $\Pr[A\mid B]=\frac{\Pr[A\cap B]}{\Pr[B]}.$
Product rule (equivalent): $\Pr[A\cap B]=\Pr[A\mid B]\Pr[B]=\Pr[B\mid A]\Pr[A].$

Independence

$A$ and $B$ are independent iff $\Pr[A\cap B]=\Pr[A]\Pr[B],$ equivalently $\Pr[A\mid B]=\Pr[A]$ (when $\Pr[B]>0$ ).
For random variables $X,Y$ : independence means events $\{X\in I\}$ and $\{Y\in J\}$ independent for all measurable $I,J$ .

Law of total probability

If $B_1,\dots,B_n$ partition $\Omega$ (mutually disjoint, $\bigcup_i B_i=\Omega$ ) then for any event $A$ : $\Pr[A]=\sum_{i=1}^n \Pr[A\mid B_i]\Pr[B_i].$
Useful to expand $\Pr[A]$ by conditioning on cases.

Expectation and indicator variables

Indicator of event $A$ : $1_A(\omega)=1$ if $\omega\in A$ , else $0$ .
Expectation of indicator: $\mathbb{E}[1_A]=\Pr[A]$ .
Linearity: $\mathbb{E}[\sum_i X_i]=\sum_i\mathbb{E}[X_i]$ (no independence required).
For a discrete r.v. $X$ : $\mathbb{E}[X]=\sum_x x\Pr[X=x]$ .

Useful identities

$\Pr[A]=\mathbb{E}[1_A]$ .
$\Pr[A\cup B]=\Pr[A]+\Pr[B]-\Pr[A\cap B]$ .
If $B$ is a Bernoulli( $p$ ) bit independent of other randomness: $\Pr[A]=\Pr[A\mid B=0]\Pr[B=0]+\Pr[A\mid B=1]\Pr[B=1]$ (special case of total law).

Short worked example from our paper (the paper’s setup)

Arthur chooses $b\in\{0,1\}$ uniformly. If $b=0$ he samples $x_0\sim u_k$ ; if $b=1$ he samples $x_1\sim p$ . He sends $x_b$ to Merlin, who outputs $\hat b=1_{S_p}(x_b)$ where $S_p=\{x:p(x)\ge 1/k\}$ . Arthur accepts iff $\hat b=b$ .

Compute $\Pr[\hat b=b]$ :

Condition on $b$ (law of total probability): $\Pr[\hat b=b]=\Pr[\hat b=b\mid b=0]\Pr[b=0]+\Pr[\hat b=b\mid b=1]\Pr[b=1].$
Evaluate each term:
- If $b=0$ : Merlin is correct iff $x_0\notin S_p$ , so $\Pr[\hat b=b\mid b=0]=u_k(S_p^c)=1-u_k(S_p)$ .
- If $b=1$ : Merlin is correct iff $x_1\in S_p$ , so $\Pr[\hat b=b\mid b=1]=p(S_p)$ .
Since $\Pr[b=0]=\Pr[b=1]=1/2$ , $\Pr[\hat b=b]=\tfrac12\big(1-u_k(S_p)+p(S_p)\big)=\tfrac12\big(1 + (p(S_p)-u_k(S_p))\big).$
Using $p(S_p)-u_k(S_p)=d_{TV}(p,u_k)$ (Scheffé set), if $d_{TV}(p,u_k)\ge\varepsilon_2$ then $\Pr[\hat b=b]\ge \tfrac12(1+\varepsilon_2).$

Absolutely. This is a great way to solidify your understanding. The proof in Claim 4.4 is a classic example of an averaging argument, which is fundamental in complexity theory and cryptography.

Let’s walk through it.

1. Summary of the Soundness Proof (Claim 4.4)

The high-level goal of the soundness proof is to show that if the distributions $p$ and $u_k$ are close (i.e., $d_{TV}(p, u_k) \le \epsilon_1$ ), then no prover $M^*$ , no matter how cleverly designed, can succeed with a probability much better than random guessing.

The proof proceeds in two logical steps:

First, we prove this bound for all deterministic provers.
Second, we show that this implies the bound for all randomized provers.

1.1 Bounding any Deterministic Prover ( $M^*_S$ )

A deterministic prover $M^*_d$ is just a fixed function $f: \mathcal{X} \to \{0, 1\}$ . As we discussed, this is mathematically equivalent to the prover picking a fixed subset $S \subseteq \mathcal{X}$ and using the strategy: “Guess 1 if $x_b \in S$ , and guess 0 otherwise.” We call this strategy $M^*_S$ .

The proof follows this path:

Write the Success Probability: For any such $M^*_S$ , we calculated its exact success probability: $P(\text{success} \mid M^*_S) = \frac{1}{2} \cdot P(x_0 \notin S) + \frac{1}{2} \cdot P(x_1 \in S)$ $P(\text{success} \mid M^*_S) = \frac{1}{2} (1 - u_k(S) + p(S))$ [cite_start] $P(\text{success} \mid M^*_S) = \frac{1}{2} (1 + (p(S) - u_k(S)))$ [cite: 219]
[cite_start]Apply the Soundness Promise: We are promised that the distributions are close[cite: 216]. This promise is $d_{TV}(p, u_k) \le \epsilon_1$ .
Use the Definition of TV Distance: The total variation distance is defined as the maximum possible “advantage” over all possible sets: $d_{TV}(p, u_k) = \sup_{S' \subseteq \mathcal{X}} (p(S') - u_k(S'))$

This means for the specific set $S$ that our prover $M^*_S$ chose, its advantage $(p(S) - u_k(S))$ must be less than or equal to the maximum possible advantage.

Therefore, $(p(S) - u_k(S)) \le d_{TV}(p, u_k)$ .
Chain the Inequalities: We can now chain steps 2 and 3 together to get our key bound: $(p(S) - u_k(S)) \le d_{TV}(p, u_k) \le \epsilon_1$
Conclude for Deterministic Provers: We substitute this bound back into our success probability equation from step 1: $P(\text{success} \mid M^*_S) = \frac{1}{2} (1 + (p(S) - u_k(S)))$ $P(\text{success} \mid M^*_S) \le \frac{1}{2} (1 + \epsilon_1)$

This proves that no deterministic strategy can do better than $\frac{1}{2}(1 + \epsilon_1)$ .

1.2 Extending to any Randomized Prover ( $M^*_r$ )

[cite_start]This is the “convexity” or “averaging” part of the argument[cite: 219].

Define a Randomized Prover: A randomized prover $M^*_r$ is simply an algorithm that randomly chooses which deterministic strategy $M^*_S$ to run. Formally, it’s a probability distribution $\mathcal{D}$ over all possible sets $S$ .
Write the Success Probability: The overall success of $M^*_r$ is the average (or expected) success of all the deterministic strategies it might pick, weighted by the probability $\mathcal{D}$ of picking them: $P(\text{success} \mid M^*_r) = E_{M^*_S \sim \mathcal{D}} \left[ P(\text{success} \mid M^*_S) \right]$
Apply the Deterministic Bound: We just proved in part 1.1 that every single term inside that expectation is bounded: $P(\text{success} \mid M^*_S) \le \frac{1}{2}(1 + \epsilon_1)$ for all $S$ .
Conclude for Randomized Provers: If you take an average of many values, and every one of those values is less than or equal to some number $c$ , the average itself must also be less than or equal to $c$ .

Therefore, $E_{M^*_S \sim \mathcal{D}} \left[ \dots \right] \le \frac{1}{2}(1 + \epsilon_1)$ .

This completes the proof. It shows that no strategy, deterministic or randomized, can break the soundness bound.

2. Glossary of Our Formal Concepts

Here are the “other bits” we defined, which form the conceptual toolkit for understanding the proof.

Honest Prover ( $M$ ) vs. Adversarial Prover ( $M^*$ ):
- [cite_start] $M$ (no star) is the honest prover who follows the protocol as written (e.g., in Algorithm 1, $M$ uses the Scheffé set for its strategy)[cite: 210, 211].
- $M^*$ (with star) is an adversarial or arbitrary prover. This represents any possible algorithm (even a malicious, cheating one) that the prover might use. The soundness proof must hold “for all $M^*$ .”
Deterministic Strategy ( $M^*_d$ ):
- A strategy that involves no randomness. Given the same input sample $x_b$ , it will always produce the same guess $\hat{b}$ .
- It is just a fixed function $f: \mathcal{X} \to \{0, 1\}$ .
Randomized Strategy ( $M^*_r$ ):
- A strategy that uses internal randomness. It can be seen as a probability distribution $\mathcal{D}$ over all possible deterministic strategies.
- It first “flips its coins” (picks a random seed $r$ ) to select a deterministic strategy $f_r$ , and then executes $f_r$ .
Set-Based Strategy ( $M^*_S$ ):
- This is just our more descriptive notation for $M^*_d$ .
- It makes explicit that every deterministic strategy $f$ is uniquely defined by a single subset $S \subseteq \mathcal{X}$ , where $S = \{x \mid f(x) = 1\}$ .
- Saying “Merlin chooses $M^*_d$ ” is identical to saying “Merlin chooses $S \subseteq \mathcal{X}$ .”