Skip to Content
TechnicalProbability notes from set theory to more advanced

Probability notes from set theory to more advanced

Set theory

  • Universal set (sample space) Ω\Omega: all possible outcomes.
  • Event: any subset AΩA \subseteq \Omega.
  • Empty event \emptyset: never occurs.
    Certain event Ω\Omega: always occurs.
  • Set operations:
    • Union: ABA \cup B — “AA or BB happens.”
    • Intersection: ABA \cap B — “AA and BB happen.”
    • Complement: Ac=ΩAA^c = \Omega \setminus A — “AA does not happen.”
    • Difference: AB=ABcA \setminus B = A \cap B^c.
  • Disjoint (mutually exclusive): AB=A \cap B = \emptyset.

These operations satisfy Boolean algebra rules:

A(BC)=(AB)(AC),(Ac)c=A.A \cup (B \cap C) = (A \cup B) \cap (A \cup C), \quad (A^c)^c = A.

Sigma-algebra

collection of measurable events To define probabilities consistently, we use a σ-algebra F\mathcal{F} over Ω\Omega:

  1. ΩF\Omega \in \mathcal{F}.
  2. If AFA \in \mathcal{F}, then AcFA^c \in \mathcal{F}.
  3. If A1,A2,FA_1, A_2, \ldots \in \mathcal{F}, then iAiF\bigcup_i A_i \in \mathcal{F}.

This ensures we can take complements and countable unions safely.

Basic probability spaces

  • Sample space: Ω\Omega (all outcomes). An event AΩA\subseteq\Omega.
  • Probability measure Pr[]\Pr[\cdot] satisfies 0Pr[A]10\le\Pr[A]\le1, Pr[Ω]=1\Pr[\Omega]=1, countable additivity.
  • Example: fair coin Ω={H,T}\Omega=\{H,T\}, Pr[H]=Pr[T]=1/2\Pr[H]=\Pr[T]=1/2.

Conditional probability & product rule

  • Conditional probability of AA given BB (with Pr[B]>0\Pr[B]>0): Pr[AB]=Pr[AB]Pr[B].\Pr[A\mid B]=\frac{\Pr[A\cap B]}{\Pr[B]}.
  • Product rule (equivalent): Pr[AB]=Pr[AB]Pr[B]=Pr[BA]Pr[A].\Pr[A\cap B]=\Pr[A\mid B]\Pr[B]=\Pr[B\mid A]\Pr[A].

Independence

  • AA and BB are independent iff Pr[AB]=Pr[A]Pr[B],\Pr[A\cap B]=\Pr[A]\Pr[B], equivalently Pr[AB]=Pr[A]\Pr[A\mid B]=\Pr[A] (when Pr[B]>0\Pr[B]>0).
  • For random variables X,YX,Y: independence means events {XI}\{X\in I\} and {YJ}\{Y\in J\} independent for all measurable I,JI,J.

Law of total probability

  • If B1,,BnB_1,\dots,B_n partition Ω\Omega (mutually disjoint, iBi=Ω\bigcup_i B_i=\Omega) then for any event AA: Pr[A]=i=1nPr[ABi]Pr[Bi].\Pr[A]=\sum_{i=1}^n \Pr[A\mid B_i]\Pr[B_i].
  • Useful to expand Pr[A]\Pr[A] by conditioning on cases.

Expectation and indicator variables

  • Indicator of event AA: 1A(ω)=11_A(\omega)=1 if ωA\omega\in A, else 00.
  • Expectation of indicator: E[1A]=Pr[A]\mathbb{E}[1_A]=\Pr[A].
  • Linearity: E[iXi]=iE[Xi]\mathbb{E}[\sum_i X_i]=\sum_i\mathbb{E}[X_i] (no independence required).
  • For a discrete r.v. XX: E[X]=xxPr[X=x]\mathbb{E}[X]=\sum_x x\Pr[X=x].

Useful identities

  • Pr[A]=E[1A]\Pr[A]=\mathbb{E}[1_A].
  • Pr[AB]=Pr[A]+Pr[B]Pr[AB]\Pr[A\cup B]=\Pr[A]+\Pr[B]-\Pr[A\cap B].
  • If BB is a Bernoulli(pp) bit independent of other randomness: Pr[A]=Pr[AB=0]Pr[B=0]+Pr[AB=1]Pr[B=1]\Pr[A]=\Pr[A\mid B=0]\Pr[B=0]+\Pr[A\mid B=1]\Pr[B=1] (special case of total law).

Short worked example from our paper (the paper’s setup)

Arthur chooses b{0,1}b\in\{0,1\} uniformly. If b=0b=0 he samples x0ukx_0\sim u_k; if b=1b=1 he samples x1px_1\sim p. He sends xbx_b to Merlin, who outputs b^=1Sp(xb)\hat b=1_{S_p}(x_b) where Sp={x:p(x)1/k}S_p=\{x:p(x)\ge 1/k\}. Arthur accepts iff b^=b\hat b=b.

Compute Pr[b^=b]\Pr[\hat b=b]:

  1. Condition on bb (law of total probability): Pr[b^=b]=Pr[b^=bb=0]Pr[b=0]+Pr[b^=bb=1]Pr[b=1].\Pr[\hat b=b]=\Pr[\hat b=b\mid b=0]\Pr[b=0]+\Pr[\hat b=b\mid b=1]\Pr[b=1].
  2. Evaluate each term:
    • If b=0b=0: Merlin is correct iff x0Spx_0\notin S_p, so Pr[b^=bb=0]=uk(Spc)=1uk(Sp)\Pr[\hat b=b\mid b=0]=u_k(S_p^c)=1-u_k(S_p).
    • If b=1b=1: Merlin is correct iff x1Spx_1\in S_p, so Pr[b^=bb=1]=p(Sp)\Pr[\hat b=b\mid b=1]=p(S_p).
  3. Since Pr[b=0]=Pr[b=1]=1/2\Pr[b=0]=\Pr[b=1]=1/2, Pr[b^=b]=12(1uk(Sp)+p(Sp))=12(1+(p(Sp)uk(Sp))).\Pr[\hat b=b]=\tfrac12\big(1-u_k(S_p)+p(S_p)\big)=\tfrac12\big(1 + (p(S_p)-u_k(S_p))\big).
  4. Using p(Sp)uk(Sp)=dTV(p,uk)p(S_p)-u_k(S_p)=d_{TV}(p,u_k) (Scheffé set), if dTV(p,uk)ε2d_{TV}(p,u_k)\ge\varepsilon_2 then Pr[b^=b]12(1+ε2).\Pr[\hat b=b]\ge \tfrac12(1+\varepsilon_2).

Absolutely. This is a great way to solidify your understanding. The proof in Claim 4.4 is a classic example of an averaging argument, which is fundamental in complexity theory and cryptography.

Let’s walk through it.


1. Summary of the Soundness Proof (Claim 4.4)

The high-level goal of the soundness proof is to show that if the distributions pp and uku_k are close (i.e., dTV(p,uk)ϵ1d_{TV}(p, u_k) \le \epsilon_1), then no prover MM^*, no matter how cleverly designed, can succeed with a probability much better than random guessing.

The proof proceeds in two logical steps:

  1. First, we prove this bound for all deterministic provers.
  2. Second, we show that this implies the bound for all randomized provers.

1.1 Bounding any Deterministic Prover (MSM^*_S)

A deterministic prover MdM^*_d is just a fixed function f:X{0,1}f: \mathcal{X} \to \{0, 1\}. As we discussed, this is mathematically equivalent to the prover picking a fixed subset SXS \subseteq \mathcal{X} and using the strategy: “Guess 1 if xbSx_b \in S, and guess 0 otherwise.” We call this strategy MSM^*_S.

The proof follows this path:

  1. Write the Success Probability: For any such MSM^*_S, we calculated its exact success probability: P(successMS)=12P(x0S)+12P(x1S)P(\text{success} \mid M^*_S) = \frac{1}{2} \cdot P(x_0 \notin S) + \frac{1}{2} \cdot P(x_1 \in S) P(successMS)=12(1uk(S)+p(S))P(\text{success} \mid M^*_S) = \frac{1}{2} (1 - u_k(S) + p(S)) [cite_start]P(successMS)=12(1+(p(S)uk(S)))P(\text{success} \mid M^*_S) = \frac{1}{2} (1 + (p(S) - u_k(S))) [cite: 219]

  2. [cite_start]Apply the Soundness Promise: We are promised that the distributions are close[cite: 216]. This promise is dTV(p,uk)ϵ1d_{TV}(p, u_k) \le \epsilon_1.

  3. Use the Definition of TV Distance: The total variation distance is defined as the maximum possible “advantage” over all possible sets: dTV(p,uk)=supSX(p(S)uk(S))d_{TV}(p, u_k) = \sup_{S' \subseteq \mathcal{X}} (p(S') - u_k(S'))

    This means for the specific set SS that our prover MSM^*_S chose, its advantage (p(S)uk(S))(p(S) - u_k(S)) must be less than or equal to the maximum possible advantage.

    Therefore, (p(S)uk(S))dTV(p,uk)(p(S) - u_k(S)) \le d_{TV}(p, u_k).

  4. Chain the Inequalities: We can now chain steps 2 and 3 together to get our key bound: (p(S)uk(S))dTV(p,uk)ϵ1(p(S) - u_k(S)) \le d_{TV}(p, u_k) \le \epsilon_1

  5. Conclude for Deterministic Provers: We substitute this bound back into our success probability equation from step 1: P(successMS)=12(1+(p(S)uk(S)))P(\text{success} \mid M^*_S) = \frac{1}{2} (1 + (p(S) - u_k(S))) P(successMS)12(1+ϵ1)P(\text{success} \mid M^*_S) \le \frac{1}{2} (1 + \epsilon_1)

    This proves that no deterministic strategy can do better than 12(1+ϵ1)\frac{1}{2}(1 + \epsilon_1).

1.2 Extending to any Randomized Prover (MrM^*_r)

[cite_start]This is the “convexity” or “averaging” part of the argument[cite: 219].

  1. Define a Randomized Prover: A randomized prover MrM^*_r is simply an algorithm that randomly chooses which deterministic strategy MSM^*_S to run. Formally, it’s a probability distribution D\mathcal{D} over all possible sets SS.

  2. Write the Success Probability: The overall success of MrM^*_r is the average (or expected) success of all the deterministic strategies it might pick, weighted by the probability D\mathcal{D} of picking them: P(successMr)=EMSD[P(successMS)]P(\text{success} \mid M^*_r) = E_{M^*_S \sim \mathcal{D}} \left[ P(\text{success} \mid M^*_S) \right]

  3. Apply the Deterministic Bound: We just proved in part 1.1 that every single term inside that expectation is bounded: P(successMS)12(1+ϵ1)P(\text{success} \mid M^*_S) \le \frac{1}{2}(1 + \epsilon_1) for all SS.

  4. Conclude for Randomized Provers: If you take an average of many values, and every one of those values is less than or equal to some number cc, the average itself must also be less than or equal to cc.

    Therefore, EMSD[]12(1+ϵ1)E_{M^*_S \sim \mathcal{D}} \left[ \dots \right] \le \frac{1}{2}(1 + \epsilon_1).

    This completes the proof. It shows that no strategy, deterministic or randomized, can break the soundness bound.


2. Glossary of Our Formal Concepts

Here are the “other bits” we defined, which form the conceptual toolkit for understanding the proof.

  • Honest Prover (MM) vs. Adversarial Prover (MM^*):

    • [cite_start]MM (no star) is the honest prover who follows the protocol as written (e.g., in Algorithm 1, MM uses the Scheffé set for its strategy)[cite: 210, 211].
    • MM^* (with star) is an adversarial or arbitrary prover. This represents any possible algorithm (even a malicious, cheating one) that the prover might use. The soundness proof must hold “for all MM^*.”
  • Deterministic Strategy (MdM^*_d):

    • A strategy that involves no randomness. Given the same input sample xbx_b, it will always produce the same guess b^\hat{b}.
    • It is just a fixed function f:X{0,1}f: \mathcal{X} \to \{0, 1\}.
  • Randomized Strategy (MrM^*_r):

    • A strategy that uses internal randomness. It can be seen as a probability distribution D\mathcal{D} over all possible deterministic strategies.
    • It first “flips its coins” (picks a random seed rr) to select a deterministic strategy frf_r, and then executes frf_r.
  • Set-Based Strategy (MSM^*_S):

    • This is just our more descriptive notation for MdM^*_d.
    • It makes explicit that every deterministic strategy ff is uniquely defined by a single subset SXS \subseteq \mathcal{X}, where S={xf(x)=1}S = \{x \mid f(x) = 1\}.
    • Saying “Merlin chooses MdM^*_d” is identical to saying “Merlin chooses SXS \subseteq \mathcal{X}.”
Last updated on