Thomas M. Kehrenberg

Exploring Finite Factored Sets with some toy examples

Seeing as there is little secondary literature for the Finite Factored Set formalism, I thought I’d write up my experience of exploring it through some toy examples that are classic examples in the Pearlian paradigm. My goal was to see how these models that I understood very well in the Pearlian paradigm would work in Finite Factored Sets.

As a warning, this doesn’t make any use of the more exciting properties of Finite Factored Sets. It’s just an exploration of how this formalism handles the mundane stuff. This also means that I’m using the factored set directly, without the abstractions of the orthogonality database. Which I think is fine here, because these are tiny toy examples whose structure is fully known. (However, it’s possible that I’ve missed the entire point of Finite Factored Sets.)


The first example is the 3-variable collider that is very central to Pearl’s formalism. It is given by the following Causal Diagram:

AA, BB, and CC are all binary variables (0=false, 1=true).

The intended meaning of a Causal Diagram (or rather, function causal model) is that the value of a node xix_i is given by a deterministic function that takes as input the parents, pa(xi)pa(x_i), (indicated by the arrows) and an “error”, or “noise”, variable uiu_i that is governed by a probability distribution that is independent from the other error/noise variables: xi=fi(pa(xi),ui)x_i = f_i(pa(x_i), u_i). Thus, the value of CC is given by c=fc(a,b,uC)c=f_c(a, b, u_C) where uCu_C is noise or uncertainty that is not explicitly modeled, which we can visualize like this:

We could also split up AA and BB into a deterministic and a random part, but as they are root nodes, there is little point. It would just be a=fa(uA)a=f_a(u_A).

The Pearlian formalism runs on graphs, but Finite Factored Sets run on the set SS of all possible outcomes – the sample space. So, the goal is now to construct a sample space that is consistent to the above graph. After that, we’ll find a factorization of that sample space.

I think it should be clear that to cover the whole sample space SS, it is sufficient to consider all possible combinations of the outcomes of AA, BB, and UCU_C (but not CC), because if we know the value of these three, then we also know the value of CC, via fcf_c.

So, we simply define SS as the Cartesian product of the sets of possible values of AA and BB: A={0,1}\mathcal{A}=\{0,1\}, B={0,1}\mathcal{B}=\{0,1\}, and the possible values of UCU_C: UC\mathcal{U}_C, which I’ll leave undefined for the moment (except to note that it must be a finite set):

S=A×B×UC,(a,b,uC)SS=\mathcal{A}\times\mathcal{B}\times\mathcal{U}_C,\quad (a, b, u_C) \in S

(We use the lowercase letters to represent elements of sets that make up the Cartesian product: aAa\in \mathcal{A}, bBb\in\mathcal{B}, and uCUCu_C\in\mathcal{U}_C.)

Then – as is custom in the formalism of Finite Factored Sets – the variables AA, BB, UCU_C are defined as partitions on SS. For example, AA is a partition consisting of two parts: 1) the set of elements of SS where a=0a=0 and 2) those where a=1a=1:

A={{(a,b,uC)Sa=0},{(a,b,uC)Sa=1}}A = \{\{(a, b, u_C)\in S|a=0\}, \{(a, b, u_C)\in S|a=1\}\}

This captures exactly what the variable AA is supposed to represent: the question of whether the first element of the Cartesian product is 0 or 1. BB is defined analogously:

B={{(a,b,uC)Sb=0},{(a,b,uC)Sb=1}}B = \{\{(a, b, u_C)\in S|b=0\}, \{(a, b, u_C)\in S|b=1\}\}

And UCU_C as well:

UC={{(a,b,uC)SuC=0},}U_C = \{\{(a, b, u_C)\in S|u_C=0\}, \dots\}

with an as of yet undefined number of parts in the partition.

Now, given the way we constructed SS, the set of partitions G={A,B,UC}G=\{A, B, U_C\} is a factorization of SS: F=(S,G)F=(S, G), because any element of SS can be uniquely identified by knowing in which part of AA, BB, and UCU_C it is, because SS is just the Cartesian product of A\mathcal{A}, B\mathcal{B}, and UC\mathcal{U}_C.

Now that we have the set, let’s look at the probability distribution over SS. Let P(A=a)P(A=a') be shorthand for P({(a,b,uC)Sa=a})P(\{(a, b, u_C)\in S|a=a'\}), i.e., the probability that the outcome lands in the subset {(a,b,uC)Sa=a}\{(a, b, u_C)\in S|a=a' \} of the sample space SS, where the first element of the Cartesian product is equal to aa'. We define P(B=b)P(B=b) and P(UC=uc)P(U_C=u_c) analogously. Finally, let P(a,b,uC)P(a, b, u_C) refer to the probability of the individual outcome (a,b,uC)S(a, b, u_C)\in S.

The fact that we want our finite factored set model to be consistent with the Causal Diagram above, implies some things about PP. In particular, the diagram implies that AA, BB, and UCU_C should be independent, which means that the joint probability should factorize like this:

P(a,b,uC)=P(A=a)P(B=b)P(UC=uC)P(a, b, u_C)=P(A=a)P(B=b)P(U_C=u_C)

But this is exactly how our (finite) set SS factorizes! Thus, PP factorizes the same way that SS does.

This concludes the translation of the graphical model into a semantically-equivalent finite factored set. To recap what we have accomplished: we turned the original model with the variables AA, BB, and CC into one with three independent variables: AA, BB, and UCU_C. And defined CC as a deterministic function of these. We then constructed a finite factored set with the factors AA, BB, and UCU_C.

Now, let’s define CC on SS as well. We again define it as a partition of SS:

C={{(a,b,uC)SfC(a,b,uC)=0},{(a,b,uC)SfC(a,b,uC)=1}}C=\{\{(a, b, u_C)\in S|f_C(a, b, u_C)=0\},\\ \{(a, b, u_C)\in S|f_C(a, b, u_C)=1\}\}

We simply partition SS depending on the output of the function fCf_C. This also allows us to define P(C=c)P(C=c) as P({(a,b,uC)SfC(a,b,uC)=c})P(\{(a, b, u_C)\in S|f_C(a, b, u_C)=c\}).


In the Pearlian formalism, we can read off the fact that AA and BB are independent from the graph structure. With Finite Factored sets, we have to look at histories. The history of a partition (aka a variable) is roughly speaking the minimal set of factors in the factorization GG that is sufficient to fully specify the partition (aka variable). AA and BB are factors in their own right, so their history consists just of themselves: h(A)={A}h(A)=\{A\} and h(B)={B}h(B)=\{B\}, because surely knowing AA is enough to know AA. As h(A)h(B)=h(A) \cap h(B)=\emptyset, we can conclude that AA and BB are orthogonal, but this is just because we defined the factorization that way, so this is no new information. Still, it’s a good consistency check.

The history of CC is more interesting, but still pretty trivial to determine. As long as fCf_C makes use of all its arguments and is generally non-pathological, all the factors are needed to determine CC. So: h(C)={A,B,UC}h(C)=\{A, B, U_C\}. This implies h(A)h(C)h(A)\subset h(C) and h(B)h(C)h(B)\subset h(C) which implies AA and BB are both strictly before CC in the time defined by our factorization. This is a non-trivial result which matches what we would have expected from the graphical model that I drew in the beginning. So, that’s good.

But what if we condition on CC? Colliders are special because they have ABA\perp B but A⊥̸BCA\not\perp B|C. Can we recover this results from our finite factored set model?

To compute conditional orthogonality, we have to compute conditional histories. In order to condition on CC, we need the histories conditioned on the subsets of SS where C=0C=0 and where C=1C=1. We’ll call these subsets C0C_0 and C1C_1:

C0:={(a,b,uC)SfC(a,b,uC)=0}C1:={(a,b,uC)SfC(a,b,uC)=1}C_0 := \{(a, b, u_C)\in S|f_C(a, b, u_C)=0\}\\ C_1 := \{(a, b, u_C)\in S|f_C(a, b, u_C)=1\}

Let’s start with the latter: let’s determine h(AC1)h(A|C_1) – the conditional history of AA given C1C_1. However, the definition of conditional history is not as straightforward as the definition of history. I’m reproducing it here:

Let F=(S,G)F=(S, G) be a finite factored set, let XX, YY, ZPart(S)Z\in \text{Part}(S), and let ESE\subseteq S. We use sXts\sim_X t to say that two elements ss and tt are in the same part in XX. The conditional history of XX given EE, written hF(XE)h^F(X|E), is the smallest set of factors HGH\subseteq G satisfying the following two conditions:

  1. For all s,tEs, t\in E, if sbts\sim_b t for all bHb\in H, then sXts\sim_X t.
  2. For all s,tEs, t\in E and rSr\in S, if rb0sr\sim_{b_0} s for all b0Hb_0\in H and rb1tr\sim_{b_1} t for all b1G\Hb_1\in G\backslash H, then rEr\in E.

The first condition is easy. It just says that the factors in h(AC1)h(A|C_1) should be sufficient to pin down AA in C1C_1. We can safely assume that AA itself will be enough. So, is H:=h(AC1)H:=h(A|C_1) just {A}\{A\} again? Well there is that second condition. To show its effect, let’s assume that indeed H={A}H=\{A\}. The condition then specifies something that has to be true for all s,tC1s,t\in C_1 and rSr\in S. However, given the assumption we just made, I can construct a counterexample to the stated condition. (Thus showing by contradiction that h(AC1)h(A|C_1) is not just {A}\{A\}.)

To make things concrete, I’ll give a concrete definition of how CC is computed. First, let’s say that UCU_C is the result of a 6-sided die; so: UC={1,2,3,4,5,6}\mathcal{U}_C=\{1, 2, 3, 4, 5, 6\}. We’ll then say that CC is XOR of AA and BB, except when UCU_C rolls a 6, in which case CC is just 0:

fC(a,b,uC)={abif uC60else.f_C(a, b, u_C)=\begin{cases} a\oplus b &\text{if } u_C \neq 6\\ 0 &\text{else.} \end{cases}

The counterexample is then:

s=(as,bs,uCs)=(1,0,1)t=(at,bt,uCt)=(0,1,2)r=(ar,br,uCr)=(1,1,2)s=(a_s, b_s, u_{Cs})=(1, 0, 1)\\ t=(a_t, b_t, u_{Ct})=(0, 1, 2)\\ r=(a_r, b_r, u_{Cr})=(1, 1, 2)

Let’s first confirm that ss and tt are indeed in C1C_1. To be in C1C_1, we need fC(a,b,uC)=1f_C(a, b, u_C)=1. This is true for both, because 10=11\oplus 0=1 and 01=10\oplus 1=1 and in neither case is uC=6u_C=6.

Now, in the if-statement in the second condition, we first ask whether rr and ss can be distinguished by the factors in HH. We’re assuming for now that HH consists only of AA, so the question becomes whether rr and ss can be distinguished by looking at AA. The answer is no, because in both cases, the first entry is 1 (so, a=1a=1). Thus, as far as the factors in HH are concerned rr and ss are the same.

Next we must investigate rr and tt in light of the the factors that are not in HH. In our case, that’s BB and UCU_C. As we can see, rr and tt are indeed indistinguishable under BB and UCU_C because rr and tt only differ in the first entry of the tuple (which corresponds to AA). The if-statement is thus true, and so according to the condition, we should have rC1r\in C_1. However, this is not the case, because 11=01\oplus 1 = 0, and so rr is in C0C_0. In other words, we have a contradiction and h(AC1)h(A|C_1) can’t consist of only AA.

So, what does the second condition in the definition of conditional history look out for? It seems to want to prevent that both HH and its complement are incomplete when it comes to being able to answer the question of whether an element is in C1C_1 or not. That is, if both the factors within HH and the factors without seem to indicate that an element rr is in C1C_1, then – the definition says – it should really be in C1C_1. The factors should not be split in such a way that neither HH nor its complement (G\HG\backslash H where GG is the set of all factors) can reconstruct CC; otherwise, the border itself leaks information about the likely values of factors.

The problem can be easily fixed by adding BB and UCU_C to h(AC1)h(A|C_1) as well, so that HH has all the factors needed to fully pin down the border of C1C_1: h(AC1)={A,B,UC}h(A|C_1)=\{A, B, U_C\}. Via symmetry, we also get h(AC0)={A,B,UC}h(A|C_0)=\{A, B, U_C\} and also the same for h(BC0,1)h(B|C_{0,1}). We thus definitely don’t have h(AC0/1)h(BC0/1)=h(A|C_{0/1})\cap h(B|C_{0/1})=\emptyset anymore. And so, AA and BB are not orthogonal given CC.

This means we have recovered the two most important independence statements about the collider: ABA\perp B and A⊥̸BCA\not\perp B|C, as well as C⊥̸AC\not\perp A and C⊥̸BC\not\perp B (just from the fact that AA and BB are strictly before CC in time). What remains to be confirmed are C⊥̸ABC\not\perp A|B and C⊥̸BAC\not\perp B|A. I’ll leave that as an exercise for the reader.


As our next example, we’ll have a look at the 3-variable chain:

Separating out the noise/uncertainty:

I won’t repeat all the steps now and just skip to constructing the sample space SS as the Cartesian product of the possible values of AA, UBU_B, and UCU_C. The elements of SS are then tuples like this one:

(a,uB,uC)S(a, u_B, u_C) \in S

We define the partitions AA, UBU_B, and UCU_C on this in the obvious way, and so they are our factorization of SS. The variables BB and CC are defined as partitions of SS according to some deterministic function of (a,uB)(a, u_B) and (a,uB,uC)(a, u_B, u_C) respectively. For the histories, it follows then that h(A)={A}h(A)=\{A\}, h(B)={A,UB}h(B)=\{A, U_B\}, and h(C)={A,UB,UC}h(C)=\{A, U_B, U_C\}, which implies A<B<CA<B<C, as one would expect.

(It might seem remarkable here that we can reconstruct the exact order of AA, BB, and CC, but that is only because we defined SS that way. Nothing interesting has happened yet. This is just a self-consistency check.)

I tried visualizing these histories but had limited success:

visualization of history

visualization of history

The interesting independence statement in this model is ACBA\perp C|B. So, to investigate this, let’s look at h(CB1)h(C|B_1) – the conditional history of CC on the subset of SS where B=1B=1. The first step is to clarify that CC is computed via BB:

fC(a,uB,uC)=gC(fB(a,uB),uC)f_C(a, u_B, u_C)=g_C(f_B(a, u_B), u_C)

That is, aa and uBu_B are only used via fBf_B. But if we already know the output of fBf_B (because we’re in the subset B1SB_1\subset S where fB=1f_B=1), then we only need uCu_C to compute the value of CC. Thus, it would be consistent with condition 1 of the conditional histories definition if we had h(CB1)={UC}h(C|B_1)=\{U_C\}. But is it also consistent with condition 2?

In condition 2, we have first this part: “if rUCsr\sim_{U_C} s …” However, UCU_C has no bearing on whether rr is in B1B_1 or not, because fBf_B doesn’t take UCU_C as an input. So, we can ignore that part. Without that part we are left with: if rXtr\sim_{X}t for all X{A,UB}X\in \{A, U_B\}, then rB1r\in B_1. Is this true for all tB1t\in B_1 and rSr\in S? Yes, because what it is saying is, if rr seems to be in B1B_1 according to AA and UBU_B, then it really is in B1B_1. And that is true, because AA and UBU_B already fully determine BB, so if they say you are in B1B_1, then you are.

So, we avoided the trap we had above, where neither HH nor its complement could reconstruct the boundary of the set we were conditioning on. Here, AA and UBU_B (which are both not in HH) are able to reconstruct the boundary of B1B_1 perfectly, so condition 2 is fulfilled. Thus, h(CB1)={UC}h(C|B_1)=\{U_C\} (and analogously h(CB0)={UC}h(C|B_0)=\{U_C\}), which implies that h(AB0/1)h(CB0/1)=h(A|B_{0/1})\cap h(C|B_{0/1})=\emptyset. (I didn’t check that h(AB0/1)h(A|B_{0/1}) doesn’t contain UCU_C, but a quick argument shows that it won’t: UCU_C is neither needed to pin down AA nor to pin down the border of B0/1B_{0/1}.) So, ACBA\perp C|B as expected.


There is one interesting 3-variable model left – the common cause:

The reader may want to try this out for themself before reading on.

Splitting off the randomness, we get:

As before, we can construct a sample space that is factorized by G={UB,A,UC}G=\{U_B, A, U_C\}, giving us the finite factored set F=(S,G)F=(S, G). The histories for AA, BB, and CC are then again obvious: h(A)={A}h(A) =\{A\}, h(B)={UB,A}h(B)=\{U_B, A\}, and h(C)={A,UC}h(C)=\{A, U_C\}. We can see that BB and CC are not independent, because their histories both include AA. But they also don’t have a definite temporal order; we can neither say B<FCB<^FC nor C<FBC<^FB.

From Pearl’s theory, we expect that BB and CC become independent when conditioned on AA. So let’s look at those conditional histories. As we know all the tricks by now, it will be a very high-level analysis.

We recall the first rule of conditional histories: the history should contain those factors that are needed to pin down the variable given that we already know the value of the conditioned variable. If we know the value of AA, then it suffices to know UBU_B in order to know BB. So, the conditional history, HH, of BB given that we know the value of AA, contains UBU_B at the least.

The second rule of conditional histories demands that either HH or its complement, G\HG\backslash H, (or both) is on its own sufficient to determine the value of the conditioned variable (AA in our case). Assuming H={UB}H=\{U_B\}, the complement of HH, {A,UC}\{A, U_C\}, contains AA itself, and so is definitely sufficient to determine AA. Thus, when conditioning on values for AA, H={UB}H=\{U_B\} is a permitted history by the second rule.

By symmetry, we get {UC}\{U_C\} for the conditional history for CC. This all then implies BCAB\perp C|A as expected.

Conclusions

I hope this was useful to someone. And I hope I didn’t completely mess up the intended use of this formalism.

One thing I appreciate about this formalism is that I find it easy to drop to the base level (the sample space with the factorization) to explicitly check my higher-level thoughts when I get confused. It’s nice to have that solid ground level available whenever I need it.

The rule about conditional histories is not exactly easy, but it feels closer to a fundamental law than the d-separation rules of the Pearlian paradigm, which always felt a bit arbitrary.

Finally, I still kind of think that a DAG is a really nice way to visualize dependencies and independencies of random variables. I wonder if there is a visualization that feels more native for Finite Factored Sets while still looking nice.