A Tutorial in Data Science: Lecture 5 – Generating Distributions by Spectral Analysis

When two states, i.e. possibly measured outcomes, of the stochastic sampling process of the underlying statistical object communicate, there is a probability of one occurring after the other, perhaps within the internal time (i.e. indexical ordering) of the measurement process, \( \in \pi=(1, \cdots, n)\) for the sample space \((\hat{X_1}, \cdots , \hat{X_n})\). Arranging the resulting values as a list, there is some chance of one value occurring after the other. Such is a direction of communication between the states those values represent. When two states inter-communicate, there is a positive probability that each state will occur after the other (within the ordering \(\pi\)). For such inter-communicating states, they have the same \(period\), defined as the GCD of distances between occurrences. The complex variable of functional communicativity can be described as the real probability of conditioned occurrence and the imaginary period of its intercommunications.

To describe our model by communicative functionals is to follow the Laplacian method of generating the distribution by finite difference equations. A single state, within state-space rather than time-space, is described as a complex variable \(s=\sigma + \omega i\), where \(\sigma\) is the real \(functional\) relation between state \& system (or part \& whole), while \(\omega\) is its imaginary \(communicative\) relationship. If we view the branching evolution of the possible states measured in a system under sampling, then the actual sampled values is a path along this decision-tree. The total system, as its Laplacian, or \(characteristic\), representation is the (tensorial) sum of the underlying sub-systems, of which each state belongs as a possible value. A continuum (real-distributional) system can only result as the infinite branching process, as thus each value a limit to an infinite path-sequence of rationalities in the state-system sub-dividing into state-systems until the limiting (stationary) systems-as-states are reached that are non-dynamic or static in the inner spatio-temporality of self-differentiation, i.e. non-dividing. Any node of this possibility-tree can be represented as a state of the higher-order system by a complex-value or as a system of the lower-order states by a complex-function. The real part of this value is the probability of the lower state occurring given the higher-system occuring (uni-directional communicativity), while the imaginary part is its relative period. Similarly, the real function of a higher system is the probability of lower states occurring given its occurence and the imaginary part is relative periods of the state-values.

A Tutorial in Data Science: Lecture 3 – Reconstruction of Laplacian Analytical Theory of Probabilities

This blog serves as a philosophically informed mathematical introduction to the ideas and notation of probability theory from its most important historical theorist. It is part of an ongoing contemporary formal reconstruction of Laplace’s Calculus of Probability from his english-translated introductory essay, “A Philosophical Essay on Probabilities,” (cite: PEP) which can be read along with these notes, which are divided into the same sections as Laplace. I have included deeper supplements from the untranslated treatise Théorie Analytique des Probabilités (cite: TA) through personal and online translation tools in section 1.10 and the Appendix (3).

The General Principles of the Calculus of Probabilities

\(\Omega\) is the state of all possible events.\
\(\omega \in \Omega\) is an event as element of the state.\

1st Principle: The probability of the occurrence of an event \(\omega\) is the number of favorable cases divided by the total number of causal cases, assuming all cases are equally likely

\(\Omega’ ={\omega_1′, \cdots , \omega_n’ }\) is the derivational system of the state \(\Omega\) as the space of cases that will cause different events in the state. \(\Omega_{\omega}’= {\omega_{i_1}’, \cdots , \omega_{i_m}’: \omega_{i_j} \rightarrow \omega}\) is the derivational system of the state favoring the event \(\omega\). The order of a particular state (or derivational state-system) is given by the measure (\(| \dots |\)) evaluated as the number of elements in it.

\[P(\omega)=P(\Omega=\omega)=\frac{|\Omega_{\omega}’|}{|\Omega’|}=\frac{m}{n}\]

If we introduce time as the attribute of case-based favorability, i.e. causality, the event \(\omega\) is to occur at a future time \(t_1\), such as would be represented by the formal statement \(\Omega(t_1)=\omega\). The conditioning cases, equally likely, which will deterministically cause the event at \(T=t_1\) are the possible events at the previous conditioning states of the system \(T=t<t_1\), given as \(\Omega(t_0<t<t_1) \in \Omega'(t_1 | t_0)={\omega_1′, \cdots , \omega_n’ }\), a superposition of possible states-as-cases since they are unknown at the time of the present of \(t_0\), where \(\Omega’\) is a derivational state-system, or set of possible causal states, here evaluated at \(t_1\) given \(t_0\), i.e. \(t_1 | t_0\). This set of possible cases can be partitioned into those that are favorable to \(\Omega(t_1)=\omega\) and those that aren’t favorable. The set of cases favorable to \(\Omega_{\omega}'(t_1 | t_0)={\omega_{i_1}’, \cdots , \omega_{i_m}’: \omega_{i_j} \rightarrow[t]{} \omega}\).

\[P(\omega)=P\bigg(\Omega(t_1)=\omega \bigg| \Omega(t_0)\bigg)=\frac{|\Omega_{\omega}'(t_1 | t_0)|}{|\Omega'(t_1 | t_0)|}=\frac{m}{n}\]

2nd Principle: Assuming the conditioning cases are not equal in probability, the probability of the occurrence of an event \(\omega\) is the sum of the probability of the favorable cases

\[P(\omega)=\sum_j P(\omega_{i_j}’)\]

3rd Principle: The probability of the combined event ($\omega$) of independent events \({\omega_1, \cdots,\omega_n}\) is the product of the probability of the composite events.

\[P(\omega_1 \cap \cdots \cap \omega_n) = \prod_i P(\omega_i)\]

4th Principle: The probability of a compound event (\(\omega\)) of two events dependent upon each other, \(\omega_1 \& \omega_2\), where \(\omega_2\) is after \(\omega_1\), is the probability of the first times the probability of the second conditioned on the first having occurred:

\[P(\omega_1 \cap \omega_2)= P(\omega_1) * P(\omega_2 | \omega_1)\]

5th Principle, p.15: The probability of an expected event \(\omega_1\) conditioned on an occurred event \(\omega_0\) is the probability of the composite event \(\omega=\omega_0 \cap \omega_1\) divided by the \(a \ priori\) probability of occurred event.

\[P(\omega_1|\omega_0)=\frac{P(\omega_0 \cap \omega_1)}{P(\omega_0)}\]

Always, \(a \ priori\) is from a prior state, as can be given by a previous event \(\omega_{-1}\). Thus, if we assume the present to be \(t_0\), the prior time to have been \(t_{-1}\), and the future time to be \(t_1\), then the \(a priori\) probability of the presently occurred event is made from \(\Omega(t_{-1})=\omega_{-1}\( as

\[P(\omega_0)=P(\omega_0 | \omega_{-1})=P\bigg( \Omega(t_0)=\omega_0 \bigg | \Omega(t_{-1})=\omega_{-1} \bigg)\]

The probability of the combined event \(\omega_0 \cap \omega_1\) occurring can also be measured partially from the \(a priori\) perspective as

\[P(\omega_0 \cap \omega_1)=P(\omega_0 \cap \omega_1 | \omega_{-1})=P\bigg(\Omega(t_0)=\omega_0 \bigcap \Omega(t_1)=\omega_1 \bigg| \Omega(t_{-1})=\omega_{-1} \bigg)\]

Thus,


\[P(\omega_1|\omega_0)=P\bigg( (\omega_1|\omega_0) \bigg | \omega_{-1} \bigg)=\frac{P(\omega_0 \cap \omega_1 | \omega_{-1})}{P(\omega_0 | \omega_{-1})}\]

6th Principle: 1. For a constant event, the likelihood of a cause to an event is the same as the probability that the event will occur. 2. The probability of the existence of any one of those causes is the probability of the event (resulting from this cause) divided by the sum of the probabilities of similar events from all causes. 3. For causes, considered \(a \ priori\), which are unequally probable, the probability of the existence of a cause is the probability of the caused event divided by the sum of the product of probability of the events and the possibility (\(a \ prori\) probability) of their cause.

For event \(\omega_i\), let \(\omega_i’\) be its cause. While \(P\) is the probability of an actual existence, \(\mu\) is the measure of the \(a \ priori \ likelihood\) of a cause since its existence is unknown. These two measurements may be used interchangeably where the existential nature of the measurement is known or substitutions as approximations are permissible. In Principle 5 they are conflated since the probability of an occurred event always implies an \(a \ priori\) likelihood.

  1. for \( \omega \) constant (i.e. only 1 cause, \( \omega’\), \(P(\omega’)=P(\omega)\)
  2. for \( \omega_i’ \) equally likely, \(P(\omega_i’)=P(\omega_i’|\omega)=\frac{P(\omega | \omega_i’)}{\sum_j P(\omega|\omega_j’)}\)
  3. \(P(\omega_i’)=P(\omega_i’|\omega)=\frac{P(\omega | \omega_i’)\mu(\omega_i’)}{\sum_j P(\omega | \omega_j’)\mu(\omega_j’)}\)

7th principle, p.17: The probability of a future event, \(\omega_1\), is the sum of the products of the probability of each cause, drawn from the event observed, by the probability that, this cause existing, the future event will occur.

The present is \(t_0\) while the future time is \(t_1\). Thus, the future event expected is \(\Omega(t_1)=\omega_1\). Given that \(\Omega(t_0)=\omega_0\) has been observed, we ask about the probability of a future event \(\omega_1\) from the set of causes \(\Omega'(t_1)={\omega_1^{(i)}:\omega_1^{(i)} \rightarrow \Omega(t_1)}\) (change of notation for causes).

\[P(\omega_1|\omega_0)=\sum_i P(\omega_1^{(i)} | \omega_0)*P(\omega_1 | \omega_1^{(i)})\]

How are we to consider causes? They can be historical events with a causal-deterministic relationship to the future or they can be considered event-conditions, as a spatiality (possibly true over a temporal duration) rather than a temporality (true at one time). Generally, we can consider causes to be hypotheses \(H={H_1 \cdots H_n}\), with \(P(H_i)\) the \(prior\) probability (single term) and \(P(\omega | H_i)\) the \(posterior\) (conditional) probability. The observed event (\(\omega_0\)) is \(\omega_{obs}\) and the future event (\(\omega_1\)) is the expected event \(\omega_{exp}\). Thus, we can restate principles 7 \& 6 as:

6. \( P(H_i|\omega_{obs})=\frac{P(\omega_{obs} | H_i)P(H_i)}{\sum_j P(\omega_{obs} | H_j)P(H_j)}\)

7. \( P(\omega_{exp}|\omega_{obs})=\sum_i P(H_i | \omega_{obs})P(\omega_{exp} | H_i)\ = \frac{\sum_i P(\omega_{obs} | H_i)P(H_i)P(\omega_{exp} | H_i)}{\sum_j P(\omega_{obs} | H_j)P(H_j)}\)

Clearly, Principle 6 is the same as Bayes Theorem (Wassermann, Thm. 2.16), which articulates the Hypotheses \(H\) as a partition of \(\Omega\) in that \(\Omega=\cup_i H_i\) )( (\(H_i \cap H_j = 0 \ for \ i\neq j\)), in that each hypothesis is a limitation of the domain of possible events. The observed event is also considered a set of events rather than a single ‘point.’ Therefore, Principle 6 says that \say{the probability that the possibility of the event is comprised within given limits is the sum of the fractions comprised within these limits} (Laplace, p.18).

8th principle, p.20: The Advantage of Mathematical Hope, \(A\), depending on several events, is the sum of the products of the probability of each event by the benefit to its occurrence

Let \(\omega={\omega_1, \cdots ,\omega_n: \omega_i \in \Omega}\) be the set of events under consideration. Let \(B\) be the benefit function giving a value to each event. The advantage hoped for from these events is:

\[A(\omega)=\sum_i B(\omega_i)*P(\omega_i)\]

A fair game is one whose cost of playing is equal to the advantage gained through it.

9th principle, p.21: The Advantage \(A\), depending on a series of events (\(\omega\)), is the sum of the products of the probability of each favorable event by the benefit to its occurrence minus the sum of the products of the probability of each unfavorable event by the cost to its occurrence.

Let \(\omega={\omega_1, \cdots ,\omega_n: \omega_i \in \Omega}\) be the series of events under consideration, partitioned into \(\omega=(\omega^+,\omega^-)\) for favorable and unfavorable events. Let \(B\) be the benefit function for \(\omega_i \in \omega^+\) and \(L\) the loss function for \(\omega_i \in \omega^-\), each giving the value to each event. The advantage of playing the game is:

\[A(\omega)=\sum_{i: \omega_i \in \omega^+} B(\omega_i)P(\omega_i) – \sum_{j: \omega_j \in \omega^-} L(\omega_j)P(\omega_j)\]

\(Mathematical \ Hope\) is the positivity of A. Thus, if A is positive, one has hope for the game, while if A is negative one has fear.

In generality, \(X\) is the random variable function, \(X:\omega_i \rightarrow \mathbb{R}\), that gives a value to each event, either a benefit (\(>0\)) or cost (\(<0\)). The absolute expectation (\(E\)) of value for the game from these events is:

\[E(\omega)=\sum_i X(\omega_i)*P(\omega_i)\]

10th principle, p.23: The relative value of an infinitely small sum is equal to its absolute value divided by the total benefit of the person interested.

This section can be explicated by examining Laplace’s corresponding section in Théorie Analytique (S.41-42, p.432-445) as a development of Bernoulli’s work on the subject.

(432) For a physical fortune \(x\), an increase by \(dx\) produces a moral good reciprocal to the fortune, \(\frac{k*dx}{x}\) for a constant \(k\). \(k\) is the (say units) of moral goodness (i.e. utility) in that \(\frac{dx}{x}=\frac{1}{k}\rightarrow\) 1 moral good. So, \(k\) is the quantity of physical fortune whereby a marginal increase by unity of physical fortune is equivalent to unity of moral fortune. For a \textit{moral fortune} \(y\),

\[y=k\ln(x)+\ln(h)\]

A moral good is the proportion of an increase in part of a fortune by the whole fortune. Moral fortune is the sum of all moral goods. If we consider this summation continuously for all infinitesimally small increases in physical fortune, moral fortune is the integral of the proportional reciprocal of the physical fortune by the changes in that physical fortune. Deriving this from principle 10,

\[dy=\frac{k\cdot dx}{x}\]

\[\int dy = y = \int \frac{kdx}{x} = k \int \frac{1}{x} dx = kln(x) + C\]

\(C=ln(h)\) is the constant of minimum moral good when the physical fortune is unity. We can put this in terms of a physical fortune, \(x_0\), the minimum physical fortune for surviving one’s existence – the cost of reproducing the conditions of one’s own existence. With \(h=\frac{1}{{x_0}^k}\),

\[y=\int_{x_0}^x dy =\int_{x_0}^x \frac{k*dx}{x}=kln(x) – k ln(x_0)=kln(x) – k ln\bigg(\frac{1}{\sqrt[k]{h}}\bigg)=kln(x) + ln(h) \]

h is a constant given by an empirical observation of \(y\) as never positive or negative but always at least what is necessary, as even someone without any physical fortune will still have a moral fortune in their existence – it is thus the unpriced (say physical fortune) of laboring existence.

(433) Suppose an individual with a physical fortune of \(a\) expects to receive a variety of changes in fortunes \(\alpha, \zeta, \gamma, \cdots\), as increments or diminishings, with probabilities of \(p, q, r, \cdots \) summing to unity. The corresponding moral fortunes would be,

\[ k ln(a+\alpha) + ln(h), \ k ln(a+\zeta) + ln(h), k ln(a+\gamma) + ln(h), \cdots \]

Thus, the expected moral fortune \(Y\) is

\[Y=kp ln(a+\alpha)+ kq ln(a+\zeta) + kr ln(a+\gamma) + \cdots + ln(h)\]


Let \(X\) be the physical fortune corresponding to this moral fortune, as

\[Y=k ln(X) + ln(h)\]

with,


\[X=(a+\alpha)^p(a+\zeta)^q(a+\gamma)^r \cdots\]

Taking away the primitive fortune \(a\) from this value of \(X\), the difference will be the increase in the physical fortune that would procure the individual the same moral advantage resulting from his expectation. This difference is therefore the expression of the mathematical advantage,

\[p\alpha + q\zeta + r\gamma + \cdots\]


This results in several important consequences. One of them is that the mathematically most equal game is always advantageous. Indeed, if we denote by \(a\) the physical fortune of the player before starting the game, by \(p\) his probability of winning, (434) \(\cdots\)

Concerning the Analytical Methods of the Calculus of Probabilities

The Binomial Theorem:


\[(x+y)^n=\sum_{k=0}^n {n \choose k}x^{n-k}y^k\]

Letting \(x=1\),

\[(1+y)^n=\sum_{k=0}^n {n \choose k}y^k=\sum_{k=1}^n {n \choose k}y^k + 1\]

If we suppose these letters are equal

\[\prod_{i=1}^n (1+a_i) \ -1 = \sum_{k=1}^n \prod_{l=1}^{l=k}a_k a_l\]

\[\prod_{i=1}^n (x+y)\]

How many ways can s letters drawn from n be arranged?

\[s!{n \choose s}\]

Consider the lottery composed of \(n\) numbers, of which \(r\) are drawn at each draw:\
What is the probability of drawing s given numbers \(Y=(y_1, \cdots y_s)\) in one draw \(X=(x_1, \cdots, x_r)\)?

\[P(Y \in X)=\frac{{n \choose (n-s)}}{{n \choose r}}=\frac{{r \choose s}}{{n \choose s}}\]

Consider the Urn \(\Omega\) with \(a\) white balls and \(b\) black balls with replacement. Let \(A_n = {\omega_1, \cdots \omega_n}\) be n draws. Let \(\mu_w(A)\) be the number of white balls and \(\mu_b (A)\) be the number of black balls. What is the probability of \(m\) white balls and \(n-m\) black balls being drawn?

\[P\bigg(\mu_w(A_n) = m \ \& \ \mu_b(A_n)=n-m\bigg)=P^n_m=?\]

\((a+b)^n\) is the number of all the cases possible in \(n\) draws. In the expansion of this binomial, \({n \choose m}b^{n-m}a^m\) expresses the number of cases in which $m$ white balls and \(n-m\) black balls may be drawn. Thus,

\[P^n_m=\frac{{n \choose m}b^{n-m}a^m}{(a+b)^n} \]

Letting \(p=P(\mu_w(A_1)=1)=\frac{a}{a+b}\) be the probability of drawing a white ball out of single draw and \(q=P(\mu_b(A_1)=1)=\frac{b}{a+b}\) be the probability of a drawing a black ball in a single draw,

\[P^n_m={n \choose m}q^{n-m}p^m\]

\[\Delta P^n_{m}=\frac{P^n_{m+1}}{P^n_{m}}=\frac{(n-m)p}{(m+1)q}\]

This is an ordinary finite differential equation:

\[{\Delta}^r P^n_{m}= \frac{P^n_{m+r}}{P^n_{m}}=\frac{p^{r}}{q^{r}}\prod_{i=0}^{r-1}\frac{n-m-i}{m+i+1}\]

Three players of supposed equal ability play together on the following conditions: that one of the first two players who beats his adversary plays the third, and if he beats him the game is finished. If he is beaten, the victor plays against the second until one of the players has defeated consecutively the two others, which ends the game. The probability is demanded that the game will be finished in a certain number of \(n\) of plays. Let us find the probability that it will end precisely at the \(n\)th play. For that the player who wins ought to enter the game at the play \(n-1\) and win it thus at the following play. But, if in place of winning the play \(n-1\) he should be beaten by his adversary who had just beaten the other player, the game would end at this play. Thus the probability that one of the players will enter the game at the play \(n-1\) and will win it is equal to the probability that the game will end precisely with this play; and as this player ought to win the following play in order that the game may be finished at the \(n\)th play, the probability of this last case will be only one half of the preceding one. (p.29-30).}

Let \(E\) be the random variable of the number of plays it takes for the game to finish.

\[\mathbb{P}(E=n)=?\]

Let

\(G_k=(p_1,p_2)\) be the random variable of the two players \((p_1,p_2)\) playing in game \(k\). Let \(W_k=p_0\) be the random variable of the winning player, \(p_0\), of game \(k\).

\[\mathbb{P}(E=n-1)=\mathbb{P}(G_{n-1}=W_{n-1}=p)\]

\[\mathbb{P}(E=n)=\frac{1}{2}\mathbb{P}(E=n-1)\]

This is an ordinary finite differential equation for a recurrent process. To solve this probability, we notice the game cannot end sooner than the 2nd play and extend the iterative expression recursively,

\[\mathbb{P}(E=n)=\bigg(\frac{1}{2}\bigg)^{n-2}\mathbb{P}(E=2)\]

\(\mathbb{P}(E=2)\) is the probability that one of the first two players who has beaten his adversary should beat at the second play the third player, which is \(\frac{1}{2}\). Thus,

\[\mathbb{P}(E=n)=\bigg(\frac{1}{2}\bigg)^{n-1}\]

The probability the game will end at latest the \(n\)th play is the sum of these,

\[\mathbb{P}(E\leq n)=\sum_{k=2}^n \bigg(\frac{1}{2}\bigg)^{k-1} = 1 – \bigg(\frac{1}{2}\bigg)^{n-1}\]

(p.31)

Appendix: Calculus of Generating Functions

In general, we can define the ordinary finite differential polynomial equation. For a particular Event, \(E\), its probability density function over internal-time steps \(n\) is given by the distribution \(f(n)=\mathbb{P}(E=n)\). The base case (\(I_0\)) of the inductive definition is known for the lowest time-step, \(n_0\), as \(f(n_0)=c\), while the iterative step (\(I^+\)) is constructed as a polynomial function \(\mathcal{P}(x)=\sum_i a_i x^i\) on the difference step of one time-unit:

\[I^+: f(n)=\mathcal{P}(f(n-1))\]

\[\rightarrow f(n)=\underbrace{\mathcal{P}(\cdots\mathcal{P}(}n(f(0))\cdots)] [\mathcal{D}(f(n))=f'(n)=\mathcal{D}(\mathcal{P})(f(n-1))f'(n-1)=\prod{k=n_0}^{n-1}\mathcal{D}(\mathcal{P})(f(k)) \]