A Tutorial in Data Science: Lecture 3 – Laplace’s Analytical Theory of Probability

by | Jan 12, 2021 | Math Lecture

This lecture serves as a philosophically informed mathematical introduction to the ideas and notation of probability theory from its most important historical theorist. It is part of an ongoing contemporary formal reconstruction of Laplace’s Calculus of Probability from his english-translated introductory essay, “A Philosophical Essay on Probabilities,” (cite: PEP) which can be read along with these notes, which are divided into the same sections as Laplace. I have included deeper supplements from the untranslated treatise Théorie Analytique des Probabilités (cite: TA) through personal and online translation tools in section 1.10 and the Appendix (3).

The General Principles of the Calculus of Probabilities

\Omega is the state of all possible events.
\omega \in \Omega is an event as element of the state.

1st Principle: The probability of the occurrence of an event \omega is the number of favorable cases divided by the total number of causal cases, assuming all cases are equally likely

\Omega' ={\omega_1', \cdots , \omega_n' } is the derivational system of the state \Omega as the space of cases that will cause different events in the state. \Omega_{\omega}'= {\omega_{i_1}', \cdots , \omega_{i_m}': \omega_{i_j} \rightarrow \omega} is the derivational system of the state favoring the event \omega. The order of a particular state (or derivational state-system) is given by the measure (| \dots |) evaluated as the number of elements in it.

    \[P(\omega)=P(\Omega=\omega)=\frac{|\Omega_{\omega}'|}{|\Omega'|}=\frac{m}{n}\]


If we introduce time as the attribute of case-based favorability, i.e. causality, the event \omega is to occur at a future time t_1, such as would be represented by the formal statement \Omega(t_1)=\omega. The conditioning cases, equally likely, which will deterministically cause the event at T=t_1 are the possible events at the previous conditioning states of the system T=t<t_1, given as \Omega(t_0<t<t_1) \in \Omega'(t_1 | t_0)=\{\omega_1', \cdots , \omega_n' \}, a superposition of possible states-as-cases since they are unknown at the time of the present of t_0, where \Omega' is a derivational state-system, or set of possible causal states, here evaluated at t_1 given t_0, i.e. t_1 | t_0. This set of possible cases can be partitioned into those that are favorable to \Omega(t_1)=\omega and those that aren’t favorable. The set of cases favorable to \omega are \Omega_{\omega}'(t_1 | t_0)=\{\omega_{i_1}', \cdots , \omega_{i_m}': \omega_{i_j} \xrightarrow[t]{} \omega\}.

    \[P(\omega)=P\bigg(\Omega(t_1)=\omega \bigg| \Omega(t_0)\bigg)=\frac{|\Omega_{\omega}'(t_1 | t_0)|}{|\Omega'(t_1 | t_0)|}=\frac{m}{n}\]

2nd Principle: Assuming the conditioning cases are not equal in probability, the probability of the occurrence of an event \omega is the sum of the probability of the favorable cases

    \[P(\omega)=\sum_j P(\omega_{i_j}')\]

3rd Principle: The probability of the combined event (\omega) of independent events \{\omega_1, \cdots,\omega_n\} is the product of the probability of the composite events.

    \[P(\omega_1 \cap \cdots \cap \omega_n) = \prod_i P(\omega_i)\]

4th Principle: The probability of a compound event (\omega) of two events dependent upon each other, \omega_1 \& \omega_2, where \omega_2 is after \omega_1, is the probability of the first times the probability of the second conditioned on the first having occurred.

    \[P(\omega_1 \cap \omega_2)= P(\omega_1) * P(\omega_2 | \omega_1)\]

5th Principle (p.15): The probability of an expected event \omega_1 conditioned on an occurred event \omega_0 is the probability of the composite event \omega=\omega_0 \cap \omega_1 divided by the a priori probability of occurred event.

    \[P(\omega_1|\omega_0)=\frac{P(\omega_0 \cap \omega_1)}{P(\omega_0)}\]

Always, a \ priori is from a prior state, as can be given by a previous event \omega_{-1}. Thus, if we assume the present to be t_0, the prior time to have been t_{-1}, and the future time to be t_1, then the a priori probability of the presently occurred event is made from \Omega(t_{-1})=\omega_{-1} as

    \[P(\omega_0)=P(\omega_0 | \omega_{-1})=P\bigg( \Omega(t_0)=\omega_0 \bigg | \Omega(t_{-1})=\omega_{-1} \bigg)\]


The probability of the combined event \omega_0 \cap \omega_1 occurring can also be measured partially from the a priori perspective as

    \[P(\omega_0 \cap \omega_1)=P(\omega_0 \cap \omega_1 | \omega_{-1})=P\bigg(\Omega(t_0)=\omega_0 \bigcap \Omega(t_1)=\omega_1 \bigg| \Omega(t_{-1})=\omega_{-1} \bigg)\]

Thus,

    \[P(\omega_1|\omega_0)=P\bigg( (\omega_1|\omega_0) \bigg | \omega_{-1} \bigg)=\frac{P(\omega_0 \cap \omega_1 | \omega_{-1})}{P(\omega_0 | \omega_{-1})}\]

6th Principle: For a constant event, the likelihood of a cause to an event is the same as the probability that the event will occur. 2. The probability of the existence of any one of those causes is the probability of the event (resulting from this cause) divided by the sum of the probabilities of similar events from all causes. 3. For causes, considered a priori, which are unequally probable, the probability of the existence of a cause is the probability of the caused event divided by the sum of the product of the probability of the events and the possibility (a priori probability) of their cause.

For event \omega_i, let \omega_i' be its cause. While P is the probability of an actual existence, \mu is the measure of the a priori likelihood of a cause since its existence is unknown. These two measurements may be used interchangeably where the existential nature of the measurement is known or substitutions as approximations are permissible. In Principle 5 they are conflated since the probability of an occurred event always implies an a priori likelihood.

  1. for \omega constant (i.e. only 1 cause, \omega'), P(\omega')=P(\omega)
  2. for \omega_i' equally likely, P(\omega_i')=P(\omega_i'|\omega)=\frac{P(\omega | \omega_i')}{\sum_j P(\omega|\omega_j')}
  3. P(\omega_i')=P(\omega_i'|\omega)=\frac{P(\omega | \omega_i')<em>\mu(\omega_i')}{\sum_j P(\omega | \omega_j')</em>\mu(\omega_j')}

7th Principle (p. 17): The probability of a future event, \omega_1, is the sum of the products of the probability of each cause, drawn from the event observed, by the probability that, this cause existing, the future event will occur.

The present is t_0 while the future time is t_1. Thus, the future event expected is \Omega(t_1)=\omega_1. Given that \Omega(t_0)=\omega_0 has been observed, we ask about the probability of a future event \omega_1 from the set of causes \Omega'(t_1)={\omega_1^{(i)}:\omega_1^{(i)} \rightarrow \Omega(t_1)} (change of notation for causes).

    \[P(\omega_1|\omega_0)=\sum_i P(\omega_1^{(i)} | \omega_0)*P(\omega_1 | \omega_1^{(i)})\]


How are we to consider causes? They can be historical events with a causal-deterministic relationship to the future or they can be considered event-conditions, as a spatiality (possibly true over a temporal duration) rather than a temporality (true at one time). Generally, we can consider causes to be hypotheses H={H_1 \cdots H_n}, with P(H_i) the prior probability (single term) and P(\omega | H_i) the posterior (conditional) probability. The observed event (\omega_0) is \omega_{obs} and the future event (\omega_1) is the expected event \omega_{exp}. Thus, we can restate principles 7 & 6 as:

  1. P(H_i|\omega_{obs})=\frac{P(\omega_{obs} | H_i)P(H_i)}{\sum_j P(\omega_{obs} | H_j)P(H_j)}
  2. P(\omega_{exp}|\omega_{obs})=\sum_i P(H_i | \omega_{obs})P(\omega_{exp} | H_i)\ = \frac{\sum_i P(\omega_{obs} | H_i)P(H_i)P(\omega_{exp} | H_i)}{\sum_j P(\omega_{obs} | H_j)P(H_j)}

Clearly, Principle 6 is the same as Bayes Theorem (Wassermann, Thm. 2.16), which articulates the Hypotheses H as a partition of \Omega in that \Omega=\cup_i H_i (H_i \cap H_j = \varnothing \ for \ i\neq j), in that each hypothesis is a limitation of the domain of possible events. The observed event is also considered a set of events rather than a single ‘point.’ Therefore, Principle 6 says that “the probability that the possibility of the event is comprised within given limits is the sum of the fractions comprised within these limits” (Laplace, p.18).

8th Principle (p.20): The Advantage of Mathematical Hope, A, depending on several events, is the sum of the products of the probability of each event by the benefit to its occurrence

Let \omega=\{\omega_1, \cdots ,\omega_n: \omega_i \in \Omega\} be the set of events under consideration. Let B be the benefit function giving a value to each event. The advantage hoped for from these events is:

    \[A(\omega)=\sum_i B(\omega_i)*P(\omega_i)\]

A fair game is one whose cost of playing is equal to the advantage gained through it.

9th Principle (p.21): The Advantage A, depending on a series of events (\omega), is the sum of the products of the probability of each favorable event by the benefit to its occurrence minus the sum of the products of the probability of each unfavorable event by the cost to its occurrence.

Let \omega={\omega_1, \cdots ,\omega_n: \omega_i \in \Omega} be the series of events under consideration, partitioned into \omega=(\omega^+,\omega^-) for favorable and unfavorable events. Let B be the benefit function for \omega_i \in \omega^+ and L the loss function for \omega_i \in \omega^-, each giving the value to each event. The advantage of playing the game is:

    \[A(\omega)=\sum_{i: \omega_i \in \omega^+} B(\omega_i)P(\omega_i) - \sum_{j: \omega_j \in \omega^-} L(\omega_j)P(\omega_j)\]

Mathematical Hope is the positivity of A. Thus, if A is positive, one has hope for the game, while if A is negative one has fear.

In generality, X is the random variable function, X:\omega_i \rightarrow \mathbb{R}, that gives a value to each event, either a benefit (>0) or cost (<0). The absolute expectation (E) of value for the game from these events is:

    \[E(\omega)=\sum_i X(\omega_i)*P(\omega_i)\]

10th Principle (p.23): The relative value of an infinitely small sum is equal to its absolute value divided by the total benefit of the person interested.

This section can be explicated by examining Laplace’s corresponding section in Théorie Analytique (S.41-42, p.432-445) as a development of Bernoulli’s work on the subject.

(432) For a \textit{physical fortune} x, an increase by dx produces a moral good reciprocal to the fortune, \frac{kdx}{x} for a constant k. k is the \say{units} of moral goodness (i.e. utility) in that \frac{dx}{x}=\frac{1}{k}\rightarrow 1 moral good. So, k is the quantity of physical fortune whereby a marginal increase by unity of physical fortune is equivalent to unity of moral fortune. For a \textit{moral fortune} y, [y=kln x + ln h]
A moral good is the proportion of an increase in part of a fortune by the whole fortune. Moral fortune is the sum of all moral goods. If we consider this summation continuously for all infinitesimally small increases in physical fortune, moral fortune is the integral of the proportional reciprocal of the physical fortune by the changes in that physical fortune. Deriving this from principle 10,

    \[dy=\frac{kdx}{x}\]

    \[\int dy = y = \int \frac{kdx}{x} = k \int \frac{1}{x} dx = kln(x) + C\]

C=ln(h) is the constant of minimum moral good when the physical fortune is unity. We can put this in terms of a physical fortune, x_0, the minimum physical fortune for surviving one’s existence – the cost of reproducing the conditions of one’s own existence. With h=\frac{1}{{x_0}^k},

    \[y=\int_{x_0}^x dy =\int_{x_0}^x \frac{kdx}{x}=kln(x) - k ln(x_0)=kln(x) - k ln\bigg(\frac{1}{\sqrt[k]{h}}\bigg)=kln(x) + ln(h) \]


h is a constant given by an empirical observation of y as never positive or negative but always at least what is necessary, as even someone without any physical fortune will still have a moral fortune in their existence – it is thus the unpriced \say{physical fortune} of laboring existence.

(433) Suppose an individual with a physical fortune of a expects to receive a variety of changes in fortunes \alpha, \zeta, \gamma, \cdots, as increments or diminishings, with probabilities of p, q, r, \cdots summing to unity. The corresponding moral fortunes would be,

    \[ k ln(a+\alpha) + ln(h), \ k ln(a+\zeta) + ln(h), k ln(a+\gamma) + ln(h), \cdots \]


Thus, the expected moral fortune Y is

    \[Y=kp ln(a+\alpha)+ kq ln(a+\zeta) + kr ln(a+\gamma) + \cdots + ln(h)\]


Let X be the physical fortune corresponding to this moral fortune, as

    \[Y=k ln(X) + ln(h)\]


with,

    \[X=(a+\alpha)^p(a+\zeta)^q(a+\gamma)^r \cdots\]


Taking away the primitive fortune a from this value of X, the difference will be the increase in the physical fortune that would procure the individual the same moral advantage resulting from his expectation. This difference is therefore the expression of the mathematical advantage,

    \[p\alpha + q\zeta + r\gamma + \cdots\]


This results in several important consequences. One of them is that the mathematically most equal game is always advantageous. Indeed, if we denote by a the physical fortune of the player before starting the game, by p his probability of winning, (434) \cdots

Concerning the Analytical Methods of the Calculus of Probabilities

The Binomial Theorem:

    \[(x+y)^n=\sum_{k=0}^n \binom{n}{k}x^{n-k}y^k\]


Letting x=1,

    \[(1+y)^n=\sum_{k=0}^n \binom{n}{k}y^k=\sum_{k=1}^n \binom{n}{k}y^k + 1\]


If we suppose these letters are equal

    \[\prod_{i=1}^n (1+a_i) \ -1 = \sum_{k=1}^n \prod_{l=1}^{l=k}a_k a_l\]

Consider the lottery composed of n numbers, of which r are drawn at each draw:\
What is the probability of drawing s given numbers Y=(y_1, \cdots y_s) in one draw X=(x_1, \cdots, x_r)?\

    \[P(Y \in X)=\frac{\binom{n}{n-s}}{\binom{n}{r}}=\frac{\binom{r}{s}}{\binom{n}{s}}\]

Consider the Urn \Omega with a white balls and b black balls with replacement. Let A_n = {\omega_1, \cdots \omega_n} be n draws. Let \mu_w(A) be the number of white balls and \mu_b (A) be the number of black balls. What is the probability of m white balls and n-m black balls being drawn?

    \[P\bigg(\mu_w(A_n) = m \ \& \ \mu_b(A_n)=n-m\bigg)=P^n_m=?\]


(a+b)^n is the number of all the cases possible in n draws. In the expansion of this binomial, \binom{n}{m}b^{n-m}a^m expresses the number of cases in which m white ballsa nd n-m black balls may be drawn. Thus,

    \[P^n_m=\frac{\binom{n}{m}b^{n-m}a^m}{(a+b)^n} \]

Letting p=P(\mu_w(A_1)=1)=\frac{a}{a+b} be the probability of drawing a white ball out of single draw and q=P(\mu_b(A_1)=1)=\frac{b}{a+b} be the probability of a drawing a black ball in a single draw,

    \[P^n_m=\binom{n}{m}q^{n-m}p^m\]

    \[\Delta P^n_{m}=\frac{P^n_{m+1}}{P^n_{m}}=\frac{(n-m)p}{(m+1)q}\]

This is an ordinary finite differential equation:

    \[{\Delta}^r P^n_{m}= \frac{P^n_{m+r}}{P^n_{m}}=\frac{p^{r}}{q^{r}}\prod_{i=0}^{r-1}\frac{n-m-i}{m+i+1}\]

Three players of supposed equal ability play together on the following conditions: that one of the first two players who beats his adversary plays the third, and if he beats him the game is finished. If he is beaten, the victor plays against the second until one of the players has defeated consecutively the two others, which ends the game. The probability is demanded that the game will be finished in a certain number of n of plays. Let us find the probability that it will end precisely at the nth play. For that the player who wins ought to enter the game at the play n-1 and win it thus at the following play. But, if in place of winning the play n-1 he should be beaten by his adversary who had just beaten the other player, the game would end at this play. Thus the probability that one of the players will enter the game at the play n-1 and will win it is equal to the probability that the game will end precisely with this play; and as this player ought to win the following play in order that the game may be finished at the nth play, the probability of this last case will be only one half of the preceding one. (p.29-30)

Let E be the random variable of the number of plays it takes for the game to finish.

    \[\mathbb{P}(E=n)=?\]


Let G_k=(p_1,p_2) be the random variable of the two players (p_1,p_2) playing in game k. Let W_k=p_0 be the random variable of the winning player, p_0, of game k.

    \[\mathbb{P}(E=n-1)=\mathbb{P}(G_{n-1}=W_{n-1}=p)\]


    \[\mathbb{P}(E=n)=\frac{1}{2}\mathbb{P}(E=n-1)\]


This is an ordinary finite differential equation for a recurrent process. To solve this probability, we notice the game cannot end sooner than the 2nd play and extend the iterative expression recursively,

    \[\mathbb{P}(E=n)=\bigg(\frac{1}{2}\bigg)^{n-2}\mathbb{P}(E=2)\]


\mathbb{P}(E=2) is the probability that one of the first two players who has beaten his adversary should beat at the second play the third player, which is \frac{1}{2}. Thus,

    \[\mathbb{P}(E=n)=\bigg(\frac{1}{2}\bigg)^{n-1}\]


The probability the game will end at latest the nth play is the sum of these,

    \[\mathbb{P}(E\leq n)=\sum_{k=2}^n \bigg(\frac{1}{2}\bigg)^{k-1} = 1 - \bigg(\frac{1}{2}\bigg)^{n-1}\]

Appendix: The Calculus of Generating Functions

In general, we can define the ordinary finite differential polynomial equation. For a particular Event, E, its probability density function over internal-time steps n is given by the distribution f(n)=\mathbb{P}(E=n). The base case (I_0) of the inductive definition is known for the lowest time-step, n_0, as f(n_0)=c, while the iterative step (I^+) is constructed as a polynomial function \mathcal{P}(x)=\sum_i a_i x^i on the difference step of one time-unit:

    \[I^+: f(n)=\mathcal{P}(f(n-1))\]


    \[\rightarrow f(n)=\underbrace{\mathcal{P}(\cdots\mathcal{P}(}_{n}f(0))\cdots)\]

   

    \[\rightarrow \mathcal{D}(f(n))=f'(n)=\mathcal{D}(\mathcal{P})(f(n-1))f'(n-1)=\prod{k=n_0}^{n-1}\mathcal{D}(\mathcal{P})(f(k))\]

Share This