A Tutorial in Data Science: Lecture 3 – Laplace’s Analytical Theory of Probability

by Justin Petrillo | Jan 12, 2021 | Math Lecture

Table of Contents

This lecture serves as a philosophically informed mathematical introduction to the ideas and notation of probability theory from its most important historical theorist. It is part of an ongoing contemporary formal reconstruction of Laplace’s Calculus of Probability from his english-translated introductory essay, “A Philosophical Essay on Probabilities,” (cite: PEP) which can be read along with these notes, which are divided into the same sections as Laplace. I have included deeper supplements from the untranslated treatise Théorie Analytique des Probabilités (cite: TA) through personal and online translation tools in section 1.10 and the Appendix (3).

The General Principles of the Calculus of Probabilities

$\Omega$ is the state of all possible events.
$\omega \in \Omega$ is an event as element of the state.

1^st Principle: The probability of the occurrence of an event $\omega$ is the number of favorable cases divided by the total number of causal cases, assuming all cases are equally likely

$\Omega' ={\omega_1', \cdots , \omega_n' }$ is the derivational system of the state $\Omega$ as the space of cases that will cause different events in the state. $\Omega_{\omega}'= {\omega_{i_1}', \cdots , \omega_{i_m}': \omega_{i_j} \rightarrow \omega}$ is the derivational system of the state favoring the event $\omega$ . The order of a particular state (or derivational state-system) is given by the measure ( $| \dots |$ ) evaluated as the number of elements in it.

$P(\omega)=P(\Omega=\omega)=\frac{|\Omega_{\omega}'|}{|\Omega'|}=\frac{m}{n}$

If we introduce time as the attribute of case-based favorability, i.e. causality, the event $\omega$ is to occur at a future time $t_1$ , such as would be represented by the formal statement $\Omega(t_1)=\omega$ . The conditioning cases, equally likely, which will deterministically cause the event at $T=t_1$ are the possible events at the previous conditioning states of the system $T=t<t_1$ , given as $\Omega(t_0<t<t_1) \in \Omega'(t_1 | t_0)=\{\omega_1', \cdots , \omega_n' \}$ , a superposition of possible states-as-cases since they are unknown at the time of the present of $t_0$ , where $\Omega'$ is a derivational state-system, or set of possible causal states, here evaluated at $t_1$ given $t_0$ , i.e. $t_1 | t_0$ . This set of possible cases can be partitioned into those that are favorable to $\Omega(t_1)=\omega$ and those that aren’t favorable. The set of cases favorable to $\omega$ are $\Omega_{\omega}'(t_1 | t_0)=\{\omega_{i_1}', \cdots , \omega_{i_m}': \omega_{i_j} \xrightarrow[t]{} \omega\}$ .

$P(\omega)=P\bigg(\Omega(t_1)=\omega \bigg| \Omega(t_0)\bigg)=\frac{|\Omega_{\omega}'(t_1 | t_0)|}{|\Omega'(t_1 | t_0)|}=\frac{m}{n}$

2^nd Principle: Assuming the conditioning cases are not equal in probability, the probability of the occurrence of an event $\omega$ is the sum of the probability of the favorable cases

$P(\omega)=\sum_j P(\omega_{i_j}')$

3^rd Principle: The probability of the combined event ( $\omega$ ) of independent events $\{\omega_1, \cdots,\omega_n\}$ is the product of the probability of the composite events.

$P(\omega_1 \cap \cdots \cap \omega_n) = \prod_i P(\omega_i)$

4^th Principle: The probability of a compound event ( $\omega$ ) of two events dependent upon each other, $\omega_1 \& \omega_2$ , where $\omega_2$ is after $\omega_1$ , is the probability of the first times the probability of the second conditioned on the first having occurred.

$P(\omega_1 \cap \omega_2)= P(\omega_1) * P(\omega_2 | \omega_1)$

5^th Principle (p.15): The probability of an expected event $\omega_1$ conditioned on an occurred event $\omega_0$ is the probability of the composite event $\omega=\omega_0 \cap \omega_1$ divided by the a priori probability of occurred event.

$P(\omega_1|\omega_0)=\frac{P(\omega_0 \cap \omega_1)}{P(\omega_0)}$

Always, $a \ priori$ is from a prior state, as can be given by a previous event $\omega_{-1}$ . Thus, if we assume the present to be $t_0$ , the prior time to have been $t_{-1}$ , and the future time to be $t_1$ , then the $a priori$ probability of the presently occurred event is made from $\Omega(t_{-1})=\omega_{-1}$ as

$P(\omega_0)=P(\omega_0 | \omega_{-1})=P\bigg( \Omega(t_0)=\omega_0 \bigg | \Omega(t_{-1})=\omega_{-1} \bigg)$

The probability of the combined event $\omega_0 \cap \omega_1$

occurring can also be measured partially from the $a priori$

perspective as

$P(\omega_0 \cap \omega_1)=P(\omega_0 \cap \omega_1 | \omega_{-1})=P\bigg(\Omega(t_0)=\omega_0 \bigcap \Omega(t_1)=\omega_1 \bigg| \Omega(t_{-1})=\omega_{-1} \bigg)$

Thus,

$P(\omega_1|\omega_0)=P\bigg( (\omega_1|\omega_0) \bigg | \omega_{-1} \bigg)=\frac{P(\omega_0 \cap \omega_1 | \omega_{-1})}{P(\omega_0 | \omega_{-1})}$

6^th Principle: For a constant event, the likelihood of a cause to an event is the same as the probability that the event will occur. 2. The probability of the existence of any one of those causes is the probability of the event (resulting from this cause) divided by the sum of the probabilities of similar events from all causes. 3. For causes, considered a priori, which are unequally probable, the probability of the existence of a cause is the probability of the caused event divided by the sum of the product of the probability of the events and the possibility (a priori probability) of their cause.

For event $\omega_i$ , let $\omega_i'$ be its cause. While $P$ is the probability of an actual existence, $\mu$ is the measure of the a priori likelihood of a cause since its existence is unknown. These two measurements may be used interchangeably where the existential nature of the measurement is known or substitutions as approximations are permissible. In Principle 5 they are conflated since the probability of an occurred event always implies an a priori likelihood.

for $\omega$ constant (i.e. only 1 cause, $\omega'$ ), $P(\omega')=P(\omega)$
for $\omega_i'$ equally likely, $P(\omega_i')=P(\omega_i'|\omega)=\frac{P(\omega | \omega_i')}{\sum_j P(\omega|\omega_j')}$
$P(\omega_i')=P(\omega_i'|\omega)=\frac{P(\omega | \omega_i')<em>\mu(\omega_i')}{\sum_j P(\omega | \omega_j')</em>\mu(\omega_j')}$

7^th Principle (p. 17): The probability of a future event, $\omega_1$ , is the sum of the products of the probability of each cause, drawn from the event observed, by the probability that, this cause existing, the future event will occur.

The present is $t_0$ while the future time is $t_1$ . Thus, the future event expected is $\Omega(t_1)=\omega_1$ . Given that $\Omega(t_0)=\omega_0$ has been observed, we ask about the probability of a future event $\omega_1$ from the set of causes $\Omega'(t_1)={\omega_1^{(i)}:\omega_1^{(i)} \rightarrow \Omega(t_1)}$ (change of notation for causes).

$P(\omega_1|\omega_0)=\sum_i P(\omega_1^{(i)} | \omega_0)*P(\omega_1 | \omega_1^{(i)})$

How are we to consider causes? They can be historical events with a causal-deterministic relationship to the future or they can be considered event-conditions, as a spatiality (possibly true over a temporal duration) rather than a temporality (true at one time). Generally, we can consider causes to be hypotheses $H={H_1 \cdots H_n}$

, with $P(H_i)$

the $prior$

probability (single term) and $P(\omega | H_i)$

the $posterior$

(conditional) probability. The observed event ( $\omega_0$

) is $\omega_{obs}$

and the future event ( $\omega_1$

) is the expected event $\omega_{exp}$

. Thus, we can restate principles 7 & 6 as:

$P(H_i|\omega_{obs})=\frac{P(\omega_{obs} | H_i)P(H_i)}{\sum_j P(\omega_{obs} | H_j)P(H_j)}$
$P(\omega_{exp}|\omega_{obs})=\sum_i P(H_i | \omega_{obs})P(\omega_{exp} | H_i)\ = \frac{\sum_i P(\omega_{obs} | H_i)P(H_i)P(\omega_{exp} | H_i)}{\sum_j P(\omega_{obs} | H_j)P(H_j)}$

Clearly, Principle 6 is the same as Bayes Theorem (Wassermann, Thm. 2.16), which articulates the Hypotheses $H$ as a partition of $\Omega$ in that $\Omega=\cup_i H_i$ ( $H_i \cap H_j = \varnothing \ for \ i\neq j$ ), in that each hypothesis is a limitation of the domain of possible events. The observed event is also considered a set of events rather than a single ‘point.’ Therefore, Principle 6 says that “the probability that the possibility of the event is comprised within given limits is the sum of the fractions comprised within these limits” (Laplace, p.18).

8^th Principle (p.20): The Advantage of Mathematical Hope, $A$ , depending on several events, is the sum of the products of the probability of each event by the benefit to its occurrence

Let $\omega=\{\omega_1, \cdots ,\omega_n: \omega_i \in \Omega\}$ be the set of events under consideration. Let $B$ be the benefit function giving a value to each event. The advantage hoped for from these events is:

$A(\omega)=\sum_i B(\omega_i)*P(\omega_i)$

A fair game is one whose cost of playing is equal to the advantage gained through it.

9^th Principle (p.21): The Advantage $A$ , depending on a series of events ( $\omega$ ), is the sum of the products of the probability of each favorable event by the benefit to its occurrence minus the sum of the products of the probability of each unfavorable event by the cost to its occurrence.

Let $\omega={\omega_1, \cdots ,\omega_n: \omega_i \in \Omega}$ be the series of events under consideration, partitioned into $\omega=(\omega^+,\omega^-)$ for favorable and unfavorable events. Let $B$ be the benefit function for $\omega_i \in \omega^+$ and $L$ the loss function for $\omega_i \in \omega^-$ , each giving the value to each event. The advantage of playing the game is:

$A(\omega)=\sum_{i: \omega_i \in \omega^+} B(\omega_i)P(\omega_i) - \sum_{j: \omega_j \in \omega^-} L(\omega_j)P(\omega_j)$

Mathematical Hope is the positivity of A. Thus, if A is positive, one has hope for the game, while if A is negative one has fear.

In generality, $X$ is the random variable function, $X:\omega_i \rightarrow \mathbb{R}$ , that gives a value to each event, either a benefit ( $>0$ ) or cost ( $<0$ ). The absolute expectation ( $E$ ) of value for the game from these events is:

$E(\omega)=\sum_i X(\omega_i)*P(\omega_i)$

10^th Principle (p.23): The relative value of an infinitely small sum is equal to its absolute value divided by the total benefit of the person interested.

This section can be explicated by examining Laplace’s corresponding section in Théorie Analytique (S.41-42, p.432-445) as a development of Bernoulli’s work on the subject.

(432) For a \textit{physical fortune} $x$ , an increase by $dx$ produces a moral good reciprocal to the fortune, $\frac{kdx}{x}$ for a constant $k$ . $k$ is the \say{units} of moral goodness (i.e. utility) in that $\frac{dx}{x}=\frac{1}{k}\rightarrow$ 1 moral good. So, $k$ is the quantity of physical fortune whereby a marginal increase by unity of physical fortune is equivalent to unity of moral fortune. For a \textit{moral fortune} $y$ , [y=kln x + ln h]
A moral good is the proportion of an increase in part of a fortune by the whole fortune. Moral fortune is the sum of all moral goods. If we consider this summation continuously for all infinitesimally small increases in physical fortune, moral fortune is the integral of the proportional reciprocal of the physical fortune by the changes in that physical fortune. Deriving this from principle 10,

$dy=\frac{kdx}{x}$

$\int dy = y = \int \frac{kdx}{x} = k \int \frac{1}{x} dx = kln(x) + C$

$C=ln(h)$ is the constant of minimum moral good when the physical fortune is unity. We can put this in terms of a physical fortune, $x_0$ , the minimum physical fortune for surviving one’s existence – the cost of reproducing the conditions of one’s own existence. With $h=\frac{1}{{x_0}^k}$ ,

$y=\int_{x_0}^x dy =\int_{x_0}^x \frac{kdx}{x}=kln(x) - k ln(x_0)=kln(x) - k ln\bigg(\frac{1}{\sqrt[k]{h}}\bigg)=kln(x) + ln(h)$

h is a constant given by an empirical observation of $y$

as never positive or negative but always at least what is necessary, as even someone without any physical fortune will still have a moral fortune in their existence – it is thus the unpriced \say{physical fortune} of laboring existence.

(433) Suppose an individual with a physical fortune of $a$ expects to receive a variety of changes in fortunes $\alpha, \zeta, \gamma, \cdots$ , as increments or diminishings, with probabilities of $p, q, r, \cdots$ summing to unity. The corresponding moral fortunes would be,

$k ln(a+\alpha) + ln(h), \ k ln(a+\zeta) + ln(h), k ln(a+\gamma) + ln(h), \cdots$

Thus, the expected moral fortune $Y$

$Y=kp ln(a+\alpha)+ kq ln(a+\zeta) + kr ln(a+\gamma) + \cdots + ln(h)$

Let $X$

be the physical fortune corresponding to this moral fortune, as

$Y=k ln(X) + ln(h)$

with,

$X=(a+\alpha)^p(a+\zeta)^q(a+\gamma)^r \cdots$

Taking away the primitive fortune $a$

from this value of $X$

, the difference will be the increase in the physical fortune that would procure the individual the same moral advantage resulting from his expectation. This difference is therefore the expression of the mathematical advantage,

$p\alpha + q\zeta + r\gamma + \cdots$

This results in several important consequences. One of them is that the mathematically most equal game is always advantageous. Indeed, if we denote by $a$

the physical fortune of the player before starting the game, by $p$

his probability of winning, (434) $\cdots$

Concerning the Analytical Methods of the Calculus of Probabilities

The Binomial Theorem:

$(x+y)^n=\sum_{k=0}^n \binom{n}{k}x^{n-k}y^k$

Letting $x=1$

$(1+y)^n=\sum_{k=0}^n \binom{n}{k}y^k=\sum_{k=1}^n \binom{n}{k}y^k + 1$

If we suppose these letters are equal

$\prod_{i=1}^n (1+a_i) \ -1 = \sum_{k=1}^n \prod_{l=1}^{l=k}a_k a_l$

Consider the lottery composed of $n$ numbers, of which $r$ are drawn at each draw:\
What is the probability of drawing s given numbers $Y=(y_1, \cdots y_s)$ in one draw $X=(x_1, \cdots, x_r)$ ?\

$P(Y \in X)=\frac{\binom{n}{n-s}}{\binom{n}{r}}=\frac{\binom{r}{s}}{\binom{n}{s}}$

Consider the Urn $\Omega$ with $a$ white balls and $b$ black balls with replacement. Let $A_n = {\omega_1, \cdots \omega_n}$ be n draws. Let $\mu_w(A)$ be the number of white balls and $\mu_b (A)$ be the number of black balls. What is the probability of $m$ white balls and $n-m$ black balls being drawn?

$P\bigg(\mu_w(A_n) = m \ \& \ \mu_b(A_n)=n-m\bigg)=P^n_m=?$

$(a+b)^n$

is the number of all the cases possible in $n$

draws. In the expansion of this binomial, $\binom{n}{m}b^{n-m}a^m$

expresses the number of cases in which $m$

white ballsa nd $n-m$

black balls may be drawn. Thus,

$P^n_m=\frac{\binom{n}{m}b^{n-m}a^m}{(a+b)^n}$

Letting $p=P(\mu_w(A_1)=1)=\frac{a}{a+b}$ be the probability of drawing a white ball out of single draw and $q=P(\mu_b(A_1)=1)=\frac{b}{a+b}$ be the probability of a drawing a black ball in a single draw,

$P^n_m=\binom{n}{m}q^{n-m}p^m$

$\Delta P^n_{m}=\frac{P^n_{m+1}}{P^n_{m}}=\frac{(n-m)p}{(m+1)q}$

This is an ordinary finite differential equation:

${\Delta}^r P^n_{m}= \frac{P^n_{m+r}}{P^n_{m}}=\frac{p^{r}}{q^{r}}\prod_{i=0}^{r-1}\frac{n-m-i}{m+i+1}$

Three players of supposed equal ability play together on the following conditions: that one of the first two players who beats his adversary plays the third, and if he beats him the game is finished. If he is beaten, the victor plays against the second until one of the players has defeated consecutively the two others, which ends the game. The probability is demanded that the game will be finished in a certain number of $n$ of plays. Let us find the probability that it will end precisely at the $n$ th play. For that the player who wins ought to enter the game at the play $n-1$ and win it thus at the following play. But, if in place of winning the play $n-1$ he should be beaten by his adversary who had just beaten the other player, the game would end at this play. Thus the probability that one of the players will enter the game at the play $n-1$ and will win it is equal to the probability that the game will end precisely with this play; and as this player ought to win the following play in order that the game may be finished at the $n$ th play, the probability of this last case will be only one half of the preceding one. (p.29-30)

Let $E$ be the random variable of the number of plays it takes for the game to finish.

$\mathbb{P}(E=n)=?$

Let $G_k=(p_1,p_2)$

be the random variable of the two players $(p_1,p_2)$

playing in game $k$

. Let $W_k=p_0$

be the random variable of the winning player, $p_0$

, of game $k$

$\mathbb{P}(E=n-1)=\mathbb{P}(G_{n-1}=W_{n-1}=p)$

$\mathbb{P}(E=n)=\frac{1}{2}\mathbb{P}(E=n-1)$

This is an ordinary finite differential equation for a recurrent process. To solve this probability, we notice the game cannot end sooner than the 2nd play and extend the iterative expression recursively,

$\mathbb{P}(E=n)=\bigg(\frac{1}{2}\bigg)^{n-2}\mathbb{P}(E=2)$

$\mathbb{P}(E=2)$

is the probability that one of the first two players who has beaten his adversary should beat at the second play the third player, which is $\frac{1}{2}$

. Thus,

$\mathbb{P}(E=n)=\bigg(\frac{1}{2}\bigg)^{n-1}$

The probability the game will end at latest the $n$

th play is the sum of these,

$\mathbb{P}(E\leq n)=\sum_{k=2}^n \bigg(\frac{1}{2}\bigg)^{k-1} = 1 - \bigg(\frac{1}{2}\bigg)^{n-1}$

Appendix: The Calculus of Generating Functions

In general, we can define the ordinary finite differential polynomial equation. For a particular Event, $E$ , its probability density function over internal-time steps $n$ is given by the distribution $f(n)=\mathbb{P}(E=n)$ . The base case ( $I_0$ ) of the inductive definition is known for the lowest time-step, $n_0$ , as $f(n_0)=c$ , while the iterative step ( $I^+$ ) is constructed as a polynomial function $\mathcal{P}(x)=\sum_i a_i x^i$ on the difference step of one time-unit:

$I^+: f(n)=\mathcal{P}(f(n-1))$

$\rightarrow f(n)=\underbrace{\mathcal{P}(\cdots\mathcal{P}(}_{n}f(0))\cdots)$

$\rightarrow \mathcal{D}(f(n))=f'(n)=\mathcal{D}(\mathcal{P})(f(n-1))f'(n-1)=\prod{k=n_0}^{n-1}\mathcal{D}(\mathcal{P})(f(k))$

← PREVIOUS NEXT →

A Tutorial in Data Science: Lecture 3 – Laplace’s Analytical Theory of Probability

The General Principles of the Calculus of Probabilities

1^st Principle: The probability of the occurrence of an event $\omega$ is the number of favorable cases divided by the total number of causal cases, assuming all cases are equally likely

2^nd Principle: Assuming the conditioning cases are not equal in probability, the probability of the occurrence of an event $\omega$ is the sum of the probability of the favorable cases

3^rd Principle: The probability of the combined event ( $\omega$ ) of independent events $\{\omega_1, \cdots,\omega_n\}$ is the product of the probability of the composite events.

4^th Principle: The probability of a compound event ( $\omega$ ) of two events dependent upon each other, $\omega_1 \& \omega_2$ , where $\omega_2$ is after $\omega_1$ , is the probability of the first times the probability of the second conditioned on the first having occurred.

5^th Principle (p.15): The probability of an expected event $\omega_1$ conditioned on an occurred event $\omega_0$ is the probability of the composite event $\omega=\omega_0 \cap \omega_1$ divided by the a priori probability of occurred event.

7^th Principle (p. 17): The probability of a future event, $\omega_1$ , is the sum of the products of the probability of each cause, drawn from the event observed, by the probability that, this cause existing, the future event will occur.

8^th Principle (p.20): The Advantage of Mathematical Hope, $A$ , depending on several events, is the sum of the products of the probability of each event by the benefit to its occurrence

9^th Principle (p.21): The Advantage $A$ , depending on a series of events ( $\omega$ ), is the sum of the products of the probability of each favorable event by the benefit to its occurrence minus the sum of the products of the probability of each unfavorable event by the cost to its occurrence.

10^th Principle (p.23): The relative value of an infinitely small sum is equal to its absolute value divided by the total benefit of the person interested.

Concerning the Analytical Methods of the Calculus of Probabilities

Appendix: The Calculus of Generating Functions

Blog Main

Categories

Recent Posts

Archives

Archives

Categories

Meta

A Tutorial in Data Science: Lecture 3 – Laplace’s Analytical Theory of Probability

The General Principles of the Calculus of Probabilities

1st Principle: The probability of the occurrence of an event is the number of favorable cases divided by the total number of causal cases, assuming all cases are equally likely

2nd Principle: Assuming the conditioning cases are not equal in probability, the probability of the occurrence of an event is the sum of the probability of the favorable cases

3rd Principle: The probability of the combined event () of independent events is the product of the probability of the composite events.

10th Principle (p.23): The relative value of an infinitely small sum is equal to its absolute value divided by the total benefit of the person interested.

Concerning the Analytical Methods of the Calculus of Probabilities

Appendix: The Calculus of Generating Functions

Blog Main

Categories

Recent Posts

Archives

Archives

Categories

Meta

1^st Principle: The probability of the occurrence of an event $\omega$ is the number of favorable cases divided by the total number of causal cases, assuming all cases are equally likely

2^nd Principle: Assuming the conditioning cases are not equal in probability, the probability of the occurrence of an event $\omega$ is the sum of the probability of the favorable cases

3^rd Principle: The probability of the combined event ( $\omega$ ) of independent events $\{\omega_1, \cdots,\omega_n\}$ is the product of the probability of the composite events.

10^th Principle (p.23): The relative value of an infinitely small sum is equal to its absolute value divided by the total benefit of the person interested.