A Tutorial in Data Science: Lecture 6 – Exploratory Data Analysis

The Issue of The Datum

Data, as finite, can never be merely fit without presupposition. The theory of the data, as what it is, is the presupposition that discloses the data in the first place through the act of measurement. As independent and identical (i.i.d.) measurements, there is not temporality to the measurement activities in serial, so the ordering of the samples is not relevant. But, this means that there is no temporality to the disclosure of the object at hand, preventing one measurement from being distinguished from another – they are either all simultaneous or noncomparable. Thus, i.i.d. random variables (measurement resultants) as a whole describe the different time-invariant superpositions of the system in question since at the single $null$ time of the serial measurements all the sample-values were found together at once or in a unity. To ‘force’ an order on the data by some random indexing is an unnecessary addition of data to our data-sampling process and thus an analysis that requires an ordering while be extraneous to the matter at hand. Thus, data as ‘unhypothesized’ describes a state-system broken in its spatio-temporality, unable to reveal itself as itself in unity, but rather as several (many) different states \(\omega_i\) all occurring with a minor existence (\(0<\mathbb{P}(\omega_i \in \Omega)<1\)). In the schemata of Heidegger (Being & Time), such things are present-at-hand in that they have been removed from their interlocking chains of signification from-which & for-which they exist through participation in the World as ready-at-hand – their existence is in question.

The Mimetics of Meaning

The meaning of a datum is its intelligibility within an interpretive context of signification. An Interpretation gives a specificity to its underlying distribution. A rational Interpretation to data gives a rational structure to its conditionality. In the circular process of interpretation, an assumption is made from which to understand the datum, while in the branching process, these assumptions are hierarchically decomposed.

A Tutorial in Data Science: Lecture 5 – Generating Distributions by Spectral Analysis

When two states, i.e. possibly measured outcomes, of the stochastic sampling process of the underlying statistical object communicate, there is a probability of one occurring after the other, perhaps within the internal time (i.e. indexical ordering) of the measurement process, \( \in \pi=(1, \cdots, n)\) for the sample space \((\hat{X_1}, \cdots , \hat{X_n})\). Arranging the resulting values as a list, there is some chance of one value occurring after the other. Such is a direction of communication between the states those values represent. When two states inter-communicate, there is a positive probability that each state will occur after the other (within the ordering \(\pi\)). For such inter-communicating states, they have the same \(period\), defined as the GCD of distances between occurrences. The complex variable of functional communicativity can be described as the real probability of conditioned occurrence and the imaginary period of its intercommunications.

To describe our model by communicative functionals is to follow the Laplacian method of generating the distribution by finite difference equations. A single state, within state-space rather than time-space, is described as a complex variable \(s=\sigma + \omega i\), where \(\sigma\) is the real \(functional\) relation between state \& system (or part \& whole), while \(\omega\) is its imaginary \(communicative\) relationship. If we view the branching evolution of the possible states measured in a system under sampling, then the actual sampled values is a path along this decision-tree. The total system, as its Laplacian, or \(characteristic\), representation is the (tensorial) sum of the underlying sub-systems, of which each state belongs as a possible value. A continuum (real-distributional) system can only result as the infinite branching process, as thus each value a limit to an infinite path-sequence of rationalities in the state-system sub-dividing into state-systems until the limiting (stationary) systems-as-states are reached that are non-dynamic or static in the inner spatio-temporality of self-differentiation, i.e. non-dividing. Any node of this possibility-tree can be represented as a state of the higher-order system by a complex-value or as a system of the lower-order states by a complex-function. The real part of this value is the probability of the lower state occurring given the higher-system occuring (uni-directional communicativity), while the imaginary part is its relative period. Similarly, the real function of a higher system is the probability of lower states occurring given its occurence and the imaginary part is relative periods of the state-values.

A Tutorial in Data Science: Lecture 4 – Statistical Inference via Systems of Hypothesis-Trees

As from Lecture 1, Let \(X^n\) be a random variable representing the n qualities that can be measured for the thing under investigation, \(\Omega\), itself the collected gathering of all its possible appearances, \(\omega \in \Omega\) such that \(X^n:\omega \rightarrow {\mathbb{R}}^n\). Each sampled measurement of \(X^n\) through interaction with \(\omega\) is given as an \(\hat{X}^n(t_i)\), each one constituting a unit of indexable time in the catalogable measurement process. Thus, the set of sampled measurements, a sample space, is a partition of ‘internally orderable’ test times within the measurement action, \(\{ \hat{X}^n(t): t \in \pi \}\). \
\(\Omega\) is a state-system, i.e the spatio-temporality of the thing in question, in that it has specific space-states \(\omega\) at different times \(\Omega(t)=\omega\). \(X\) is the function that measures \(\omega\). What if the measurement is not Real, but Complex: \(X: \Omega \rightarrow \mathbb{C}\)? While a real number results from a finite, approximate, or open-ended process of objective empirical measurement, an imaginary number results from a subjective intuition or presupposition to measurement. Every interaction with \(\Omega\) lets it appear as \(\omega\), which is quantified by \(X\). From these interactions, we seek to establish truths about \(\Omega\) as quantifying the probability that the Claim \(C\) is correct, which is itself a quantifiable statement about \(\Omega\).

Ultimately, we seek the nature of how \(\Omega\) appears differently depending on one’s interactions with it (i.e. samplings), as thus the actual distribution (\(\mathcal{D}\)) of the observed measurements, using our measurement apparatus $X$, that is, we ask about \(\mathcal{D}X(\Omega)=f{X(\Omega)}\). The assumptions will describe the class \(\mathcal{C}\) of the family \(\mathcal{F}\) of distribution functions which \(f_X\) belongs to, i.e. \(f_X \in \mathcal{F}{\mathcal{C}}\), for the \(\hat{X}\) measurements of the appearances of \(\Omega\), while the sampling will give the parameter \(\theta\), such that \(f_X =f{\mathcal{C}}(\theta)\). The hypothesis distribution-parameter (\(\theta^*\)) may be either established by prior knowledge (\(\theta_0\)) or some the present n-sampling of the state-system (\(\theta_1\)). Thus, the parameter obtained from the present sampling \(\hat{\theta}=\Theta(\hat{X_1}, \cdots \hat{X_n})\) is either used to judge the validity of a prior parameter estimation (\(\theta^=\theta_0\)) or is assessed in its own right (i.e. \(\theta^*=\theta_1=\hat{\theta}\)) as representative of the actual object’s state-system distribution, the difference between the two hypothesis set-ups, \textit{a priori vs. a posteriori}, being whether the present experiment is seen has having a bias or not. In either the prior or posteriori cases, \(H_{-}:\theta_0=\theta|\hat{\theta}\) or \(H_{+}:\hat{\theta}=\theta\), one uses the present sampling to establish the validity of a certain parameter value. If \(\hat{\Delta} \theta =\theta_0-\hat{\theta}\) is the expected bias of the experiment, then \(H_{-}:\hat{\theta}+\hat{\Delta}\theta=\theta|\hat{\theta}\) \& \(H_{+}:\hat{\theta}=\theta|\hat{\theta}\). Thus, in all experiments, the statistical question is primarily that of the bias of the experiment that samples a parameter, whether it is 0 or not, i.e. \(H_{-}:|\hat{\Delta}\theta|>0\) or \(H_{+}:\hat{\Delta}\theta=0\).

The truth of the bias of the experiment, i.e. how representative it is, can only be given by our prior assumptions, \(A\), such as to know the validity of our claim about the state-system’s distributional parameter, \(P(C|A)=P(\theta=\theta^*|\hat{\theta})=P(\Delta \theta=\hat{\Delta}\theta)\), as the probability our expectation of bias is correct. Our prior assumption, \(A: f_X \in \mathcal{F}{\mathcal{C}}\) is about the distribution of the \(k\)-parameters in the class-family of distributions, where \(\mathcal{F}{\mathcal{C}}={f(k)}, \ s.t. \ f_X=f(\theta)\), that is about \(\mathcal{D}K(\mathcal{F}{\mathcal{C}})\). Here, \(K\) is a random variable that samples state-systems in the wider class of generally known objects, or equivalently their distributions (i.e. functional representations), measuring the \(k\)-parameter of their distribution, such that \(f_K(\mathcal{F}{\mathcal{C}})=\mathcal{D}_K(\mathcal{F}{\mathcal{C}})\). The distributed-objects in \(\mathcal{F}_{\mathcal{C}}\) are themselves relative to the measurement system \(X\) although they may be transformed into other measurement units, in that this distribution class is of all possible state-systems which \(X\) might measure sample-wise, for which we seek to know specifically about the \(\Omega\) in question to obtain its distributional \(k\)-parameter value of \(\theta\). Essentially, the assumption \(A\) is about a meta-state-system as the set of all objects \(X\) can measure, and thus has more to do with \(X\), the subject’s method of measurement, and \(\Theta\), the parametrical aggregation of interest, than with \(\Omega\), the specific object of measurement.

\(\theta \in \Theta\), the set of all the parameters to the family \(\mathcal{F}\) of relevant distributions, in that \(\Theta\) uniquely determines \(f\), in that \(\exists M: \Theta \rightarrow f \in \mathcal{F}\), or \(f=\mathcal{F}(\Theta)\).

A Tutorial in Data Science: Lecture 3 – Reconstruction of Laplacian Analytical Theory of Probabilities

This blog serves as a philosophically informed mathematical introduction to the ideas and notation of probability theory from its most important historical theorist. It is part of an ongoing contemporary formal reconstruction of Laplace’s Calculus of Probability from his english-translated introductory essay, “A Philosophical Essay on Probabilities,” (cite: PEP) which can be read along with these notes, which are divided into the same sections as Laplace. I have included deeper supplements from the untranslated treatise Théorie Analytique des Probabilités (cite: TA) through personal and online translation tools in section 1.10 and the Appendix (3).

The General Principles of the Calculus of Probabilities

\(\Omega\) is the state of all possible events.\
\(\omega \in \Omega\) is an event as element of the state.\

1st Principle: The probability of the occurrence of an event \(\omega\) is the number of favorable cases divided by the total number of causal cases, assuming all cases are equally likely

\(\Omega’ ={\omega_1′, \cdots , \omega_n’ }\) is the derivational system of the state \(\Omega\) as the space of cases that will cause different events in the state. \(\Omega_{\omega}’= {\omega_{i_1}’, \cdots , \omega_{i_m}’: \omega_{i_j} \rightarrow \omega}\) is the derivational system of the state favoring the event \(\omega\). The order of a particular state (or derivational state-system) is given by the measure (\(| \dots |\)) evaluated as the number of elements in it.


If we introduce time as the attribute of case-based favorability, i.e. causality, the event \(\omega\) is to occur at a future time \(t_1\), such as would be represented by the formal statement \(\Omega(t_1)=\omega\). The conditioning cases, equally likely, which will deterministically cause the event at \(T=t_1\) are the possible events at the previous conditioning states of the system \(T=t<t_1\), given as \(\Omega(t_0<t<t_1) \in \Omega'(t_1 | t_0)={\omega_1′, \cdots , \omega_n’ }\), a superposition of possible states-as-cases since they are unknown at the time of the present of \(t_0\), where \(\Omega’\) is a derivational state-system, or set of possible causal states, here evaluated at \(t_1\) given \(t_0\), i.e. \(t_1 | t_0\). This set of possible cases can be partitioned into those that are favorable to \(\Omega(t_1)=\omega\) and those that aren’t favorable. The set of cases favorable to \(\Omega_{\omega}'(t_1 | t_0)={\omega_{i_1}’, \cdots , \omega_{i_m}’: \omega_{i_j} \rightarrow[t]{} \omega}\).

\[P(\omega)=P\bigg(\Omega(t_1)=\omega \bigg| \Omega(t_0)\bigg)=\frac{|\Omega_{\omega}'(t_1 | t_0)|}{|\Omega'(t_1 | t_0)|}=\frac{m}{n}\]

2nd Principle: Assuming the conditioning cases are not equal in probability, the probability of the occurrence of an event \(\omega\) is the sum of the probability of the favorable cases

\[P(\omega)=\sum_j P(\omega_{i_j}’)\]

3rd Principle: The probability of the combined event ($\omega$) of independent events \({\omega_1, \cdots,\omega_n}\) is the product of the probability of the composite events.

\[P(\omega_1 \cap \cdots \cap \omega_n) = \prod_i P(\omega_i)\]

4th Principle: The probability of a compound event (\(\omega\)) of two events dependent upon each other, \(\omega_1 \& \omega_2\), where \(\omega_2\) is after \(\omega_1\), is the probability of the first times the probability of the second conditioned on the first having occurred:

\[P(\omega_1 \cap \omega_2)= P(\omega_1) * P(\omega_2 | \omega_1)\]

5th Principle, p.15: The probability of an expected event \(\omega_1\) conditioned on an occurred event \(\omega_0\) is the probability of the composite event \(\omega=\omega_0 \cap \omega_1\) divided by the \(a \ priori\) probability of occurred event.

\[P(\omega_1|\omega_0)=\frac{P(\omega_0 \cap \omega_1)}{P(\omega_0)}\]

Always, \(a \ priori\) is from a prior state, as can be given by a previous event \(\omega_{-1}\). Thus, if we assume the present to be \(t_0\), the prior time to have been \(t_{-1}\), and the future time to be \(t_1\), then the \(a priori\) probability of the presently occurred event is made from \(\Omega(t_{-1})=\omega_{-1}\( as

\[P(\omega_0)=P(\omega_0 | \omega_{-1})=P\bigg( \Omega(t_0)=\omega_0 \bigg | \Omega(t_{-1})=\omega_{-1} \bigg)\]

The probability of the combined event \(\omega_0 \cap \omega_1\) occurring can also be measured partially from the \(a priori\) perspective as

\[P(\omega_0 \cap \omega_1)=P(\omega_0 \cap \omega_1 | \omega_{-1})=P\bigg(\Omega(t_0)=\omega_0 \bigcap \Omega(t_1)=\omega_1 \bigg| \Omega(t_{-1})=\omega_{-1} \bigg)\]


\[P(\omega_1|\omega_0)=P\bigg( (\omega_1|\omega_0) \bigg | \omega_{-1} \bigg)=\frac{P(\omega_0 \cap \omega_1 | \omega_{-1})}{P(\omega_0 | \omega_{-1})}\]

6th Principle: 1. For a constant event, the likelihood of a cause to an event is the same as the probability that the event will occur. 2. The probability of the existence of any one of those causes is the probability of the event (resulting from this cause) divided by the sum of the probabilities of similar events from all causes. 3. For causes, considered \(a \ priori\), which are unequally probable, the probability of the existence of a cause is the probability of the caused event divided by the sum of the product of probability of the events and the possibility (\(a \ prori\) probability) of their cause.

For event \(\omega_i\), let \(\omega_i’\) be its cause. While \(P\) is the probability of an actual existence, \(\mu\) is the measure of the \(a \ priori \ likelihood\) of a cause since its existence is unknown. These two measurements may be used interchangeably where the existential nature of the measurement is known or substitutions as approximations are permissible. In Principle 5 they are conflated since the probability of an occurred event always implies an \(a \ priori\) likelihood.

  1. for \( \omega \) constant (i.e. only 1 cause, \( \omega’\), \(P(\omega’)=P(\omega)\)
  2. for \( \omega_i’ \) equally likely, \(P(\omega_i’)=P(\omega_i’|\omega)=\frac{P(\omega | \omega_i’)}{\sum_j P(\omega|\omega_j’)}\)
  3. \(P(\omega_i’)=P(\omega_i’|\omega)=\frac{P(\omega | \omega_i’)\mu(\omega_i’)}{\sum_j P(\omega | \omega_j’)\mu(\omega_j’)}\)

7th principle, p.17: The probability of a future event, \(\omega_1\), is the sum of the products of the probability of each cause, drawn from the event observed, by the probability that, this cause existing, the future event will occur.

The present is \(t_0\) while the future time is \(t_1\). Thus, the future event expected is \(\Omega(t_1)=\omega_1\). Given that \(\Omega(t_0)=\omega_0\) has been observed, we ask about the probability of a future event \(\omega_1\) from the set of causes \(\Omega'(t_1)={\omega_1^{(i)}:\omega_1^{(i)} \rightarrow \Omega(t_1)}\) (change of notation for causes).

\[P(\omega_1|\omega_0)=\sum_i P(\omega_1^{(i)} | \omega_0)*P(\omega_1 | \omega_1^{(i)})\]

How are we to consider causes? They can be historical events with a causal-deterministic relationship to the future or they can be considered event-conditions, as a spatiality (possibly true over a temporal duration) rather than a temporality (true at one time). Generally, we can consider causes to be hypotheses \(H={H_1 \cdots H_n}\), with \(P(H_i)\) the \(prior\) probability (single term) and \(P(\omega | H_i)\) the \(posterior\) (conditional) probability. The observed event (\(\omega_0\)) is \(\omega_{obs}\) and the future event (\(\omega_1\)) is the expected event \(\omega_{exp}\). Thus, we can restate principles 7 \& 6 as:

6. \( P(H_i|\omega_{obs})=\frac{P(\omega_{obs} | H_i)P(H_i)}{\sum_j P(\omega_{obs} | H_j)P(H_j)}\)

7. \( P(\omega_{exp}|\omega_{obs})=\sum_i P(H_i | \omega_{obs})P(\omega_{exp} | H_i)\ = \frac{\sum_i P(\omega_{obs} | H_i)P(H_i)P(\omega_{exp} | H_i)}{\sum_j P(\omega_{obs} | H_j)P(H_j)}\)

Clearly, Principle 6 is the same as Bayes Theorem (Wassermann, Thm. 2.16), which articulates the Hypotheses \(H\) as a partition of \(\Omega\) in that \(\Omega=\cup_i H_i\) )( (\(H_i \cap H_j = 0 \ for \ i\neq j\)), in that each hypothesis is a limitation of the domain of possible events. The observed event is also considered a set of events rather than a single ‘point.’ Therefore, Principle 6 says that \say{the probability that the possibility of the event is comprised within given limits is the sum of the fractions comprised within these limits} (Laplace, p.18).

8th principle, p.20: The Advantage of Mathematical Hope, \(A\), depending on several events, is the sum of the products of the probability of each event by the benefit to its occurrence

Let \(\omega={\omega_1, \cdots ,\omega_n: \omega_i \in \Omega}\) be the set of events under consideration. Let \(B\) be the benefit function giving a value to each event. The advantage hoped for from these events is:

\[A(\omega)=\sum_i B(\omega_i)*P(\omega_i)\]

A fair game is one whose cost of playing is equal to the advantage gained through it.

9th principle, p.21: The Advantage \(A\), depending on a series of events (\(\omega\)), is the sum of the products of the probability of each favorable event by the benefit to its occurrence minus the sum of the products of the probability of each unfavorable event by the cost to its occurrence.

Let \(\omega={\omega_1, \cdots ,\omega_n: \omega_i \in \Omega}\) be the series of events under consideration, partitioned into \(\omega=(\omega^+,\omega^-)\) for favorable and unfavorable events. Let \(B\) be the benefit function for \(\omega_i \in \omega^+\) and \(L\) the loss function for \(\omega_i \in \omega^-\), each giving the value to each event. The advantage of playing the game is:

\[A(\omega)=\sum_{i: \omega_i \in \omega^+} B(\omega_i)P(\omega_i) – \sum_{j: \omega_j \in \omega^-} L(\omega_j)P(\omega_j)\]

\(Mathematical \ Hope\) is the positivity of A. Thus, if A is positive, one has hope for the game, while if A is negative one has fear.

In generality, \(X\) is the random variable function, \(X:\omega_i \rightarrow \mathbb{R}\), that gives a value to each event, either a benefit (\(>0\)) or cost (\(<0\)). The absolute expectation (\(E\)) of value for the game from these events is:

\[E(\omega)=\sum_i X(\omega_i)*P(\omega_i)\]

10th principle, p.23: The relative value of an infinitely small sum is equal to its absolute value divided by the total benefit of the person interested.

This section can be explicated by examining Laplace’s corresponding section in Théorie Analytique (S.41-42, p.432-445) as a development of Bernoulli’s work on the subject.

(432) For a physical fortune \(x\), an increase by \(dx\) produces a moral good reciprocal to the fortune, \(\frac{k*dx}{x}\) for a constant \(k\). \(k\) is the (say units) of moral goodness (i.e. utility) in that \(\frac{dx}{x}=\frac{1}{k}\rightarrow\) 1 moral good. So, \(k\) is the quantity of physical fortune whereby a marginal increase by unity of physical fortune is equivalent to unity of moral fortune. For a \textit{moral fortune} \(y\),


A moral good is the proportion of an increase in part of a fortune by the whole fortune. Moral fortune is the sum of all moral goods. If we consider this summation continuously for all infinitesimally small increases in physical fortune, moral fortune is the integral of the proportional reciprocal of the physical fortune by the changes in that physical fortune. Deriving this from principle 10,

\[dy=\frac{k\cdot dx}{x}\]

\[\int dy = y = \int \frac{kdx}{x} = k \int \frac{1}{x} dx = kln(x) + C\]

\(C=ln(h)\) is the constant of minimum moral good when the physical fortune is unity. We can put this in terms of a physical fortune, \(x_0\), the minimum physical fortune for surviving one’s existence – the cost of reproducing the conditions of one’s own existence. With \(h=\frac{1}{{x_0}^k}\),

\[y=\int_{x_0}^x dy =\int_{x_0}^x \frac{k*dx}{x}=kln(x) – k ln(x_0)=kln(x) – k ln\bigg(\frac{1}{\sqrt[k]{h}}\bigg)=kln(x) + ln(h) \]

h is a constant given by an empirical observation of \(y\) as never positive or negative but always at least what is necessary, as even someone without any physical fortune will still have a moral fortune in their existence – it is thus the unpriced (say physical fortune) of laboring existence.

(433) Suppose an individual with a physical fortune of \(a\) expects to receive a variety of changes in fortunes \(\alpha, \zeta, \gamma, \cdots\), as increments or diminishings, with probabilities of \(p, q, r, \cdots \) summing to unity. The corresponding moral fortunes would be,

\[ k ln(a+\alpha) + ln(h), \ k ln(a+\zeta) + ln(h), k ln(a+\gamma) + ln(h), \cdots \]

Thus, the expected moral fortune \(Y\) is

\[Y=kp ln(a+\alpha)+ kq ln(a+\zeta) + kr ln(a+\gamma) + \cdots + ln(h)\]

Let \(X\) be the physical fortune corresponding to this moral fortune, as

\[Y=k ln(X) + ln(h)\]


\[X=(a+\alpha)^p(a+\zeta)^q(a+\gamma)^r \cdots\]

Taking away the primitive fortune \(a\) from this value of \(X\), the difference will be the increase in the physical fortune that would procure the individual the same moral advantage resulting from his expectation. This difference is therefore the expression of the mathematical advantage,

\[p\alpha + q\zeta + r\gamma + \cdots\]

This results in several important consequences. One of them is that the mathematically most equal game is always advantageous. Indeed, if we denote by \(a\) the physical fortune of the player before starting the game, by \(p\) his probability of winning, (434) \(\cdots\)

Concerning the Analytical Methods of the Calculus of Probabilities

The Binomial Theorem:

\[(x+y)^n=\sum_{k=0}^n {n \choose k}x^{n-k}y^k\]

Letting \(x=1\),

\[(1+y)^n=\sum_{k=0}^n {n \choose k}y^k=\sum_{k=1}^n {n \choose k}y^k + 1\]

If we suppose these letters are equal

\[\prod_{i=1}^n (1+a_i) \ -1 = \sum_{k=1}^n \prod_{l=1}^{l=k}a_k a_l\]

\[\prod_{i=1}^n (x+y)\]

How many ways can s letters drawn from n be arranged?

\[s!{n \choose s}\]

Consider the lottery composed of \(n\) numbers, of which \(r\) are drawn at each draw:\
What is the probability of drawing s given numbers \(Y=(y_1, \cdots y_s)\) in one draw \(X=(x_1, \cdots, x_r)\)?

\[P(Y \in X)=\frac{{n \choose (n-s)}}{{n \choose r}}=\frac{{r \choose s}}{{n \choose s}}\]

Consider the Urn \(\Omega\) with \(a\) white balls and \(b\) black balls with replacement. Let \(A_n = {\omega_1, \cdots \omega_n}\) be n draws. Let \(\mu_w(A)\) be the number of white balls and \(\mu_b (A)\) be the number of black balls. What is the probability of \(m\) white balls and \(n-m\) black balls being drawn?

\[P\bigg(\mu_w(A_n) = m \ \& \ \mu_b(A_n)=n-m\bigg)=P^n_m=?\]

\((a+b)^n\) is the number of all the cases possible in \(n\) draws. In the expansion of this binomial, \({n \choose m}b^{n-m}a^m\) expresses the number of cases in which $m$ white balls and \(n-m\) black balls may be drawn. Thus,

\[P^n_m=\frac{{n \choose m}b^{n-m}a^m}{(a+b)^n} \]

Letting \(p=P(\mu_w(A_1)=1)=\frac{a}{a+b}\) be the probability of drawing a white ball out of single draw and \(q=P(\mu_b(A_1)=1)=\frac{b}{a+b}\) be the probability of a drawing a black ball in a single draw,

\[P^n_m={n \choose m}q^{n-m}p^m\]

\[\Delta P^n_{m}=\frac{P^n_{m+1}}{P^n_{m}}=\frac{(n-m)p}{(m+1)q}\]

This is an ordinary finite differential equation:

\[{\Delta}^r P^n_{m}= \frac{P^n_{m+r}}{P^n_{m}}=\frac{p^{r}}{q^{r}}\prod_{i=0}^{r-1}\frac{n-m-i}{m+i+1}\]

Three players of supposed equal ability play together on the following conditions: that one of the first two players who beats his adversary plays the third, and if he beats him the game is finished. If he is beaten, the victor plays against the second until one of the players has defeated consecutively the two others, which ends the game. The probability is demanded that the game will be finished in a certain number of \(n\) of plays. Let us find the probability that it will end precisely at the \(n\)th play. For that the player who wins ought to enter the game at the play \(n-1\) and win it thus at the following play. But, if in place of winning the play \(n-1\) he should be beaten by his adversary who had just beaten the other player, the game would end at this play. Thus the probability that one of the players will enter the game at the play \(n-1\) and will win it is equal to the probability that the game will end precisely with this play; and as this player ought to win the following play in order that the game may be finished at the \(n\)th play, the probability of this last case will be only one half of the preceding one. (p.29-30).}

Let \(E\) be the random variable of the number of plays it takes for the game to finish.



\(G_k=(p_1,p_2)\) be the random variable of the two players \((p_1,p_2)\) playing in game \(k\). Let \(W_k=p_0\) be the random variable of the winning player, \(p_0\), of game \(k\).



This is an ordinary finite differential equation for a recurrent process. To solve this probability, we notice the game cannot end sooner than the 2nd play and extend the iterative expression recursively,


\(\mathbb{P}(E=2)\) is the probability that one of the first two players who has beaten his adversary should beat at the second play the third player, which is \(\frac{1}{2}\). Thus,


The probability the game will end at latest the \(n\)th play is the sum of these,

\[\mathbb{P}(E\leq n)=\sum_{k=2}^n \bigg(\frac{1}{2}\bigg)^{k-1} = 1 – \bigg(\frac{1}{2}\bigg)^{n-1}\]


Appendix: Calculus of Generating Functions

In general, we can define the ordinary finite differential polynomial equation. For a particular Event, \(E\), its probability density function over internal-time steps \(n\) is given by the distribution \(f(n)=\mathbb{P}(E=n)\). The base case (\(I_0\)) of the inductive definition is known for the lowest time-step, \(n_0\), as \(f(n_0)=c\), while the iterative step (\(I^+\)) is constructed as a polynomial function \(\mathcal{P}(x)=\sum_i a_i x^i\) on the difference step of one time-unit:

\[I^+: f(n)=\mathcal{P}(f(n-1))\]

\[\rightarrow f(n)=\underbrace{\mathcal{P}(\cdots\mathcal{P}(}n(f(0))\cdots)] [\mathcal{D}(f(n))=f'(n)=\mathcal{D}(\mathcal{P})(f(n-1))f'(n-1)=\prod{k=n_0}^{n-1}\mathcal{D}(\mathcal{P})(f(k)) \]

A Tutorial in Data Science: Lecture 2 – The Hermeneutic Nature of Scientific Data

The question of science itself has never been its particular object of inquiry but the existential nature, in its possibility and thereby the nature of its actuality. Science is power, and thus abstracts itself as the desired meta-good, although it is always itself about particularities as an ever-finer branching process. Although a philosophic question, the “question of science” is inherently a political one, as it is the highest good desired by society, its population, and its government. To make sense of science mathematically-numerically, as statistics claims, it is the scientific process itself that must be understood through probability theory as The Logic of Science, a book on the subject by E.T. Jaynes in the context of a scientific investigation into the equivalency between statistics and quantum physics.1

Linguistic Analysis of the Invariants of Science: The Laws of Nature

The theory of science, as the proof of its validity in universality, must consider the practice of science, as the negating particularity. The symbolic language of science, within which its practice and results are embedded, necessarily negates its own particularity as well, as thus to represent a structure universally. Science, in the strict sense of having achieved already the goal of universality, is de-linguistified. While mathematics, in its extra-linguistic nature, often has the illusion of universal de-linguistification, such is only a semblance and often an illusion. The numbers of mathematics always can refer to things, and in the particular basis of their conceptual context always do. The non-numeric symbols of mathematics too represented words before short-hand gave them a distilled symbolic life. The de-linguistified nature of the extra-linguistic property of mathematics is that to count as mathematics, the symbols must themselves represent universal things. Thus, all true mathematical statements may represent scientific phenomena, but the context and work of this referencing is not trivial and sometimes the entirety of the scientific labor. The tense of science, as the time-space of the activity of its being, is the tensor, which is the extra-linguistic meta-grammar of null-time, and thus any and all times.

The Event Horizon of Discovery: The Dynamics between an Observer & a Black Hole

The consciousness who writes or reads science, and thereby reports or performs the described tensor as an action of experimentation or validation, is the transcendental consciousness. Although science is real, it is only a horizon. The question is thus of its nature and existence at this horizon. What is knowable of science is thereby known as “the event horizon,” as that which has appeared already, beyond which is merely a black hole as what has not yet revealed itself – always there is a not-yet to temporality and so such a black hole can always be at least found as all that of science that has not and cannot be revealed since within the very notion of science is a negation of withdrawal (non-appearance) as the condition of its own universality (negating its particularity). Beginning here with the null-space of black-holes, the physical universe – at least the negative gravitational entities – have a natural extra-language – at least for the negative linguistic operation of signification whereby what is not known is the object of reference. In this cosmological interpretation of subjectivity within the objectivity of physical space-time, we thus come to the result of General Relativity that the existence of a black-hole is not independent of the observer, and in fact is only an element in the Null-Set, or negation, of the observer. To ‘observer’ a black-hole is to point to and outline something of which one does not know. If one ‘knew’ what it was positively then it would be not ‘black’ in the sense of not-emitting light within the reference frame (space-time curvature) of the observer. That one cannot see something, as to receive photons reflecting space-time measurements, is not a property of the object but rather of the observer in his or her subjective activity of observation since to be at all must mean there is some perspective from which it can be seen. As the Negation of the objectivity of an observer, subjectivity is the negatively-curved gravitational anti-matter of black holes. Subjectivity, as what is not known by consciousness, observes the boundaries of an aspect (a negative element) of itself in the physical measurement of an ‘event horizon.’

These invariants of nature, as the conditions of its space-time, are the laws of dynamics in natural science. At the limit of observation we find the basis of the conditionality of the observation and thus its existence as an observer. From the perspective of absolute science, within the horizon of universality (i.e. the itself as not-itself of the black-hole or Pure Subjectivity), the space-time of the activity of observation (i.e. the labor of science) is a time-space as the hyperbolic negative geometry of conditioning (the itself of an unconditionality). What is a positive element of the bio-physical contextual condition of life, from which science takes place, for the observer is a negative aspect from the perspective of transcendental consciousness (i.e. science) as the limitation of the observation. Within Husserlian Phenomenology and Hilbertian Geometry of the early 20th century in Germany, from which Einstein’s theory arose, a Black-Hole is therefore a Transcendental Ego as the absolute measurement point. Our Solar System is conditioned in its space-time geometry by the MilkyWay galaxy it is within, which is conditioned by the black hole Sagittarius A* (SgrA). Therefore, the unconditionality of our solar space-time (hence the bio-kinetic features) is an unknown of space-time possibilities, enveloped in the event horizon of SgrA. What is the inverse to our place (i.e. space-time) of observation will naturally only exist as a negativity, what cannot be seen.

Classical Origins of The Random Variable as The Unknown: Levels of Analysis

Strictly speaking, within Chinese Cosmological Algebra of 4-variables (\(\mu, X,Y,Z\)), this first variable of primary Unknowing, is represented by \(X\), or Tiān (), for ‘sky’ as that which conditions the arc of the sky, i.e. “the heavens” or the space of our temporal dwelling ‘in the earth.’ \(X=SgrA*\) is the closest supermassive blackhole and the most relevant primary unknown for our contextualized solar system since it structures the Milky Way Galaxy and thus the space-time within which our solar system is embedded. In the galactic ecosystem, blackholes interact with distant neighbors and signals as cosmic rays can come from other galaxy to the Earth. It can be said thus that all unknowns (\(x\)) in our space-time of observation are within “the great unknown” (\(X\)) of SgrA, as thus \(x \in X\( or \(x \mathcal{A} X\) for the negative aspectual (\(\mathcal{A}\)) relationship “x is an aspect of X.” These are the relevant, and most general (i.e. universal) invariants to our existence of observation. They are the relative absolutes of, from, and for science. Within more practical scientific judgments from a cosmological perspective, the relevant aspects of variable unknowns are the planets within our solar system as conditioning the solar life of Earth. The Earthly unknowns are the second variable Y, or Di (地) for “earth.” They are the unknowns that condition the Earth, or life, as determining the changes in climate through their cyclical dynamics. Finally, the last unknown of conditionals, Z, refers to people, Ren (人) for ‘men,’ as what conditions their actions. X is the macro unknown (conditionality) of the gravity of ‘the heavens,’ Y the meso unknown of biological life in and on Earth, and Z the micro unknown of psychology as quantum phenomena. These unknowns are the subjective conditions of observation. Finally, the 4th variable is the “object”, or Wu (物), \(\mu\) of measurement. This last quality is the only real value in the sense of an objective measurement of reality, while the others are imaginary in the sense that their real values aren’t known, and can’t be within the reference of observation since they are its own conditions of measurement within “the heavens, the earth, and the person” (Bréard, p.82).

In the quaternion tradition of Hamilton, (\(\mu, X,Y,Z\)) are the quaternions, (\(\mu, i,j,k\)). Since the real-values of X,Y,Z in the scientific sense can’t be known truly and thus must always be themselves unknowns, they are treated as imaginary numbers (\(i=\sqrt{-1}\)) with their ‘values’ merely coefficients to the quaternions \(i,j,k\). These quaternions are derived as quotients of vectors, as thus the unit orientations of measurement’s subjectivity, themselves representing the space-time. We often approximate this with the Cartesian X,Y,Z of 3 independent directions as vectors, yet such is to assume Euclidean Geometry as independent.


[1] E.T. Jaynes, “Probability In Quantum Theory,” Complexity, Entropy and the Physics of Information; Ed. by W. H. Zurek, Wesley Publishing Co., 1990.

[2] Andrea Bréard, Nine Chapters on Mathematical Modernity: Essays on the Global Historical Entanglements of the Science of Numbers in China, Springer, 2019.