# A Tutorial in Data Science: Lecture 1 – The Foundations of Data Science

by | Jan 12, 2021 | Math Lecture

## The Scientific Process: Question-Asking & Temporality

The scientific process is the temporality of question-asking, and as such the basis for temporality, which ontologically answers the epistemological questions in terms of genesis or the `coming forth from’ that underlies notions of causality.

The business of science is the procedure of knowledge accumulation. It begins with a qualitative hermeneutic experience of a phenomenon from being-in-the-world that is articulated into its parts through the process of rationalization and ends with the quantitative formalization of claim-making whereby empirical quantities are measured and the confidence of the validity of propositional claims are also measured. Statistics is this last secondary measurement action, the method of determining the validity of a hypothetical propositional claim about empirical nature. A claim that is not particularly valid will likely be true only some of the time or under certain specific conditions that are not too common. Ultimately, thus, within a domain of consideration, statistics answers the question of the universality of the claims made about nature through empirical methods of observation. It may be that two opposing claims are both true in the sense that they are each true half the time of random observation or within half the space of contextual conditionalities. The scientific process, as progress, relies on methods that over a linear time of repeated experimental cycles, increase the validity of the claims as the knowledge of nature approaches universality, itself always merely a horizon within the phenomenology of empiricism. This progressive scientific process is called ‘discovery,’ or merely research, although it is highly non-linear.

The scientific process is a branching process as the truth of a claim is found to be dependent upon its conditions, and those conditions found dependent on further conditionals. This structure of rationality is as a tree. A single claim (C) has a relative validity (V) due to the truth of an underlying, or conditioning, claim, $$C_i$$, given as $$V_{C_i}(C)=V(C,C_i)$$. We may understand the validity of claims through probability theory, in that the relative validity of a claim based on a conditioning claim is the probability the claim is true conditioned on $$C_i$$, $$V(C,C_i)=P(C|C_i)$$. In general, we will refer to the object under investigation, of which C is a claim about, as the primary variable X, and the subject performing the investigation, of which $$C_i$$ is hypothesized (as a cognitive action), as the second variable Y. Thus, the orientation of observation, i.e. the time-arrow, is given as $$\sigma: Y \rightarrow X$$.

The question of inference is thus how to answer $$P(C|C_i)$$. Given our assumptions, $$A=C_i$$, we find the probability of validity for the hypothesis $$C=H$$, as thus $$P(H|A)$$.

An observer (Y) makes an observation from a particular position of an event (X) with its own place, forming a space-time of the action of measurement. An observation-as-information is a complex quantum-bit, which within a space of investigation is a complex variable, representing a tree of observation-conditioning rationality resulting from the branching process of hypothesis formation, with each node a conditional hypothesis and edge length the conditional probability. The gravitation of the system of measurement is the space-time tensor of its world-manifold, stable or chaotic of the time of interaction. We thus understand the positions of observers within a place of investigation, itself given at least in real-part component by the object of investigation.

## The Experimental Set-up: Breaking the Flow of Nature

Nature is explained by a parameterized model. Each parameter, as a functional aggregation of measurement samples, has itself a corresponding distribution as it $$occurs \ in \ nature$$ along the infinite, universal horizon of measurement.

Let $$X^n$$ be a random variable representing the n qualities that can be measured for the thing under investigation, $$\Omega$$, itself the collected gathering of all its possible appearances, $$\omega \in \Omega$$ such that $$X^n:\omega \rightarrow {\mathbb{R}}^n$$. Each sampled measurement of $$X^n$$ through an interaction with $$\omega$$ is given as an $$\hat{X}^n(t_i)$$, each one constituting a unit of indexable time in the catalogable measurement process. Thus, the set of sampled measurements, a sample space, is a partition of ‘internally orderable’ test times within the measurement action, $${ \hat{X}^n(t): t \in \pi }$$.

$$\Omega$$ is a state-system, i.e the spatio-temporality of the thing in question, in that it has specific space-states $$\omega$$ at different times $$\Omega(t)=\omega$$. $$X$$ is the function that measures $$\omega$$. What if measurement is not Real, but Complex: $$X: \Omega \rightarrow \mathbb{C}$$? While a real number results from a finite, approximate, or open-ended process of objective empirical measurement, an imaginary number results from a subjective intuition or presupposition to measurement. Every interaction with $$\Omega$$ lets it appear as $$\omega$$, which is quantified by $$X$$. From these interactions, we seek to establish truths about $$\Omega$$ as quantifying the probability that the Claim $$C$$ is correct, which is itself a quantifiable statement about $$\Omega$$.

In this set up of statistical sampling, one will notice the step-wise process-timing of a single actor performing n sequential measurements can be represented the same as n indexed actors performing simultaneous measurements, at least with regard to internal time accounting. In order to infer the latter interpretational context, such as to preserve the common sense notion of time as distinct from social space, one would represent all n simultaneous measurements as n dimensions of X, although assumed to be generally the same in quality in such that all n actors sample the same object in the same way, yet are distinct in some orderable indexical quality. Thus, in each turn of the round time (i.e. one unit), all actors perform independent and similar measurements. It may be, as in progressive action processes, future actions are dependent on previous ones, and thus independence is only found within the sample space of a single time round. Alternatively, it may also be that the actors perform different actions, or are dependent upon each other in their interactions. Thus, the notion of actor(s) may be embedded in the space-time of the action of measurement. The embedding of a coordinated plurality of actors in the most mundane sense of ‘collective progress’ can be represented as the group action of all independent \& similar measurers completes itself in each round of time, with inter-temporalities in the process measurement process being similar but dependent on the previous one. The progressive interaction may be represented as the inducer $$I^+:X(t_i) \rightarrow X(t_{i}+1)$$, with the assumptions of similarity and independence as $$\hat{x_i}(t) \sim \hat{x_j}(t) \ \& \ I(\hat{x_i}(t),\hat{x_j}(t))=0$$. We take $$\hat{X}(t)$$ to be a group of measurement actors/actions $${ \hat{x}_i(t): i \in \pi }$$ that acts on $$\Omega$$ together, or simultaneously, to produce a singular measurement of one round time

## Deriving the Distribution for Normalicy: The Unquestioned State of Nature

The question with measurement is not, “what is the true distribution of the object in question in nature?,” for this uncountable nature cannot be known within the countable horizon of empirical science, but rather “what is the distribution of the parameter I am using to measure?,” as thus the mimetic relationship between subject and object in the activity of measurement. The underlying metric of the quality under investigation, itself arising due to an interaction of measurement as the distance function within the investigatory space-time, is $$\mu$$. As the central limit states, averages of any such measurements, each having an error, will converge to normalcy. The reflective view on this convergence process as time-conditioning is backwards, in that all measurements as answers to the question come from this primitive state of normalcy, before distinction has been made by inquiry. We can describe analytically the distribution of that which has not been questioned, and thus atemporal, by the averaged measurements in that the rate of change of the frequency $$f$$ of our sample measurements $$x,x_0 \in X$$ by the change in the space of measuring, is inversely proportional, by constant k, to the distance from the true measurement ($$\mu$$) and the frequency:

$$\forall \epsilon > 0, \exists \delta(\epsilon)>0 \ s.t. \ \forall x, |x_0-x|<\delta \rightarrow \bigg| k(x_0-\mu)f(x_0) – \frac{f(x_0)-f(x)}{x_0-x}\bigg|<\epsilon$$

or in the differential form

$$\frac{df}{dx}=-k(x-\mu)f(x)$$

$$f(t)=\int_{-\infty}^{+\infty}-k(x-\mu)f(x)dx$$

the solution distribution is scaled by the constant of coefficient, C

$$f(x)=Ce^{-\frac{k}{2}{(x-\mu)}^2}$$

given the normalization of the total size of the universe of events as 1

$$\int_{-\infty}^{\infty} f dx =1$$

thus,

$$C=\sqrt{\frac{k}{2\pi}}$$

so the total distribution is,

$$f(x)=\sqrt{\frac{k}{2\pi}}e^{-\frac{k}{2}{(x-\mu)}^2}$$

$$E(X)=\int (x-\mu)f(x)dx=\mu$$

$$\sigma^2=E(X^2)=\int {(x-\mu)}^2 f(x)dx=\frac{1}{\sqrt{k}}$$

so, $$f(x)=N\bigg(\mu,\sigma=\frac{1}{\sqrt{k}}\bigg)$$

Responses to this post:

# Blog Main

Featured Bundles

## All Categories

Math Resources

Student Tips

Math Learning

Math Test Preps

Math Lectures

Professional Math

Math Fun Facts

Math Blogs