As explained in the first Lecture of A Tutorial in Data Science – a 10 Lecture Series I wrote to supplement our Data Science course at Mathematics Academy to help you learn the theoretical foundations of this new science – Statistics is the study of measuring the validity of claims and claim-complexes under uncertain (i.e. random) circumstances. In the application of any metrics or models to empirical conditions, it is required that the generality of the representational sample used to estimate underlying realities be determined by statistical method. Thus, Statistics is necessary to estimate the validity of summary metrics and predictive models in science and institutional operations in order to clarify the generality of the quantitative estimations, which have been calculated only for a representative sample. In empirical reality, there is a randomness to the conditions because they change over time or space to an extent beyond the deterministic reduction of the model or the sensitivity of the measurement instrument. In Lecture 9, I describe how non-reducible stochastic processes – time-evolving statistical processes – are equivalent to chaotic dynamical systems. Thus, randomness from our perspective of measurement is the same as a limitation of knowledge since we cannot know all the circumstances of the continuum of reality. To view these different perspectives together we may understand the observer and the object observed as constituting a single communication system, where the observer asks a question via a measurement apparatus and the object in question answers it by revealing itself as a datum within the experimental setup. This paradigm of communication, or information, science originated in the 1950s enabling the evolution of computer systems into communication systems and the internet. Thus, today we understand randomness as either an incompleteness in the message data or noise in the channel of sending the message as a signal. Statistics is necessary to estimate this error in order to know the confidence in our interpretive decipherment of the message data either from other people, nature, or our intermediary complex systems.
Ever since the cheapening of computer data storage in the 1990s, data has been collected at a rate faster than the development of interpretive schemas from which to analyze this data into useful decision-making results, causing a surplus of data in relation to interpretations. Thus, “Data Science” evolved as a science and role for the application of statistical methods to these large unstructured databases where the required hypothesis to test is ambiguous by using specific techniques and technologies for these datasets, such as Hadoop.
20 Professional Roles requiring Statistics:
- For the traditional role of Statistician, a Bachelors, and even a Masters, in Statistics or Mathematics is clearly preferred, but a few courses (6 course credit hours) in it or its applications will suffice for any official qualification. A statistician is often required on a scientific or government project in order to perform official statistical measurements of the hypothesis, in order to certify the validity of the claims made by the researchers in a scientific experiment or program evaluation.
2. SAS Clinical Programmer
- Specifically using SAS software for coding automated statistical analysis, as for evaluating clinical drug trials for approval.
- An Analyst will apply techniques of data analysis to different subject matters and will require an additional specialty in that area, although this can sometimes be learned on the job. Often, Statistics is used, but without the same high standards of control and non-bias as in a scientific experiment since the data is often gathered and analyzed ad hoc for the purpose of better institutional decision-making, i.e. giving some empirical basis beyond mere intuition, rather than proving facts about nature. A p-value of 15% is often as good as you will get since at least it tells you one hypothesis is better than the alternative, i.e. better than mere chance. The emphasis in this role is using data technologies to gather quantitative insights, interpreting them, and communicating them to non-technical decision-makers.
3. Data Analyst
- A Data Analyst applies techniques of analysis to a wide array of different subject matters and thus should have a specialty in general-purpose data technologies rather than the subjects of application. A Data Analyst should thus know how to make and use pipelines for gathering data from different sources, methods for cleaning and transforming data in preparation of analysis, and basic statistical analysis such as summary statistics, hypothesis testing, and regression modeling. Today, Python is commonly used for programming purposes, SQL is required to pull and organize the databases, and R is used for statistical analyses. I have written a blog post to teach you the basics of integrating these different programming languages and tools to do a simple visual analysis of the economic impact of COVID-19.
4. Intelligence Analyst
- An intelligence analyst uses data analysis techniques for sensitive government information either directly for the intelligence agencies or for contractors. This is a great role for someone with a liberal arts education and thus able to use critical thinking across disparate subjects, who also likes to view problems quantitatively in terms of data and analysis, i.e. from a neutral perspective of being blind to the interpretive conclusions.
5. Business Analyst
- A Business Analyst translates business problems into data and modeling problems so they can be approached systematically and answered quantitatively.
6. Business Intelligence Analyst
- The emphasis here is on gathering intelligence about competitors in order to make more strategic decisions.
7. Business Systems Analyst
- This role must translate business systems into functional systems for quantitative analysis. One must view the human-to-human operational processes as input-output processes oriented towards specific objectives via coordination mechanisms.
8. Program Analyst
- Program Analysts use data analysis in the research aspects of understanding the empirical landscape on which a program operates and in evaluating the success of a program through key metrics; additional experience in policy or the subject of the program is a plus.
9. Management Analyst
- Management Analysts analyze the performance of members in a team or institutional workforce in order to identify where human resource allocation and efficiency can be improved; additional expertise in management principles and the psychology of motivation is a plus.
10. Operations Research Analyst
- This subject is specialized in general organizational decision-making by translating such organizational operations problems into mathematical models in order to minimize risk and maximize efficiency.
11. Market Research Analyst
- This is the empirical analysis of markets for the purpose of scouting investment opportunities and sometimes requires analyzing surveys from consumers.
12. Financial/Budget Analyst
- This involves accounting, speculative projections, and valuation estimations by analyzing market potentials and business success.
- These roles conduct scientific research, which has a high standard of precision, i.e. statistical significance p-value <5%, so require a high degree of expertise, sometimes a Ph.D.
13. Data Scientist
- A Data Scientist is similar to a Data Analyst but must be able to use statistical methods on large unstructured datasets, thus requiring big data technologies like Hadoop and techniques of machine learning.
14. Research & Development Scientist
- This role conducts scientific research, sometimes for experimental purposes and sometimes for technology development, yet both will always involve hypotheses testing. A Bachelors’s can be sufficient for literature reviews and background research within a larger team. One must be at least able to learn the scientific standards and techniques for the problems at hand.
- Instead of empirical analysis, the engineer builds infrastructural operating information systems, which will often include automated analytical systems for empirical or testing purposes.
15. Data Engineer
- The Data Engineer is tasked with engineering the information systems for data collection, cleaning transformation, and set up for analysis, often involving many dynamic parts from different sources.
16. Analytics Engineer
- The Analytics Engineer must create all the analytic metrics for a system to operate and evaluate its success in testing, whether this is for a business operation, a social program, or an information system.
C.i. Software Engineer
17. Research Software Engineer
- The Research Software Engineer role was created in 2010 in order to help create more usable and standardized research software since such software is often created for single-purpose highly specialized research questions and scenarios. While this is predominantly a coding job for packaging existing research software, one must understand the statistical methods used within the research software in order to test its usefulness and generality of results.
- Once you have had one of the roles in the previous categories, you can become a manager to oversee such operations.
18. Analytics Manager
- An Analytics Manager oversees the work of data analysts, often setting their agenda based upon the priorities of a business and translating these results to stakeholders.
19. Project Manager
- A project manager must oversee the completion of projects, often allocating resources, ensuring communication between team members, and creating a viable work process. But, a project manager, especially where those projects are scientific studies, must also be able to rigorously evaluate the success of the projects and the generality of their results.
D.i Product Manager
- A Product Manager oversees the development of a product, often performing background data analysis research to first formulate the requirements of the product, managing a product (i.e. technology) team through the product development process as a project until completion, and then testing its success, i.e. that it performs the expected function using statistical rigor.
20. Data Product Manager
- Data and Statistical skills are especially useful when the Product itself under development is a Data Product. As a Data Product Manager, you determine the data needs of your organization or clients and develop the data-technology systems to deliver those data products in the timely and specialized manner needed, thus requiring familiarity with both the underlying data information systems and the statistical techniques to summarize, aggregate, or abstract data.