Viewpoint: Statistical Data Science, The Data Analysis Side
There is no way a term like ‘Data Science’ be interpreted without including data analysis and statistics. The statistics toolbox is a large irreplaceable part of the lingua franca of science.
“I believe statistics has many cultures.”
— Emanuel Parzen, Professor, Texas A&M University
By Randy Bartlett, Feb 2014.
With so much LARGE talk about the role of Quants/statisticians and the utility of statistics, we need to restate what we view with clarity.
The data world is split into two skill sets: one for managing data and another for analyzing it.
Inside the corporation, we will describe those who perform data analysis professionally as Business Quants. The heavies, the go-to guys for data analysis, who use statistical software. I will use this less tainted term to denote those econometricians, industrial engineers, operations researchers, statisticians, et al., who apply three toolboxes: mathematics, statistics, and algorithms. Mathematical tools, which are "coated" in logic and wrapped around algorithms, address complete numbers.
Statistical tools, coated in logic and mathematics and wrapped around algorithms, address incomplete data. The coatings provide rigor. For complete numbers, we can deduce and obtain unique results (gotta like that); for incomplete data, we must infer. Other algorithmic tools (logic, heuristics, and optimization) comprise a third tool box that works in both the complete and incomplete domains. These three toolboxes have strong interdependencies and we refer to quants as those professionals who employ them to analyze data.
(Business) Quants must combine their knowledge about the (business) problem with a mastery of techniques from all three toolboxes. This involves knowing when each tool does and does not apply.
During the most recent hoopla, we have heard restrictive definitions of statistics and narrow perceptions of what Quants do—sometimes coming from people statistics training! Ignore them. Applied statistics includes more than the table of contents from a Stat 201 book; more than the research interests of statistics professors; and more topics than time to cover them in graduate school. A realistic definition of applied statistics includes everything we can do with statistical thinking and statistical assumptions.
That written, there are those who want to play down statistics in Data Science and there are those who want to play down Data Science by not including statistics. To the first, I write that there is no new vision of how to approach data without knowing statistics. To both, I write that there is no way a term like ‘Data Science’ will not be interpreted to include data analysis and statistics. The statistics toolbox is a large irreplaceable part of the lingua franca of science.