50 years of Data Science

1 minute read

Drawing on work by Tukey, Chambers, Breiman and Cleveland, Stanford statistics professor, David Donoho present a vision of data science based on the activities of people who are ‘learning from data’.

John Tukey’s The Future of Data Analysis, asserts that Statistics must become concerned with the handling and processing of data, its size, and visualization.
John Chambers’s S language, the predecessor of R, is the forerunner of the “notebook” concept, where an academic paper can be made reproducible, scripted, shareable (i.e. Jupyter Notebook)
Leo Breiman’s Two Cultures notes that concern strictly with prediction accuracy is different from inference about models, and that the former is under-represented in academia but prevalent in industry, where it has turned into “machine learning.”
William S. Cleveland 2001 paper Data Science: An Action Plan for Expanding the Technical Areas of the ﬁeld of Statistics addressed academic statistics departments and proposed a plan to reorient their work.

His paper reviews the recent spectacle about data science in the popular media, and about how/whether Data Science is really different from Statistics.

He also describe an academic ﬁeld dedicated to improving that activity in an evidence-based manner. His premises is that this new ﬁeld is a better academic enlargement of statistics and machine learning than today’s Data Science Initiatives, while being able to accommodate the same short-term goals.

He propose to call the following collection of activities below as a would-be ﬁeld “Greater Data Science”

Data Exploration and Preparation
Data Representation and Transformation
Computing with Data
Data Modeling
Data Visualization and Presentation
Science about Data Science

He contended that Information technology skills are a premium but scientiﬁc understanding and statistical insight should be ﬁrmly in the driver’s seat.

Check out a thoughtful essay by Stanford statistics professor David Donoho, titled “50 Years of Data Science”

Share on

Twitter Facebook LinkedIn

50 years of Data Science

Share on

You may also enjoy

Online Jupyter Notebook

What I learn in 2018

Formatting options for Docker commands

Reproducible Software Stack