Skip to Main Content
Purdue University Purdue Logo Purdue Libraries

Data Science

Books and resources about Data Science

Table of Content

Content of Data Science Essentials:

What's Data Science

  • Generalized definition: 
  • Data science is an interdisciplinary field that combines mathematical and statistical foundations with advanced computational power and the capacity to extract knowledge from data in specific domains. 

  • Definition at Purdue (from Integrative Data Science Initiative)
  • "Data science – the grand interdisciplinary challenge to extract new knowledge from big data through advanced analytics – presents a transformational opportunity for Purdue."  by Jay Akridge, Provost and Executive Vice President for Academic Affairs and Diversity

As an emerging interdisciplinary field, data science is a term beyond any single definition. Please check out the post, Defining Data Science: The What, Where and How of Data Science, for a detailed explanation of what is data science  

Data Science Principles

The classic three cores of data science by Drew Conway 2010:

  • Computational / Programming Skills
  • Math & Statistics Knowledge
  •  Domain Expertise 

 The classic three cores of data science

The ten data science foundations defined by National  Academies of Science, Engineering, and Medicine:

  • Mathematical Foundations
  • Computational Foundations
  • Statistical Foundations
  • Data Management and Curation
  • Data Description and Visualization
  • Data Modeling and Assessment
  • Workflow and Reproducibility 
  • Communication and Teamwork
  • Domain-specific Considerations
  • Ethical Problem Solving

The six data science principles in the data science curriculum (merged from the ten data science foundations by Shao et al. 2021)

'Data' principles:

  • Data Management
  • Communication and Visualization
  • Ethics and Privacy

'Science' principles:

  • Statistics
  • Computer Science
  • Domain Expertise

The ten data science foundations defined

Data Lifecycle

DataOne Data Life Cycle diagram

DataOne Data Life Cycle

  • Plan: description of the data that will be compiled, and how the data will be managed and made accessible throughout its lifetime
  • Collect: observations are made either by hand or with sensors or other instruments and the data are placed into digital form
  • Assure: the quality of the data is assured through checks and inspections
  • Describe: data are accurately and thoroughly described using the appropriate metadata standards
  • Preserve: data are submitted to an appropriate long-term archive (i.e. data center)
  • Discover: potentially useful data are located and obtained, along with the relevant information about the data (metadata)
  • Integrate: data from disparate sources are combined to form one homogeneous set of data that can be readily analyzed
  • Analyze: data are analyzed