Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Purdue University Purdue Logo Purdue Libraries

Data Science

Books and resources about Data Science

Table of Content

Content of Data Science Essentials:

What's Data Science

  • Generalized definition: 
  • Data science is an interdisciplinary field that combines mathematical and statistical foundations with advanced computational power and the capacity to extract knowledge from data in specific domains. 

  • Definition at Purdue (from Integrative Data Science Initiative)
  • "Data science – the grand interdisciplinary challenge to extract new knowledge from big data through advanced analytics – presents a transformational opportunity for Purdue."  by Jay Akridge, Provost and Executive Vice President for Academic Affairs and Diversity

As an emerging interdisciplinary field, data science is a term beyond any single definition. Please check out the post, Defining Data Science: The What, Where and How of Data Science, for a detailed explanation of what is data science  

Data Science Principles

The classic three cores of data science by Drew Conway 2010:

  • Computational / Programming Skills
  • Math & Statistics Knowledge
  •  Domain Expertise 

 The classic three cores of data science

The ten data science foundations defined by National  Academies of Science, Engineering, and Medicine:

  • Mathematical Foundations
  • Computational Foundations
  • Statistical Foundations
  • Data Management and Curation
  • Data Description and Visualization
  • Data Modeling and Assessment
  • Workflow and Reproducibility 
  • Communication and Teamwork
  • Domain-specific Considerations
  • Ethical Problem Solving

The six data science principles in the data science curriculum (merged from the ten data science foundations by Shao et al. 2021)

'Data' principles:

  • Data Management
  • Communication and Visualization
  • Ethics and Privacy

'Science' principles:

  • Statistics
  • Computer Science
  • Domain Expertise

The ten data science foundations defined

Data Lifecycle

DataOne Data Life Cycle diagram

DataOne Data Life Cycle

  • Plan: description of the data that will be compiled, and how the data will be managed and made accessible throughout its lifetime
  • Collect: observations are made either by hand or with sensors or other instruments and the data are placed into digital form
  • Assure: the quality of the data is assured through checks and inspections
  • Describe: data are accurately and thoroughly described using the appropriate metadata standards
  • Preserve: data are submitted to an appropriate long-term archive (i.e. data center)
  • Discover: potentially useful data are located and obtained, along with the relevant information about the data (metadata)
  • Integrate: data from disparate sources are combined to form one homogeneous set of data that can be readily analyzed
  • Analyze: data are analyzed