Skip to Main Content
Purdue University Purdue Logo Purdue Libraries

Data Management for Undergraduate Researchers

How to collect, save, document and preserve research

Data Quality Checking Guide

Data Credibility

Timeliness

Information Completeness

Data Accuracy

Data Consistency

Data Deduplication

Who created the data?

Are data outdated?

Are there missing elements in data records?

Are there data typos?

Are data formats consistent?

Are there repeated data records?

Who published the data?

When are the data captured and updated? 

Are there missing data records?

Are data formats correct?

Are data units measurements consistent?

Are data entered more than once?

Who contributed to the data?

Is version control implemented to track revisions of a data set?

Are all information  captured for their intended uses?

Are there data outliers which may not be recorded accurately?

Are types of data consistent?

Is contact information available?

Are data described to be findable and reusable?

Do data represent the information we intend to capture?

Are data synched within and across platforms?

This checklist is available under a CC BY 4.0 license, with attribution to Wei Zakharov, Purdue Libraries and School of Information Studies.

Example 1

Research Questions:

§What are the impacts of covid vaccines on the Tennessee’s new cases numbers?
§Are people of a particular age vulnerable to coronavirus disease?

Two data sources:

§TN Department of Health. (2021). Daily case information [Data file]. Retrieved from https://www.tn.gov/health/cedep/ncov/data/downloadable-datasets.html
§TN Department of Health. (2021). COVID vaccine state summary [Data file]. Retrieved from https://www.tn.gov/health/cedep/ncov/data/downloadable-datasets.html
Data Quality Checking:
Data Quality Checking Guide Example 1
 

Example 2

§Research question:
How different driving scenarios impact driver’s choice of accepting paper coupons by stores? Investigated factors include age group, weather, passenger, and
income.

§Data source:

  UC-Irvine Machine Learning Repository. Data were collected via a survey on  Amazon Mechanical Turk.

§Data quality checking:

Data quality checking example 2