Skip to Main Content
Purdue University Purdue Logo Purdue Libraries

Data Management for Undergraduate Researchers

How to collect, save, document and preserve research

Data Quality Checking Guide

Data Credibility Timeliness Information Completeness Data Accuracy Data Consistency Data Deduplication
Who created the data? Are data outdated? Are there missing elements in data records? Are there data typos? Are data formats consistent? Are there repeated data records?
Who published the data? When are the data captured and updated? Are there missing data records? Are data formats correct? Are data units measurements consistent? Are data entered more than once?
Who contributed to the data? Is version control implemented to track revisions of a data set? Are all information captured for their intended uses? Are there data outliers which may not be recorded accurately? Are types of data consistent?
Is contact information available? Are data described to be findable and reusable? Do data represent the information we intend to capture? Are data synched within and across platforms?

This checklist is available under a CC BY 4.0 license, with attribution to Wei Zakharov, Purdue Libraries and School of Information Studies.

Example 1

Research Questions

  • What are the impacts of covid vaccines on the Tennessee’s new cases numbers?
  • Are people of a particular age vulnerable to coronavirus disease?

Two data sources:

Data quality checking:
questions from the data quality check guide answered in relation to this example question
 

Example 2

Research question:

  • How different driving scenarios impact driver’s choice of accepting paper coupons by stores? Investigated factors include age group, weather, passenger, and income.

Data source:

  • UC-Irvine Machine Learning Repository. Data were collected via a survey on  Amazon Mechanical Turk.

Data quality checking:

Data Credibility Timeliness Information Completeness Data Accuracy Data Consistency Data Deduplication
Who created the data?
Tong Wang
Cynthia Rudin
Are data outdated?
Created on 9/15/2020
Are there missing elements in data records?
Yes
Are there data typos?
No
Are data formats consistent?
Yes
Are there repeated data records?
No
Who published the data?
The Journal of Machine Learning Research
When are the data captured and updated?
A Survey on Amazon Mechanical Turk
Are there missing data records?
Yes
Are data formats correct?
Yes
Are data units measurements consistent?
Yes
Are data entered more than once?
No
Who contributed to the data?
Faneli Doshi-Velez
Yimin Liu
Erica Klampfl
Is version control implemented to track revisions of a data set?
No. 1-time survey
Are all information captured for their intended uses?
Yes
Are there data outliers which may not be recorded accurately?
No
Are types of data consistent?
Yes
Is contact information available?
Yes
  Are data described to be findable and reusable?
Yes
Do data represent the information we intend to capture?
Yes
Are data synched within and across platforms?
Yes