Skip to main content
Purdue University Purdue Logo Purdue Libraries

Applied Big Data Workshop: Variety

Variety in the CAM2 Project

Variety in Big Data projects may not seem immediately evident.  Data used in big data projects may have  different rates, sizes, and frequencies as well as different policies attached to the data. 

What is Variety?

Data in many forms - structured, unstructured, text, multimedia

Variety Activity

Identifying Variety

Policy - Data providers dictate terms for the use of their dataset. Each of the providers has different policies that specify different download rates, different acceptable uses and different technical specifications such as frame rates. The providers may also have different security requirements (who may or may not access the frames), access/sharing requirements(watermarks or restrictions on how the image may be shared or reused), multiple owners or rights holders for the images (which leads to unclear provenance for future reuse) and many levels of quality for the resulting data due to a variety of equipment (cameras, servers, etc.)  

 Variety in the data impacts coding decisions in multiple ways. This can include:

  • Storage
  • Metadata
  • Security
  • Access to the data
  • Quality Control
  • Analytical Methods

 

Quick Tutorials on Data Management