Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Data Wrangling Tools
The tool list below offers resources with at least some free functionality which you can use to move data from one format to another as necessary to answer your research questions. This is not an exclusive list, nor does the presence of the tool on this list indicate a requirement that your team use it. This is a reference list only.
If a tool requires an account, it will be noted with an A.
If a tool requires a subscription for some or all functionality, it will be noted with an S.
Data Wrangling Tools
A tool for mining data locked in .pdf fiiles. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux.
A powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
R is a powerful data management tool. Multiple libraries have been created for wrangling messy data, including dlpr and tidyr. The link above includes a basic introduction to data wrangling with R.
With Trifacta Wrangler, any user with a Mac or Windows machine and Internet connection is able to download, install and start using Trifacta immediately. Trifacta Wrangler empowers analysts to wrangle diverse data sources on their desktop in preparation for use in analytics or visualizations tools such as Tableau. Trifacta Wrangler does not require an underlying storage and processing environment outside of what’s already available on modern Mac and Windows machines. By leveraging the best-of-breed hybrid architecture of a connected desktop application (used by Spotify, Slack, etc…), users are able to have the agility of working with data locally on their machine while also benefitting from the advantages of seamless product updates and metadata access over an internet connection.
csvkit is a suite of command-line tools for converting to and working with CSV.
Python and Pandas
Python of course is an excellent language for data manipulation. Add on the Pandas library, which includes its DataFrame object, and data scientists can quickly perform even more complex operations. For example, merging, joining, and transforming huge hunks of data with a single Python statement.
Mr. Data Converter
Mr. Data Converter is straight forward – it takes Excel data and transforms it to web-friendly formats like HTML, JSON and XML.