Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Purdue University Purdue Logo Purdue Libraries

Computational Linguistics: Linguistics Data Consortium

A guide to library and information resources in computational linguistics

LDC Data Catalog

The Linguistic Data Consortium (LDC) is an open consortium of universities, libraries, corporations and government research laboratories. It was formed in 1992 to address the critical data shortage then facing language technology research and development. LDC provides a repository and distribution point for language resources, in particular, regular releases of datasets that can be ordered through its Data Catalog.

You can create a guest account at LDC and sign up for their newsletter, browse the data catalog, and access limited sections of LDC Online.

Only organizational members are able to order and access datasets from the Data Catalog, and only a limited number of datasets can be ordered in a year. You may create an organizational account by registering with LDC, accepting it use agreement, and completing the registration form. Select "Purdue University" as the organization. It may take up to a week for your account to be verfied.

Purdue's organizational membership in the LDC and accounts are managed by the Purdue University Libraries. For questions or to request that a particular dataset from the Data Catalog be ordered for you, contact Professor Michael Witt.