Skip to main content
Purdue University Purdue Logo Purdue Libraries

BCHM 60501 Macromolecules: Databases

This guide contains information to support the course BCHM 60501 Macromolecules.


The National Center for Biotechnology Information  (NCBI) provides analysis and retrieval resources that include GeneBank, Entrez, MyNCBI, PubMed, BLAST, Electronic PCR, Cancer Chromosomes, among many others.  The links give here are through the Purdue Libraries proxy so that you can access articles from off campus with your Purdue ID.

PubMed is the NCBI interface to MEDLINE, a database of over 20 million journal articles.

The Gene Expression Omnibus (GEO) is a public functional genomics data repository containing both raw and processed microarray and sequencing data.  GEO provides tools to search, analyze and acquire microarray and sequencing data.  GEO can be searched both by experiment (dataset) or by gene (GEO profiles).

The Sequence Read Archive (SRA) is a repository for raw sequencing data from a variety of platforms.  Data deposit and acquisition requires special tools provided by NCBI.

The Gene database integrates data for many species.  This is a great place to start to find structural and functional information about a gene and the encoded protein or RNA.


The European Bioinformatics Institute "EMBL-EBI provides freely available data from life science experiments,  performs basic research in computational biology and offers an extensive user traing programs, supporting researchers in academia and industry."


Ensembl is a database of vertebrate and other select eukaryotic genomes.  The content of the databases is similar to NCBI, but Ensembl has a more modern interface. 

The Array Express archive is a repository for functional genomics data including microarray and sequencing data.  It is similar to GEO and some data is available on both sites.

UCSC Genome Bioinformatics Site contains the reference sequence and working draft assemblies for a large collection of genomes.

The UCSC Genome Browser provides access to a wealth of functional human genomics data.  This tool allows researchers to browse through the human genome, viewing a wide range of data types.  Researchers can also upload their own data tracks or download data from UCSC.  There are free Open Helix tutorials for the UCSC Genome Browser.

The UCSC Table Browser is a tool to find and download functional genomics data for a wide range of genomes.  This is a complex tool that require some training and experience to use effectively.  There is a good user's guide and an OpenHelix tutorial.

Gene Ontology Consortium

The Gene Ontology (GO) project is a collaborative bioinformatics project with the goal of providing complete and consistent descriptions of gene products across all organisms.  The descriptions fall into three large categories (ontologies): Biological Process, Cellular Compartment and Molecular Function.  Accordingly, each gene product is likely to have at least three descriptions.  However, some gene products have no descriptions whereas others can have a dozen or more.

"The BioMart project provides free software and data services to the international scientific community in order to foster scientific collaboration and facilitate the scientific discovery process. The project adheres to the open source philosophy that promotes collaboration and code reuse."

Multiple databases use BioMart software, and one of the most generally useful is available at Ensembl.


InterMine is open source software designed specifically for the creation of complex biological databases.  InterMine also provides tools to query these databases.  Like BioMarts, InterMine has been adopted by multiple research communities.  Generally, research communties will adopt either BioMart or InterMine, e.g. Gramene maintains a BioMart of biological data for plants whereas Araport provides ThaleMine, an InterMine database for Arabidopsis research.


The World Wide Protein Data Bank is a global collaboration.  This ensures that the PDB archive of 3D structural data for proteins and nucleic acids is uniform and available globally.  The RCSB PDB website is maintained by the Research Collaboratory for Structural Biology, located in the United States.  This website provides resources to access and analyze 3D structural data.

"The Cancer Genome Atlas (TCGA), a collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), aims to generate comprehensive, multi-dimensional maps of the key genomic changes in major types and subtypes of cancer."

"The cBioPortal for Cancer Genomics provides visualization, analysis and download of large-scale cancer genomics data sets."

"CellMiner™ is a web application generated by the Genomics & Bioinformatics Group, LMP, CCR, NCI that facilitates systems biology through the retrieval and integration of the molecular and pharmacological data sets for the NCI-60 cell lines."

Plant-related Databases

Araport is a one-stop-shop for Arabidopsis thaliana genomics. Araport offers gene and protein reports with orthology, expression, interactions and the latest annotation, plus analysis tools, community apps, and web services. Araport is 100% free and open-source. Registered members can save their analysis, publish science apps, and post announcements.”

 SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for professionally curated genetics, genomics and related data resources for soybean.  

 SALAD is a motif-based database of protein annotations for plant comparative genomics. Contains information on proteome data sets of rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, algae, and yeast.

 The Plant Transcription Factor Database (PlnTFDB) provides putatively complete sets of transcription factors (TFs) and other transcriptional regulators  in plant species whose genomes have been completely sequenced and annotated.

 The Plant microRNA Database (PMRD) integrates available plant miRNA data deposited in public databases, collected from the  literature, and data generated in-house.

Biosharing is "a curated, informative and educational resource on inter-related data standards, databases, and policies in the life, environmental and biomedical sciences".

The Online Bioinformatics Resources Collection (OBRC) contains annotations and links for thousands of bioinformatics databases and software tools.  Developed by the Health Sciences Library at the University of Pittsburgh.  This website has not been updated recently, so use with caution.