The National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources that include GeneBank, Entrez, MyNCBI, PubMed, BLAST, Electronic PCR, Cancer Chromosomes, among many others. The links give here are through the Purdue Libraries proxy so that you can access articles from off campus with your Purdue ID.
PubMed is the NCBI interface to MEDLINE, a database of over 20 million journal articles.
The Gene Expression Omnibus (GEO) is a public functional genomics data repository containing both raw and processed microarray and sequencing data. GEO provides tools to search, analyze and acquire microarray and sequencing data. GEO can be searched both by experiment (dataset) or by gene (GEO profiles).
The Sequence Read Archive (SRA) is a repository for raw sequencing data from a variety of platforms. Data deposit and acquisition requires special tools provided by NCBI.
The Gene database integrates data for many species. This is a great place to start to find structural and functional information about a gene and the encoded protein or RNA.
The European Bioinformatics Institute "EMBL-EBI provides freely available data from life science experiments, performs basic research in computational biology and offers an extensive user traing programs, supporting researchers in academia and industry."
Ensembl is a database of vertebrate and other select eukaryotic genomes. The content of the databases is similar to NCBI, but Ensembl has a more modern interface.
The Array Express archive is a repository for functional genomics data including microarray and sequencing data. It is similar to GEO and some data is available on both sites.
UCSC Genome Bioinformatics Site contains the reference sequence and working draft assemblies for a large collection of genomes.
The UCSC Genome Browser provides access to a wealth of functional human genomics data. This tool allows researchers to browse through the human genome, viewing a wide range of data types. Researchers can also upload their own data tracks or download data from UCSC. There are free Open Helix tutorials for the UCSC Genome Browser.
The Gene Ontology (GO) project is a collaborative bioinformatics project with the goal of providing complete and consistent descriptions of gene products across all organisms. The descriptions fall into three large categories (ontologies): Biological Process, Cellular Compartment and Molecular Function. Accordingly, each gene product is likely to have at least three descriptions. However, some gene products have no descriptions whereas others can have a dozen or more.
"The BioMart project provides free software and data services to the international scientific community in order to foster scientific collaboration and facilitate the scientific discovery process. The project adheres to the open source philosophy that promotes collaboration and code reuse."
Multiple databases use BioMart software, and one of the most generally useful is available at Ensembl.
InterMine is open source software designed specifically for the creation of complex biological databases. InterMine also provides tools to query these databases. Like BioMarts, InterMine has been adopted by multiple research communities. Generally, research communties will adopt either BioMart or InterMine, e.g. Gramene maintains a BioMart of biological data for plants whereas Araport provides ThaleMine, an InterMine database for Arabidopsis research.
The World Wide Protein Data Bank is a global collaboration. This ensures that the PDB archive of 3D structural data for proteins and nucleic acids is uniform and available globally. The RCSB PDB website is maintained by the Research Collaboratory for Structural Biology, located in the United States. This website provides resources to access and analyze 3D structural data.
"The Cancer Genome Atlas (TCGA), a collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), aims to generate comprehensive, multi-dimensional maps of the key genomic changes in major types and subtypes of cancer."
"The cBioPortal for Cancer Genomics provides visualization, analysis and download of large-scale cancer genomics data sets."
"CellMiner™ is a web application generated by the Genomics & Bioinformatics Group, LMP, CCR, NCI that facilitates systems biology through the retrieval and integration of the molecular and pharmacological data sets for the NCI-60 cell lines."
SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for professionally curated genetics, genomics and related data resources for soybean.
SALAD is a motif-based database of protein annotations for plant comparative genomics. Contains information on proteome data sets of rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, algae, and yeast.
The Plant Transcription Factor Database (PlnTFDB) provides putatively complete sets of transcription factors (TFs) and other transcriptional regulators in plant species whose genomes have been completely sequenced and annotated.
The Plant microRNA Database (PMRD) integrates available plant miRNA data deposited in public databases, collected from the literature, and data generated in-house.
The Online Bioinformatics Resources Collection (OBRC) contains annotations and links for thousands of bioinformatics databases and software tools. Developed by the Health Sciences Library at the University of Pittsburgh. This website has not been updated recently, so use with caution.