Research Guides: Bioinformatics: Web Sites

Web Sites

Ensembl
Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation.
Intermine Registry
InterMine is an open source data warehouse system for the integration and analysis of complex biological data. Websites for many model organisms are built using InterMine. This registry contains links to these sites.
National Center for Biotechnology Information
The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.
UCSC Genome Browser
The UCSC Genome Browser is a web-based tool serving as a multi-powered microscope that allows researchers to view all 23 chromosomes of the human genome at any scale from a full chromosome down to an individual nucleotide.

Repbase
Repbase is a database of prototypic sequences representing repetitive DNA from different eukaryotic species. Repbase is used in genome sequencing projects as a reference collection for masking and annotation of repetitive DNA (e.g. by RepeatMasker or CENSOR).

GeneCards
GeneCards is a searchable, integrative database that provides comprehensive, user-friendly information on all annotated and predicted human genes.
GTEx Portal
The Genotype-Tissue Expression project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation.
HGNC
The HGNC is responsible for approving unique symbols and names for human loci, including protein coding genes, ncRNA genes and pseudogenes, to allow unambiguous scientific communication.
PharmGKB
PharmGKB is an NIH-funded resource that provides information about how human genetic variation affects response to medications

Cancer Cell Line Encyclopedia
Data and information on more than 1000 cell lines used in cancer research.
The Cancer Genome Atlas
The Cancer Genome Atlas (TCGA), a landmark cancer genomics program, molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types.
cBioPortal
The cBioPortal for Cancer Genomics is an open-access, open-source resource for interactive exploration of multidimensional cancer genomics data sets.
COSMIC
COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer.
DepMap
The goal of the Dependency Map (DepMap) portal is to empower the research community to make discoveries related to cancer vulnerabilities by providing open access to key cancer dependencies analytical and visualization tools.
National Cancer Institute GDC Data Portal
The GDC Data Portal is a robust data-driven platform that allows cancer researchers and bioinformaticians to search and download cancer data for analysis.

FlyBase
Database and tools for Drosophila research. The site currently asks for $150/year for each user but is currently on the honor system.
Intermine Registry
InterMine is an open source data warehouse system for the integration and analysis of complex biological data. Websites for many model organisms are built using InterMine. This registry contains links to these sites.
Mouse Genome Informatics
MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease.
Saccharomyces Genome Database
The Saccharomyces Genome Database (SGD) provides comprehensive integrated biological information for the budding yeast Saccharomyces cerevisiae along with search and analysis tools to explore these data, enabling the discovery of functional relationships between sequence and gene products in fungi and higher organisms.
WormBase
WormBase is an international consortium of biologists and computer scientists providing the research community with accurate, current, accessible information concerning the genetics, genomics and biology of C. elegans and related nematodes.
ZFIN
The Zebrafish Information Network (ZFIN) is the database of genetic and genomic data for the zebrafish (Danio rerio) as a model organism.

DAVID
The Database for Annotation, Visualization and Integrated Discovery (DAVID) provides a comprehensive set of functional annotation tools for investigators to understand the biological meaning behind large lists of genes.
The Gene Ontology Resource
The mission of the GO Consortium is to develop a comprehensive, computational model of biological systems, ranging from the molecular to the organism level, across the multiplicity of species in the tree of life.
GOrilla
GOrilla is a tool for identifying and visualizing enriched GO terms in ranked lists of genes.
PANTHER
The PANTHER Classification System was designed to classify proteins (and their genes) in order to facilitate high-throughput analysis.
Reactome
REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic and clinical research, genome analysis, modeling, systems biology and education.

BioCyc Genome Database Collection
A collection of Pathway/Genome Databases (PGDBs) for model eukaryotes and for thousands of microbes, plus software tools for exploring them. BioCyc is an encyclopedic referenceof curated data from publications.

TAIR - The Arabidopsis Information Resource
Genetic and molecular biology data for the model higher plant Arabidopsis thaliana . Data available from TAIR includes the complete genome sequence along with gene structure, gene product information, gene expression, DNA and seed stocks, genome maps, genetic and physical markers, publications, and information about the Arabidopsis research community. Gene product function data is updated every week from the latest published research literature and community data submissions.

TAIR also provides extensive linkouts to other Arabidopsis resources.

The BAR
The Bio-Analytic Resource for Plant Biology
Gramene
Gramene is a curated, open-source, integrated data resource for comparative functional genomics in crops and model plant species.
Phytozome
Phytozome is the Plant Comparative Genomics portal of the Department of Energy's Joint Genome Institute. It provides the plant science community a hub for accessing, visualizing and analyzing JGI-sequenced plant genomes, as well as selected genomes and datasets that have been sequenced elsewhere.
PlantRegMap
Our mission is to provide a comprehensive, high-quality resource of plant transcription factors (TFs), regulatory elements and interactions between them, advancing the understanding of plant transcriptional regulatory system.
ThaleMine
ThaleMine enables you to analyze Arabidopsis thaliana genes, proteins, gene expression, protein-protein interactions, orthologs, and more. This site is similar to TAIR but uses a different interface.

PRIDE
The Proteomics Identifications Database.
The Worldwide Protein Data Bank
The Worldwide PDB (wwPDB) organization manages the PDB archive and ensures that the PDB is freely and publicly available to the global community.
UniProt
The world’s leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information.