Protein Structure and Function Analysis

Over the years the three-dimensional structures of a large number of proteins have been obtained, primarily through X-ray crystallographic techniques.  By relating this known 3D-structural information to the primary structure of proteins, patterns have emerged that can be used for the prediction of 3D-structure based on the primary structure of the protein.  From these predicted structures, insight into the function of the protein can be obtained.

Why are such predictive abilities useful?  With the sequencing of the human and other genomes rapidly progressing, an important next step is to determine the functions of the proteins encoded by genes.  In some cases genes are discovered by mapping mutations that cause some discernible phenotype- perhaps cancer or some developmental defect.  By analyzing the structure of the proteins encoded by these genes, insight into the mechanism of the genetic defect can be obtained, and, perhaps, insight into a treatment for it.

The links to the following web-based programs allow the user to search existing protein databases for matches to query protein sequences in sequence databases, databases of known 3D structures (PDB files), and databases of recognized domains, folds and motifs.  Other programs make predictions of secondary structures and subcellular locations.  Still others attempt to predict overall 3D structure even when homologies to PDB files are low.  This is not an exhaustive list, but simply reflects links I have used for teaching and research purposes.  For a more comprehensive list of options try the listings at The CMS Molecular Biology Resource, ExPASy Proteomics Research ToolsAmos' WWW links page, and assorted tools at NCBI
 
 

3D-Structure Visualization

3D structures are determined primarily using X-ray crystallography and NMR spectrometry.  Known 3D structures are usually cataloged as PDB files (*.pdb) which can be read as text, showing a header containing background information followed by coordinate data.  They can also be opened by several programs that give a manipulatable image of the protein.  My favorites are the standalone program Rasmol and the related (and probably preferable) Netscape 4.7x plug-in Chime, both freeware and both easily downloaded and installed. Support and links for both is available at http://www.umass.edu/microbio/rasmol/.
 

PDBLite
Search the PDB data base for a specific macromolecule structure
 

PDB to MultiGIF Page
This site will convert a PDB file to a rotating, animated gif picture that can be incorporated into a web page or powerpoint presentation.

Sequence Searches and Alignments

BLAST
BLAST (Basic Local Alignment Search Tool) is a program that allows similarity searches through the various nucleotide and protein databases.  There are several programs available for protein-protein, nucleotide-nucleotide, and protein-nucleotide searches.
 
 

Secondary Structure Prediction

NNPredict
 
 

Domain Databases and Sequence Analysis

Simple Modular Architecture Research Tool (SMART)
Maintained by EMBL.  Allows text searches for proteins containing combinations of domains.

ScanProsite
ExPASy Molecular Biology Server of Swiss Institute of Bioinformatics.

InterPro
European Bioinformatics Institute site allows text and sequence searches of multiple databases, including TIGR, Swiss-Prot and SMART

Conserved Domain Database (CDD)
Site maintained by NCBI.  Text and sequence searches of SMART and Pfam databases.  Link to CDART:( Conserved Domain Architecture Retrieval Tool) output with listing of proteins containing a particular domain, with Genbank links.
 
 

Prediction of 3-Dimensional Structure

3D-PSSM
This multifaceted program may give you some insights into the structure and function of your protein of interest that other programs don't.  In this program you enter the sequence of your protein as well as keywords describing your protein.  The program then:

Performs a BLAST search of protein structure databases for PDB structure files
Performs a protein database search (BLAST) to find sequence alignments.  Results are in a nonlinked FASTA format, so you may prefer the NCBI version of protein BLAST search.
Searches for PROSITE motifs
Performs a secondary structure analysis
Performs a modified BLAST for PDB structure files that also includes structure/function keywords

If there is an existing PDB structure file for your protein, it will find that in the initial search and present that information in the results table showing structural alignments.  If a PDB structure does not exist, it will utilize a BLAST search to find PDB files that have some homology to it.  It will then create a model based on that structure.  Tabulated results have links to SCOP (Structural Classification of Proteins), descriptions of the relevant protein family and fold class, and a link to a PDB file that can be opened using Rasmol or Chime that superimposes your sequence on the 3D model.
 
 
 

Subcellular Localization

PSORT
The subcellular or extracellular location of your protein of interest can provide insight into the function of the protein.  For example, the presence of nuclear localization sequence (NLS) might support a hypothesized function of transcriptioin factor.

This program analyzes the sequence of your protein for signal sequences and sequences charcteristic for targetting of proteins to the nucleus, chloroplast, mitochondria, peroxisomes etc.  Different programs are available for plant/bacterial proteins and for animal/yeast proteins.  
 
 

Physical Characteristics of Polypeptides

ExPASy ProtParam Tool
Calculates the pI, extinction coefficient, and amino acid composition from a polypeptide sequence

ExPASy PeptideMass
Calculates the masses of peptides resulting from proteolytic digests


Free-Standing Programs

ANTHEPROT (Analyze The Proteins)
This is a nice, free, downloadable program that does secondary structure predictions,
helical wheels, and multiple alignments