Protein statistics

Every protein hold specific and individual features which are unique to that particular protein. Features such as isoelectric point or amino acid composition can reveal important information of a novel protein. Many of the below described features are calculated in a simple way.


Molecular weight

The molecular weight is the mass of a protein or molecule. The molecular weight is simply calculated as the sum of all the atomic masses of all the atoms in the molecule.

The weight of a protein is usually represented in Daltons (Da).

When calculating the molecular weight of a protein it is usually carried out without including additional posttranslational modifications. For native and unknown proteins it is usually difficult to assess whether posttranslational modifications such as glycosylations are present on the protein, thus the calculation solely based on the amino acid sequence may be inaccurate. Very accurately the molecular weight can be determined by mass-spectrometry in the laboratory.

Atomic composition

Amino acids are indeed very simple compounds. All 20 amino acids consist of combinations of only five different atoms. The atoms which can be found in these simple structures are; Carbon, Nitrogen, Hydrogen, Sulfur, Oxygen. The atomic composition of a protein can for example be used to calculate the precise molecular weight of the entire protein.

Total number of negatively charged residues (Asp$+$Glu)

At neutral pH, the fraction of negatively charged residues implies information about the location of the protein. Intracellular proteins tend to have a higher fraction of negatively charged residues than extracellular proteins.

Total number of positively charged residues (Arg$+$Lys)

At neutral pH, nuclear proteins have high relative percentage of positively charged amino acids. Nuclear proteins often bind to the negatively charged DNA, which may regulate gene expression or help to fold the DNA. Nuclear proteins often have a a low percentage of aromatic residues [Andrade et al., 1998].


Isoelectric point

The isoelectric point (pI) of a protein is the pH where the proteins has no net charge. The pI is calculated from the pKa values for 20 different amino acids. At a pH below the pI, the protein carries a positive charge, whereas if the pH is above pI the proteins carry a negative charge. In other words, pI is high for basic proteins and low for acidic proteins. This information can be used in the laboratory when running electrophoretic gels. Here the proteins can be separated, based on their isoelectric point.


Aliphatic index

The aliphatic index of a protein is a measure of the relative volume occupied by aliphatic side chain of the following amino acids; alanine, valine, leucine and isoleucine. An increase in the aliphatic index increases the thermostability of globular proteins. The index is calculated by the following formula.


\begin{displaymath}
Aliphatic~index = X(Ala) + a*X(Val) + b*X(Leu) + b*(X)Ile
\end{displaymath} (5.1)

X(Ala), X(Val), X(Ile) and X(Leu) are the amino acid compositional fractions. The constants a and b are the relative volume of valine (a=2.9) and leucine/isoleucine (b=3.9) side chains compared to the side chain of alanine [Ikai, 1980].


Extinction coefficient

This measure indicates how much light is absorbed by a protein at a particular wavelength. The extinction coefficient is measured by UV spectrophotometry but can also be calculated. The amino acid composition is important when calculating the extinction coefficient. The extinction coefficient is calculated from the absorbance of cysteine, tyrosine and tryptophan using the following equation.


\begin{displaymath}
Ext(Protein) = count(Cystine)*Ext(Cystine) + count(Tyr)*Ext(Tyr) +
count(Trp)*Ext(Trp),
\end{displaymath} (5.2)

where Ext is the extinction coefficient of that particular amino acid. At 280nm the extinction coefficient for Cys=120, Tyr=1280 and Trp=5690.

This equation is only valid under the following conditions:

  • pH 6.5
  • 6.0 M guanidium hydrochloride
  • 0.02 M phosphate buffer

The extinction coefficient values of the three important amino acids at different wavelengths are found in [Gill and von Hippel, 1989].

Knowing the extinction coefficient, the absorbance (optical density) can be calculated using the following formula:


\begin{displaymath}
Absorbance(Protein) = \frac{Ext(Protein)}{Molecular ~ weight}
\end{displaymath} (5.3)

Two values are reported. The first value is computed assuming that all cysteine residues appear as half cystines, meaning they form di-sulfide bridges to other cysteines. The second number assumes that no di-sulfide bonds are formed.


Amino acid distribution

Amino acids are the basic components of proteins. The amino acid distribution in a protein is simply the percent of the different amino acids represented in a particular protein of interest. Amino acid composition is generally conserved through family-classes in different organisms which can be useful when studying one particular protein or enzymes across species borders. Another interesting observation is that amino acid composition deviates slightly between proteins from different subcellular localizations. This fact have been used in several computational methods, used for prediction of subcellular localization.


Estimated half-life

The half life of a protein is the time it takes before only half of the protein pool for that particular protein is left. The half life of proteins is highly dependent on the presence of the N-terminal amino acid, thus overall protein stability [,,]. Importance of the N-terminal residues is generally know as the 'N-end rule'. Simply, the N-end rule, thus the N-terminal amino acid, determines the half-life of proteins. The estimated half-life of proteins have been investigated in mammals, yeast and E. coli (see Table 5.1.8). If leucine is found N-terminally in mammalian proteins the estimated half-life is 5.5 hours.

  Amino acid Mammalian Yeast E. coli
  Ala (A) 4.4 hour >20 hour >10 hour
  Cys (C) 1.2 hour >20 hour >10 hour
  Asp (D) 1.1 hour 3 min >10 hour
  Glu (E) 1 hour 30 min >10 hour
  Phe (F) 1.1 hour 3 min 2 min
  Gly (G) 30 hour >20 hour >10 hour
  His (H) 3.5 hour 10 min >10 hour
  Ile (I) 20 hour 30 min >10 hour
  Lys (K) 1.3 hour 3 min 2 min
  Leu (L) 5.5 hour 3 min 2 min
  Met (M) 30 hour >20 hour >10 hour
  Asn (N) 1.4 hour 3 min >10 hour
  Pro (P) >20 hour >20 hour ?
  Gln (Q) 0.8 hour 10 min >10 hour
  Arg (R) 1 hour 2 min 2 min
  Ser (S) 1.9 hour >20 hour >10 hour
  Thr (T) 7.2 hour >20 hour >10 hour
  Val (V) 100 hour >20 hour >10 hour
  Trp (W) 2.8 hour 3 min 2 min
  Tyr (Y) 2.8 hour 10 min 2 min