Protein statistics
Every protein hold specific and individual features which are unique to that particular protein. Features such as isoelectric point or amino acid composition can reveal important information of a novel protein. Many of the below described features are calculated in a simple way.
Molecular weight
The molecular weight is the mass of a
protein or molecule. The molecular weight is simply calculated as
the sum of all the atomic masses of all the atoms in the molecule.
The weight of a protein is usually represented in Daltons (Da).
When calculating the molecular weight of a protein it is usually carried out without including additional posttranslational modifications. For native and unknown proteins it is usually difficult to assess whether posttranslational modifications such as glycosylations are present on the protein, thus the calculation solely based on the amino acid sequence may be inaccurate. Very accurately the molecular weight can be determined by mass-spectrometry in the laboratory.
Atomic composition
Amino acids are indeed very simple compounds. All 20 amino acids consist of combinations of only five different atoms. The atoms which can be found in these simple structures are; Carbon, Nitrogen, Hydrogen, Sulfur, Oxygen. The atomic composition of a protein can for example be used to calculate the precise molecular weight of the entire protein. Total number
of negatively charged residues (Asp
Glu)
At neutral pH, the
fraction of negatively charged residues implies information about
the location of the protein. Intracellular proteins tend to have a
higher fraction of negatively charged residues than extracellular
proteins.
Total number
of positively charged residues (Arg
Lys)
At neutral pH, nuclear
proteins have high relative percentage of positively charged amino
acids. Nuclear proteins often bind to the negatively charged DNA,
which may regulate gene expression or help to fold the DNA. Nuclear
proteins often have a a low percentage of aromatic residues
[Andrade et al., 1998].
Isoelectric point
The isoelectric point (pI) of a protein is the pH where the proteins
has no net charge. The pI is calculated from the pKa values for
20 different amino acids. At a pH below the pI, the protein carries
a positive charge, whereas if the pH is above pI the proteins carry
a negative charge. In other words, pI is high for basic proteins and
low for acidic proteins. This information can be used in the
laboratory when running electrophoretic gels. Here the proteins can
be separated, based on their isoelectric point.
Aliphatic index
The aliphatic index of a protein is a
measure of the relative volume occupied by aliphatic side chain of
the following amino acids; alanine, valine, leucine and isoleucine.
An increase in the aliphatic index increases the thermostability of
globular proteins. The index is calculated by the following formula.
X(Ala), X(Val), X(Ile) and X(Leu) are the amino acid compositional fractions. The constants a and b are the relative volume of valine (a=2.9) and leucine/isoleucine (b=3.9) side chains compared to the side chain of alanine [Ikai, 1980].
Extinction coefficient
This measure indicates how much light is absorbed by a protein at a
particular wavelength. The extinction coefficient is measured by UV
spectrophotometry but can also be calculated. The amino acid
composition is important when calculating the extinction
coefficient. The extinction coefficient is calculated from the
absorbance of cysteine, tyrosine and tryptophan using the following
equation.
where Ext is the extinction coefficient of that particular amino acid. At 280nm the extinction coefficient for Cys=120, Tyr=1280 and Trp=5690.
This equation is only valid under the following conditions:
- pH 6.5
- 6.0 M guanidium hydrochloride
- 0.02 M phosphate buffer
The extinction coefficient values of the three important amino acids at different wavelengths are found in [Gill and von Hippel, 1989].
Knowing the extinction coefficient, the absorbance (optical density) can be calculated using the following formula:
Two values are reported. The first value is computed assuming that all cysteine residues appear as half cystines, meaning they form di-sulfide bridges to other cysteines. The second number assumes that no di-sulfide bonds are formed.
Amino acid distribution
Amino acids are the basic components of
proteins. The amino acid distribution in a protein is simply the
percent of the different amino acids represented in a particular
protein of interest. Amino acid composition is generally conserved
through family-classes in different organisms which can be useful
when studying one particular protein or enzymes across species
borders. Another interesting observation is that amino acid
composition deviates slightly between proteins from different
subcellular localizations. This fact have been used in several
computational methods, used for prediction of subcellular
localization.
Estimated half-life
The half life of a protein is the time it
takes before only half of the protein pool for that particular
protein is left. The half life of proteins is highly dependent on
the presence of the N-terminal amino acid, thus overall protein
stability [,,].
Importance of the N-terminal residues is generally know as the
'N-end rule'. Simply, the N-end rule, thus the N-terminal amino
acid, determines the half-life of proteins. The estimated half-life
of proteins have been investigated in mammals, yeast and E.
coli (see Table 5.1.8). If leucine is found
N-terminally in mammalian proteins the estimated half-life is 5.5
hours.
| Amino acid | Mammalian | Yeast | E. coli | |
| Ala (A) | 4.4 hour | >20 hour | >10 hour | |
| Cys (C) | 1.2 hour | >20 hour | >10 hour | |
| Asp (D) | 1.1 hour | 3 min | >10 hour | |
| Glu (E) | 1 hour | 30 min | >10 hour | |
| Phe (F) | 1.1 hour | 3 min | 2 min | |
| Gly (G) | 30 hour | >20 hour | >10 hour | |
| His (H) | 3.5 hour | 10 min | >10 hour | |
| Ile (I) | 20 hour | 30 min | >10 hour | |
| Lys (K) | 1.3 hour | 3 min | 2 min | |
| Leu (L) | 5.5 hour | 3 min | 2 min | |
| Met (M) | 30 hour | >20 hour | >10 hour | |
| Asn (N) | 1.4 hour | 3 min | >10 hour | |
| Pro (P) | >20 hour | >20 hour | ? | |
| Gln (Q) | 0.8 hour | 10 min | >10 hour | |
| Arg (R) | 1 hour | 2 min | 2 min | |
| Ser (S) | 1.9 hour | >20 hour | >10 hour | |
| Thr (T) | 7.2 hour | >20 hour | >10 hour | |
| Val (V) | 100 hour | >20 hour | >10 hour | |
| Trp (W) | 2.8 hour | 3 min | 2 min | |
| Tyr (Y) | 2.8 hour | 10 min | 2 min |





















