Sequence Logo Graphs


In CLC's workbenches there are a number of alignment-specific view options in the Alignment info preference group. One option is displaying a "sequence logo". The sequence logo displays the information content of all positions in the alignment as residues or nucleotides stacked on top of each other.

The sequence logo provides a far more detailed view of the alignment than the conservation view.

Sequence logos can help identify protein binding sites on DNA sequences but can also aid to identify conserved residues in aligned domains of protein sequences and a wide range of other applications.

Each position of the alignment and consequently the sequence logo, shows the sequence information in a computed score based on Shannon entropy [Schneider and Stephens, 1990]. The height of the individual letters represents the sequence information content in that particular position of the alignment.

A sequence logo is also a much better visualization tool than a simple consensus sequence. An example is for instance an alignment where a particular residue is found in one position in 70% of the sequences.

If a consensus sequence were to be defined it would typically only display the single residue with 70% coverage. In the figure above, an ungapped alignment of 11 E. coli start codons including flanking regions are shown.

In this example, a consensus sequence would only display ATG as the start codon in position 1, but looking at the sequence logo it is seen that a GTG is also allowed as a start codon.

These options are available:

  • Foreground color. Colors the letters using a gradient according to the information content of the alignment column.
  • Background color. Sets a background color of the residues using a gradient in the same way as described above.
  • Graph on/off. Displays sequence logo at the bottom of the alignment.
  • Height.
  • Color. The sequence logo can be displayed in black or Rasmol colors. For protein alignments, a polarity color scheme is also available, where hydrophobic residues are shown in black color, hydrophilic residues as green, acidic residues as red and basic residues as blue.