Restriction site detection

Restriction enzymes are a group of enzymes that can cut double-stranded DNA molecules into smaller restriction fragments. The enzymes work by cleaving the chemical bonds in the phosphate backbone of the DNA molecule. These bonds can be reformed by ligase enzymes and in this manner restriction fragments with different origin can be spliced together in novel ways. This process forms the basis of many procedures in molecular biology and genetic engineering.

A given restriction enzyme will only cut DNA molecules that contain a particular recognition sequence. If the recognition sequence is present, the enzyme will cut the molecule at a cleavage site near the recognition sequence, or overlapping this. Cleavage of the two strands may occur at sites that are directly opposite, resulting in restriction fragments with blunt ends. Alternatively, the sites of cleavage can have a small offset, resulting in fragments with so-called sticky ends i.e. where one strand is longer and thus overhangs the other. The example in figure 3.1 shows the widely used enzyme EcoRI. This enzyme has the recognition sequence GAATTC which is termed a palindrome since it is equal to its reverse complement. This enzyme cleaves the G-A bond on both strands and thus creates two restriction fragments with a sticky end.

Figure 3.1: A stretch of double stranded DNA cleaved by the restriction enzyme EcoRI. The enzyme recognizes the palindromic sequence in the hatched box which is cleaved to produce the green and the blue fragments that now have overhanging ends.

Image palindrome2

Bioinformatical algorithms can detect sites in a DNA sequence where a given enzyme will cut. They do so by searching the input sequence for regions that match the specific recognition sequence of the enzyme in question. The enzymes that have palindromic recognition sequences will match the same region no matter which DNA strand is examined. Therefore, only one strand is searched for this group of enzymes. Enzymes with non-palindromic recognition sequences can match regions on one strand that do not correspond to a match on the opposite strand and both strands must therefore be searched.