Pattern discovery

Applying the Pattern Discovery helps identify unknown sequence patterns across single or multiple DNA and protein sequences. The discovery method is based on advanced hidden Markov models. If the analysis is performed on several sequences at a time the method will search for patterns which are common between all the sequences.

To search for patterns, simply select one or more sequences and run the analysis. Annotations will be added to all the sequences and a view is opened for each sequence.

Various parameters can be set prior to the pattern discovery search:

  • Minimum pattern length
  • Maximum pattern length
  • The model’s noise percentage. A low level of accepted noise results in a model where patterns are to be very alike in order to be accepted as sequence patterns.
  • Number of different kinds of patterns to predict. The number specifies the number of iterations that the algorithm goes through. Patterns marked ’Pattern1’ in the output have the highest level of confidence.
  • Show result of patterns discovery in a table. Generates a tabular output which displays patterns found.
  • Include Background Distribution of Amino Acids For protein sequences it is possible to include information on the background distribution of amino acids from a range of organisms.

Output

Each novel pattern will be represented as an annotation on the sequences of the type Region. More information on each found pattern is available through the tooltip view, including detailed information on the position of the pattern and quality scores.

It is also possible to get a tabular view of all found patterns in one combined table. In this view, each pattern will be represented with information on obtained scores, quality of the pattern and position in the sequence.