CLC Genomics Workbench

Paired end reads - insertions and deletions

CLC Genomics Workbench includes a number of graphical options of identifying genomic insertions and deletions when the sequencing produces paired-end reads.

Finding insertions

One option is to show a graph of Single paired-ends reads (paired-end reads where only one of the reads matches).

When such a Single paired-ends reads graph suddenly rise and afterward fall again, an insertion might have occurred.

Screenshot 1

Zooming in on the reads will show how the color of the reads changes because they go from blue (paired-ends) to green (single), meaning that at this point, the reverse part of the paired-ends reads no longer match the reference sequence.

Screenshot 2: Zooming where the single reads kick in.

Since their reverse partners do not match the reference, there must be an insertion in the sequenced data. Looking further down the contig, the color changes from green to a combination of red (only reverse reads match) and blue

Screenshot 3: Zooming where the paired-ends reads kick in again.

The reverse reads colored in red have a forward counterpart which do not match the reference sequence, for the same reason as we see the lonely forward reads before the insertion. Among the reverse reads, the "ordinary" paired-ends reads start again, marking the end of the insertion.

Finding deletions

Deletions are easy to detect in CLC Genomics Workbench. They are simply areas of no coverage.

Screenshot 4: A deletion in the sequenced data results in coverage of 0.

Depending on the size of the deletion, you will see a rise in other graphs as well:

  • A small deletion will result in an increase of the Paired-ends distance, because the gap between the forward and the reverse read will just extend the deletion.
  • A larger deletion will result in an increase of Single paired-ends reads when the deletion is larger than the maximum distance allowed between paired-ends reads (because the "other" part of the read has a match which is too far away). This maximum value can be changed when performing the assembly in CLC Genomics Workbench.

When you zoom in on the deletion, you can see how the distance between the reads increase.

Screenshot 5: Each part of the pair still match because the deletion is smaller than the maximum distance between the reads.