CLC Genomics Workbench

Multiplex Sequencing by Name

When you do batch sequencing of different samples, you can use multiplexing techniques to run different samples in the same run. There is often a data analysis challenge to separate the sequencing reads, so that the reads from one sample are assembled together.

The CLC Genomics Workbench supports automatic grouping of samples for two multiplexing techniques:

  • By name: This supports grouping of reads based on their name.
  • By sequence tag: This supports grouping of reads based on information within the sequence (tagged sequences).

Sorting sequences by name

With this functionality you will be able to group sequencing reads based on their file name. A typical example would be that you have a list of files named like this:
...
A02__Asp_F_016_2007-01-10
A02__Asp_R_016_2007-01-10
A02__Gln_F_016_2007-01-11
A02__Gln_R_016_2007-01-11
A03__Asp_F_031_2007-01-10
A03__Asp_R_031_2007-01-10
A03__Gln_F_031_2007-01-11
A03__Gln_R_031_2007-01-11
...

In this example, the names have five distinct parts (we take the first name as an example):

  • A02 which is the position on the 96-well plate
  • Asp which is the name of the gene being sequenced
  • F which describes the orientation of the read (forward/reverse)
  • 016 which is an ID identifying the sample
  • 2007-01-10 which is the data of the sequencing run

CLC Genomics Workbench allows you to separate the data and fill it into different sequence lists of your choice.

Three different types of separation is allowed

  • Simple: This will simply use designated characters like Underscore, Dash, Tilde, etc. to split up the name.
  • Positions: You define a part of the name by entering the start and end positions, e.g. from character number 6 to 14.
  • Java regular expression: This is an option for advanced users where you can use a special syntax to have total control over the splitting.

When the name is split up into separate parts, the user is free to choose which parts of the sequence name should be used for the grouping into separate sequence lists.

After this, each sequence list can be assembled separately.

Screenshot: Splitting up the name at every underscore (_) and using the sample ID and gene name for grouping.