The read mapper "clc_mapper" has been replaced by a new memory efficient mapper. The quality and speed of the read mapping matches the previous mapper.
A beta version of the memory efficient read mapper that supports affine gap cost is available under the name "clc_mapper_beta".
The clc_mapper has an updated interface with improved feedback to the user and colors.
The read mapper now accepts paired reads with a distance of up to 100kbp.
For machines with many cores the performance of the k-mer counting step in the de novo assembler has been improved.
Fastq quality score offset is now auto detected in the tools "clc_quality_trim" and clc_overlap_reads".
Added support for gzipped reference sequences.
Improved the speed of the legacy read mapper on gzipped input files.
castosam now reports quality scores that are compatible with samtools.
- The progress indicator for the de novo assembler now behaves correctly in the interval 90-92%.
- Fixed a collision of temporary file names that could cause the de novo assembler and read mapper to crash.
- The default gap cost in clc_overlap_reads is now set to 3 as stated in the documentation. It used to be 2.
- The SCARF read format is now deprecated and will be removed in future versions of the CLC Assembly Cell.
The clc_find_variantions tool is now deprecated. Functionality for computing consensus sequences will be made available in a separate tool while the variant detection functionality will be removed in future versions of the CLC Assembly Cell.
Support for 32 bit versions of the Apple OS X operating system is deprecated and a distribution of the CLC Assembly Cell will not be available for this platform in the future.
- The clc_mapping_viewer tool has been removed.
- Fixed a bug in the de novo assembler which caused circular contigs to be output with the wrong sequence.
- In the next major release of the Assembly Cell the program clc_mapping_viewer will be removed.
- Fixed a bug in the de novo assembler that caused circular contigs containing scaffolds to be output with a negative scaffold size in AGP format.
- Fixed a bug in the read mapper that could result in a crash for reads longer than 20Kbp.
- Fixed a bug that caused the mapper to report wrong paired read distances when mapping to circular genomes.
- From version 4.3 of the Assembly Cell the program clc_mapping_viewer will be removed.
- clc_agp_join: a tool for joining contigs with N's using APG formatted scaffolding information. Useful if you want both the contigs and scaffolds using a single run of the de novo assembler.
- The de novo assembler is now able to output contigs with scaffolding information in a format can be validated the NCBI AGP validator.
- Added an option to keep paired reads together in clc_convert_sequences.
The read mapper will now place ambiguous gaps to the left, as opposed to the right, to ensure better concordance with common variant databases.
- Fixed bugs which caused missing or misplaced scaffold annotations in circular contigs.
- Fixed bug that causes contigs with a length smaller than the minimum contig length threshold to be output from the de novo assembler.
- Fixed bug which caused some scaffold to have a gap size of 0.
- Fixed a bug in clc_cas_to_sam which caused inconsistent output when the "-t" parameter was used.
- Added missing paired option to clc_convert_sequences.
- Various small bug fixes.
- Binaries for Linux are now compiled with glibc 2.5. This means that the system requirements for Linux has changed. From this release, SuSE is supported from version 10.2. This was previously version 10.0.
- A new tool for extracting a random subset of reads called clc_sample_reads
- clc_remove_duplicates, a tool for identifying and removing duplicated reads in datasets, is now out of beta and a part of the Assembly Cell
- The number of N's output by the de novo assembler has been further reduced and now N's primarily occur in assemblies when the scaffolder is used.
- Improved the performance of the Duplicate Removal tool to make it scale to larger datasets.
- clc_cas_to_sam now outputs information on the number and types of mismatches.
- Fixed crashes when outputting contigs in the de novo assembler when paired reads was used as input.
- Fixed an issue where circular contigs were extended too much by the de novo assembler.
- Fixed a crash in clc_cas_to_sam.
- Various small bug fixes.
- Fixed a bug causing clc_cas_to_sam to crash.
- Fixed bug which caused the de novo assembler to crash or go into an infinite loop when outputting contigs generated from paired reads.
- Fixed compatibility issues for the clc mapping viewer on windows 64 bit platforms.
- Fixed read mapper errors.
- Fixed de novo assembly error.
New de novo assembler
- Scaffolding is integrated into the assembly. This means better resolution of contigs and insertion of Ns when two contigs cannot be joined in sequence but there is pair information that connects them.
- -Automatic paired distance estimation: Using the -e option, the de novo assembler will estimate the fragment size of your paired data .
- Improved use of unpaired reads for resolving ambiguities in the de Bruijn Graph.
- Various improvements of the assembly quality.
- New parameter for specifying the maximum bubble size. There is a default value which is automatically calculated based on the input data.
- New white paper with benchmarks and results from quality control.
- Bug fix: Fixed a bug in the de novo assembler which caused an increased number of N's in the results, because the sequence of the read that spanned contigs was not looked up correctly. The de novo assembler now produces much fever N's for low coverage assemblies.
New read mapper
- Great improvement of speed for mapping (see whitepaper for more details on speed and quality)
- Support for complex genomes with many repeats
- The previous read mapper is still included as a legacy version to allow color space mapping which is not supported in the new mapper.
- The forward only mode of the clc_mapper now also works for paired reads.
Updated naming of tools
We have updated the names of the tools to be more consistent, and to reflect the use of "mapping" rather than "assembly" throughout the software. We have provided a helper script to assist updating existing scripts based on the old naming scheme. Read more here .
LicensingA new license tool is included that will make it very easy to:
- Download a license based on a license order ID. This would previously require some email exchange with CLC bio but can now be done in one go without involvement of CLC bio.
- Request and download an evaluation license directly.
Furthermore, it is now checked if the license is valid for the particular version of the CLC Assembly Cell.
A new restriction has been added for running the CLC Assembly Cell on large computers: if the system has more than 64 cores (hyper threaded cores), it will not be able to run with a static license. In this case, a network license is needed.
You can now trim adapters from sequencing reads prior to assembly or mapping. Read more here .
- Added support for read group information in castosam.
- Added support for non-specific reads in castosam
- Added progress on castosam and samtocas
- sort_pairs auto detects input files. Now supports for solid paired end and ion torrent files.
- Proper out of memory error messages are shown if a tool runs out of memory
- Various bug fixes
- Fixed problem with read mapping on computers with Japanese Windows.
For a complete list of older release, visit the CLC Assembly release archive.