- Fixed a bug in the clc_correct_pacbio_reads tool that caused the tool to crash when run on a machine without SSE3 support.
- Fixed a very rare bug in the clc_mapper tool that sometimes caused mapping positions to be reported multiple times.
- Fixed a bug in the clc_cell_licutil licensing tool that caused the evaluation licenses generated by the tool to be invalid for the current version of CLC Assembly Cell.
New tools and features
- clc_correct_pacbio_reads: correct PacBio reads as a preprocessing step before assembling them.
- clc_assembler_long: de novo assemble PacBio reads that have already been corrected using for example clc_correct_pacbio_reads.
- clc_mapper: This update
- Will greatly improve read mapping quality with long reads (PacBio or Oxford Nanopore).
- Will return the same read mappings as before with short reads (Illumina and ion torrent) except from a few minor bugs that have been fixed.
- Will improve read mappings with longer reads (454) slightly.
- Allows the user to set the match score with the new --matchscore parameter.
- clc_find_variants: has been deprecated and replaced by clc_extract_consensus since CLC Assembly Cell version 4.2.
- Fixed a bug that caused adapters to only be searched in the forward direction when specified for only odd-numbered or only even-numbered reads.
- Fixed a bug that messed up quality scores when reading from two interleaving paired files.
- Two consecutive mismatches in the MD field of SAM/BAM files produced by clc_cas_to_sam are now separated by a "0", which they should be according to the SAM format specification.
- Fixed a bug in clc_cas_to_sam that caused soft clipped ends to be included in the count for the NM field in SAM/BAM files produced by the tool.
- Fixed a bug that caused the license check to fail on Mac OS X 10.11.3 and higher.
- Fixed a bug that made the clc_assembler tool fail when the "--interleave" option was used.
- Fixed a bug in the clc_extract_consensus that caused the tool to not recognize the option "-r unknown".
- Fixed a bug in clc_cas_to_sam that caused soft clipped ends to be included in the count for the NM field in the SAM/BAM files produced by the tool.
- Two consecutive mismatches in SAM/BAM files produced by clc_cas_to_sam are now separated by a "0", which they should be according to the SAM format specification.
Fixed a read mapper bug that caused some reads to be incorrectly reported as unmapped when global alignment was selected.
Fixed an issue that could occur when mapping paired-end reads, where these were erroneously reported as broken pairs when the fragment size derived from the alignments of the two ends of the pair was longer than reference sequence.
Fixed a bug that caused the mapper to enter an infinite loop if a reference of length 0 was used.
- Fixed a rare bug that sometimes made the read mapper halt prematurely when several seeds were identified at the same reference position.
- Fixed a bug that caused an out of memory error when very long reference sequences were used in the read mapper.
- Fixed a bug that made the built-in license check fail when executing jobs with very long commands.
- The upper limit for the length of each reference sequence in the read mapper was increased to 2 billion bases.
- The clc_find_variations tool has now been moved to the deprecated folder and will not be part of future releases.
- A new tool named clc_extract_consensus has been introduced which can extract consensus sequences from a cas file.
- The default read mapper now support affine gap cost.
- Maximal length of reference sequences has been increased from 1Gbp to 2Gbp.
- Introduced an option in clc_overlap_reads for automatic removal of adapter sequences in overlapping reads.
- Introduced an option in clc_filter_matches for removing both reads of a read pair if just one of the reads fails to pass the filter criteria.
- When performing a read mapping with affine gap cost, alternating gaps in a read and reference is now prevented by default.
- Progress measuring in read mapper and de novo assembler made smoother.
- clc_cas_to_sam now supports all integer types for the NH tag which fixes the erroneous error message "Wrong type for CS tag" in most cases.
- Fixed rare crash in clc_mapping_table and clc_remove_duplicates.
- Fixed a bug which could cause the read mapper to report sub-optimal alignments when using affine gap cost in rare cases.
- Fixed a crash for the read mapper which could occur when the input read file contained several reads shorter than 15bp.
- Fixed a crash that occurred on older CPUs when using very long reference sequences.
The read mapper "clc_mapper" has been replaced by a new memory efficient mapper. The quality and speed of the read mapping matches the previous mapper.
A beta version of the memory efficient read mapper that supports affine gap cost is available under the name "clc_mapper_beta".
The clc_mapper has an updated interface with improved feedback to the user and colors.
The read mapper now accepts paired reads with a distance of up to 100kbp.
For machines with many cores the performance of the k-mer counting step in the de novo assembler has been improved.
Fastq quality score offset is now auto detected in the tools "clc_quality_trim" and clc_overlap_reads".
Added support for gzipped reference sequences.
Improved the speed of the legacy read mapper on gzipped input files.
castosam now reports quality scores that are compatible with samtools.
- The progress indicator for the de novo assembler now behaves correctly in the interval 90-92%.
- Fixed a collision of temporary file names that could cause the de novo assembler and read mapper to crash.
- The default gap cost in clc_overlap_reads is now set to 3 as stated in the documentation. It used to be 2.
- The SCARF read format is now deprecated and will be removed in future versions of the CLC Assembly Cell.
The clc_find_variantions tool is now deprecated. Functionality for computing consensus sequences will be made available in a separate tool while the variant detection functionality will be removed in future versions of the CLC Assembly Cell.
Support for 32 bit versions of the Apple OS X operating system is deprecated and a distribution of the CLC Assembly Cell will not be available for this platform in the future.
- Fixed a bug in the de novo assembler which caused circular contigs to be output with the wrong sequence.
- In the next major release of the Assembly Cell the program clc_mapping_viewer will be removed.
- Fixed a bug in the de novo assembler that caused circular contigs containing scaffolds to be output with a negative scaffold size in AGP format.
- Fixed a bug in the read mapper that could result in a crash for reads longer than 20Kbp.
- Fixed a bug that caused the mapper to report wrong paired read distances when mapping to circular genomes.
- In the next major release of the Assembly Cell the program clc_mapping_viewer will be removed.
- A new tool for extracting a random subset of reads called clc_sample_reads
- clc_remove_duplicates, a tool for identifying and removing duplicated reads in datasets, is now out of beta and a part of the Assembly Cell
- The number of N's output by the de novo assembler has been further reduced and now N's primarily occur in assemblies when the scaffolder is used.
- Improved the performance of the Duplicate Removal tool to make it scale to larger datasets.
- clc_cas_to_sam now outputs information on the number and types of mismatches.
- Fixed crashes when outputting contigs in the de novo assembler when paired reads was used as input.
- Fixed an issue where circular contigs were extended too much by the de novo assembler.
- Fixed a crash in clc_cas_to_sam.
- Various small bug fixes.
- Fixed a bug causing clc_cas_to_sam to crash.
- Fixed bug which caused the de novo assembler to crash or go into an infinite loop when outputting contigs generated from paired reads.
- Fixed compatibility issues for the clc mapping viewer on windows 64 bit platforms.
- Fixed read mapper errors.
- Fixed de novo assembly error.
New de novo assembler
- Scaffolding is integrated into the assembly. This means better resolution of contigs and insertion of Ns when two contigs cannot be joined in sequence but there is pair information that connects them.
- -Automatic paired distance estimation: Using the -e option, the de novo assembler will estimate the fragment size of your paired data .
- Improved use of unpaired reads for resolving ambiguities in the de Bruijn Graph.
- Various improvements of the assembly quality.
- New parameter for specifying the maximum bubble size. There is a default value which is automatically calculated based on the input data.
- New white paper with benchmarks and results from quality control.
- Bug fix: Fixed a bug in the de novo assembler which caused an increased number of N's in the results, because the sequence of the read that spanned contigs was not looked up correctly. The de novo assembler now produces much fever N's for low coverage assemblies.
New read mapper
- Great improvement of speed for mapping (see whitepaper for more details on speed and quality)
- Support for complex genomes with many repeats
- The previous read mapper is still included as a legacy version to allow color space mapping which is not supported in the new mapper.
- The forward only mode of the clc_mapper now also works for paired reads.
Updated naming of tools
We have updated the names of the tools to be more consistent, and to reflect the use of "mapping" rather than "assembly" throughout the software. We have provided a helper script to assist updating existing scripts based on the old naming scheme. Read more here .
LicensingA new license tool is included that will make it very easy to:
- Download a license based on a license order ID. This would previously require some email exchange with CLC bio but can now be done in one go without involvement of CLC bio.
- Request and download an evaluation license directly.
Furthermore, it is now checked if the license is valid for the particular version of the CLC Assembly Cell.
A new restriction has been added for running the CLC Assembly Cell on large computers: if the system has more than 64 cores (hyper threaded cores), it will not be able to run with a static license. In this case, a network license is needed.
You can now trim adapters from sequencing reads prior to assembly or mapping. Read more here .
- Added support for read group information in castosam.
- Added support for non-specific reads in castosam
- Added progress on castosam and samtocas
- sort_pairs auto detects input files. Now supports for solid paired end and ion torrent files.
- Proper out of memory error messages are shown if a tool runs out of memory
- Various bug fixes
- Fixed problem with read mapping on computers with Japanese Windows.
For a complete list of older release, visit the CLC Assembly release archive.