CLC Bioinformatics Database
CLC Bioinformatics Database is a smart and efficient solution for managing centralized bioinformatics data in a 3-tier client/server architecture. The server contains one or more databases, and the clients are CLC Workbenches or could even be your existing applications.
The base components of the solution
- A database management system of choice; Microsoft SQL Server, Oracle, PostgreSQL, or MySQL.
- CLC Workbenches (clients) for interacting with the database.
- Thin client for administrating purposes. – Upload and download of data.
- CLC Database Middleware to ensure scalability and performance.
The most important benefits
- Seamless sequence data management
- Powerful data mining
- Flexible acccess control system and privilege management
- Custom metadata management
- Advanced data support
- Cross platform support on the client side (Windows, Mac OS X, and Linux)
- Mature API for customization and integration
Shared file system vs. CLC Bioinformatics Database
|Shared File System (standard in CLC Workbenches)||CLC Bioinformatics Database|
|Centralized storage of data|
|Flexible access control system and privilege management|
|Advanced custom meta data management|
|Mature data mining based on meta data including custom|
|Web based access to data, including data upload and download|
|3-tier data management to promote scalability and performance|
|Facilitate DBMS features as fx Database clusters|
|Support for existing IT infrastructure|
The CLC Bioinformatics Database includes the following features:
- Web Client Access
- CLC Workbench Access
- LDAP/Active Directory Support
- Database API
- Multi Session Support (250 concurrent sessions)
See the latest improvements of CLC Bioinformatics Database
CLC Bioinformatics Database includes the option of attaching an extra layer of data to all data-objects in the database like primers, reports, alignments, raw sequence data, etc. This extra layer of data – metadata – can be defined freely as needed by the database users.
- Physical location of primers and vectors
- Freezer position
- Number of samples at a given location
- “Has the report or result been quality checked?” (yes/no)
- LIMS identifiers
- Research project ID
Powerful data mining
The metadata are “searchable” by any user having access to the underlying data, and it is thus easy to get an overview of e.g. content of a freezer, data related to a given research project, or research results not yet quality checked.
Few examples of data mining:
- Simple search, where you just type a word or a phrase (like Google)
- Simple search but with AND/OR operations and wildcards (“*”). You can even search for “something like” or “almost” hits. For example “Numan~” would include objects starting with “Human” as results. Users can use the built in “guide” system for more advanced queries.
- Using a GUI-approach to searching, you can build an arbitrary complicated search expression. The concept is easy to use since the user does not need not know about technical details li e.g. “*” or “~”. GUI-based searches can be “stored in the database for later use or for sharing with other users.
CLC Bioinformatics Database includes the option of defining metadata like freezer position etc. These data are searchable like any other data.
Data mining in multiple environments
- Using a CLC Workbench
- Using a web browser
- Using other applications being integrated with the database using our Application Programming Interface (API).
For each data object, all metadata values can be viewed and edited from within the Workbench or from a web client.
Seamless sequence data management
Centralized data is vital to increasing the knowledge level and efficiency of work in almost any organization.
In many organizations, essential research data is however stored on several computers in various formats. Even employees’ personal computers often contain essential and confidential data.
The base components of the solution
With CLC Bioinformatics Database, data are easily shared between all users of CLC bio’s Workbenches across departments, across organizations, and across geographical boundaries.
This cross platform multi-user environment facilitates new levels of collaboration between researchers, and it eliminates tedious data communication using different kinds of data media like USB sticks, CDs, e-mails, and the like.
The result is a much more efficient work-flow and realization of even better research quality than before.
Organized giving a nice overview
In the CLC Bioinformatics Database solution, data is organized using the “Folder/File” metaphor, known from Microsoft Windows/Mac OS/Unix/Linux. This has proven to be a very good storage and browsing concept for all kinds of data, and is easy to use and easy to learn for new users
This concept of data structuring enables smooth transfer (drag-and-drop) of data from database to file-system and vice versa.
Upload and Download
Uploading and downloading data in nearly any format – including all major high-throughput sequencing formats – is one of the most important functionalities of any database.
With the CLC approach of importing data to the database, it has never been easier – in the Workbench you simply drag a folder of data of almost any format directly from your desktop to a database “folder”. And that’s it. Exporting data in almost any format is just as easy.
Multiple interfacing for uploading and downloading data
Another option is to use a web browser to upload and download data from any location, whether or not there is a Workbench installed on the computer.
Other more automated import/export actions can be performed using either the CLC Workbenches or other applications being integrated with the database using our Application Programming Interface (API).
Any kind of data may be stored in CLC Bioinformatics Database. It has native application support for the following data formats.
Supported data formats
- Phylip Alignments (.phy)
- Macromolecular Crystallographic Info File (.cif)
- Clustal Alignment (.aln)
- Embl (.emb)
- FASTA (.fsa)
- Vector NTI (.ma4, .pa4, oa4)
- Gene Construction Kit File (.gcc)
- Blast Db (.phr, .pal, .nhr, .nal)
- GCG Sequence (.gcg)
- GenBank (.gbk)
- Lasergene sequence (.pro, .seq)
- GCG Alignment (.msf)
- Newick (.nwk, .newick)
- Protein Data Bank (.pdb)
- PIR (.pir)
- CLC (.clc)
- Staden Sequence (.sdn)
- DNA Strider files (.str)
- SwissProt (.swp)
- Plain Text (.txt)
- Trace files (.abi, .ab1, .scf, .phd)
- Zip files (.zip)
- CT File
High-throughput sequencing formats
- Roche 454
- SAM/BAM mapping files
- Tabular mapping
Virtually any format can be supported using our Application Programming Interface.
Flexible access control and privilege management
The database solution includes traditional support for access permissions. Each user has a unique username and password. Basically, users can have “no access”, “read access”, and/or “write access” to specified areas of data.
Access rights and permission settings
The access rights are defined on the on the folder level of the data structure. Different access rights can be given to each folder, resulting in a very flexible data security architecture.
From an administrative point of view this is very important since it gives even more control of the data and who can edit it. The access privilege model is based on users, user-groups, and privileges attached to the “directories” in the database.
The directory of users can either be CLC Authentication Directory (running in the same DB instance as the Bioinformatics Database) or an LDAP or Microsoft Active Directory.
Customization and Integration
Integrating custom database schemes can be done by implementing a plugin in the CLC Database Middleware.
Customizations are either developed by the customer, by CLC bio, or by a team of CLC bio / customer employees.
The SOAP API
CLC Bioinformatics Database Middleware enables an Application Programming Interface (SOAP API) that makes it one of the most flexible database solutions in the market. It enables the following features that can all be implemented outside a CLC Workbench:
- Import/export of data to/from the database by creating your own client scripts or applications.
- Access and modification of any data structure and data content in the database
- Powerful data mining
- Migration of data from existing databases to CLC Bioinformatics Database
- Integration of third party technologies/tools with the database
- Metadata management
The CLC Workbenches also comes with an API (the Software Developer Kit). This ensures the option of full integration of the CLC Workbenches with existing databases.
These features can be used “command line”/scripting or directly from a program, or script making it possible to carry out batch operations and to integrate the database with almost any other system.
Example: CLC Database Integration Point and custom Integration Point (marked with red.)