Genetic characterization of highly pathogenic H5N1 avian influenza viruses

The control of emerging and re-emerging highly contagious and economically important animal diseases, e.g. foot-and-mouth disease, classical swine fever, and in particular those that might pose threat to public health such as highly pathogenic avian influenza (HPAI), require the use of a broad range of tools both from the competent authorities and the laboratories involved.

Preparedness in diagnostic laboratories means recognition of the particular agent in good time, but obtaining information on the characteristics of the detected pathogen is also of increasing importance. The latter might help to trace the source of the infection, predict its spread and the potential epidemiological consequences by identifying virulence markers, receptor binding motifs, drug resistance traits, reasserting variants, etc., and elaborate the strategies to control it.

The data regarding the above considerations is most accurately generated by in-depth nucleotide sequence analysis of the pathogen of concern.

Today the notification of HPAI (Highly Pathogenic Avian Influenza) is based on a standardized series of tests performed in the national reference laboratories of all member states of the EU. In case of a suspicion of HPAI outbreak, the competent authorities order these laboratories to confirm or exclude HPAI and to discriminate it from Newcastle disease since it has a highly similar epidemiology and devastating effect on poultry industry. Cultivation of the clinical specimens in eggs and identification of the cultured agent with antisera is still the golden standard technique for influenza virus isolation [Office International des Epizooties (OIE), 2004].

Image h5n1_web

Figure 1:Colorized transmission electron micrograph of avian influenza A H5N1 viruses (seen in gold) grown in MDCK cells (seen in green)[Wikipedia, 2007].

Full-length sequencing is gaining ground

Recently, the molecular diagnostic methods have also gained ground and traditional as well as real-time PCR methods targeting the conserved region of the matrix protein gene are widely used for the detection of influenza A viruses. In positive cases the subtype of the detected virus is determined with other PCRs targeting the haemagglutinin (HA) and neuraminidase (NA) genes. In case of H5 or H7 subtypes it is followed by the pathotyping, i.e. the characterization of the nucleotide composition of the region flanking the HA cleavage site.

If the detected virus turns out to be an HPAI (that is subsequently confirmed by biological assays as well), strictly regulated measures must be applied as detailed in the respective National Contingency Plans. Although the above steps carried out in National Reference laboratories accomplish the compulsory laboratory investigations, thorough analysis of every isolate has become a routine in the more sophisticated laboratories in order to be better prepared for a possible outbreak.

In the case of the highly pathogenic H5N1 avian influenza, full-length sequencing protocols are being used in the diagnostic laboratories. These protocols can quickly provide comprehensive data on a detected H5N1 avian influenza virus (AIV) to animal and public health authorities facilitating their actions.

Part of the analysis is the handling of the obtained raw nucleotide sequence data. For this purpose, we use the CLC Main Workbench software that provides fast, versatile, and flexible ways of handling and analyzing sequences. For instance, the accuracy of the assembly of several raw nucleotide sequences is supported by showing the translation of the particular sequence into amino acids, which aids in spotting conflicts among sequence reads.

Work flow

Figure 2 gives an overview of the work following a suspicion and confirmation of an avian influenza outbreak.

Image workflow

Figure 2: Work flow following a suspicion of an avian influenza outbreak.

The process is initiated in the event of a suspected avian influenza outbreak, and the national authorities take action according to a pre-defined procedure. At the reference laboratories, the subtype is detected by sequencing the Haemagglutinin and the Neuraminidase strands (the "H" and "N" in the subtype H5N1 refer to Haemagglutinin and Neuraminidase, respectively).

Haemagglutinin is a precursor protein which is cleaved into two subunits. Adjacent to the cleavage site of the highly pathogenic variants there are several basic amino acids which are recognized by ubiquitous proteases. On the other hand, low pathogenic variants have a single Arginine at the cleavage site which means it is only recognized by extracellular proteases secreted by cells in the respiratory and intestinal tract [Neumann and Kawaoka, 2006].

Thus, sequencing of the Haemagglutinin cleavage site is used to determine pathogenicity, and in the next step of the work flow, this assessment of pathogenicity guides further actions to be taken by the authorities.

If a variant of the H5N1 avian influenza is detected, further PCR amplification and sequencing is performed to get the full-length sequence of the genome. The un-assembled chromatogram trace files are imported into the CLC Main Workbench where they are assembled to a reference sequence.

Subsequently, GenBank is searched for sequences from other potentially epidemiologically related H5N1 influenza virus sequences. The sequences are aligned and a phylogenetic tree is created to show relationships between the sequences. As the sequences are obtained from geographically dispersed influenza outbreaks, it highlights the relationships between the influenza variants found in different regions around the world.

This knowledge is then used be authorities to adjust and harmonize procedures for handling and preventing outbreaks. Furthermore, the sequences are submitted to online databases like the online genotyping tool for influenza A viruses [Lu et al., 2007] to further strengthen the efforts exerted towards the recognition, control, and prevention of avian influenza outbreaks.

Zooming in on sequence assembly

For assembly, the Assemble Sequences to Reference function is preferred since it produces the longest stretches of contiguous assemblies in case of short and not properly overlapping input sequences.

As an initial step, a BLAST search is performed using one of the longest and best raw sequences. The highest-ranking hit is then downloaded an used as a reference for the assembly (see figure 3).

Image blast_web

Figure 3: Result of a BLAST search using one of the longest reads from the sequencing as query.

When this sequence is used as reference for the assembly, an additional benefit is that the coding region is annotated, making it easy to translate the contig in the correct reading frame.

As shown in figure 4, the dynamic translation of both reads and consensus sequence is a great help when analyzing the sequence.

Image contig

Figure 4: A Haemagglutinin contig showing three reads (one reverse read and two forward reads). The reverse read has a T in position 848, whereas the other reads have an A.

This example shows the sequencing data of the Haemagglutinin gene assembled into a contig. At position 848 there is an T in the read at the top. The other reads plus the reference sequence show an A. Looking at the translation makes it easy to see that the T would result in a stop codon at this position. Since we are in the middle of a coding sequence (indicated by the yellow CDS annotation), a stop codon is very unlikely. This is also supported by the trace data where the red peak (indicating a T) is topped by the green peak (indicating an A).

The dynamic translation is also an aid in determining whether variations are synonymous or non-synonymous. In figure 5, the two reads display an A whereas the reference sequence displays a G. In the translation, an aspartic acid (D) is substituted for Asparagine (N). This is shown in red to indicate a non-synonymous substitution (a synonymous substitution would be colored yellow).

Image variation

Figure 5: A variation at position 536. Notice the translation where the Ns are colored red to symbolze a non-synonymous substition.

Zooming in on alignments and trees

The further analysis consists of comparing the sequences obtained with other variants of H5N1. First, more sequences are downloaded using the integrated GenBank search in the Workbench. With a few clicks, more than 50 sequences are downloaded.

Next, an alignment is created as a first step in comparing the sequences. The alignment is then used to create a phylogenetic tree as shown in figure 6.

Image tree

Figure 6: Phylogenetic tree comparing the Haemagglutinin gene of geographically dispersed occurences of avian influenza.

Because the sequences were downloaded from GenBank, they include all the information about where the virus was found, and this is displayed in the tree. You can see that the H5N1 variants are grouped together.

Conclusion

This case study shows how the CLC Main Workbench has been used in Highly Pathogenic Avian Influenza research. The Workbench has been used to assemble sequencing data, search databases, and create alignments and phylogenetic trees. Based on full-length sequencing, the relationships with other variants of H5N1 have been analyzed.