Samantekt:
The rapid progress in DNA sequencing technology is resulting in exponential growth in the volume of genome and metagenome sequences. In many cases, standard bioinformatics methods can assign genes to a family, but the exact function of the encoded protein remains unknown. Moreover, the generation of high quality DNA sequence is disproportionately time consuming and expensive. It is, therefore, important to develop methods for the structural and functional automatic annotation for the analysis of lower quality sequence starting from DNA reads. We have developed MEGGASENSE platform for the functional and structural automatic annotation of metagenomes/genomes. The initial functional analysis is carried out on reads before read assembly using in silico translation. However, unlike most other platforms, the resulting protein sequences are scanned using HMMs, which results in more effective detection of genes. Depending on the aims of the analysis, it is possible to use a generic library of profiles or a custom database of choice. The reads are subsequently being assembled. The assembled sequences can be used for ‘gene of interest’ analysis by BLAST within the platform. Finally, the user can browse the sequences with the Solr search engine, which is implemented within the graphical user interface. The utility of the MEGGASENSE will be illustrated with examples including metagenomes composed of novel anaerobic or microaerophilic heterotrophic species from thermophilic habitats that cover a range of physicochemical conditions. Using MEGGASENSE over 350 reads/genes potentially expressing carbohydrate-modifying enzymes were discovered showing identities ranging from 26 to 100% to genes present in GenBank.
