Peer-reviewed articles

MEGGASENSE -the Metagenome / Genome Annotated Sequence Natural Language Search Engine: a Platform for the Construction of Sequence Data Warehouses

Authors: Gacesa R., Zucko J., Pétursdóttir SK, Gudmundsdottir EE, Fridjonsson OH, Diminic J., Long PF, Cullum J., Hranueli D., Hreggvidsson GO, Starcevic A.

Version: Food Technology and Biotechnology

Publication year: 2017

Summary:

The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya. The implementation of a specialized metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel 'functional' assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyzes helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.

Link to article