Computational Methods and Applications for Real-Time Identification of Species and Pathogens from Raw Read Sequencing Data.

Karimi, Ramin
Folyóirat címe
Folyóirat ISSN
Kötet címe (évfolyam száma)
A very small proportion (often cited as <1%) of the total microbial diversity in nature can be cultivated in the laboratory. The vast majority of them cannot be isolated or are extremely difficult to grow in the laboratory. In recent years, impressive progress has been made in the field of bioinformatics by technological advances of genome sequencing to perform whole genome sequencing in thousands of individuals. The emerging field of metagenomics provides a series of technical innovations for culture-independent scrutiny of microbial communities in the environment. It is a large-scale sequencing of the entire community, sampled directly from its natural environment. It provides new opportunities for gaining access to previously hidden phylogenetic, functional, metabolic, and ecological diversity of organisms and their community structure. While these technologies are constantly continued to offer increases in throughput, the time and cost of DNA sequencing continue to fall. Therefore, sequencing technologies are becoming applicable as a routine tool for diagnostic and public health microbiology. However, the complexity of the analysis and high-costs of the computational resources has encountered many challenges and obstacles to achieving this goal. One of the major challenges for metagenomics studies is the accurate identification of organisms present in complex environments. Although, a wide variety of assembling and alignment-based algorithms, software and computational analysis workflows have been subsequently developed, computational approaches for alignment-based identification of complex communities, without very extensive sequencing coverage are inadequate for even the most abundant members. In this research, we have proposed an alignment-free method and its appropriate pipelines and software for Real-time identification of species and strains from raw read sequencing data. The method tries to shortcut identification into a quick and accurate process in environmental and clinical sequencing samples, using parallel and distributed computing on commodity hardware for enhancing the applicability of the analysis as a routine process in the entire research community.
Big Data, Bioinformatics, Metagenomics, DNA Signature, Short Reads, Parallel Computing, Distributed Systems, Genome Databases