Applying Machine Learning for VCF file filtering
Fájlok
Dátum
Szerzők
Folyóirat címe
Folyóirat ISSN
Kötet címe (évfolyam száma)
Kiadó
Absztrakt
An in silico experiment in which the sequencing data of chromosome 1 from a multitude of S. cerevisiae strains was utilized. The compendium of yeasts created by the Department of Molecular Biotechnology and Microbiology provided the data for many S. cerevisiae strains and their different sequencing runs (replicates). Variants from same-strain replicates were called and combined into VCF files which were then subjected to hard-filtering, and filtering by a convolutional neural network. The study describes a pipeline which utilizes the Genome Analysis Toolkit (GATK) for variant calling and filtration. Finally, the efficacies of both methods are compared, and their strengths and weaknesses are highlighted.
Leírás
Kulcsszavak
Bioinformatics, Saccharomyces cerevisiae, Next-generation sequencing, GATK