Investigating bioinformatic pipelines for De novo variant identification in New Zealand dairy cattle
Permanent link to Research Commons versionhttps://hdl.handle.net/10289/16193
New Zealand is the biggest single entity for milk and milk product export in the world and being the country’s main industry, keeping track of bovine mutations allows influence of the national herd’s health and the ability to breed cattle better suited to the current farming and climate needs. With advances in next-generation sequencing of whole genomes, research understanding and tracking germline de novo mutations (DNMs) is attainable. However, current techniques being used to identify and study candidate DNMs are unvetted and experimental. Some of these techniques include bioinformatic software packages such as DenovoCNN, DenovoGear, PedFilter, and RUFUS. This thesis aimed to investigate and test two of these identification pipelines, DenovoGear and PedFilter using next-generation sequencing data from bovine trios. Output data was generated by writing scripts and extracting data from six trios into Excel workbooks or manual IGV validation of selected candidate DNMs. Of the total 714,393 candidates identified across PedFilter and DenovoGear, only 161 of them were identified by both programs. After Integrative Genomic Viewer validation, 50% of these candidates were deemed to be true positives while only 12% were deemed false positives. Data analysis shows neither technique is 100% accurate and they both should be used with secondary filtering and confirmation of the identified DNM candidates. It was also apparent that the different techniques would be better suited to different study situations. For example, (1) For smaller studies (e.g., 3–15 trios), with limited data available to them, using DenovoGear with a secondary software program for filtering would be best (2) If sufficient samples are available to provide a simulated population data set, PedFilter has higher accuracy than DenovoGear on its own (3) For fast analysis without requiring heavy filtering of the output data use a combination of both PedFilter and DenovoGear. Lastly, it is clear that further developments and more testing is required in this area of research before a set standard of practice or guidelines can be recommended and implemented. It is suggested to testing DenovoCNN and Rufus or branching out to other packages to trial like SynthDNM, DNMFilter_INDELs or HAPDeNovo.
The University of Waikato
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
- Masters Degree Theses