Thumbnail Image

Effects of sample age on data quality from targeted sequencing of museum specimens: what are we capturing in time?

Background: Next-generation sequencing (NGS) can recover DNA data from valuable extant and extinct museum specimens. However, archived or preserved DNA is difficult to sequence because of its fragmented, damaged nature, such that the most successful NGS methods for preserved specimens remain sub-optimal. Improving wet lab protocols and comprehensively determining the effects of sample age on NGS library quality are therefore of vital importance. Here, I examine the relationship between sample age and several indicators of library quality following targeted NGS sequencing of ~1300 loci using 271 samples of pinned moth specimens (Helicoverpa armigera) ranging in age from 5 to 117 years. Results: I find that older samples have lower DNA concentrations following extraction and thus require a higher number of indexing PCR cycles during library preparation. When sequenced reads are aligned to a reference genome or to only the targeted region, older samples have a lower number of sequenced and mapped reads, lower mean coverage, and lower estimated library sizes, while the percentage of adapters in sequenced reads increases significantly as samples become older. Older samples also show the poorest capture success, with lower enrichment and a higher improved coverage anticipated from further sequencing. Conclusions: Sample age has a significant, measurable impact on the quality of NGS data following targeted enrichment. However, incorporating an uracil-removing enzyme into the blunt-end repair step during library preparation could help to repair DNA damage, and using a method that prevents adapter-dimer formation may result in improved data yields.
Journal Article
Type of thesis
©2020 The Author. This article is licensed under a Creative Commons Attribution 4.0 International License.