Approximately 13% of the human genome at certain motifs have the potential to form non-canonical (non-B) DNA structures (e.g. G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes, but also affect the activity of polymerases and helicases. Because sequencing technologies utilize these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. Single-nucleotide error rates were generally increased for G-quadruplexes (G4s) and Z-DNA motifs in all three technologies. Deletion error rates were also increased for all non-B types except for Z-DNA motifs in Illumina and PacBio HiFi, while in ONT they were increased substantially only for G4 motifs. Insertion error rates for non-B motifs were highly elevated in Illumina, moderately elevated in PacBio HiFi, and only slightly elevated in ONT. Thus, elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and when scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA.
Learning Objectives:
1. Describe alternative (non-B) DNA structures, how prevalent are they and what are their biological implications?
2. Explain why non-B DNA structures might be relevant for sequencing technologies.
3. Identify examples where non-B DNA induced sequencing errors might be relevant.