High-throughput short-read DNA sequencing has revolutionized our ability to measure genetic variation in the form of single-nucleotide polymorphisms (SNPs) in human genomes. However, ~75% of all variant bases are contained in larger, structural genome changes; this non-SNP DNA variation accounts for ~20-25% of all genetic variation events. These types of variation are more difficult to address with short-read sequencing because of its read length limitations. Structural genomic variation plays important roles in numerous diseases, e.g. many repeat expansion disorders such as fragile X syndrome (the most common heritable form of cognitive impairment), variable number tandem repeat (VNTR) disorders, or structural breakpoints in cancer, to name just a few.
In my talk, I will highlight how multi-kilobase reads from PacBio sequencing can resolve many of these previously considered 'difficult-to-sequence' genomic regions. The long reads also allow phasing of the sequence information along the maternal and paternal alleles, which I will exemplify by full-length, fully phased HLA class I & II gene sequencing. In addition, characterizing the complex landscape of alternative gene products is currently very difficult with short-read sequencing technologies, and I will describe how long-read, full-length mRNA sequencing can be used to describe the diversity of transcript isoforms, with no assembly required. Lastly, in the exciting area of gene therapy, I will highlight how long PacBio reads can more accurately and efficiently measure outcomes of genome editing studies.