What is bioinformatics?
The discipline of bioinformatics integrates biology and computer science to improve our understanding of complex biological systems. Bioinformatic analysis describes the processing of biological data from DNA, RNA, proteins, and other biological molecules using computational tools.
Where does bioinformatics fit into NGS workflows?
Bioinformatics plays many key roles in NGS workflows, starting at the very beginning (Figure 1). During project planning, insights from a bioinformatician can help researchers decide how many replicates (biological and/or technical) they need for each sample type, and also the required sequencing depth for a particular analysis. This is to ensure that robust analysis is possible, including the determination of statistically significant differences between samples.
Many NGS workflows include target enrichment—the use of a probe or primer panel to selectively capture genomic regions of interest—to focus sequencing reads on specific areas. The creation of these panels requires computer algorithms; this is true whether an expert designer helps to create the panel, or whether the researcher uses an online panel design tool designed for non-bioinformaticians, in which case the algorithms operate “behind the scenes.
Finally—and this is where the bulk of the “bioinformatics magic” happens— bioinformatics experts and algorithms take the raw data that comes off the sequencer (millions of sequencing reads) and convert it into meaningful biological information, including statistical analysis where relevant. For DNA sequencing, these results can include uniformity of coverage, the presence and frequency of genetic variants, on-target rates, or the de novo assembly of genomes. For RNA sequencing, these results can include differences in gene expression, the relative activity of different pathways, and the detection of RNA fusions and splice variants.
What are the three main stages of NGS data analysis?
For RNA sequencing, relative levels of gene expression are identified (often shown as a heat maps to make it easier to see patterns). This can be followed by pathway analysis (by examining which genes are most highly and lowly expressed in a given sample) and the identification of alternate transcripts such as RNA fusions and splice variants.
Tertiary analysis includes interpretation steps that describe the biological relevance of the NGS data. For example, this could include the identification of disease-associated variants or mutations that suggest a susceptibility to cancer or neurological conditions, or abnormally high or low expression of disease-associated genes. This stage can also include variant annotation in curated online databases.
What are “container solutions” in NGS bioinformatics?
A “bioinformatics container” refers to a group of software components that are linked together in a specific sequence within an analysis pipeline. This “containerization” of multiple algorithms enables scalable, reproducible analysis and documentation, and can increase analysis efficiency by automating the movement of data through each stage. This also can simplify the maintenance and sharing of complex bioinformatics resources, and can facilitate the alignment of results across research groups.
What roles can AI play in NGS bioinformatics workflows?
The implementation of artificial intelligence (AI) can further enhance and streamline the analysis of NGS data, especially as AI can automate and optimize numerous aspects of the process.
Want to learn more? For a discussion of how bioinformatics—including container solutions—can be used in the analysis of data from whole-exome sequencing (WES), watch the recorded webinar on Labroots linked here.
For Research Use Only. Not for use in diagnostic procedures.