Development and evaluation of Total RNA sequencing quality controls for blood-based biomarker discovery
November 2023 – Poster Presentation
Cheryl L. Sesler1, Guzel I. Shaginurova1, Lukasz S. Wylezinski1,2, Jamieson D. Gray1, Elena V. Grigorenko1, Franklin R. Cockerill, III1,3, Julia A. Larsen4, Michael K. Racke4, Charles F. Spurlock, III1,2,5
1 Decode Health, Nashville, TN, USA
2 Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN 37203
3 Department of Medicine, Rush University Medical Center, Chicago, IL 60612
4 Quest Diagnostics, Secaucus, NJ 07094
5 Wagner School of Public Health, New York University, New York, NY 10012
Introduction. Next generation RNA sequencing unlocks quantitative and functional characterization of gene expression to discover candidate biomarkers for diagnostic and therapeutic applications. Reproducibility is a key challenge for reliable biomarker discovery as many biomarkers identified in preliminary studies do not exhibit consistency in larger subsequent studies. Gene expression is dynamic and affected by many biological, environmental, and technical factors, including variations in laboratory processes and analysis tools. The lack of standardized procedures and evolving analysis methods present challenges for reliable biomarker discovery, especially for cell-free applications. Quality controls (QCs) must be considered through all stages of biomarker discovery efforts. Here, we describe a comprehensive QC framework with practical considerations and limitations for effective biomarker discovery leveraging total RNA sequencing, improving reliability, and downstream translation.
Methods. Metrics were developed to establish a QC framework from sample collection through wet laboratory processes (RNA isolation, RNA cleanup, DNase treatment, library preparation), sequencing, and sequence analysis (alignment and quantification). Total RNA sequencing was performed using RNA isolated from whole blood (PAXgene Blood RNA tubes) or plasma (Streck RNA Complete BCT). Control RNA samples were created to monitor variations across sequencing batches. Differential gene expression and machine learning (ML) algorithms were leveraged to identify candidate biomarkers in samples meeting QC criteria.
Results. Incorporating both internal and previously described best practices, we developed and integrated a QC framework for total RNA sequencing. This framework was applied to a catalog of 511 clinical specimens spanning multiple medical disciplines. RNA integrity and genomic DNA contamination exhibited the highest impact on downstream results. Inclusion of an additional DNase treatment reduced genomic DNA contamination and provided increased alignment of sequenced reads to exons and decreased mapping to intergenic regions. Sequence alignment metrics, including mapping rate and number of unique, multi-mapped, and intronic reads, exhibited the highest correlation to RNA integrity number (RIN).
Conclusions. A robust QC framework was developed spanning sample processing through sequencing analyses and implemented across multiple sample types for biomarker discovery with RNA sequencing. Integrating multi-faceted QC processes can improve design decisions and enhance the confidence and reliability of results. These steps are necessary to translate early discoveries into actionable biomarkers, leading to more efficient and effective development efforts.