Si trova su / Altri legami
© 2021 American Chemical Society.We propose a novel approach for building a classification/identification framework based on the full complement of RNA post–transcriptional modifications (rPTMs) expressed by an organism at basal conditions. The approach relies on advanced mass spectrometry techniques to characterize the products of exonuclease digestion of total RNA extracts. Sample profiles comprising identities and relative abundances of all detected rPTM were used to train and test the capabilities of different machine learning (ML) algorithms. Each algorithm proved capable of identifying rigorous decision rules for differentiating closely related classes and correctly assigning unlabeled samples. The ML classifiers resolved different members of the Enterobacteriaceae family, alternative Escherichia coli serotypes, a series of Saccharomyces cerevisiae knockout mutants, and primary cells of the Homo sapiens central nervous system, which shared very similar genetic backgrounds. The excellent levels of accuracy and resolving power achieved by training on a limited number of classes were successfully replicated when the number of classes was significantly increased to escalate complexity. A dendrogram generated from ML–curated data exhibited a hierarchical organization that closely resembled those afforded by established taxonomic systems. Finer clustering patterns revealed the extensive effects induced by the deletion of a single pivotal gene. This information provided a putative roadmap for exploring the roles of rPTMs in their respective regulatory networks, which will be essential to decipher the epitranscriptomics code. The ubiquitous presence of RNA in virtually all living organisms promises to enable the broadest possible range of applications, with significant implications in the diagnosis of RNA–related diseases.