Revealing The Wide-ranging Functions Within Uncommon CRISPR-Cas Systems Through profound Terascale Clustering.

Revealing The Wide-ranging Functions Within Uncommon CRISPR-Cas Systems Through profound Terascale Clustering.

The synopsis provided by the editor highlights the vast diversity inherent in microbial biochemical systems. Computational tools play a pivotal role in examining sequence data, a crucial process in uncovering novel and valuable components for the advancement of biotechnology. Altae-Tran et al., using an innovative method called deep terascale clustering, uncovered over 200 new functional systems associated with CRISPR, a technology focused on DNA manipulation. Among these findings are genes tied to precise DNA-editing mechanisms that could potentially enhance the safety of therapeutic genome editing. Additionally, the authors pinpointed Cas14, a CRISPR-Cas enzyme with the ability to precisely cut RNA. These discoveries hold promise in further refining DNA- and RNA-editing technologies, thereby widening their applications in both medicine and biotechnology. This summary is attributed to Di Jiang.

Structured Abstract

INTRODUCTION

The systematic exploration of sequencing databases emerges as a potent approach for unveiling protein families and functional systems. This method has revealed a diverse array of CRISPR-Cas systems, which act as microbial RNA-guided adaptive immune systems and serve as the foundation for numerous molecular technologies, notably programmable genome editing. However, current techniques for mining sequences are struggling to keep pace with the exponential growth of databases housing billions of proteins. This limitation hampers the discovery of rare protein families and their associations.

RATIONALE

The aim was to comprehensively list CRISPR-related gene modules within all publicly available sequencing data. Recently, there have been discoveries of previously unidentified biochemical activities connected to the recognition of programmable nucleic acids by CRISPR systems, including transposition and protease activity. The belief was that numerous diverse enzymatic activities might be linked to CRISPR systems, many of which could have low representation in existing sequence databases.

RESULTS

The team devised fast locality-sensitive hashing–based clustering (FLSHclust), a parallelized, in-depth clustering algorithm with linearithmic scaling built on locality-sensitive hashing. FLSHclust performs comparably to MMseqs2, a well-established quadratic-scaling algorithm, in terms of clustering efficacy. This method was employed in a sensitive CRISPR discovery pipeline, leading to the identification of 188 previously unreported CRISPR-associated systems, including several rare ones.

Four newly discovered systems underwent experimental characterization. For instance, an examination of a type IV system with an HNH nuclease domain inserted in the CRISPR-associated DNA damage-inducible gene G (DinG)–like helicase revealed RNA-guided protospacer-adjacent motif (PAM)–dependent directional double-stranded DNA (dsDNA) degradation. This process necessitated both adenosine triphosphate (ATP) hydrolysis and the HNH nuclease functions of the DinG-HNH protein. This marked the first demonstration of a type IV system with a specified interference mechanism. Additionally, two type I systems harboring HNH nuclease domains inserted in different subunits of Cascade (Cas8-HNH and Cas5-HNH) displayed precise dsDNA cleavage and single-stranded DNA (ssDNA) cleavage. The Cas5-HNH system also exhibited collateral cleavage of ssDNA. Both systems demonstrated potential for genome editing in human cells, with the Cas8-HNH system displaying high specificity. The study also delved into candidate type VII systems, which included a minimal Cas7-Cas5 effector complex and an exclusive interference protein featuring a β-CASP domain. These systems were likely derived from type III-E CRISPR systems and targeted RNA.

Further findings of CRISPR-linked systems encompassed potential effector and adaptation components, novel associations of Mu transposons with CRISPR systems, and numerous newly identified proteins and domains associated with type V systems. The study also observed a potential instance of Cas9 co-optation as an anti-CRISPR mechanism, alongside noting several non-CRISPR hypervariable regularly interspersed repeat arrays.

CONCLUSION

This study presents FLSHclust as an efficient tool for rapidly clustering millions of sequences, offering extensive applications in mining large sequence databases. The CRISPR-associated systems uncovered in this research signify an unexplored reserve of varied biochemical activities associated with RNA-guided mechanisms, holding significant potential for advancement in biotechnologies.