Prokaryotic reverse transcriptases: from retroelements to specialized defense systems

Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, C/ Profesor Albareda 1, 18008 Granada, Spain

Find articles by Alejandro González-Delgado

Mario Rodríguez Mestre

Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, C/ Profesor Albareda 1, 18008 Granada, Spain

Department of Biochemistry, Universidad Autónoma de Madrid and Instituto de Investigaciones Biomédicas “Alberto Sols”, CSIC-UAM, Madrid, Spain

Find articles by Mario Rodríguez Mestre

Francisco Martínez-Abarca

Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, C/ Profesor Albareda 1, 18008 Granada, Spain

Find articles by Francisco Martínez-Abarca

Nicolás Toro

Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, C/ Profesor Albareda 1, 18008 Granada, Spain

Find articles by Nicolás Toro

Alejandro González-Delgado, Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, C/ Profesor Albareda 1, 18008 Granada, Spain;

Corresponding author: Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, C/ Profesor Albareda 1, 18008 Granada, Spain. Tel: (+34) 958 181600; Fax: +34 958129600; E-mail: se.cisc.zee@orot.salocin

Received 2021 Jan 15; Accepted 2021 May 7. Copyright © The Author(s) 2021. Published by Oxford University Press on behalf of FEMS.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact moc.puo@snoissimrep.slanruoj

Associated Data

fuab025_Supplemental_File. GUID: B980B99E-197B-4DB2-8E3D-70F39D13424E

ABSTRACT

Reverse transcriptases (RTs) catalyze the polymerization of DNA from an RNA template. These enzymes were first discovered in RNA tumor viruses in 1970, but it was not until 1989 that they were found in prokaryotes as a key component of retrons. Apart from RTs encoded by the ‘selfish’ mobile retroelements known as group II introns, prokaryotic RTs are extraordinarily diverse, but their function has remained elusive. However, recent studies have revealed that different lineages of prokaryotic RTs, including retrons, those associated with CRISPR-Cas systems, Abi-like RTs and other yet uncharacterized RTs, are key components of different lines of defense against phages and other mobile genetic elements. Prokaryotic RTs participate in various antiviral strategies, including abortive infection (Abi), in which the infected cell is induced to commit suicide to protect the host population, adaptive immunity, in which a memory of previous infection is used to build an efficient defense, and other as yet unidentified mechanisms. These prokaryotic enzymes are attracting considerable attention, both for use in cutting-edge technologies, such as genome editing, and as an emerging research topic. In this review, we discuss what is known about prokaryotic RTs, and the exciting evidence for their domestication from retroelements to create specialized defense systems.

Keywords: reverse transcriptases, group II introns, retrons, DGRs, CRISPR-Cas systems, Abi systems

Prokaryotic reverse transcriptases (RTs) are attracting considerable attention, both for use in cutting-edge technologies, such as genome editing, and as an emerging research topic. In this review, the authors discuss what is known about prokaryotic RTs, and the exciting evidence for their domestication from retroelements to create specialized defense systems.

INTRODUCTION

According to the central dogma of molecular biology proposed by Francis Crick in 1958, once genetic information was converted to proteins the flow could not be reversed. However, it left open the possibility that information could be transferred from RNA to DNA (Crick 1958). The discovery of enzymes capable of synthesizing DNA from an RNA template demonstrated this hypothesis (Crick 1970). RNA-dependent DNA polymerases, also known as reverse transcriptases (RTs), were discovered in 1970 in RNA tumor viruses (Baltimore 1970; Temin and Mizutani 1970), and this finding reshaped existing views on the functioning of all forms of life. In prokaryotes, RTs were discovered almost 20 years later as a key component of retrons (Lampson et al. 1989; Lim and Maas 1989). Currently, prokaryotic RTs display an extraordinary diversity, with most of them (80%) classified into three main groups: those encoded by group II introns, retron/retron-like sequences and diversity-generating retroelements (DGRs). The remaining RT sequences form distinct lineages, including those associated with CRISPR-Cas systems, abortive phage infection systems (Abi-like) and other uncharacterized RTs clustering in the so-called G2L (group-II-like) and unknown groups (UG; Toro et al. 2019a). While the RTs of eukaryotes and viruses have been characterized in detail, it has taken over 30 years for us to begin to understand the function and mechanisms of the highly diverse RTs of prokaryotic organisms (Fig. 1 ).

An external file that holds a picture, illustration, etc. Object name is fuab025fig1.jpg

Timeline of research on prokaryotic RTs. Over the last almost 40 years since the discovery of a small DNA satellite in 1984, which turned out to be synthesized by an RT, research in this field has expanded spectacularly. Here, we show the main breakthroughs concerning the various functional groups of prokaryotic RTs.

Recent investigations have shown that several prokaryotic RT lineages have evolved to provide new strategies for dealing with phages. Thus, together with Abi-RTs (Fortier, Bouchard and Moineau 2005; Odegrip, Nilsson and Haggård-Ljungquist 2006; Durmaz and Klaenhammer 2007) and CRISPR-RTs (Kojima and Kanehisa 2008; Toro and Nisa-Martínez 2014), searches for new antiviral immunity systems in defense islands have led to the discovery of anti-phage properties in a majority of retron types, and several RTs from uncharacterized groups (Gao et al. 2020; Millman et al. 2020). The relevance of these enzymes also lies in the use of different RT lineages as promising tools for diverse applications, including genome editing, recombination-mediated genetic engineering applications and the recording of biological information in bacterial genomes (Schmidt, Cherepkova and Platt 2018; Belfort and Lambowitz 2019; Simon, Ellington and Finkelstein 2019). These potential uses have boosted interest in these prokaryotic enzymes and their associated systems.

Group II intron-RTs are the best-characterized in prokaryotes (Ferat and Michel 1993; Lambowitz and Zimmerly 2011), but very little is known about their ecological role. These mobile retroelements are generally considered to be selfish units, but a few studies have provided evidence to suggest that they can disrupt other mobile genetic elements (MGE), participating in host defense against these potentially harmful elements (Chillón, Martínez-Abarca and Toro 2011; Qu et al. 2018). DGRs provide their hosts with an adaptive advantage due to the great sequence variation that they cause in a target gene through a mutagenic reverse transcription reaction (Liu et al. 2002; Wu et al. 2018). The wide range of functions performed by DGRs have yet to be elucidated, but they are known to participate in tropism switching and signaling pathways potentially involved in virus–host interactions. Thus, our current knowledge of prokaryotic RTs suggests that they tend to be domesticated to perform immune functions in the host cell through a wide range of biological mechanisms that remain incompletely understood. The overall purpose of this review is to highlight the role of prokaryotic RTs in immunity to phages and MGEs mediated by selfish genetic elements to more specialized systems, and to summarize the use of these enzymes in cutting-edge technologies.

Groups of prokaryotic RTs

Group II introns

Group II introns were first identified in the mitochondrial and chloroplast genomes of lower eukaryotes and plants. They were not found in prokaryotes until 1993, and their mobility has since been characterized in detail (Ferat and Michel 1993; Michel and Ferat 1995; Dai and Zimmerly 2003; Toro 2003; Lambowitz and Zimmerly 2004). Group II introns are self-splicing RNAs that require an ancient form of RT to act as mobile retroelements. This type of RT is the most abundant form in bacteria, accounting for almost 50% of total RT diversity (Toro et al. 2019a). Bacterial group II introns generally insert into sequences outside of essential genes or into non-essential genes, and show an extinction–recolonization dynamics suggesting that they are deleterious to the host (Leclercq and Cordaux 2012). With few exceptions, these introns are typically found as only one or two copies per genome (Dai and Zimmerly 2002). This tendency for group II introns to be deleterious may drive the domestication of RT-containing genes, and their conversion into non-mobile RTs, presumably with novel functions (Fig. 2 ). Indeed, although group II introns have given rise to more complex retroelements such as eukaryotic non-LTR retrotransposons, they are also considered the evolutionary ancestors of the spliceosome complex and the telomerase, which represent greats examples of how group II introns could generate new functions by domestication (Box 1).

An external file that holds a picture, illustration, etc. Object name is fuab025fig2.jpg

RT domestication from an ancestral retroelement. The scheme depicts a hypothetical scenario of the domestication of the different RT lineages from an autonomous mobile retroelement, probably an ancestral group II intron. The RNA domains (I–VI) as well as the exon regions (E1 and E2) and the different domains of the intron encoded protein (RT: Reverse Transcriptase; X: Maturase; D: DNA-binding and En: Endonuclease) are indicated. At some point, the intron RNA component was lost, and the remaining RTs coevolved with their genomic context, resulting in the recruitment of RTs to various specialized systems. The G2L RTs may represent an intermediate state between a mobile group II intron and a domestication of these RTs leading to nascent functional associations. Here, an example of a putative association between G2L-RTs and a multicomponent system comprised by the queousine biosynthesis pathway QueA, QueC, QueE (queuosine biosynthesis pathway) and FGE-s (FGE sulfatase), YaaA (DNA-binding protein) and Gly (Glycosyltransferase) proteins is shown. The RTs associated with CRISPR-Cas systems present different stages of association: first a group II intron was inserted into the genomic context of a cas1 gene. Following a loss of mobility, the remaining RT coevolved with cas1, resulting in a functional association. Subsequently, RT and Cas1 were fused and, later, a Cas6 domain was acquired independently. Alternatively, the RT and AEP primase domain (Prim_S) may have fused to form a particular group of RT-CRISPR systems. DGR RTs have evolved to hypermutate the variable region (VR) of several target genes with a specific fold domain and through a mutagenic reverse transcription of a target region (TR). DGRs are typically assisted by various ancillary proteins including Avd (Accessory variability determinant), HRDC (Helicase and RNaseD C-terminal domain), MSL (MutS-like) and CH1 (Conserver Hypothetical Gene 1). In the case of retrons, RTs have become associated with small ncRNAs and an effector module, some of them forming tripartite toxin/antitoxin systems with antiviral properties. The wide variety of effectors suggests that the retron unit is highly modular and, in some cases, the RT and other domains have fused (TOPRIM: topoisomerase-primase; DUF3800; peptidase and TIR: toll-interleukin receptor). The RTs from the Abi-like/UG lineage are highly divergent and phylogenetically distant from those of group II introns, suggesting that this lineage may represent an old domestication event, creating a new mechanism of defense against phages. The Abi RTs are fused to unknown domains, except for AbiA, in which the RT is fused to a HEPN domain. Most RTs from the UG lineage remain uncharacterized, but some have been shown to confer resistance to phages and are now known as defense-associated RTs (DRTs). DRTs may consist of the RT itself, but the RT is also often fused to a nitrilase domain or associated with other RT proteins. The arrows denote events that have been inferred during the domestication of the different RT lineages.

BOX 1. Role of group II introns in the origin of eukaryotic life

RTs may have played a key role in the transition from the simplest RNA molecules to the current DNA world (Iyer, Koonin and Aravind 2003; Mustafin and Khusnutdinova 2019). RNA-dependent RNA polymerases (RdRP) are thought to be the evolutionary ancestors of the RT protein family (Ellefson et al. 2016; Koonin et al. 2020). There has been considerable speculation about the evolution of RTs since the most recent common ancestor (Eickbush 1997; Curcio and Belfort 2007). In this scenario, group II introns are believed to be one of the drivers of eukaryotic evolution, as they are considered to be the ancestors of nuclear spliceosomal introns, non-LTR retrotransposons and telomerase after colonizing eukaryotes through bacterial endosymbionts that evolved into mitochondria and chloroplasts (Lambowitz and Belfort 2015; Novikova and Belfort 2017; Haack and Toor 2020). It has also been suggested that group II intron proliferation in primitive eukaryotic cells may have promoted the formation of a nuclear envelope to separate splicing from translation (Martin and Koonin 2006).

The spliceosome is an RNP complex, formed by five small nuclear RNAs (snRNAs) and about 80 proteins, which eliminates introns from pre-mRNA molecules (Wahl, Will and Lührmann 2009). Parallels between group II introns and spliceosomal introns suggest that the later ones evolved from group II introns by fragmentation and reassembly (Chalamcharla, Curcio and Belfort 2010; Qu et al. 2014). Functionally, both elements require two transesterification reactions for splicing, with a bulged adenosine involved in the first step (Lambowitz and Belfort 2015). Structurally, the organization of the active site of the group II intron RNP is similar to that of the yeast spliceosome (Agrawal, Wang and Belfort 2016; Qu et al. 2016), and the architecture of the maturase domain of various group II introns is similar to that of Prp8 in the spliceosome complex (Zhao and Pyle 2016). All these similarities suggest a strong relationship between group II introns and the spliceosome. On the other hand, the RTs of group II introns, non-LTR retrotransposons and telomerase act through analogous mechanisms, suggesting that ancestral group II introns also gave rise to these other elements (Eickbush and Malik 2002). Once non-LTR-retrotransposons had evolved from group II introns, they may have given rise to other more complex retroelements, such as LTR-retrotransposons, through the incorporation of an integrase. Finally, retroviruses are believed to have evolved from LTR-elements by incorporating envelope genes from other viruses (Eickbush and Jamburuthugoda 2008; Finnegan 2012).

Group II introns consist of a catalytic RNA, with characteristic conserved 5′- and 3′-end sequences, GUGYG and AY, respectively, resembling those of spliceosomal eukaryotic introns, and an intron-encoded protein (IEP; Lambowitz and Zimmerly 2011). Catalytic intron RNAs have a conserved secondary structure of 400–800 nts in length and organized into six domains, DI–VI, radiating from a central ‘wheel’ (Michel, Costa and Westhof 2009). The IEPs, which are encoded in DIV, have several different domains involved in group II intron retromobility (Lambowitz and Zimmerly 2004): an N-terminal RT domain, an X domain or maturase involved in facilitating RNA splicing, a D domain involved in DNA binding and a metal-dependent DNA endonuclease domain of the HNH family that cleaves a target DNA strand to generate the primer for reverse transcription (San Filippo and Lambowitz 2002). However, a large number of bacterial group II introns encode IEPs without an endonuclease domain. The best studied of introns that lack this domain is the Sinorhizobium meliloti RmInt1, which uses a mechanism associated with DNA replication to prime reverse transcription (Martínez-Abarca et al. 2004; García-Rodríguez et al. 2019).

The mobility of group II introns requires RNA splicing via two transesterification reactions, the first of which starts with a nucleophilic attack of 2′-OH on a bulged adenosine in DVI, with the second reaction resulting in the generation of an excised intron lariat and exon ligation (Fig. 3A ). The IEP facilitates this stage, by acting as a maturase, with the RT and X domains functioning together to bind the intron RNA specifically, thereby promoting the formation of a stable ribonucleoprotein (RNP) complex (Belfort and Lambowitz 2019). This active ribozyme can migrate to new DNA targets in the host genome by a process involving a target-primed reverse transcription (TPRT) mechanism, in which the RT domain synthesizes the intron cDNA of the reverse-spliced intron RNA in one strand of a double-stranded-DNA target site (Lambowitz and Zimmerly 2011). Intron mobility can occur via retrotransposition, wherein the intron is introduced into ectopic sites, but the principal mobility pathway of group II introns is retrohoming, in which the intron is inserted into a target region of the host genome (Belfort and Lambowitz 2019).

An external file that holds a picture, illustration, etc. Object name is fuab025fig3.jpg

Role of RTs in prokaryotic immunity. (A), Group II intron life cycle. The RNA domains (I–VI) as well as the exon regions (E1 and E2) and the RT and homing nuclease domains of intron encoded protein are indicated. Intron splicing steps form a lariat RNA which constitute a complex with the IEP. As soon as the target DNA is found, intron reverse splices into the target DNA, frequently a mobile genetic element, abolishing its mobility via a process termed retrohoming. (B), Retrons are tripartite modular systems, some of which have been shown to act as toxin/antitoxin systems. The retron unit is comprised by a non-coding RNA (ncRNA) and the RT gene. The RT produces an msDNA at high copy number using the ncRNA as template. This msDNA remaining bound to the RT, forming the antitoxin unit. The effector module, formed by one or more genes with different enzymatic functions, constituting the toxin effector, is inhibited by direct contact with the antitoxin. By an as yet uncharacterized mechanism, when a phage infects the cell, triggers the msDNA processing/degradation, allowing the toxin to cause cell death, to protect cell population. (C), Diversity-generating retroelements (DGRs) introduce random mutations into the template repeat (TR), in a reaction known as mutagenic reverse transcription, which is performed by the RT with the help of an ancillary protein (typically Avd: Accessory variability determinant). The mutated cDNA is then integrated into the variable region (VR) of a target gene (TG), generating considerable sequence variability. The TG generally encodes a protein displaying protein–protein interactions or with surface activities enabling the host cell to adapt to different conditions, such as host–phage interactions. (D), RTs associated with RNA-targeting CRISPR–Cas systems (type III and VI) form an integrase complex together with Cas1 and Cas2, facilitating the acquisition of RNA molecules, which are integrated into the CRISPR array as a new spacer. (E), RTs involved in abortive infection (Abi) systems are part of AbiA, AbiK and Abi-P2, but their mechanisms of action remain unknown. In AbiA, the RT is fused to a HEPN domain and is thought to degrade host or phage RNA to confer resistance. In AbiK, a random-sequence DNA molecule remains covalently attached to an OH-group on a tyrosine residue in the RT domain, which is fused to an unknown domain. AbiK uses an as yet incompletely undeciphered mechanism to block phage Sak proteins, conferring immunity. Abi-P2, which has demonstrated reverse transcriptase activity, is formed by a RT and a domain of unknown function capable of excluding phages by an as yet undetermined mechanism. (F), Defense-associated RTs (DRTs) are novel antiviral systems in which RT activity is required for resistance, through an unknown mechanism. These systems consist of RTs from different unknown groups (UGs) that act alone (DRT2, DRT4 and DRT5), together with small membrane proteins (TM; DRT1), or, in case of DRT3, two RTs from two different UGs and a ncRNA are required for anti-phage properties.

Despite detailed characterization of the mobility pathways of group II introns, the ecological implications of these retroelements in host cells are poorly understood, with only a few studies published to date. Group II introns, which are considered to be selfish elements, tend to localize at higher densities on plasmids than on chromosomes and frequently hide in other MGEs, such as other group II introns, a broad range of transposases and some phage-related proteins, disrupting these elements and their functions (Waldern et al. 2020). In one study, following the acquisition of the RmInt1 group II intron by conjugative transfer, the colonization of homing sites, typically insertion sequences (ISRm2011-2 and close homologs), was found to occur at high frequency via the preferred retrohoming pathway, with sites located on the template for lagging-strand synthesis invaded first, followed by those on the leading strand template (Nisa-Martínez et al. 2007). The splicing of RmInt1 naturally inserted into an IS interrupting a transposase gene is almost completely abolished, but this intron retains its invasion capacity, suggesting that group II introns may, to some extent, control the spread of other MGEs (Chillón, Martínez-Abarca and Toro 2011). These findings suggest that group II introns may have an evolutionary role in circumventing splicing, preventing the mobility of harmful elements in the bacterial cell and forming a particular defense system.

Consistent with this hypothesis, another group II intron integrated into a relaxase gene on a conjugative plasmid has been shown to inhibit host gene expression and to restrain the naturally cohabiting mobile element, preventing its conjugative horizontal transfer by decreasing the levels of spliced mRNA (Qu et al. 2018). This process seems to function as a defensive barrier, limiting the spread of other mobile elements acting as general inhibitors of gene expression. However, the relaxase stimulates intron dispersal by nicking the conjugative plasmid and the chromosome (Novikova et al. 2014). Thus, the relaxase facilitates plasmid dispersal and retrotransposition events, whereas the group II intron regulates relaxase expression, maintaining a balance that may be positive for the host.

Contrasting with this inhibitory function, one recent study showed that group II introns could increase genetic diversity by creating chimeric relaxase variants through the shuffling of coding sequences in RNA and DNA, thereby showing that these retroelements can be beneficial to the conjugative elements that harbor them and to their bacterial host (LaRoche-Johnston, Bosan and Cousineau 2020). Moreover, the existence of host factors acting as global regulators of intron mobility has been demonstrated. Some of these factors, such as RNAse E, act as depressors, whereas others, such as the alarmones ppGpp and cAMP, act as stimulators, demonstrating the role of nutritional stress in the activation of these retroelements (Coros et al. 2008, 2009; Nisa-Martínez, Molina-Sánchez and Toro 2016). Thus, group II introns may act by preventing the damage caused by other MGEs activated by stress conditions. Although these findings support that group II introns occasionally facilitate host adaptation, they are also in agreement with the reported selfish behavior of these retroelements that allows them to spread and survive within their bacterial host.

Retrons

In 1984, an unusual single-stranded DNA, known as multicopy single-stranded DNA (msDNA) was found to accumulate to high levels in the bacterium Myxococcus xanthus (Yee et al. 1984). It was subsequently shown that msDNA was produced by an RT using a two-region (msr and msd) non-coding RNA (ncRNA) as a template, forming a unit called retron (Inouye et al. 1989; Lampson et al. 1989; Lim and Maas 1989). This was the first demonstration of RT activity in prokaryotic organisms (Inouye 2017). Retrons have been studied in detail biochemically (Lampson, Inouye and Inouye 2005), but their biological role has remained unknown more than 30 years after their discovery (Simon, Ellington and Finkelstein 2019). However, several independent studies have shed light on the function of retrons, proposing that they act as a novel prokaryotic defense system against phages (Gao et al. 2020; Millman et al. 2020). Interestingly, these studies show that the msDNA molecule is crucial for the antiviral activity of retron systems (Gao et al. 2020; Millman et al. 2020; Bobonis et al. 2020a).

During msDNA synthesis, the RT protein is bound to the transcribed ncRNA just downstream from the msd region, where it initiates a reverse transcription reaction using a 2′-OH group present in a conserved branching G residue in the msr region as a primer. The resulting msDNA remains covalently attached to the msr RNA as a single branched molecule through a 2′-5′ phosphodiester bond (Shimamoto, Inouye and Inouye 1995). Despite the considerable divergence of msr/msd sequences in the small number of experimentally validated retrons, all these sequences have a number of structural properties in common, including complementary 5′ and 3′ ends of the ncRNA, facilitating formation of the secondary structure of the RNA. The msr region presents a variable number of short stem-loops and the msd region folds into a single hairpin with a long stem, all of these features being indispensable for msDNA production (Lampson, Inouye and Inouye 2005; Simon, Ellington and Finkelstein 2019).

The recent expansion of the range and diversity of known retrons based on genome survey analyses has made it possible to increase the number of putative retrons from tens to thousands, with most of these elements containing in RT motif 7 the characteristic ‘VTG’ signature (Toro et al. 2019a). About a third of annotated retrons were thought to encode an ancillary gene (Simon, Ellington and Finkelstein 2019), but a computer pipeline for the systematic prediction of genes specifically associated with retrons has shown that approximately 75% of retrons include an additional component, as an independent gene or a RT-fused domain (Mestre et al. 2020); retrons should, therefore, be considered tripartite systems (Fig. 2 ). Interestingly, this report showed that the use of covariance models and consensus RNA structure detection makes it possible to identify putative retron ncRNA consensus structures, even in groups with no experimentally validated representatives. A comparison of the phylogenies of the three retron components indicates that retrons are not only highly modular, with the same type of RT associated with different domains and vice versa, but that they have also co-evolved, suggesting that they may act as a functional unit. Moreover, based on the high diversity of putative enzymatic activities encoded by the genes or associated domains, retrons have been classified into 13 types and 25 subtypes, revealing a tremendous diversity of possible mechanisms and biological functions not only related to defense (Mestre et al. 2020).

Interest in prokaryotic defense mechanisms against phages has led to several strategies for searching for novel immunity systems. The most successful has involved searching for clusters of antiviral system genes in defense islands (Doron et al. 2018). This approach has led to some retron types being identified as abundant in these islands, and two independent reports (Gao et al. 2020; Millman et al. 2020) have recently shown that some retron systems confer resistance to a wide range of phages. In addition, mutations of the three components of the system have been shown to abolish immunity, indicating that all three are required for correct activity. The isolation of phages able to overcome the resistance conferred by Retron-Eco6 (Ec48), a type IV retron system according to the recent classification (Mestre et al. 2020), led to the detection of genetic mutations inactivating a phage RecBCD inhibitor. This suggests that Retron-Eco6 acts as a RecBCD guardian, sensing the presence of phage-encoded RecBCD inhibitors and somehow activating the associated protein, in this case, a two-transmembrane domain protein, resulting in cell death (Millman et al. 2020). However, Retron-Eco8, a type I-B2 retron system (Mestre et al. 2020), has been shown to act independently of RecBCD (Millman et al. 2020), highlighting the diverse possible modes of action underlying the antiviral activity of retrons, potentially due to highly diverse enzyme activities.

In parallel, Retron-Sen2 (St85), a type I-B1 retron system (Mestre et al. 2020), has been shown to act as a novel type of tripartite toxin/antitoxin (TA) system, in which the RT and msDNA form the antitoxin directly blocking the toxin unit called RcaT. This protein has been predicted to present a N-terminal nucleoside deoxyribosyltransferase-like (NDT) domain and a C-terminal DNA binding domain (Mestre et al. 2020). Furthermore, the toxicity of RcaT increases in cold temperatures or anaerobic conditions (Elfenbein et al. 2015; Bobonis et al. 2020a) and its binding to the RT protein does not interfere with its toxicity (Bobonis et al. 2020a). However, the presence of the RT-msDNA complex binding the effector protein determines antitoxin specificity (Fig. 3B ). Phage-origin triggers and prophages-encoded blockers of this novel TA system have also been identified, suggesting an extensive arms-race between retron systems and phages (Bobonis et al. 2020b). Some of the triggers identified (Dam and RecE) have been shown to directly interact with the RT-msDNA complex, methylating, cleaving and/or binding the msDNA component and, therefore, triggering the activation of the toxin. Moreover, Dam and RecE have anti-restriction properties and could lead to Abi mediated by RcaT through the inactivation of RT-msDNA antitoxins, suggesting crosstalk between the innate/adaptive immunity systems and this tripartite TA system. One of the identified blockers is RacC, a small protein encoded by Rac-prophage which acts directly against the toxin activity, raising the possibility of the existence of phage-encoded anti-retron systems.

Despite the great progress in our understanding of retrons based on these recent discoveries, many biological questions remain unanswered, concerning the ways in which different types of retron systems sense phages, how the antitoxin is inactivated, or the role of the toxin in the final step in this form of immunity. Moreover, new types of retrons with new ncRNA structures have recently been identified but have not yet been experimentally characterized (Mestre et al. 2020). Further studies are therefore required to improve our understanding of the biological role of retrons as an anti-phage system and of the other putative functions of these elements derived from their potential for sensing other cell signals.

Diversity-generating retroelements

DGRs are a unique type of domesticated RT-containing system that have evolved to provide benefits to the host through a reverse transcription reaction generating broad sequence variability in a specific target gene (Zimmerly and Wu 2015). It has been suggested that they originated from a loss of movement capacity in another type of retroelement followed by diversification, and they are widespread in phages, plasmids, bacterial and archaeal genomes (Paul et al. 2017; Yan et al. 2019; Roux et al. 2020). The functional unit of DGRs is highly diverse and formed by a variable gene cassette, but all DGRs have at least three essential components: a reverse transcriptase (RT), a template repeat (TR) and a target gene (TG) with a variable region (VR) displaying ≈90% sequence identity to the TR (Fig. 3C ). DGRs increase the ability of the host to adapt to changing environmental conditions through a reaction called mutagenic retrohoming, during which the TR RNA is randomly modified by a mutagenic reverse transcription process, and the resulting cDNA, typically with random A-to-N mutations, is inserted into the VR region, replacing the native sequence and creating multiple novel versions of the TG (Medhekar and Miller 2007; Guo et al. 2014).

Several DGRs have been characterized in bacterial genomes (Le Coq and Ghosh 2011; Arambula et al. 2013), but the best-known and understood example of mutagenic retrohoming is that of the DGR of the Bordetella phage BPP-1 (Liu et al. 2002). DGR activity controls phage tropism switching, by generating new variants of the major tropism determinant (Mtd) protein of the tail. This protein is responsible for binding to pertactin, an adhesin on the cell surface of Bordetella species that is expressed only during the virulent Bvg + phase. DGR hypermutation in Mtd therefore facilitates the adaptation of phage tropism to surface modifications in the host bacterium (Liu et al. 2002; Doulatov et al. 2004). The VR is found at the 3′ end of the mtd gene, corresponding to a CLec (C-type lectin) fold consisting of a structural scaffold and a final region in which massive mutations can occur, resulting in functional protein variants (McMahon et al. 2005). Furthermore, the mutagenic retrohoming performed by the BPP-1 phage DGR requires several ancillary elements for efficacy. An accessory variability determinant (avd) gene is involved in the mutagenic reverse transcription reaction, binding both the RT protein and the RNA of the TR (Alayyoubi et al. 2013). Following cDNA synthesis, recognition between TR and VR requires a GC-rich sequence called the initiation of mutagenic homing (IMH) sequence at the 3′ end of the VR, together with a slightly different IMH sequence (IMH*) in TR. A DNA stem-loop structure just downstream from the IMH sequence facilitates IMH-IMH* recognition, ensuring directional retrohoming and, therefore, resulting in the correct insertion of a novel VR in the target gene (Guo et al. 2011; Naorem et al. 2017).

Over the years, research has greatly expanded the number of putative DGRs identified, with the prediction of these retroelements in genomic (Park et al. 2012; Schillinger and Zingler 2012; Nimkulrat et al. 2016; Wu et al. 2018) and metagenomic data (Yan et al. 2019; Roux et al. 2020). Furthermore, bioinformatics tools have been developed to identify and characterize DGRs. These tools include DiGReF (Schillinger et al. 2012), DGRscan (Ye 2014) and MyDGR (Sharifi and Ye 2019). All these reports have progressively revealed the widespread presence of DGRs in prokaryotes and phages and have demonstrated the great variability of their genetic components and their functional diversity. Indeed, many bacterial DGRs have been shown to be encoded by temperate phages inserted into bacterial chromosomes as prophages (Benler et al. 2018). Based on RT sequences, the largest DGR dataset available compiles 32 321 sequences, grouped into 1318 clusters (≥50% identity), including DGRs from phages and prokaryotic organisms, in both genomes and metagenomes (Roux et al. 2020). This survey revealed that DGRs predominate in continually changing environments, in which hypermutation is highly beneficial to the host. In these ecological conditions, continual attempts at the horizontal gene transfer of DGR cassettes are made between phylogenetically distant organisms, enhancing the adaptation of a broad range of biological entities. Furthermore, using non-synonymous single nucleotide variants (SNVs) has been shown that most DGRs (50-75%) analyzed present signs of recent activity, with higher activity levels in phage-associated than in cellular DGRs, in which hypermutation may be induced under stress conditions (Roux et al. 2020).

DGRs present a broad range of cassette architectures, based on the order, number and orientation of their components (Fig. 2 ). For example, there are four classes of accessory genes, with avd the most common (over 70%), but some DGR loci lack this ancillary ORF. Furthermore, DGRs can present multiple target genes (from 2 to 8) and can act in trans (Wu et al. 2018). Despite this modular organization, the target genes typically encode multidomain proteins with the VR located at the C-terminus. These regions are associated only with the C-type lectin fold and with an uncharacterized domain next to Ig-like fold protein sequences (Roux et al. 2020). In addition, VRs share a highly conserved sequence on either side of the fold sequence, whereas the internal sequence is subject to mutagenesis (Wu et al. 2018). The crystal structure of the C-type lectin fold has revealed an unusually large capacity to accommodate massive sequence variation (Handa et al. 2016). This and the conserved bias towards adenine mutation indicate that DGRs are mechanistically limited in terms of how and where they can produce diversity (Wu et al. 2018; Roux et al. 2020). However, the target proteins have also been shown to be highly modular, suggesting that genetic recombination occurs between independent folding domains and a C-terminal C-type lectin fold to generate chimeric targets (Roux et al. 2020). This process may be the evolutionary source of the involvement of DGRs in various functions beneficial for the host. Target genes are currently classified on the basis of the putative functions of the domains outside the VR sequence, mostly involved in protein–protein binding, ligand binding or surface displays activities, suggesting a broad range of potential biological functions, including virulence, virus-host or cell-cell interactions (Fig. 3C ). Most phage target genes encode structural proteins, the variability of which enables phages to overcome bacterial defenses based on cell wall modifications, whereas cellular target genes encode proteins involved in binding extracellular substrates (Roux et al. 2020). A new target function has recently been described, with a group of specific cyanobacterial DGRs able to hypermutate a small pocket in binding domains of multidomain proteins broadly involved in regulatory pathways (Vallota-Eastman et al. 2020). Given the broad array of possible scenarios, further characterization of novel types of DGRs is required to develop a full understanding of the role of these elements in their biological niches.

RTs associated with CRISPR–Cas systems

CRISPR–Cas (Clustered regularly interspaced short palindromic repeats and CRISPR-associated proteins) systems are an adaptive immune system harbored by archaea and bacteria for defense against foreign nucleic acids, mostly MGEs, such as phages, transposons and plasmids (Barrangou et al. 2007). CRISPR-mediated immunity occurs in three stages: adaptation, expression and interference (Van der Oost et al. 2014). However, the mechanism of the immune response is highly variable, thanks to the considerable diversity of CRISPRs, comprising two classes, six types and 33 subtypes (Makarova et al. 2020). The relationship between RTs and these very diverse defense systems was described for the first time in 2008 (Kojima and Kanehisa 2008; Simon and Zimmerly 2008).

Both these studies showed that the reverse transcriptase-coding region was adjacent or fused to the cas1 gene, which is present in most CRISPR–Cas systems. This observation was endorsed by broader studies of RTs linked to these systems (Toro and Nisa-Martínez 2014; Toro, Martínez-Abarca and González-Delgado 2017). Simultaneously, it was shown that Cas1 and Cas2 form an integrase complex responsible for performing the adaptive step of CRISPR-Cas immunity, which involves the insertion of a genetic fragment of the invading agent, known as the protospacer, into the CRISPR array as a new spacer between two repeat sequences, the direct repeats (DRs; Yosef, Goren and Qimron 2012; Nuñez et al. 2014; Jackson et al. 2017). This particular group of RTs may, therefore, be involved in acquiring new spacers from RNA sources.

This hypothesis was recently validated by three independent studies in different systems: the Cas6RTCas1 adaptive operon from Marinomas mediterranea MMB-1 (Silas et al. 2016), the RTCas1 system from Fusicanibacter saccharivorans (Schmidt, Cherepkova and Platt 2018) and the RTCas1 system from Vibrio vulnificus YJ016 (González-Delgado et al. 2019). All three studies showed that RT-CRISPR systems can acquire new spacers directly from RNA in vivo, in an RT-dependent manner, via a mechanism with features common to group II intron retrohoming (Fig. 3D ). It has also been demonstrated that the additional Cas6 domain in the M. mediterranea system participates in CRISPR–RNA (crRNA) biogenesis and is required for RT activity and its regulation (Mohr et al. 2018). This crosstalk between the different components of RT–CRISPR systems has been validated by the cryo-EM structure of a Cas6RTCas1–Cas2 complex from Thiomicrospira (Wang et al. 2021), in which the linking RT helix regulates both RT and Cas1 activities.

In the conditions analyzed, the spacers acquired by the RT-containing adaptive operon in these systems were found to be biased towards highly transcribed regions (Silas et al. 2016; Schmidt, Cherepkova and Platt 2018) or highly abundant rRNAs (González-Delgado et al. 2019), reflecting a correlation between spacer acquisition events and RNA abundance. The systems studied displayed a bias towards the antisense strand of coding sequences in the newly acquired spacers of the CRISPR Array when the adaptive unit was present alone. Most RTs are associated with type III CRISPR-Cas systems (Silas et al. 2017; Toro, Martínez-Abarca and González-Delgado 2017; Toro et al. 2019a), which recognize both DNA and RNA targets for target degradation. This slight preference for the antisense strand during acquisition therefore improves the performance of the interference step of type III systems, thanks to the complementarity between the crRNA and the target transcript (Pyenson and Marraffini 2017). However, the natural sources of the spacers found in CRISPR Arrays associated with RT-containing CRISPR–Cas loci remain to be determined. Due to the limited abundance and distribution of RNA phages, only one natural example of CRISPR immunity to an RNA phage has been described (Wolf et al. 2020), and only a few matches to DNA phage-like sequences have been found (Silas et al. 2017). Different approaches, such as exploring the larger number of metagenomes available from databases, as well and the recent expansion of known ssRNA phage genomes from tens to 15 611 near-complete genomes (Callanan et al. 2020), might help to shed light on this topic and improve our understanding of the biological role of RT–CRISPR–Cas systems in their environmental niche.

The origin and evolutionary relationships of RTs functionally associated with CRISPR-Cas systems have been a matter of debate since the proposal of two models, one suggesting a ‘single-point origin’, in which RTs are domesticated by the CRISPR–Cas locus via the retrotransposition of a group II intron (Silas et al. 2017), and the other postulating ‘multiple origins’, in which RTs related to group II introns have been recruited by CRISPR–Cas adaptation modules several times during evolution (Toro et al. 2018). However, new insight into the ‘multiple origins’ model was provided by a recent analysis of the putative RT sequences in bacterial genomes (Toro et al. 2019a). This study identified a total of 15 clades of CRISPR-RTs: 13 clades closely related to group II introns already reported in the previous analysis (Toro et al. 2018), and two new clades, one evolved from retron/retron-like sequences (clade 14) and the other from RTs related to the RT domain present in Abi-P2 systems (clade 15). Furthermore, novel RT sequences adjacent or fused to an Archaeo-eukaryotic primase (AEP) domain and belonging to clade 12 have been described (Fig. 2 ), suggesting the possibility of cDNA production without the use of a primer to facilitate the acquisition of spacers in these particular systems. All the RTs described in these studies were found to be functionally linked to type III CRISPR–Cas systems, with the exception of those AbiP2-like RTs linked to a type I-C system, but a recent study reported the recruitment of RTCas1 fusion proteins by type VI-A CRISPR–Cas systems (Toro et al. 2019b). Type VI systems are class 2 CRISPR–Cas systems in which a unique multidomain protein is responsible for performing the interference step specifically targeting RNA (East-Seletsky et al. 2016; O'connell 2019). The presence of novel RTs in some type III and VI CRISPR–Cas systems suggests an adaptive advantage of these systems in the RNA world that it would be worth characterizing further in the search for unusual mechanisms of spacer acquisition that would help us to understand CRISPR biology.

RTs involved in Abortive bacteriophage infection (Abi) systems

Abi systems are a prokaryotic defense mechanism against bacteriophages in which the infection cycle of the virus is blocked, by halting host metabolism or driving the cell to its death. In this way, Abi systems prevent phage multiplication, thereby protecting the rest of the population (Bernheim and Sorek 2020). A tremendous variety of Abi genes has been described, with about 20 different Abi systems present in Lactococcus spp. alone. Another indicator of the heterogeneity of Abi systems is that three such systems contain an RT domain: AbiA, AbiK and Abi-P2 (Fortier, Bouchard and Moineau 2005; Odegrip, Nilsson and Haggård-Ljungquist 2006; Durmaz and Klaenhammer 2007). However, the role of the putative reverse transcriptase activity in these systems remains unclear. Only in the case of AbiK has been shown that the RT domain per se is sufficient to confer phage resistance (Fortier, Bouchard and Moineau 2005). For AbiA and Abi-P2, it remains unclear whether other genetic elements present at the same locus are required for defense activity.

Each of these systems has a different mode of operation, but the N-terminal RT domains of Abi proteins have several features in common, including the conservation of a potentially active site with a Y(R/V)DD sequence and the absence of the RT motif 7 (Simon and Zimmerly 2008; Toro and Nisa-Martínez 2014). A C-terminal domain of variable length and uncharacterized function fused to this domain is also present in all types of RT-based Abi systems (Fig. 2 ). It has been suggested that the AbiA C-terminal domain is a novel version of the higher eukaryote and prokaryote nucleotide-binding (HEPN) domain (Anantharaman et al. 2013). In many defense systems, including restriction-modification (R-M), toxin-antitoxin (TA), CRISPR–Cas and other Abi systems, HEPN domains play a key role due their RNAse activity, which may directly attack viral RNAs or induce host suicide or dormancy by attacking self-RNAs. The presence of RT and HEPN domains in AbiA suggests that phage multiplication may be inhibited through interaction between phage-encoded proteins and a DNA molecule covalently linked to the RT, with the HEPN domain degrading host RNA, driving the cell to suicide and protecting the surrounding community (Fig. 3E ). Indeed, the best-characterized AbiA protein, which is found in a lactococcal plasmid (Hill, Pierce and Klaenhammer 1989), is able to stop phage replication. Furthermore, a mutant in a phage recombinase have been shown to be insensitive to AbiA-mediated immunity (Dinsmore and Klaenhammer 1997), suggesting that this recombinase could trigger AbiA. Loci containing AbiA sequences have been shown to confer immunity to a wide set of lactococcal phages (Hill, Miller and Klaenhammer 1990; Tangney and Fitzgerald 2002a). The lactococcal AbiA also displays versatile heterologous activity, conferring resistance to several phages in a Streptococcus thermophilus strain (Tangney and Fitzgerald 2002b).

The best-known RT-based Abi system is AbiK, which has also been found in a plasmid native to a lactococcal strain (Emond et al. 1997). In addition to AbiA, this system provides resistance to a broad range of lactococcal phages (936, c2 and P335), reducing infectivity by six orders of magnitude (Fortier, Bouchard and Moineau 2005). The AbiK protein has a DNA polymerase activity divergent from that of canonical RTs and acts like a terminal transferase, polymerizing DNAs of random sequence (Wang et al. 2011). Moreover, the synthesized product remains covalently attached to the enzyme, possibly via a hydroxide group on a tyrosine located in the C-terminal domain, which could act as the primer for a reaction that would be similar to that observed in hepadnavirus self-priming (Wang and Seeger 1992). Conversely, studies of phage mutants able to escape AbiK have identified different viral proteins sensitive to AbiK-mediated immunity. All these proteins, denoted Sak (sensitivity to AbiK), participate in the phage replication process (Ploquin et al. 2008; Lopes et al. 2010; Scaltriti et al. 2010; Scaltriti et al. 2011). In this way, the AbiK system works like AbiA, preventing phage maturation by a functional interaction with Sak proteins and provoking cell death by an unknown mechanism (Fig. 3E ).

Unlike the previously described systems, Abi-P2 is encoded in a highly variable region of several P2 prophages present in different E. coli strains, with a higher AT content than the host genome, suggesting that this region may have been acquired by HGT (Odegrip, Nilsson and Haggård-Ljungquist 2006). Furthermore, Abi-P2 displays reverse transcriptase activity and confers resistance to the T5 phage by decreasing the plating efficiency of this phage by more than seven orders of magnitude (Fig. 3E ). Moreover, deletion of the region of the gene containing the sequence encoding the putative active site of Abi-P2 (YRDD) abolishes resistance to the phage. A recent genome survey analysis enlarged the number of known RT-based Abi systems by an order of magnitude, with Abi-P2 accounting for 75% of these systems (Toro et al. 2019a). Non-redundant homologs of AbiA, AbiK and Abi-P2 have been found in many bacterial phyla and are particularly diverse in the phylum Firmicutes. By contrast, no RT-containing Abi systems has ever been found in Archaea. With the increasing number of examples identified, research in this field merits a new impetus, to shed light on the role of these Abi-like RTs in phage resistance.

RTs from unknown groups (UG)

Over the years, novel uncharacterized RTs more distantly related to group II intron RTs and clustering in the so-called ‘unknown groups’ have emerged in analyses of increasing numbers of RT sequences (Kojima and Kanehisa 2008; Simon and Zimmerly 2008; Toro and Nisa-Martínez 2014). There are now 28 different UGs covering 10% of prokaryotic RT diversity (Toro et al. 2019a) showing considerable sequence diversity. For example, UG1 and UG5 RTs have a C-terminal nitrilase domain (Simon and Zimmerly 2008), whereas, in UG6, this domain is located further downstream in an adjacent ORF (Gao et al. 2020). UG3 and UG8 represent a unique case in which RT sequences are always located next to each other, suggesting that they act as a functional unit (Fig. 2 ; Kojima and Kanehisa 2008).

RTs from UG groups have recently been found in defense islands and have been experimentally validated as new anti-phage systems known as Defense-associated RTs (DRTs), in which immunity is dependent on the RT domain (Gao et al. 2020). In DRT type 1 (UG1) both the C-terminal nitrilase domain and a small membrane protein are required for defense (Fig. 3F ). In DRT type 3, corresponding to UG3 and UG8, a structured ncRNA downstream from the UG8 gene is required for immunity. By contrast, in DRT type 2 (UG2), type 4 (UG15) and type 5 (UG16), the RT alone is sufficient for resistance against phages. However, the mechanism of action of these novel systems remains undeciphered, and additional experimental data are required to determine how UG-RTs provide protection against different types of phages. Moreover, not all UG groups have been tested, and only some of those investigated display antiviral properties (Gao et al. 2020). The characterization of additional types of UG-RTs is therefore required, for the discovery of new DRT systems.

Group II-like (G2L) uncharacterized RTs

In addition to the RT lineages described above, novel phylogenetic groups may arise through different types of domestication of ancient retroelements, the function of which remains to be elucidated. The RTs of one of these groups are most closely related phylogenetically to group II introns and were therefore named ‘group II-like’ or ‘G2L’ RTs. The main difference between the RTs of this group and those classified as group II introns is the absence of the characteristic intronic RNA structure in the G2L group (Simon and Zimmerly 2008). A previous analysis of the prokaryotic reverse transcriptase landscape (Toro et al. 2019a) led to an increase in the number of identified G2L-type sequences, with these sequences being found to branch off from a node common to CRISPR-associated RTs. There is a close relationship between G2L and CRISPR-RTs, and it should be pointed out that two groups previously classified as G2L (G2L1 and G2L2) have since been reclassified as new clades of RT-CRISPR systems (clade4/6 and clade 9), suggesting that both RT lineages may have evolved from a common ancestral group II intron RT. Moreover, the previously named G2L3 (Simon and Zimmerly 2008) was noticed to be phylogenetically separated from the G2L lineage (Toro and Nisa-Martínez 2014) and more recently, this monophyletic group has been reclassified as a novel UG group named as UG17 (Toro et al. 2019a). The G2L lineage has been greatly expanded and comprises RT sequences clustered within the G2L4 and G2L5 groups (Simon and Zimmerly 2008) and four additional clades (Toro et al. 2019a).

In this way, G2L may constitute an evolutionary record of an intermediate state between the autonomous mobilization of group II intron RTs and the domestication of these RTs for the performance of useful cellular functions. This hypothesis was tested by analyzing the genomic neighborhood (Mestre et al. 2020) of G2L sequences, to search for possible evidence of emerging functional associations (see Supplementary Methods). Interestingly, a particular subgroup of G2L RTs (cluster 4; Toro et al. 2019a) appears to be embedded in a multicomponent system composed of the products of the queACDE genes involved in the queuosine biosynthesis pathway (Thiaville et al. 2016), two different glycosyltransferases, the 5′-3′ exonuclease domain of a family A DNA polymerase, a GlcNAc-PI de-N-acetylase (LmbE), a phosphoribosyltransferase, YaaA (DUF328), which is a DNA-binding protein with a novel fold (Prahlad et al. 2020), and a FGE sulfatase with a C-lec fold (Fig. 4 and Figure S1, Supporting Information). QueC was recently reported to be a component of novel defense systems (Gao et al. 2020), which suggest that this genomic cassette may also be involved in immunity functions. Moreover, 25% of known G2L-RT sequences were located close to known defense system sequences (Fig. 4 ). Thus, functional associations between G2L and neighboring genes could generate specialized systems with antiviral activities worthy of further investigation.

An external file that holds a picture, illustration, etc. Object name is fuab025fig4.jpg

Discovery of putative functional associations of G2L-RTs. Representative instances of RTs from G2L group found in defense islands. Some of these RTs are embedded in a multicomponent system (colored in purple), comprised by QueC, QueD, QueE (queuosine biosynthesis pathway), FGE-s (FGE sulfatase), YaaA (DNA-binding protein), GTF (Glycosyltransferase), RSE (Radical Sam Enzyme), PRT (phosphoribosyltransferase), Nuclease (Flap Endonuclease), PIG-L (GlcNAc-PI de-N-acetylase) and GTP-c (GTP cyclohydrolase). Genes known to be involved in defense are shown in yellow: CRISPR–Cas system type I-B, Type I RM (Restricton-Modification), TA (Toxin-Antitoxin), Wadjet, Septu and Gabija are recently described defense systems (Doron et al. 2018). The genes encoded by a conjugative transposon are shown in dark grey. The encoding strain, the accession number and the genomic coordinates and the protein ID on NCBI or PATRIC databases are indicated to the left.

Differential distribution of RTs in bacteria and archaea

The most exhaustive phylogenetic analysis of bacterial RTs used 198 760 predicted RT proteins and addressed the distribution of RTs, mostly in bacteria (Toro et al. 2019a). The classification of the final dataset of 9141 non-redundant RTs supported the classification of most RTs into the three main groups: group-II introns, the largest group, accounting for 47% of RTs, retron/retron-like sequences (25%) and DGRs (12%; Fig. 5A and B ). The remaining 16% of RTs clustered into distinct groups including RTs previously reported to be linked to CRISPR–Cas systems, group 2-like (G2L), Abi-like or UG (unknown) groups. These data are consistent with previously reported findings (Kojima and Kanehisa 2008; Simon and Zimmerly 2008; Toro and Nisa-Martínez 2014).

An external file that holds a picture, illustration, etc. Object name is fuab025fig5.jpg

Distribution of prokaryotic RTs. (A), Phylogeny of bacterial RTs. The unrooted trees were constructed from an alignment of 9141 sequences. The tree newick files are provided in Toro et al. (2019) and the redrawn tree is reprinted by permission of the publisher (Taylor & Francis Ltd, http://www.tandfonline.com). The branches of each RT type are indicated and highlighted with distinct colors. RTs associated with CRISPR–Cas systems with a retron (clade 14) or Abi-P2 (clade 15) origin are indicated by green points and black arrows. (B), Relative abundance of the different RT groups in bacterial and archaeal genomes (C), Phylogeny of archaeal RTs. The unrooted trees were constructed from an alignment of 411 sequences. The tree newick files is provided in Supplementary File 1. (D), RT distribution in the main bacterial phyla. (E), RT distribution in the main archaeal phyla. Charts showing the proportions of RTs corresponding to group II introns, retrons, DGRs, RTs associated with CRISPR–Cas systems (RT–CRISPR), group II-like RTs (G2L), RT-based Abi systems (Abi), unknown groups (UG) and unclassified RTs. In panels (B, D and E), the number of unique RT sequences is indicated after the name of each domain/phylum.

Far fewer RT sequences have been identified in archaea than in bacteria, because of the smaller number of complete genome sequences available in databases (i.e. more than 3 × 10 5 bacterial genome sequences versus just under 5 × 10 3 archaeal genome sequences in the PATRIC database; Supplementary Methods). A more detailed analysis of archaeal RTs revealed that DGRs accounted for the largest proportion (30% of total archaeal RTs), contrasting with the situation in bacteria (with only 12%). The opposite pattern was observed for retrons, which account for only 4% of total archaeal RTs (Fig. 5B and C ; Supplementary File 1). No RTs from the G2L or Abi-like groups have been found in archaea. The remaining RT groups account for similar proportions of the RTs in archaea and bacteria, with group II introns the most abundant (52%), which are probably proliferating in archaea even though they were originally acquired from bacteria (Fig. 5B ; Toro 2003).

In bacteria, the Flavobacteria–Chlorobi–Bacteroidetes group (FCB), candidate phyla radiation (CPR) and Cyanobacteria have high rates of non-redundant RTs relative to the number of sequenced genomes for these phyla (Figure S2A, Supporting Information). Group II introns are, predictably, prevalent in most bacterial phyla, but retrons predominate in Proteobacteria, whereas DGRs account for about 80% of total RTs in CPR, the strongest bias towards a specific RT group in bacteria (Fig. 5D ). Looked at the other way (distribution of bacterial phyla for particular RT groups), Proteobacteria is the predominant phylum in most RT lineages, whereas, Firmicutes account for a large proportion of group II introns and Abi-like RTs. Similarly, the CPR group contains about 40% of all singular DGRs (Figure S2B, Supporting Information). These data support the notion that some bacterial phyla have recruited specific groups of RTs to improve their responses to the ecological conditions in their natural environments.

As in bacteria, in most phyla and other taxonomic groups of archaea there is a correlation between the total number of genomes sequenced and the total number of RTs found in the group concerned, except for the TACK group (the Thaumarchaeota, Aigararchaeota, Crenarchaeota and Korarchaeota phyla), which accounts for 25% of total archaeal genomes but only 7% of non-redundant RT sequences (Figure S2C, Supporting Information). Group II introns predominate in Euryarchaeota, the TACK group and the Asgard group (55–75% of RTs). However, a different scenario emerges for the DPANN group (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota and Nanohaloarchaeota) and uncultured/unclassified archaea, in which DGRs account for about 80% of RTs (Fig. 5E ). Regarding the dominance of phyla for every RT group, it is worth to highlight that 90% of all DGRs were found in DPANN group and unclassified archaea (Figure S2D, Supporting Information).

The abundance of DGRs in the DPANN group and in the bacterial CPR group has been described (Paul et al. 2017; Roux et al. 2020), but we show here that DGRs are, by far, the most abundant type of RT in these phylogenetic groups. Curiously, the DPANN group is the most divergent of all archaeal groups, resembling the situation of the CPR group in bacteria (Castelle et al. 2018). Indeed, these groups have several characteristics in common and differ from other prokaryotic organisms in having small cell and genome sizes, in their episymbiotic relationships with other organisms and their limited metabolic capacities. In this context, the high abundance of DGRs in the DPANN and CPR groups may reflect a role of RTs of this type in the adaptation of these organisms to their biological niches through changes in membrane proteins facilitating interactions with the host (Castelle et al. 2018).

Biotechnological applications of prokaryotic RTs

RTs, particularly those from eukaryotes and viruses, have been exploited as biotechnological tools in a wide range of fields (Martín-Alonso, Frutos-Beltrán and Menéndez-Arias 2020). The unique properties of several prokaryotic RTs groups have enabled their use in a wide variety of genome editing applications, including gene targeting, high-throughput site-directed mutagenesis technologies, libraries of highly mutagenized genes or with high-throughput transcriptional characterization purposes (Table 1 ).

Table 1.

Biotechnological applications of different prokaryotic RTs.

ApplicationsTechnologyRT typeGeneral descriptionReferences
Specific gene knockout-knockdown-deletionTargetronGroup II intronsThe specificity of the IBS–EBS interactions was harnessed to target introns to preprogrammed positions in a wide variety of bacterial genomes.Zhuang et al. (2009); García-Rodriguez et al. (2014); Gwee et al. (2019); Wen et al. (2020)
Targetron plus CRISPR-Cas9 counter selectionGroup II intronsCRISPR/Cas9 counterselection increased the chances of finding clones that integrated the intron into the target lacZ sequence in a recombination-independent fashion.Velázquez, Lorenzo and Al-Ramahi (2019)
GETRGroup II intronsGroup II introns deliver new lox sites allowing the recombinase Cre to produce insertions, inversions and deletions and one-step cut-and-paste operations.Enyeart et al. (2013)
Antisense cDNA gene regulationRetronsRetron engineered to produce a msDNA which contains an antisense cDNA to knockdown a target gene.Mao et al. (1995)
High-troughput site directed mutagenesisSCRIBERetronsRetron engineered to produce a msDNA with a desire sequence to modify a target region after recombination which acts as a genomic ‘tape recorder’ of the presence of regulatory signals at population level.Farzadfard and Lu (2014)
Continuous gene evolutionRetronsExpression of retron under an error-prone RNA polymerase that generates random mutations in the msd region which later is introduced in the desired loci.Simon, Morrow and Ellington (2018)
HiSCRIBERetronsEfficient retron recombineering to recording transient spatial information and continuous genome evolution.Farzadfard et al. (2020)
RLRRetronsOptimization of retron-based genome editing to prepare pooled, barcoded mutant libraries for high-throughput screens of allelic variants and uniquely suit to utilize non-designed sources of variation.Schubert et al. (2021)
Multiplex gene editingRetronsCombination of retron and CRISPR-Cas9 to enable multiplex gene editing in bacteria.Lim et al. (2020)
CRISPEYRetronsRetron homology-direct reparation of CRISPR-Cas9 double-strand breaks in yeast.Sharon et al. (2018)
Phage therapyVariants of tropism proteinsDGRsContinuous target evolution through mutagenic retrohoming reaction carried out by Bordetella-phage1 DGR to create variants of tropism proteins that bind T4 lysozyme.Yuan et al. (2013)
High-trouhput transcriptional technologiesTIGRTGroup II intronsThermostable properties of RTs used in different high-throughput RNA characterization purposes.Mohr et al. (2013)
Marathon RTGroup II intronsUltraprocessive and accurate properties of E. rectale Group II intron RT used in different high-throughput RNA characterization purposes.Zhao, Liu and Pyle (2018)
Record-SeqCRISPR-RTsUsing RTCas1–Cas2 integrase complex to storage transcriptional information into CRISPR Array as DNA, describing specific and complex cellular behaviours assessing the cumulative gene expression.Schmidt, Cherepkova and Platt (2018); Tanna et al. (2020)

For gene targeting the specific interactions between intron and exon-binding sites (IBS–EBS) required for group II intron retromobility was harnessed to target introns to preprogrammed positions (Guo et al. 2000; Mohr et al. 2000; Karberg et al. 2001; Enyeart et al. 2014). Thus, group II introns have been exploited as the first RNA-guided gene targeting tools in a wide variety of bacterial genomes in the so-called targetron knockout technology (Table 1 ). With an algorithm designed to identify optimal matches, targetrons are programmed to insert into desired genes by modifying the sequence elements in the intron RNA to base-pair to the IBS in the DNA target site (Zhuang et al. 2009; García-Rodríguez et al. 2014; Gwee et al. 2019; Wen et al. 2020). The coupling of this technique with CRISPR-Cas9 counterselection has recently increased the chances of finding clones in which the intron is integrated into the target sequence in a recombination-independent fashion (Velázquez, Lorenzo and Al-Ramahi 2019). Group II introns have also been programmed to deliver lox sites which are used by Cre recombinase to generate insertions, inversions, deletions and one-step cut-and-paste operations (Table 1 ; Enyeart et al. 2013). On the other hand, the ability of retrons to produce msDNA at high copy number has allowed the use of these elements with gene silencing purposes (Table 1 ; Mao et al. 1995; Simon, Ellington and Finkelstein 2019). This approach is based on the fact that the msd region can be modified with a random sequence without disturbing the production of msDNA. Thus, retrons could be engineered to synthesize specific DNA molecules that works as antisense cDNAs to knockdown the mRNA levels of a target gene (Mao et al. 1995).

The particularities of retrons have also been used as an interesting alternative in high-throughput directed mutagenesis applications (Simon, Ellington and Finkelstein 2019). The first approach, referred to as Synthetic Cellular Recorders Integrating Biological Events (SCRIBE), was based on engineered variants of Retron-Eco1 with a modified msd region (Table 1 ; Farzadfard and Lu 2014). The production of the modified msDNA, which is inducible by arbitrary transcriptional signals (light and chemical inducers), acts as a genomic ‘tape recorder’ of the presence of these signals at population level. To increase efficiency, the retron unit is co-expressed in combination with the Beta recombinase of bacteriophage λ, leading to the recombination of the modified msDNA into a specific DNA locus in a magnitude- and duration-dependent way of the signals (Farzadfard and Lu 2014). However, due to the moderate recombination level (∼10 −4 recombination events per generation) observed in this system, several works have studied the factors that limits retron recombineering (Simon, Morrow and Ellington 2018; Farzadfard et al. 2020; Schubert et al. 2020). The first improvements in efficiency were carried out by optimizing the promoter and by knocking out genes that reduces the accumulation of msDNA, specifically the mismatch repair protein MutS and the single-stranded exonuclease ExoX (Simon, Morrow and Ellington 2018). This system yields recombination efficiencies 2 order of magnitude higher and allows continuous evolution of target loci by expressing the retron unit under the control of an error-prone RNA polymerase, which generates random mutations in the msd region that are introduced into the target DNA (Simon, Morrow and Ellington 2018). Furthermore, the deletion or silencing of other host factors, mainly exonucleases such as recJ and sbcB (xonA), significantly increased the recombination frequency achieving >99% editing efficiency (Table 1 ; Farzadfard et al. 2020; Schubert et al. 2021). Both works circumvent some limitations of previous genome-editing technologies in bacteria, including the need to electroporate with mutagenic oligonucleotides or the requirements of specific cis-encoded elements (i.e. PAM motif in CRISPR–Cas9 counterselection). Thus, Farzadfard and colleagues (2020) introduce HiSCRIBE (High-efficency SCRIBE) and demonstrate their use in recording transient spatial information and continuous genome evolution. On the other hand, Schubert and collaborators (2021) create RLR (Retron Recombination Library) useful to prepare pooled, barcoded mutant libraries for high-throughput screens of allelic variants and uniquely suit to utilize non-designed sources of variation. Interestingly, a recent work has also shown that modifications in the ncRNA structure and in the architecture of the retron operon lead to a more efficient reverse transcription reaction and, thus, in a higher retron recombineering efficiency in bacteria, yeast and human cells (López et al. 2021). Moreover, retron recombineering technologies have also been used in combination with CRISPR–Cas systems for multiplex gene editing in bacteria (Lim et al. 2020) and eukaryotes (Sharon et al. 2018). This last study uses a chimeric RNA comprised by a modified retron ncRNA joined to a CRISPR guide in a technology called Cas9 Retron precISE Parallel Editing via homologY (CRISPEY) ideal to identify causal variants for polygenic traits (Table 1 ).

Continuous target evolution could also be performed using DGRs, thanks to their ability to generate multiple variants of a protein domain, which can be useful in biotechnological applications such as in phage therapy. However, only one preliminary study has been able to demonstrate the ability of the phage BPP-1 DGR to create variants of tropism proteins likely to bind the T4 lysozyme (Table 1 ; Yuan et al. 2013). The exploration of novel DGRs with advantageous properties, such as thermostability (Handa, Shaw and Ghosh 2019), could provide promising new tools. In addition, characterization of the mutagenic retrohoming mechanism has revealed that the RT and Avd proteins of the BPP-1 DGR work as a complex and that both are required for cDNA synthesis from both DGR and non-DGR templates with an oligodeoxynucleotide (ODN) primer (Handa et al. 2018). cDNAs synthesized from non-DGR templates also present adenine mutations, revealing this to be an intrinsic feature of the RT–Avd complex. This capacity can be used to create libraries of hypermutated cDNAs, to address the issue of sequence variability when searching for protein variants with significantly higher levels of activity than the native protein. Theoretically, mutagenic retrohoming is the biological process capable of creating the highest levels of sequence variability, potentially about 10 30 protein variants, several orders of magnitude more than is possible with eukaryotic systems (Wu et al. 2018). These observations show that DGRs are a powerful potential tool for protein engineering, for which detailed studies are required to determine their usefulness.

Prokaryotic RTs have also been exploited for different high-throughput RNA characterization purposes (Martín-Alonso, Frutos-Beltrán and Menéndez-Arias 2020). The features of specific Group II intron-RTs, such as some thermostable group II intron RTs (TGIRTs; Mohr et al. 2013) and other from Eubacterium rectale (Marathon RT; Zhao and Pyle 2018), which present ultraprocessive and accurate properties, have been used in RNA-seq and epitranscriptomics technologies. Interestingly, the RT-containing CRISPR loci have also been used with similar applications to record transcriptional events (Schmidt, Cherepkova and Platt 2018), also expanding the CRISPR–Cas toolbox (Jinek et al. 2012; Komor, Badran and Liu 2017; Pickar-Oliver and Gersbach 2019; Vigouroux and Bikard 2020). The adaptive operon of F. saccharivorans, a complex formed by the RTCas1 fusion protein and Cas2, has been used to develop Record-seq technology, which records transcriptional events in a CRISPR array as DNA, describing specific and complex cellular behaviors and assessing cumulative gene expression (Table 1 ; Schmidt, Cherepkova and Platt 2018; Tanna et al. 2020). However, the use of the F. saccharivorans system as an RNA-recording tool presents the disadvantage of apparent skewing towards AT-rich regions at the ends of transcripts. It will be important to analyze more CRISPR–Cas system harboring RTs, such as the V. vulnificus RTCas1–Cas2A–Cas2B system, which can acquire spacers regardless of their ‘GC’ content and from any point in the coding sequence (González-Delgado et al. 2019), to overcome this limitation. Record-Seq-derived methods could also be used to improve current technologies for archiving real data for populations of living cells (Shipman et al. 2017), thanks to the ability of systems of this type to store highly transcribed regions potentially containing the information of interest. Moreover, these systems could be used as highly scalable bacterial biosensors to report on gut function (Tanna, Ramachanderan and Platt 2021). In addition, the characterization of singular RTs linked to CRISPR-Cas systems could provide useful tools, such as groups of RTs fused to a putative primase (AEP) domain that might not require priming for cDNA reactions.

On the other hand, the recent finding that some retrons systems can constitute novel tripartite TA systems (Gao et al. 2020; Mestre et al. 2020; Millman et al. 2020; Bobonis et al. 2020a, b) containing an effector associated protein endowed with anti-phage activity could also lead to new biotechnological applications. For example, retrons could be engineered for their use in phage therapy or as riboswitches, applying certain conditions that trigger the desired response. Thus, the potential of these elements as biotechnological tools is only just beginning to be understood. Furthermore, the large RT proteins involved in Abi systems could also provide novel biotechnological tools, thanks to the enzymatic diversity provided by the domains fused to the C-terminus of the RT domain. These natural fusions are worth characterizing, particularly for AbiA, in which polymerase and putative RNAse activities are combined in a single protein that may function differently from the RTs of retroviruses. Similarly, the properties of other RTs from other groups, such as the UG and G2L groups, some of which are involved in antiviral defenses (Gao et al. 2020), and those found in Archaea could supply novel functions for use in the existing or new fields. We are currently witnessing the first uses of prokaryotic RTs in a broad range of biotechnology applications.

CONCLUDING REMARKS

This review describes how prokaryotic organisms have domesticated the different types of RTs to perform defense functions against phages and other MGEs. The outstanding research performed in the last few years has not only resolved the 30-year-old enigma of the biological role of retrons (Gao et al. 2020; Millman et al. 2020; Bobonis et al. 2020a, b), but also revealed why the presence of an RT is advantageous for some types of CRISPR–Cas systems (Silas et al. 2016; Toro et al. 2019b), and has shown that some uncharacterized groups of RTs confer resistance to a wide range of phages (Gao et al. 2020). However, several key questions remain unanswered. From an evolutionary point of view, it would be worthwhile improving our understanding of how group II introns evolved in their host genome and gave rise to other types of prokaryotic RTs. Interestingly, defense systems and MGEs have several characteristics in common, including being the only two functional categories of genes with a negative selection coefficient (Iranzo et al. 2017), implying that they can be deleterious to the host. These parallel evolutionary histories could provide the necessary substrate for mobile genetic elements and defense systems to become associated on multiple occasions throughout evolution, as demonstrated for CRISPR–Cas systems (Koonin and Makarova 2019). It would be reasonable to hypothesize that group II introns may have been the source of other domesticated RTs for various defense systems. Research on RTs phylogenetically related to group II introns, such as the G2L lineage, could shed light on this issue.

With our current knowledge of RTs, we are only just beginning to understand the impact of these enzymes in prokaryotes. Exploring new groups of recently expanded group II introns (Toro et al. 2019a) may shed light on how these selfish elements achieve a balance between their inherent need to spread and their ability to control MGEs, resulting in benefit to the host. The recently discovered anti-phage activity of retrons has opened up a whole new avenue of research, extending from clarification of the stages of retron-mediated immunity to the way in which phages overcome this defensive barrier. Given the modularity and diversity of retron systems (Mestre et al. 2020), a number of completely different biological mechanisms may be involved. A parallel track could be followed for UGs, in which computational analysis may also reveal the existence of more diversity, potentially associated with other proteins or ncRNAs and constituting novel DRTs. Interestingly, phylogenetic analysis (Toro and Nisa-Martínez 2014; Toro et al. 2019a) has shown that the Abi-RTs lie among UG-RTs, suggesting that DRTs may confer resistance similarly to Abi-RTs, leading the host cell to commit suicide. The presence of an RT domain provides an adaptive advantage to RNA-targeting CRISPR–Cas systems (types III and VI; Silas et al. 2016; González-Delgado et al. 2019), but the origin of the spacers in the CRISPR arrays of systems carrying RTs and the structural basis of the mechanism of RNA spacer integration into the array remain a mystery. Further studies are therefore required to improve our understanding of type III and VI CRISPR biology. Finally, in addition to their known functions, the other possible roles of DGRs should be studied, to shed light on the ways in which mutagenic reverse transcription helps the host to adapt to changing environmental conditions, particularly for CPR bacteria and DPANN archaea, in which DGRs are particularly abundant (Paul et al. 2017; Roux et al. 2020).

The recently proposed ‘pan-immune system’ model suggests that, even if a particular strain does not have genes encoding all types of defense systems, the pan-genome of a mixed population of strains potentially encodes a battery of defense systems that protects the whole population (Bernheim and Sorek 2019). According to the implications of this model, it would be worthwhile analyzing the possible co-occurrence of RT-containing defense systems with other immunity systems, to check for potential crosstalk between antiviral mechanisms. Such studies would help to unravel the complexity of prokaryotic immunity.

The answers to all these questions would have implications for the biotechnological applications of RT-containing systems in new fields, such as phage therapy. It might become possible to design phages or molecules triggering cell death mediated by a retron toxin or by DRT/Abi immunity. Furthermore, natural or engineered phages could be used to overcome RT-mediated immunity to kill a particular bacterial strain with precision. Moreover, the biochemical characterization of novel RTs may lead to the discovery of new functions useful in high-throughput RNA technologies, genome editing or continuous gene evolution, among others. This survey should facilitate the experimental characterization of novel RT groups and promote additional lines of research leading to a better understanding of the roles of these prokaryotic RTs and future applications.