Origin and Diversity of Plant Receptor-Like Kinases

Because of their high level of diversity and complex evolutionary histories, most studies on plant receptor-like kinase subfamilies have focused on their kinase domains. With the large amount of genome sequence data available today, particularly on basal land plants and Charophyta, more attention should be paid to primary events that shaped the diversity of the RLK gene family. We thus focus on the motifs and domains found in association with kinase domains to illustrate their origin, organization, and evolutionary dynamics. We discuss when these different domain associations first occurred and how they evolved, based on a literature review complemented by some of our unpublished results.


INTRODUCTION
Protein phosphorylation is a biochemical posttranslational modification involved in all signaling pathways and cellular activities in living organisms. It influences cell signaling networks in response to cellular or environmental stimulation, e.g., by the reversible regulation of protein functions through their activation or inactivation, the formation of protein complexes, and the arrangement of subcellular protein locations (31). The phosphorylation process, involving the transfer of a phosphoryl group from adenosine triphosphate (ATP) (or other nucleoside phosphates) to an acceptor hydroxyl residue of the protein substrate, is one of the roles ascribed to well-conserved protein kinase domains (48). Note that serine/threonine (Ser/Thr) phosphorylation is the most common phosphorylation event. Besides their role in this catalytic function, kinase domains also facilitate the binding and orientation of the ATP phosphate donor and protein substrate.
Networks of protein kinases involved in substrate phosphorylation are highly complex and consist of hundreds of proteins, making them one of the largest protein families (95). In 2002, Manning and coauthors (97) proposed the term kinome to encompass the complete set of protein kinases encoded in a genome. The kinome of the first plant genome sequenced in 2000 (Arabidopsis thaliana) included more than 1,000 proteins (2). Several years later, with more genomes available, the same analysis revealed that the plant kinome superfamily represented ∼1-4% of the protein coding genes in 16 genomes (77). In plants, among the variety of protein kinases, the largest subfamily, besides those with functional homologs in other kingdoms (∼400 proteins), consists of singular proteins with structural organization resembling that of receptor tyrosine kinases (RTKs) in animals. In 1993, Walker (153) introduced the term receptor-like kinases (RLKs) to account for some of the new genes he had cloned. This terminology is now used widely to define the whole subfamily of these plant receptors. Recent publications have also begun to name those that have a well-defined receptor function receptor kinases (RKs).
The first to promote the concept of receptor in the late nineteenth and early twentieth centuries were the immunologist Ehrlich and the physiologist Langley (94). Working on toxins and ferments, Ehrlich suggested the existence of molecules in body cells-initially referred to as side-chain and then receptor molecules-that need to fix substances in order to be biologically active (126). In parallel, another receptor theory introduced by Langley to explain antagonistic drug action in the body proposed the existence of receptive substances with chemical affinities for specific compounds (93). The questioning and controversy about the receptor theory ended in 1948 when the pharmacologist Ahlquist (1) published a paper describing two types of adrenergic receptors. The fundamental nature and structure of these receptors remained hypothetical until the 1980s. At this time, the first structurally related receptors from the RTK family were cloned. They are specific cell surface receptors for several growth factors, such as epidermal growth factor (EGF) or insulin, and their protein kinase activity was found to be intrinsic to receptor polypeptides and not part of a separate effector system (39,145,146). All of these proteins contained a large glycosylated extracellular ligand-binding domain (ECD), a hydrophobic domain that appeared to constitute a typical transmembrane (TM) domain, and an internal kinase catalytic domain (118). In plants, the first RLK, which is structurally similar to these RTKs, was discovered only 5 years later in 1990 (154).

A SHORT HISTORY OF THE DISCOVERY AND DIVERSITY OF RECEPTOR-LIKE KINASE SUBFAMILIES
It was assumed for decades that most cell-to-cell communication in plants would occur via cytoplasmic bridges called plasmodesmata because of the presence of cell walls separating plant cells from one another. Publication of the paper by Walker & Zhang (154) was thus a landmark event since it introduced a new paradigm for the mechanism by which cells communicate in plants. Although structurally resembling animal RTKs, plant RLK subfamilies underwent highly complex evolutionary histories due to considerable gene expansions (an increased number of genes per genome) coupled with diversifications (an increased number of specific combinations of domains) (57,77,79). The large number of domains found associated with kinase domains in these RLKs reflects the wide array of all cell communications that plants must handle within themselves and with the outside world, e.g., as pattern recognition receptors during immunity (26,51,139).

Receptor-Like Kinases with Diverse Extracellular Domains
In this section, we pay tribute to the pioneering researchers who discovered the first members of each of the new structural organizations observed in these receptors. Through them, the golden age for the discovery of hundreds of other RLKs in plants and the elucidation of their functions began (Figure 1). Kinase   Thaumatin   Kinase   Kinase   LRR   Kinase   PAN  TM   TM   TM   TM   TM   TM   TM   TM   TM   TM   TM   TM   TM   TM  Ser/Thr kinase. This kinase domain would be intracellular and located downstream of the TM domain. Moreover, in the Ser/Thr kinase subfamily, the closest relative to ZmPK1 would be the human c-raf1 kinase (11), a proto-oncogene related to rapidly accelerated fibrosarcoma (RAF) Ser/Thr kinases, which were initially encountered in viruses (114,165). The S-domain, in the extracellular part of the protein, has been subdivided into three subdomains: The first, the bulbtype lectin (B-LEC) domain, contains a highly conserved 14-residue stretch, and the third, the cysteine-rich (or PAN) domain, comprises 10-13 cysteine residues clustered near the TM domain (112,142). Unexpectedly, besides these remarkable residues, the whole ECD showed 52% identity with the ECD of an S-locus-specific glycoprotein (SLG) of Brassica (104). These SLG genes were known to be among the products of the S-locus that controls the pollen-stigma interaction of self-incompatibility in the Brassicaceae mustard family. Moreover, even though the maize and Brassica genes shared some sequence similarities, data showed that they were expressed in different tissues, suggesting that ZmPK1 would not play a role in pollen-stigma recognition. Screening cDNA libraries by using protein kinase genes as hybridization probes under lowstringency conditions was widely used in the 1990s to reveal serendipitously many new RLKs in several plant species. In 1991, Nasrallah's group (134) described the cloning of a new S-domain receptor kinase. The ECD displayed genotype-specific sequence polymorphisms that paralleled those of SLGs and S receptor kinase (SRK) transcripts were detected only in reproductive organs, suggesting that self-recognition between pollen and stigma during pollination in Brassica oleracea is mediated by receptor-ligand interactions between pollen and pistil components, leading to pollen acceptance or rejection. In 1992, the same team described the receptor ARK1 (141). Just like ZmPK1, ARK1 was expressed mainly in leaves, ruling out a role in self-incompatibility. Since then, several S-domain receptor kinases have been characterized, and besides their role in reproduction, some play a role in innate immunity, like mediating low-complexity bacterial metabolite sensing in Arabidopsis (71), in plant-mycorrhizal interaction (73), or in environmental and developmental processes (8,109,110,117). These G-type lectin receptors [for Galanthus nivalis agglutinin (GNA)] belong to the large lectin (carbohydrate-binding) domain RLK family with other subfamilies, such as the L-type (L-LEC), C-type (C-LEC), malectin-like (also known as CrRLK1L), cysteine-rich repeat (CRR), and lysin motif (LysM) RLKs described below.

Leucine-rich repeat domain.
In 1991, Bleecker (9) reported the cloning of an A. thaliana putative TM protein kinase. One year later, the TRANSMEMBRANE KINASE 1 (TMK1) gene was described in detail by Chang and coauthors (21). In contrast with the previously cloned receptors, the ECD of TMK1 was composed of 11 copies of a 22-AA leucine-rich repeat (LRR) unit, a protein structural motif rich in leucine arranged in tandem repeats. In animals, the Toll receptor, involved in the establishment of dorsoventral polarity in Drosophila embryos, most resembled this new plant LRR receptor (50). The structural difference between these two receptors was the presence of the kinase domain in the cytoplasmic region of the plant receptor, which is absent in the Toll receptor [as it contains instead a Toll/interleukin-1 receptor (TIR) domain]. Since the 1990s, there has been enormous progress in determining the functions and mechanisms of activation of some of these receptors. Many functions have been attributed to these receptors, from plant growth and development to symbiosis and immunity. Activation of these receptors is generally based on peptide or hormonal ligand-dependent interactions of LRR receptor heterodimers containing one LRR-RLK with a few LRR units playing the role of coreceptor, such as BRASSINOSTEROID INSENSITIVE1-ASSOCIATED KINASE1 (BAK1) or SUPPRES-SOR OF BIR1-1 (SOBIR) (46,57,58,131,160). However, because the LRR-containing receptor kinase subfamily contains the largest number of genes in all plant genomes studied so far (   (53). Recent discoveries have shown that these receptors are involved in cell wall integrity maintenance mechanisms through the perception of pectin or pectin-derived molecules not only during developmental cell wall extensions but also during stress-or pathogen-induced cell wall damage (5, 64).

Legume lectin domain.
In 1996, Herve, Lescure, and coauthors (55) described a protein, Arabidopsis thaliana lectin-receptor kinase 1 (Ath.lecRK1), containing an ECD homologous to the carbohydrate-binding proteins of the legume lectin family. Since 1889, extracts of castor beans or many other leguminous seeds were considered capable of agglutinating animal red blood cells via some remarkable proteins called lectins (from the Latin word legere for choose) (87,135). However, despite this homology, all AA residues forming the putative monosaccharide-binding site of the lectin domain of Ath.lecRK1 were different from those usually found in legume lectins, suggesting that these receptors might not possess true lectin activity (14). To date, most studied L-LEC-RLKs have roles in plant immunity, with recent data suggesting that these receptors could bind metabolites such as extracellular adenosine triphosphate (eATP) (24) or extracellular nicotinamide adenine dinucleotide (eNAD) (155). This latter molecule could be a key component of systemic acquired resistance in Arabidopsis, and the binding of eNAD to the L-LEC-RLK requires that this receptor forms a complex with the LRR-RLK BAK1 coreceptor (155, 157).

CRINKLY4 (and tumor necrosis factor receptor-like) domain. The 1995 Cold
Spring Harbor Symposium review of Dangl, Preuss, and Schroeder (29) reports prime scientific discoveries including the cloning and functional analyses by Becraft, Stinard, and McCarty (7) of the maize receptor CRINKLY4. The ECD of this RLK was described as containing a novel 37-AA domain repeated seven times (the crinkly domain related to a GTPase-activating protein found in all eukaryotes) and a 26-AA region with similarities to the tumor necrosis factor receptor (TNFR) cysteine-rich region. The fact that this TNFR motif was known to contact the TNF suggests that the ligand for the maize protein might be a peptide. Even if a peptide of the CLE family has been suggested to be a ligand for the ARABIDOPSIS CRINKLY4 (ACR4) receptor in complex with the CLAVATA1 receptor belonging to the LRR-RLK subfamily, no biochemical data supporting this suggestion are available at the moment (28). It has to be noted that, contrary to previously described receptors that are present at several tens of copies per genomes, the CRINKLY4 subfamily is only represented by one or two copies on average per genome ( Table 1). (119) cloned the first member of the Catharanthus roseus receptor-like kinase 1-like (CrRLK1L) subfamily in 1996 during a screen of a library from a Madagascar periwinkle cell culture. None of the previously described ECDs showed similarity to the 450-AA residues composing the N-terminal part of this receptor. In 2011, Boisson-Dernier, Kessler, and Grossniklaus (10) showed that the ECD of these CrRLK1L receptors is homologous to a carbohydrate-binding domain, the malectinlike domain, which seems however to lack residues important for carbohydrate-rich ligand binding (37,103). Various members of this RLK subfamily, in complex with other CrRLK1L receptors and glycosylphosphatidylinositol-anchored proteins chaperons/coreceptors, are involved in developmental or immune cell-wall-sensing responses induced by RAPID ALKALINIZATION FACTOR (RALF) peptides (43, 107, 161).

Malectin-like leucine-rich repeat domain.
A malectin-like domain has also been found in the extracellular part of the receptors in association with LRRs. The light-repressible receptor protein kinase (LRRPK) is the first receptor of this type cloned in 1997 by Deeken & Kaldenhoff (33). The presence of a malectin-like domain followed by a few LRRs in the ECD of an RLK was described for the first time for the IMPAIRED OOMYCETE SUSCEPTIBILITY1 (IOS1) receptor in 2011 by Hok, Keller, and coauthors (59). It has recently been established that the CrRLK1L FERONIA and IOS1 function as scaffolds to regulate LRR-RLK immune receptor complexes (133,164). Another well-studied receptor with this organization is the SYMBIO-SIS RECEPTOR-LIKE KINASE (SYMRK), which was cloned in 2002 by Stracke, Parniske, and coauthors (136). For this receptor, named DOES NOT MAKE INFECTIONS2 (DMI2) in Medicago truncatula, the protein level paced by protection from proteasome-mediated degradation could be a regulator of bacteria-plant and fungi-plant symbioses (108, 151).

Leucine-rich repeat malectin domain.
In the LRR malectin domain, LRRs are located upstream in the extracellular part of the receptor, and the first gene of this type was described in 1998 by Takahashi, Chua, and coauthors (137). The extracellular part of receptor-like kinase History 137 in flowers 1 (RKF1) possesses 13 LRRs. The presence of the malectin domain, located between LRRs and the TM domain in this receptor, was discovered later (38). The primary sequence difference between the malectin-like and malectin domains suggests that these two domains could have distinct ligand affinities. To our knowledge, scarce data are available on these LRR malectin RLKs in the literature, except that one of these receptors conferred enhanced resistance to the fungal pathogen Magnaporthe oryzae when overexpressed in rice (84).

Cysteine-rich repeat domain.
Takahashi et al. (137) cloned a second gene, RKF2. In this 617-AA protein, the ECD has two copies of a C-X8-C-X2-C motif, distinct from the cysteinerich region of SLGs and SRKs but highly similar to fungal lectins, suggesting either a common origin or convergent evolution of these domains (147). A few years later, the receptors possessing these CRR motifs were named CRKs, for CRR RLKs (22). These motifs are also named GINKBILOBIN2 (GNK2) [e.g., in the InterPro database ( This gene possesses a proline-rich ECD with sequence similarity to the extensin family of proteins, and PERK1 gene expression was rapidly induced by wounding, suggesting a role in sensing cell wall modifications following cell wall damage or pathogen responses, similar to the WAKs. Even though characterized PERK genes suggest that they have functions in development and virus infection, up to now, ligands for these receptors have not been described (12).

Lysin motif domain.
In the search for receptors involved in the perception of Nod factors during the early steps of nodulation following rhizobial infection of legumes, a hypothesis on the presence of two receptors was put forward in 1994 (3). However, it took almost 10 years before two different teams, led by Stougaard (92,113) and Geurts (88), published their results on the cloning of a yet-undescribed type of receptor containing LysMs in their ECD. The receptors analyzed in these papers, i.e., NOD FACTOR RECEPTOR1 (NFR1), NFR5, and SYM2 [now known as LysM domain-containing receptor-like kinases (LYKs)], contained two or three LysM modules that occur in bacterial peptidoglycan-binding proteins or in chitinases from yeast and alga (20). Based on studies conducted over the last decade, it is now clear that all LysM-RLK receptors characterized to date, in complex with LysM-RLP or with a kinase domain devoid of kinase activity [called non-RD kinases (30) and representing 20% of all kinases (72)], bind to chitooligosaccharides ligands (89). Their dual role in plant immunity and in the establishment of the arbuscular mycorrhizal and rhizobium-legume symbioses is also well-defined (19).

Calcium-dependent lectin domain.
The association between the calcium-dependent (or C-type) lectin (C-LEC) motif and a kinase domain in plants was noted for the first time in Arabidopsis by Shiu & Bleecker (121) in 2001. Interestingly, this association was observed in a single receptor. Contrary to the abundance of other types of lectin receptor kinase in plants, the very low number of receptors with this C-LEC kinase structural organization has now been confirmed in several genomes (one or two copies) (8,124,148,163). However, we still do not know their functions. In animals, C-LEC receptors are numerous and have been extensively studied because of their decisive role in pathogen recognition and immunity (16).

Other Extracellular Motifs with Few Representatives or Uncommon Associations
Several new domain associations have been discovered in the search for Resistance genes (R genes). The RLKs described in this section are representative of domain associations that are sometimes species-specific and present in only a few copies in plant genomes. Consequently, for many of them, more extensive investigations are needed to uncover their real functions and activation mechanisms.

Chitinase (glycoside hydrolase)-type domain.
In 2000, Kim, Pai, and coauthors (62) cloned the CHITINASE-RELATED RLK1 (CHRK1) gene. This new type of receptor contains a domain closely related to chitinase in its extracellular part, but lacking the essential residue required for chitinase activity. Co-suppression of the endogenous tobacco gene showed pleiotropic developmental phenotypes, and mRNA accumulation was strongly stimulated by fungi and viruses (76).

Thaumatin domain.
In 1996, Wang, Lawton, and coauthors (156) published a paper stating that "The PR5K Receptor Protein Kinase from Arabidopsis thaliana Is Structurally Related to a Family of Plant Defense Proteins." Indeed, the extracellular part of this pathogenesis-related (PR)-5-like RLK contains a thaumatin-like motif, which shares sequence similarities with thaumatin, a sweet-tasting protein originally found in an African berry (150). Thaumatin RLKs are transcriptionally induced by pathogenic and environmental stress, and some of these receptors play a specific role in abscisic acid-dependent drought-stress signaling (6, 144). (18) cloned the stem rust-resistance gene Rpg1 in barley. This gene had homology to receptor kinases, but the domain organization was unique: Two kinase domains were fused into one protein, with the second one being a pseudokinase. Because this structure has been found in a wide range of plant taxa, it has recently been termed tandem kinase-pseudokinase (TKP) (63).

Nucleotide-binding site leucine-rich repeat kinase.
In 2008, Brueggeman and coauthors (17) described the Rpg5 gene, which is also involved in stem rust resistance. This time, this new R gene was found to code for a protein with a nucleotide-binding site (NBS), LRRs, and protein kinase domains. This organization is now defined as integrated domain (ID) nucleotidebinding leucine-rich repeat (NLR) proteins (NLR-IDs). NLR-IDs have been found in many plant species, and up to 10% of NLRs contain IDs (68,115). The kinase association, as for other IDs, could be a strategy to trap kinase-targeting pathogen effectors (66; see also 138).

Pathogenesis-related protein 1 domain.
In 2004, a new structural organization was described by Shiu and colleagues (123) in a paper comparing Arabidopsis and rice RLKs. The domain pathogenesis-related protein 1 (PR-1) was found to be fused to a kinase domain related to the CRR subfamily. Several years later, in 2013, the same ECD configuration was detected in two genes in the Theobroma cacao genome (140). The PR-1 domain was found to be fused to the TM and kinase domains related to the L-LEC subfamily, suggesting that these domain associations had occurred at least twice. The origin of this fusion in cacao seemed to involve retrotransposition since one of the genes was surrounded by retrotransposition marks and repetitive elements.

Cytoplasmic or Membrane-Associated Receptor Kinases
Besides these receptor structural configurations, some kinase proteins lacking an apparent TM domain but closely related to these RLKs were also discovered.

Kinase only.
For example, the Arabidopsis APK1 protein, which was described as phosphorylating tyrosine, serine, and threonine, was cloned by Hirayama & Oka (56) (4). To our knowledge, none of these proteins have been studied so far, but, with the ID model in mind, these proteins could also play the role of a trap in the many RLK signaling pathways in which PUBs and ubiquitination have been involved (27,40). Alternatively, this could simply be a way to fuse two proteins that normally work subsequentially. This Prévert-style inventory of researchers and articles describing the discoveries of new RLKs in the last decade of the twentieth century has laid the foundations for showcasing that (a) these receptors are involved in a myriad of different functions such as developmental processes, disease resistance, symbiosis, or self-incompatibility, (b) they are present in many plant species, and (c) they are all phylogenetically related.

THE ORIGIN OF THE RECEPTOR-LIKE KINASE DOMAIN
The high diversity of the domain combinations in the RLKs we have described, especially in their ECD, raises several questions regarding their origin: How are RLK domains related to kinase 140 Dievart et al.
domains in other kingdoms? When did each of these combinations appear? Did they appear around the same time or at very distinct periods? Did each of them appear only once, or did some of them appear multiple times over the course of evolution? A journey into the past is warranted to answer these questions. Hanks, Quinn, and Hunter (48) were the first to initiate, with the data set of protein kinases available in 1988, a classification based on similarities in their AA sequences. They thus aligned the 250-300 AAs of 65 protein kinase catalytic domains from vertebrates and invertebrates. This allowed them to precisely describe, for the first time, the 11 major conserved subdomains of the catalytic domains. Additionally, their phylogenetic analysis revealed five major clades that pooled protein kinases with similar modes of regulation or substrate specificities. One of the clades grouped protein tyrosine kinase sequences closely related to the Ser/Thr kinase Raf protein.  (56)]. The resulting tree presented in that paper revealed that the plant RLK sequences clustered together, suggesting a monophyletic origin. They were classified as "other protein kinase families (not falling in major groups)." Moreover, the closest animal kinase to this group was the Drosophila Pelle kinase, indicative of a possible common origin of plant RLKs and animal Pelle kinases. The fruit fly kinase was also classified in the "other protein kinase families (not falling in major groups)" category, but in a different subgroup. Another important result from this analysis is the fact that, in the S-domain group, ZmPK1 and SRKs were not grouped together in the inferred tree, even though they both contain an S-domain associated with the kinase domain.
The second attempt to build a phylogenetic tree combining RLK and animal kinase domains was published in 1997 by Clark, Williams, and Meyerowitz (25). In their phylogenetic tree, based on the minimum evolution principle, all plant RLKs analyzed [S-domain: ZmPK1, RLK1 and RLK4 (153), and SRK; LRR domain: CLAVATA1 (CLV1) (25), RLK5 (153), TMK1, ERECTA (ER) (143), and Xa21 (132); CRINKLY4 domain: CR4 (7); and no associated domain: PTO (99) and FEN (91)] formed separate lineages distinct from animal kinases. Moreover, the four S-domain receptors were again not clustered together, with SRK being outside of the RLK1, RLK4, and ZmPK1 clade, thus confirming previous results (47). Another observation was that the no associated domain receptors PTO and FEN clustered together and were close relatives to an LRR domain receptor clade, including CLV1, RLK5, and ER. However, some other LRR domain sequences, i.e., Xa21 and TMK1, were not included in this group, and formed two disjointed clades, also distinct from the CRINKLY4 domain CR4 clade. In 1999, Hardie (49) published a review in which the phylogenetic relationships of 89 kinase domains from A. thaliana were considered, including 18 RLKs. This analysis showed that these protein kinases clustered into 12 major subfamilies, one being RLKs. However, as noted in the article, the kinase domain sequences of RLKs were more variable than those of the other subfamilies. Moreover, these sequences did not cluster in exactly the same way, depending on whether the analyses were done using their kinase or ECD sequences. Four subclasses emerged in the phylogenetic tree inferred using the ECDs of RLKs. The first subclass is the S-domain containing RLK1, RLK4, and ARK1 (141). The second is the LRR domain containing CLV1, ER, RLK5, TMK1, TMKL1 (149), RKF1 (137), and the BRASSINOSTEROID-INSENSITIVE1 (BRI1) receptor (83). The third is the WAK domain containing Pro25 (WAK1) and WAK4 (52). The fourth is the L-LEC domain containing LecRK1 (55) (101), and the recent release of the complete sequence of the Arabidopsis genome, revealed that the Arabidopsis genome contains nearly 1,000 genes encoding Ser/Thr protein kinases, several hundred of which are RLKs (2). Comparative genome analysis between Arabidopsis and other sequenced genomes again highlighted that RLKs were highly similar to the Drosophila Pelle protein kinase and the mammalian INTERLEUKIN1 RECEPTOR-ASSOCIATED KINASE (IRAK). Overall, these results provided a first indication that (a) RLKs have a monophyletic origin and are close relatives to animal Pelle kinases and IRAKs, (b) RLK kinase domains evolve more rapidly than kinase domains of other protein kinases, (c) some ECDs found in plants are unfamiliar in animals, and (d) although the ECDs are similar, kinase domains of some of these RLK receptors do not cluster together in the phylogenetic trees, suggesting that several associations between these ECDs (e.g., S-domain or LRR) and different kinase domains could have occurred independently. These results paved the way for Shiu & Bleecker (121) to publish a first remarkable paper in 2001. All of this evidence supports the monophyletic origin of the RLK/Pelle subfamily [renamed by Shiu & Bleeker (121)], and the large-scale expansion of this gene family in plants (few genes in animals versus hundreds in plants) (123). However, the time of divergence of this clade from other kinases has yet to be precisely established. It seems clear from these data that the origin of the RLK/Pelle clade could predate the divergence of plants and animals (121), i.e., around 1,580 ± 90 million years ago (Mya) (54), but the presence of RLK/Pelle in the Alveolata phylum [e.g., Plasmodium (122), Perkinsus, and Toxoplasma (79)] suggests that it may be even older, i.e., between 1,580 and 1,840 Mya (54) (Figure 2). Analyses of new genome sequences revealed, however, that RLK/Pelle orthologs are not found in the excavates Leishmania major and Giardia intestinalis (96), in Dictyostelium, the model organism of Amoebozoa (44), in fungal genomes, or in Monosiga brevicollis, a choanoflagellate close relative to animals (79,121,122), therefore suggesting massive losses in these lineages under this hypothesis. A search for RLK/Pelle sequences in many other species in these phyla should thus help to complete the evolutionary history of these kinase domains, which could be more complex than anticipated [e.g., involving horizontal transfers (79) or convergent evolution].

AT THE ORIGIN OF EXTRACELLULAR LIGAND-BINDING DOMAIN-KINASE DOMAIN ASSOCIATION
Two approaches are found in the RLK-related literature that are focused on either a specific RLK subfamily or a specific taxon. The first approach analyzes a given RLK subfamily, and it is studied among several species in order to reconstruct its evolutionary history. This approach and the resulting conclusions depend on the genomes available at the time of the analyses. The second approach focuses on a given species, and it aims to inventory all of its RLKs, generating a very precise picture of a given step in the evolution of the RLK family. Below we summarize findings on RLK evolution derived from the literature as well as our studies using these two approaches.
The seminal extensive genome-wide analyses conducted by Shiu & Bleecker (121,122) provide an evolutionary framework that is highly useful for the whole scientific community working on RLKs. Moreover, additional later studies by Shiu and collaborators (77)(78)(79)123) have also greatly helped gain further insight into the evolutionary dynamics of this gene family in plants. Indeed, these were the first reported large-scale analyses of RLK gene families comparing several species. Phylogenetic analyses based on kinase domains combined with annotations of ECD structural domains led to the first attempt to classify this large family of plant receptors. The RLK family was in turn subdivided into more than 50 different subfamilies, around 20 of which are LRR receptor kinase subfamilies and 20 are receptor-like cytoplasmic kinases (RLCKs) that have been defined as receptor kinases with no apparent SP or TM domain (grouping the kinase-only and the UspA kinase U-box) (121). These different subfamilies tend to have similar structural organization in 142 Dievart et al.  their ECDs (Supplemental Figure 1). This large body of literature has shed considerable light on the extraordinary gene expansion, i.e., the increasing number of genes in some subfamilies, as the consequence of duplication events, including whole genome, segmental, or tandem duplications.
In the streptophytes, the number of proteins within RLK subfamilies expanded, but also the number of these different subfamilies increased (resulting in diversification) (70). An assessment of the presence and structural architecture of RLK ECDs in the different lineages is required to trace the origins of RLK diversification, i.e., the increasing number of specific domain combinations. In the early twenty-first century, whole plant genome sequences have started to be released, and a wealth of data is now available (100). classification proposed in 2012 (77). Moreover, the recent availability of many new genome sequences in Charophyta and basal land plant lineages has led to a more precise description of the fusion events that have shaped RLK subfamilies in angiosperms. These data from the literature and our unpublished studies are described below and summarized in Figure 3.   The presence and frequencies of RLK domains within major green lineages. The color code is the same as in Figure 1; a filled circle depicts the presence of the concerned structure, and its size is proportional to the number of sequences found; an open circle means that no gene has been found. For a brief description of the current knowledge on green plant lineage (e.g., Viridiplantae) phylogeny, readers are referred to Supplemental Data 1. There are two areas of discrepancy between published results and ours. The first, shown by an asterisk ( * ), is the appearance of the CRINKLY4 subfamily in Marchantia (15). Our data suggest that the fusion between the CRINKLY4 domain and a kinase appeared in the Physcomitrella patens genome (74) and not in Marchantia. Previous phylogenetic analysis of the CRINKLY4 subfamily showed that only one protein was a true CRINKLY4 family member containing all of the characteristic domains in the P. patens genome (105). This discrepancy could be due to the different levels of importance given to the presence of structural domains in the CRINKLY4 RLK. These results should then be taken with caution. The second discrepancy is shown by a dagger ( †). Our data suggest that S-domain RLKs could be a new putative receptor configuration that appeared before the emergence of the Phragmoplastophyta clade (that includes three charophyte lineages: Charophyceae, Coleochaetophyceae, and Zygnematophyceae, together with land plants). However, these observations refer to a sole sequence that, in our study, would have been lost in Marchantia. Based on published data, the S-domain RLK subfamily only appeared in land plants (15,162). Gray text indicates a species that is included in this article, but for which a complete proteome is not available. Abbreviations: ANK, ankyrin repeat; C-LEC, calcium-dependent lectin; CRR, cysteine-rich repeat; L-LEC, L-type lectin; LRR, leucine-rich repeat; LysM, lysin motif; RLK, receptor-like kinase; UspA, universal protein A; WAK, wall-associated kinase.

Receptor-Like Kinases in Angiosperms
To supplement the data harvested in the literature, we selected 176,020 protein kinases classified as RLK/Pelle in the iTAK database (166) and scanned them to annotate their structural domains (Supplemental Table 1). On average, there are approximately 900 RLKs per genome in the 127 dicot and 53 monocot species analyzed in accordance with previous results (123,167). One third consist only of a kinase domain (62,642 proteins). Some of these proteins are RLCKs. In 12% of the analyzed receptor configurations, only a TM and kinase domain were detected without any other motifs in the ECD (13,591 proteins). However, in many of these proteins, the ECD was very long (several hundred AAs) (Supplemental Figure 2). New motifs or domains may ultimately be discovered in these receptors. More than 2,000 unique domain organizations were found in the protein data set. Among them, 1,730 were found in less than 10 proteins in all of the 180 genomes (Supplemental Table 1). Moreover, many variations per domain have been noted in multidomain ECDs (162) (see also Supplemental Figure 3). These variations shed light on the multiple domain organizations found in plant species, and show the trial by error dynamics within genome evolution. These diverse organizations also reveal that these domains have often been reused throughout evolution, e.g., the association of RLPs with kinase domains. Of course, many of these configurations are probably annotation errors (e.g., gene boundary delimitation leading to protein fusion prediction errors or a lack of domain recognition by the software) since these architectures are based on automatic genome annotations. Nevertheless, the 12 types of domain described in Section 2.1 are the most abundant predicted structures observed in RLKs in this data set ( Table 1). The largest subfamilies are LRR and putative carbohydrate-binding [containing the S-domain, L-LEC, C-LEC, malectin, malectin-like (CrRLK1L), LysM (8) and CRR (147) kinases], with over 200 LRR and 150 putative carbohydrate-binding receptors per genome on average. Note that only one C-LEC RLK was present per genome, as mentioned above. Could the uniqueness of the C-LEC receptor in plant genomes be an indication of its functional importance? Among these ECD kinase associations, only one was probably not present in the angiosperm ancestors, i.e., the LRR malectin receptor kinase that is not found in the Amborella genome and whose expansion seems to have been monocot-and dicot-specific (38,42) (Figure 3). Note, however, that the evolutionary history of this subfamily is perhaps more complex since some LRR malectin receptors-despite also being absent in gymnosperms-are found in basal vascular plants and in the Physcomitrella genome (one or two copies).
www.annualreviews.org • Plant Receptor-Like Kinase History 145 For the structures described in Section 2.2 ( Table 1) and for many others (Supplemental Table 1), a sporadic distribution in monocot and dicot species has been observed without any obvious lineage-specific expansion. This rate of association is assumed to have been very low (78). If the events we noticed were not due to annotation errors, these domain swapping events could represent new randomly occurring domain associations. The putative ones that caught our attention are (a) the integration of a WAK domain in the S-domain organization, (b) the gain of a kinase domain in NBS-LRR proteins-but unlike Rpg5 (17), the kinase domain here is located at the N-terminal end with respect to the NB-ARC and LRR domains-and (c) receptor kinases containing only a malectin domain but no detected LRR repeats (Supplemental Figure 4). However, as these proteins have only been characterized by sequence analysis, further functional studies of these genes should provide more insight into the functions of these putative receptors.

Land Plant Clade
Prior to the publication of complete genomes of species belonging to the land plant group but outside of angiosperms, 29 cDNAs of RLKs (belonging to the LRR, C-LEC, Pro-rich/extensin, LysM, malectin-like (CrRLK1L), malectin-like LRR, and L-LEC subfamilies) were identified in Marchantia polymorpha (116). In the recent publication of the complete M. polymorpha genome (15), a whole section is devoted to the study of RLK genes, revealing the presence of many of them in this early diverging lineage of land plants. Although some subclasses were not represented (e.g., the BRI1 clade in the LRR subfamily), all domain RLK subfamilies were present except the CRR RLKs that appeared in vascular plants (Selaginella) (147) (see the caption for Figure 3 for the discrepancy observed for the CRINKLY4 and S-domain subfamilies).

Charophyta Lineages
Before the complete genomes of charophyte algae were sequenced, a few studies explored the presence of RLKs in cDNA libraries, and some were found in two Charophycean algae, Closterium ehrenbergii (Zygnematophyceae, 14 RLKs) and Nitella axillaris (Charophyceae, 13 RLKs) (116). In another study, the presence of symbiotic gene homologs within the entire green lineage revealed that homologous sequences of LysM, malectin-like (CrRLK1L) and malectin-like LRR-domain RLKs were present in charophyte lineages, so these associations appeared before the emergence of land plants (34,152) (Figure 3). The availability of the complete genome of Chara braunii revealed that the LysM RLK gene family was not only present but also expanded in this species (seven copies) (106). Since L-LEC, CRINKLY4, WAK, and putatively S-domain subfamilies are found in basal land plants but not in the C. braunii genome, the absence of these receptor configurations in Zygnematophyceae (23) and Coleochaetophyceae genomes would be highly indicative of their land plant specificity.
The earliest Streptophyta sequenced genome available today is Klebsormidium flaccidum (60). Although a detailed list of RLKs was not specifically established, RLKs were clearly identified as a gene family in Klebsormidium, for which the numbers of genes were significantly increased in land plant genomes (60). We found 94 RLKs in this genome, and the first canonical receptor configuration with an acquisition of a TM domain was observed in it. The only kinase-associated ECDs found in this genome are the LRR, C-LEC, and Pro-rich/extensin domains, thus suggesting that these fusions were the first that occurred and arose before the Klebsormidiophyceae split.
these very early diverging lineages would be essential to delve deeper into the origin of the associations between LRR, C-LEC, and Pro-rich/extensin domains, which may have emerged early in the streptophyte lineage.

Chlorophyta Clade
In the Chlorophyta clade, there have been several attempts to screen for the presence of RLKs. A search in the Ostreococcus tauri (35) and Chlamydomonas reinhardtii genomes revealed the presence of two RLCK genes in C. reinhardtii but none in O. tauri (78). When using the standalone iTAK program (166) to assess several chlorophyte genomes, we found a few kinase domains classified as RLK/Pelle, some with various associated domains [e.g., ankyrin-or WD40-repeats annotated by the InterProScan program (102) but no TM domain detected by the TMHMM program (67)]. One domain that was found repeatedly in several species from the Chlorophyceae (three copies), Ulvophyceae (one copy), or Trebouxiophyceae (one copy) was a C-terminal U-box domain associated or not with other N-terminal domains resembling the UspA kinase U-box RLCKs described above (Figure 3). The presence of this particular assembly in some RLCK genes, also detected throughout the streptophyte lineage, could be a strong clue that this subfamily may have been present before the streptophyte-chlorophyte split, but after the Rhodophyceae divergence. Moreover, they could represent the ancestors from which RLKs arose. Some LRR kinase domain associations have been observed in Chlorella variabilis (Trebouxiophyceae, Chlorellales) but not in any other chlorophytes analyzed previously [C. reinhardtii and Volvox carteri (Chlorophyceae), Micromonas pusilla, Ostreococcus lucimarinus, and O. tauri (prasinophytes)] (36,90,130). In the 14 species analyzed in this study, an LRR domain kinase association was again only detected in C. variabilis, and it was not found even in its close relative Chlorella sorokiniana. Previous phylogenetic analysis showed that this kinase domain did not cluster with the monophyletic RLK/Pelle subfamily (36). Overall, these results suggest that the LRR kinase association observed in this sole Chlorella genome could be a unique event that occurred independently of the successful association that spread in the streptophyte genomes.

CONCLUDING REMARKS
In conclusion, the RLK diversity observed in various plant genomes, in terms of domain combinations, did not occur in a single step over the course of plant evolution but instead took place gradually. Clearly, each new sequenced genome, particularly at the basal branches of streptophyte species, has made it possible to describe an increasingly precise history of the origin and evolution of these receptors. Although the picture is still only partial-especially with a lack of data on charophytes-the emergence of certain subfamilies in C. braunii, for example, led to the hypothesis that the high morphological complexity of Chara could result from the advent and/or expansion of certain gene subfamilies, including LysM, malectin-like (CrRLK1L), and malectin-like LRR RLKs (Figure 4) (106). Similarly, the appearance of RLK subfamilies that are found throughout land plants but missing in charophytes (WAK, L-LEC, CRINKLY4, and S-domain) is correlated with the diversification of both the developmental and defense-signaling mechanisms necessary for out-of-water adaptation (∼450 Mya). The appearance of the LRR, C-LEC, and Pro-rich/extensin subfamilies in basal charophytes is still a mystery. How did they emerge? What was their function? Some tracks are beginning to appear, but the way is still long (158). Gaining in-depth understanding of the evolutionary history of these large gene families is a gargantuan task because many events, such as ancestral and recent lineage-specific duplications or domain rearrangements (e.g., fusions or exchanges), must be considered, and the Relationships between the major stages of plant evolution and appearances of new structural organizations of domains. Dotted lines indicate that the exact starting point is unknown. Dates are from References 80 and 120. Abbreviations: C-LEC, calcium-dependent lectin; CRR, cysteine-rich repeat; L-LEC, L-type lectin; LRR, leucine-rich repeat; LysM, lysin motif; NBS, nucleotide-binding site; PR-1, pathogenesis-related protein 1; TM, transmembrane; WAK, wall-associated kinase. evolutionary dynamics of ECDs and kinase domains must be studied in parallel. This huge work has been undertaken recently for the CRR kinase subfamily (147) and should open the path to new extensive analyses on other subfamilies.

SUMMARY POINTS
1. Plant receptor-like kinases (RLKs) are structurally related to animal receptor tyrosine kinase.
2. Some extracellular domains found in plants are unfamiliar in animals.
3. Domains contained in extracellular regions of plant RLKs are highly diversified.
4. RLKs have a monophyletic origin and are close relatives to animal Pelle and interleukin-1 receptor-associated kinases.
5. RLK diversity, in terms of domain combinations, did not occur in a single step over the course of plant evolution but took place gradually.

NOTE ADDED IN PROOF
While this review was in press, a new receptor configuration was described by Wu, Pei, and coauthors (159). This receptor is the first known cell-surface hydrogen peroxide (HP) sensor in plants.