Cloning and Characterization of a New Site-Specific Methyl-Directed ElmI Endonuclease Recognizing and Cleaving C5-methylated DNA Sequence 5'-G(5mC)^NG(5mC)-3'.

Putative open reading frames of MD-endonucleases have been identified in Enterobacteria genomes as a result of the search for amino acid sequences homologous to MD-endonuclease BisI. A highly conserved DNA primary structure of these open reading frames in different genera of Enterobacteria (Escherichia, Klebsiella and Cronobacter) has allowed researchers to create primers for PCR screening, which was carried out on Enterobacteria DNA collected from natural sources. The DNA fragment, about 440 bp in length, was amplified by use of the genomic DNA of a wild E.coli LM N17 strain as a template and was inserted into the pMTL22 vector. Endonuclease activity was detected in an E.coli ER 2267 strain transformed with the obtained construction. A new enzyme named ElmI was purified by chromatographic techniques from the recombinant strain biomass. It was discovered that similarly to BisI this enzyme specifically cleaves the methylated DNA sequence 5'-GCNGC- 3' before the central nucleotide "N" if this sequence contains two 5-methylcytosines. However, unlike BisI, ElmI more efficiently cleaves this sequence if more than two cytosine residues are methylated.


INTRODUCTION
Methyl-directed site specific endonucleases (MD-endonucleases) recognize and cleave DNA at specific methylated sequences, leaving unmethylated DNA untouched. In the last nine years, more than 10 prototypes of these enzymes have been described. They have different recognition sites, which are cleaved only when the cytosine residues within them are C5-methylated.
In contrast to restriction endonucleases, MD-endonucleases recognize not only a specific nucleotide sequence and relative hydrolysis position in this sequence, but also a specific pattern of methylation. Therefore, different MD-endonucleases, even those with similar recognition sites, may cleave DNA differently based on its methylation pattern. Enzymes that recognize the methylated 5'-GCNGC-3' sequence are a great example. Among the enzymes that recognize this site and cleave DNA after the central nucleotide "N" BlsI [1] cleaves this sequence if it contains at least two and PkrI [2] if it contains at least three 5-methylcytosine residues. Among the enzymes that cleave the sequence before the central nucleotide "N," MD-endonuclease BisI [3] cleaves the 5'-GCNGC-3' sequence if it contains two 5-methylcytosine residues (and much less efficiently, if there is only one), whereas GluI [4] needs four 5-methylcytosine residues.
We have described a new representative of this group of enzymes, MD-endonuclease ElmI, that recognizes the methylated 5'-GCNGC-3' sequence and cleaves it before the central nucleotide "N" (to form 5'-overhanging single-nucleotide ends) if the sequence contains at least two 5-methylcytosine residues; the enzyme activity increases by an order of magnitude if three or four cytosines in the recognition site are methylated.

PCR screening of Enterobacteria DNA collected from natural sources and production of pElmI plasmid with the new MD-endonuclease gene
Coliform bacteria were isolated from natural sources (sewage water) on a selective Endo medium according to [5]. Eight to twenty strains with different morphological characteristics were collected from each sample inoculation (LM, LT, LP, and LV series).
Chromosomal DNA was isolated from these strains and screened by PCR using the primers listed below. The fragment amplified using DNA from one of the wild-type strains was inserted into the pMTL22 plasmid [6] at the BglII and FauNDI restriction sites.
After transformation of the E. coli ER2267 cells, the clones carrying the target plasmid (called pElmI) were plated on Petri dishes with an agarized LB medium supplemented with ampicillin (50 µg/ml). The clones were grown overnight at 37°C, subcultured to separate dishes with ampicillin (100 µg/ml), and allowed to grow overnight for further analysis.

Production of biomass of the recombinant ElmI producer strain and assay of the target activity
The recombinant clone of the E. coli ER2267 strain, which was transformed with pElmI plasmid, was transferred from Petri dishes to a bottle with 200 ml of a LB broth supplemented with ampicillin (100 µg/ml) using an inoculation loop. The inoculum was grown overnight using a thermostatted shaker (37°C, 120 rpm). 5 ml of inoculum were seeded into 20 bottles with 200 ml LB broth supplemented with ampicillin (100 µg/ml) and 0.5 mM isopropyl-β-Dthiogalactopyranoside (IPTG).
The culture was grown for 10 hours in a thermostatic shaker (120 rpm), and then an aliquot (1 ml) for the enzyme activity assay was withdrawn and transferred to a 1.5 ml Eppendorf tube. The cells were pelleted using a 5416 Eppendorf tabletop centrifuge (Eppendorf GmbH, Germany, 12,000 rpm, 2 min). The supernatant was removed, and the precipitate was resuspended in 0.2 ml of a lysis buffer (10 mM Tris-HCl, pH 8.5, 0.1 mg/ml lysozyme, 0.5 M NaCl, 1 mM EDTA, 0.1% Triton X-100).
The activity of the enzyme in the lysate was assayed in 20 µl of the reaction mixture, which contained pF-sp4HI3 plasmid, pre-linearized with DriI restriction enzyme, as a DNA substrate [4]. The linearization was performed in a SE buffer "W" (10 mM Tris-HCl (pH 8.5 at 25°C), 10 mM MgCl 2 , 100 mM NaCl, 1 mM dithiothreitol) for 2 hours at 37°C. The amount of ElmI enzyme sufficient for complete hydrolysis of 1 µg of pFsp4HI3 DNA (2 hours, 37°C, SE buffer "W") was taken as 1 unit activity of the enzyme. The presence or absence of hydrolysis of the DNA substrate was determined by electrophoresis in 1% agarose gel.
The cells of all of the produced biomass were pelleted using a J2-21 centrifuge (30 min, 8,000 rpm, JA-10 rotor, Beckman, USA) and frozen.

Production of ElmI enzyme preparation
All procedures for isolation and purification of the enzyme preparation were performed at 4°C using the following solutions: -Buffer A: 10 mM Tris-HCl, pH 7.5; 0.1 mM EDTA; 7 mM 2-mercaptoethanol; -Buffer B: 10 mM K-phosphate buffer, pH 7.4; 0.1 mM EDTA; 7 mM 2-mercaptoethanol. pFsp4HI3 in the SE buffer "W," digested for 15 min in 20 µl of the reaction mixture by adding aliquots (1 µl) of chromatographic fractions, was used as the DNA substrate [4] to determine the activity of the enzyme.
Isolation of the crude extract. The biomass (8 g) was suspended in 30 ml of buffer A containing 0.2 M NaCl, 1 mM phenylmethylsulfonyl fluoride (PMSF), 0.1% Triton X-100; and 0.1 mg /mL lysozyme. The cells were disrupted by sonication on a Soniprep 150 sonicator (MSE, UK, 2 cm adapter, amplitude of 20 µm), using four 1-minute periods followed by 1-minute intervals to cool the suspension in an ice bath.
Chromatography on phosphocellulose P-11. The crude extract was diluted twofold with buffer A and applied to a column with phosphocellulose P-11 (30 ml), pre-equilibrated with buffer A containing 0.1 M NaCl, and washed with 160 mL of the same buffer. The adsorbed material was eluted using a linear gradient of NaCl (0.1 to 1 M) in buffer A (total volume of 800 ml); 30 fractions were collected, and fractions 16 to 22 (0.35 to 0.47 M NaCl) exhibiting target activity were pooled together.
Chromatography on hydroxylapatite. The pooled fractions were applied to a column with hydroxylapatite (2 ml), pre-equilibrated with buffer B containing 0.05 M NaCl, and washed with 10 ml of the same buffer. The adsorbed material was eluted using a linear gradient of K-phosphate buffer (pH 7.4) containing 0.05 M NaCl (0.01 to 0.1 M, total volume of 30 ml). A gradient of 20 fractions was collected, and fractions 8 to 12 (0.044 to 0.056 M K-phosphate) exhibiting ElmI activity were pooled together. The pooled fractions were dialyzed for 1 h against 300 ml of buffer A.
Chromatography on heparin-agarose. The dialyzed fractions were applied to a column with heparin-agarose (2 ml), pre-equilibrated with buffer B containing 0.05 M NaCl, and washed with 4 ml of the same buffer. The adsorbed material was eluted using a linear gradient of NaCl (0.05 to 1 M) in buffer B (a total volume of 30 ml); 20 fractions were collected, and fractions 12 and 13 (0.62 to 0.67 M NaCl) exhibiting the target activity were pooled together.
Concentration, activity assays and storage. The pooled fractions were dialyzed for 20 hours against a 15-fold volume of buffer B with 50% glycerol and 0.2 M NaCl, and stored at -20°C.
Sanger sequencing was used to determine the DNA sequence on a ABI 3130xI Genetic Analyzer automatic sequencer (Applied Biosystems, USA) according to the manufacturer's instructions.
Preparations of enzymes, DNA, deoxynucleoside triphosphates, and synthetic oligonucleotides, as well as the molecular weight markers (1 kb Ladder and Lambda/StyI) used in this work, were produced by Si-bEnzyme (Russia).

Cloning of the new MD-endonuclease ElmI gene and comparative analysis of nucleotide and amino acid sequences
Previously, we found MD-endonucleases in bacteria from different taxonomic groups, but mostly in representatives of the Microbacteriaceae and Bacillaceae families. The earlier screening of cell lysates did not allow us to identify similar site-specific enzymes in Enterobacteriaceae strains, which may indicate either the absence or the extremely low activity of MD-endonucleases in this group of bacteria. To resolve this issue, we decided to use bioinformatic rather than the biochemical method to search for homologous enterobacterial proteins.
The PSI-BLAST (https://blast.ncbi.nlm.nih.gov) software was used to screen the database of Enterobacteriaceae amino acid sequences for sequences homolo-gous to the previously described MD-endonuclease BisI (GenBank AJW87312) [7]. Two search iterations revealed ~50 enterobacterial amino acid sequences which were homologous to the BisI sequence (32-48% similarity, 17-30% identity). The roles of all these homologous proteins have been unknown. The nucleotide sequences of the corresponding genes were extracted from the GenBank database and compared to each other. It has been shown that the sample contains two groups of highly homologous genes. The first group includes genes of four putative proteins with a length of 143-144 amino acid residues from the bacteria of genera Escherichia (GenBank accession numbers ACT43858 and AKN48098), Cronobacter (CCJ93299), and Klebsiella (KEG36084). A comparison of these genes to each other revealed a 93-99% identity. The second group contained enterobacterial genes which encode proteins with a length of 290 amino acid residues, whose N-terminal portion is homologous to the BisI protein (GenBank protein accession numbers: KFC97828, WP_000794335, WP_000794336, WP_000794337, WP_001655794, WP_004952390, WP_008806407, WP_021557167, WP_025912430, WP_032653240, WP_032671961, and WP_033070923 ). The degree of nucleotide sequence identity in this group is 83-99%. Figure 1 shows multiple alignment of the amino acid sequences of the four highly homologous enterobacterial proteins from the first group, which have a BisI sequence. The sequence of endonuclease ElmI, which was detected by PCR screening, is also shown (see the description below).
Multiple alignment of the corresponding enterobacterial genes (Fig. 2) revealed that their nucleotide compositions also have a high degree of identity, even though the host organisms belong to different genera. The sequence of the elmI gene, established in this work, is added to the alignment.
The high degree of sequence identity for the genes from the two aforementioned groups, which holds true for their end sites, allowed us to select primers for PCR screening of wild-type strains for the presence of similar genes. To search for genes encoding proteins related to the first group of proteins, we used primers containing the recognition sites of FauNDI and BamHI restriction endonucleases for inserting PCR fragments into a plasmid vector: For PCR screening for genes similar to the second group of genes, the following primers were synthesized: Genomic DNA was isolated from 64 strains of coliforms bacteria detected in sewage waters and used as a template for PCR. Amplification with the Esp-3 and Esp-4 primers (the second group of genes) did not result in a fragment of the expected length (~870 bp) in any of the matrix DNA. The use of the Esp-1 and Esp-2 primers (the first group of genes) resulted in a PCR fragment of the expected length (~430 bp) in one DNA sample.
This amplified fragment was treated with the Bam-HI and FauNDI restriction enzymes and ligated into the pMTL22 vector, which had been previously digested with FauNDI and BglII. The resulting plasmid, named pElmI, was used to transform the E. coli ER2267 cells.
Taxonomic specificity of the original strain whose genomic DNA was amplified to obtain the fragment was determined using conventional biochemical and morphological criteria [8], and by analyzing the structure of the 16S rRNA fragment by BLAST [9]. The original natural producing strain was identified as E.
coli LM N17. The site-specific DNA endonuclease produced by the strain was named ElmI.
The PCR fragment inserted in the pElmI plasmid was sequenced. The nucleotide sequence of the fragment, 432 bp in length, was deposited in the GenBank with accession number LN869919. The sequence begins with a ATG start codon and ends with a TAA stop codon (the hypothetical reading frame has no other stop codons), and, therefore, it can be considered as an hypothetical reading frame of DNA endonuclease ElmI, and the gene encoding this protein, as elmI.
A comparative analysis of the sequenced fragment of the gene shows that elmI has essentially the same sequence as the genes encoding polypeptides of the closest homologues: E. coli strains BL21 (DE3) (ACT43858) and C41 (DE3) (AKN48098). The only identified substitution was the presence of cytosine at position 131 of the elmI gene instead of thymine in the homologous genes (Fig. 2).
Therefore, the derived amino acid sequence of ElmI endonuclease differs from the closest homologues   from the E. coli strains BL21 (DE3) and C41 (DE3) by one amino acid residue: ElmI has serine at position 44, whereas the closest homologues (E. coli BL21 (DE3) and C41 (DE3)) have leucine at the same position ( Fig.  1). At the same time, the amino acid sequence similarity between ElmI and BisI is ~50%, and the number of identical amino acids is 112%. Therefore, the cloned DNA fragment that was identified by PCR screening and represented by a putative gene of methyl-directed ElmI DNA endonuclease is highly homologous to portions of genomic DNA from well-known E. coli strains.

Determination of the new ElmI MDendonuclease specificity
In contrast to the parental strain, the lysate of E. coli ER2267 clones carrying pElmI plasmid exhibited endonuclease activity and one of the E. coli pElmI clones was chosen for production of biomass and isolation of the enzyme. A total of 8 g of E. coli pElmI biomass were produced as described in the Materials and Methods section. Chromatographic purification of the biomass resulted in 3 ml of the ElmI enzyme preparation with a concentration of 4 u.a./µl.
Various substrate DNAs were digested in pre-established optimum conditions (37°C, SE reaction buffer "W", 20 µl of the reaction mixture containing 0.5 µg of substrate DNA, 2h) in order to determine the sitespecificity of ElmI. DNA was cleaved by the BisI con-trol enzyme under the same conditions, but using the SE reaction buffer "Y." DNA of plasmids carrying genes of different DNA methyltransferases was used as substrates to determine the specificity of the ElmI enzyme. The activity of these genes in E. coli strains, from which the plasmids were isolated, resulted in modification of DNA substrates by the corresponding DNA methyltransferases, and, therefore, they had distinctive patterns of methylation.
The results of the site-specificity analysis of the DNA substrates are shown in Fig. 3.
These fragments are much less visible in the case of BisI. Even though the hydrolysis of pFsp4HI3/DriI plasmid by 2-4 dilutions is much more complete com-  These data suggest that, in contrast to BisI, the new ElmI MD-endonuclease is an order of magnitude more efficient in cleaving the 5'-GCNGC-3' sequence in the presence of three or four 5-methylcytosine residues than in the presence of only two methylated residues. Therefore, the original DNA is completely absent after digestion with 1/16 u.a. of ElmI (lane 5) due to a more efficient cleavage of 5'-GCNGC-3' with three or four 5-methylcytosine residues. In contrast, BisI cleaves such hypermethylated variants less efficiently: therefore, the original DNA fragment remains visible if 1/16 u.a. is used (lane 14).

Determination of the position of the hydrolizable linkage in the ElmI recognition site
The position of the hydrolizable linkage was determined by comparing the lengths of the fragments generated during the cleavage of the oligonucleotide D1/ D2 duplex, formed from oligonucleotides D1 and D2, using ElmI, PkrI, and GluI MD-endonucleases (the latter also recognizes the 5'-GCNGC-3' methylated sequence [4] and cleaves it similarly to BisI before the central nucleotide. The putative sequence recognized by ElmI is underlined): D1: 5'-GAGTTTAG(5mC)GG(m5C)TATCGATCC-3' D 2 : 5 ' -G G A T C G A T A G ( 5 m C ) C G ( m 5 ) C T A -AACTC-3'. Figure 5 shows a autoradiograph of the electropherograms of the cleavage products of the radiolabelled D1*/D2 duplex in 20% polyacrylamide gel with 7M urea.
As can be seen from Fig. 5, the fragments derived from the hydrolysis of the D1*/D2 duplex with PkrI and ElmI (lanes 2 and 3, respectively) have different electrophoretic mobilities, indicating that these enzymes have different positions of hydrolizable linkage relative to the recognition site. At the same time, the electrophoretic mobilities of DNA fragments produced by ElmI and GluI are identical (lanes 3 and 4,  respectively). Therefore, ElmI and GluI have the same position of hydrolizable linkage relative to the recognized 5'-GCNGC-3' sequence. Since GluI cleaves the 5'-GC^NGC-3' sequence before the central nucleotide "N" [4], ElmI also cleaves it before the central nucleotide.

CONCLUSION
Thus, the first identified recombinant enterobacterial MD-endonuclease ElmI recognizes the 5'-GC^NGC-3' nucleotide sequence and cleaves both strands of DNA before the central nucleotide "N," producing 5'-overhanging single-nucleotide ends.
Our results indicate that enterobacterial genomes contain genes for MD-endonucleases whose amino acid sequences have only moderate homology to BisI and that only half of the amino acid residues may be regarded as similar in physical and chemical properties. Nevertheless, despite the only moderate homology of the primary structure BisI and ElmI have similar recognition sites and positions of hydrolizable linkages.
The use of the Esp-1 and Esp-2 primers and a laboratory E. coli BL21 (DE3) (ACT43858) strain results in amplification of an ~ 430 bp DNA fragment which is highly homologous to the elmI gene. According to Gen-Bank, this fragment represents a reading frame that encodes a polypeptide of unknown function: Entero-bact1 -WP_001276099.1 hypothetical protein Entero-bact1 -WP_001276099.1 hypothetical protein [Escherichia coli] >ref|YP_003035796.1| hypothetical protein ECBD_1551 [Escherichia coli 'BL21]. However, according to our data, this reading frame is a gene encoding a methyl-directed DNA endonuclease. We have denoted the gene corresponding to this frame as ecoBLI, and the encoded protein as EcoBLI. Its properties will be discussed in a separate publication.
The site-specific ElmI endonuclease can be used in epigenetic studies, molecular biology, and genetic engineering for site-specific cleavage of methylated DNA: e.g., for the analysis of genomic DNA methylation in plants [13], where CNG-methylation is considered to be epigeneticaly important.