Comparative metatranscriptome analysis revealed broad response of microbial communities in two soil types, agriculture versus organic soil

Background Studying expression of genes by direct sequencing and analysis of metatranscriptomes at a particular time and space can disclose structural and functional insights about microbial communities. The present study reports comparative analysis of metatranscriptome from two distinct soil ecosystems referred as M1 (agriculture soil) and O1 (organic soil). Results Analysis of sequencing reads revealed Proteobacteria as major dominant phyla in both soil types. The order of the top 3 abundant phyla in M1 sample was Proteobacteria > Ascomycota > Firmicutes, whereas in sample O1, the order was Proteobacteria > Cyanobacteria > Actinobacteria. Analysis of differentially expressed genes demonstrated high expression of transcripts related to copper-binding proteins, proteins involved in electron carrier activity, DNA integration, endonuclease activity, MFS transportation, and other uncharacterized proteins in M1 compared to O1. Of the particular interests, several transcripts related to nitrification, ammonification, stress response, and alternate carbon fixation pathways were highly expressed in M1. In-depth analysis of the sequencing data revealed that transcripts of archaeal origin had high expression in M1 compared to O1 indicating the active role of Archaea in metal- and pesticide-contaminated environment. In addition, transcripts encoding 4-hydroxyphenylpyruvate dioxygenase, glyoxalase/bleomycin resistance protein/dioxygenase, metapyrocatechase, and ring hydroxylating dioxygenases of aromatic hydrocarbon degradation pathways had high expression in M1. Altogether, this study provided important insights about the transcripts and pathways upregulating in the presence of pesticides and herbicides. Conclusion Altogether, this study claims a high expression of microbial transcripts in two ecosystems with a wide range of functions. It further provided clue about several molecular markers which could be a strong indicator of metal and pesticide contamination in soils. Interestingly, our study revealed that Archaea are playing a significant role in nitrification process as compared to bacteria in metal- and pesticide-contaminated soil. In particular, high expression of transcripts related to aromatic hydrocarbon degradation in M1 soil indicates their important role in biodegradation of pollutants, and therefore, further investigation is needed.


Backgrounds
Understanding microbial community structure and function of the soil ecosystem is vital to delineate ecological roles of the associated microbiome [1,2]. Several studies have shown that more than 97% of microbes in different ecological habitats (soil, water, acid mine drainage, hot spring) cannot be cultured, and hence, their functional and ecological roles in various biochemical processes remained unexplored [3,4]. Advances in next-generation sequencing have provided novel insights about structural and functional organization of bacterial genomes and about the key physiological processes and mechanisms bacteria employ to acclimatize under a set of environmental conditions [5][6][7]. Recently, genome sequencing and analysis of lignocelluloses degrading saprophytic fungi deciphered complete information about the enzymatic machinery these fungi have used to degrade lignocelluloses [8]. Though whole metagenome sequencing and assembly offers opportunity for researchers to profile gene diversity and function under normal conditions, it cannot predict gene functions and pathways which upregulate under particular environmental conditions [9]. In sharp contrast to this, metatranscriptome sequencing and analysis hold immense potential to unravel the functional role of diverse microbiota under various environmental conditions [10,11]. Nonetheless, next-generation sequencing carried out from couple of complex microbial communities (marine, sediments, and soil) has successfully addressed the role of microbes in various ecological processes [12][13][14][15][16][17]. In agricultural lands where microbes interact within and between, various groups of organisms makes it very difficult to discover the role of pesticides and metals in shaping microbial community structure and function [18]. Furthermore, large variation in sorption, desorption, and degradation of pesticides has been reported in different soil types [19,20]. Pesticide usage in agriculture increases the number of pesticides degrading bacteria or archaea in soil [21]. Interestingly, nitrification test as a pesticide side effect has been reported as the best way to depict the role of microorganisms in soil [22]. Studying expression of genes by direct sequencing and analysis of metatranscriptomes at a particular time and space can disclose structural and functional divergence of microbial communities [23].
Spraying of chemicals, e.g., pesticides, herbicides, and fertilizers, in agricultural lands of Punjab (India) is a regular practice [24]. We hypothesized that long usage of such chemicals may alter microbial community structure and function; thus, comparative metatranscriptome can allow researchers to investigate the ecological roles of microbial communities in such environments. Herein, we sequence, analyze, and compare the whole metatranscriptome of agricultural versus organic soil.

Site description and sampling
Soil samples were collected in duplicates from the agricultural field (M1) of Malwa region of Punjab, India (29°30′ and 31°10′ north latitudes and 73°50′ and 76°50′ east longitudes), and from the normal organic soil (O1) where no modern agricultural practices are carried out. Samples were collected at a depth of 0-10 cm in September 2014 (atmospheric temperature 28°C in RNA later (Ambion) (2 ml RNA later added to 2 g of soil) and stored at − 80°C. Agricultural soil was loamy and had pH of 7.5 to 8.0 whereas the organic soil was acidic and had pH of 6.5-7.0; it was deep brown and porous in nature. Soil samples were transported to the laboratory in dry ice and stored in − 80°C freezer till its further use for extraction of RNA. The estimation of metals in soil was performed by inductively coupled plasma mass spectrometer (ICP-MS Agilent 7700), whereas the pesticides in the two soil types were estimated by gas chromatographymass spectrometry (GC-MS, Agilent technologies) as reported in our previous study [25].

RNA extraction
Total RNA was extracted from 2 g soil in duplicates each from M1 and O1 using RNA PowerSoil® Total RNA isolation kit (MO BIO Laboratories, Inc., Carlsbad, CA 92010, USA). Briefly,~2 g soil was homogenized in a 15-ml tube containing silica carbide beads, lysis buffers, phenol to chloroform to isoamyl alcohol (pH 6.5-8), and IRS, to ensure complete lysis of all microorganisms and neutralization of RNases. Clear lysates were precipitated to concentrate the total nucleic acids and were resuspended in a buffer optimized for binding to anionexchange gravity flow columns. RNA was eluted using a high salt buffer and was precipitated to obtain the final pure RNA which was resuspended in RNase-free water. RNA from two extractions was pooled (both from M1 and O1) before cDNA preparation and sequencing.

cDNA library construction and Illumina sequencing
Total RNA was treated with DNase to remove DNA contamination. RNA quality was assessed using NANODROP LITE spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). Double-stranded cDNA was generated from amplified RNA using the superscript IITM doublestrand cDNA synthesis kit (Thermo Fisher Scientific, Waltham, MA, USA), as per manufacturer's instructions. Library preparation, processing, and sequencing were performed at the SciGenom Labs Private Limited, Kerala, India, using the Illumina HighSeq2500 with paired-end (PE) sequencing. Data was submitted with MG-RAST server [26]. MG-Rast ID of the data sets is available for O1 with accession number is mgm4653349.3; M1 dataset was provided a Gold ID 0eeec568676d676d343733323034392e33 and is under progress at present; however, it can be accessed upon request to the server.

FASTQ files quality checking
Raw sequences obtained after the sequencing of metatranscriptomes were analyzed using FastQC tool. Base distribution, base composition, and GC distribution of reads were calculated from the QC result. Based on results, we trimmed the sequence read wherever it was necessary in order to retain high-quality sequences for further analysis. Additionally, reads with more than 10% of "N"s, Illumina adapter contaminated reads, and lowquality reads were excluded from the analysis.

Adapter removal
Raw sample reads were initially pre-processed by removing adapter (Illumina) sequences. Adapter removal was carried out using the cutadapt tool (version 1.7.1) with default parameters.

Sequence clustering
The reads from each sample were clustered using UCLUST tool (http://www.drive5.com/usearch) with the cutoff value of > 90%. From clustered sequences, the representative sequencing reads were used as a query for functional annotation.

Taxonomy and functional annotation of sequencing reads
Read-based annotations of unique cDNA sequences were carried out against non-redundant sequence database using standalone BlastX program (http://www. ncbi.nlm.nih.gov/) with optimal e value of 10e−5. The best hits showing sequence similarity greater than 90 and lowest e value were retrieved. The predicted gene functions from each read were annotated using Gene Ontology (GO) and KEGG (Kyoto Encyclopedia of Genes and Genomes analysis). The GI (sequence accession number) of each functional hit was retrieved and queried against UniProt database (http://www.UniProt. org/). The transcript id and gene function is provided in the supplementary figures and tables (Additional files 1, 2, 3, 4 and 5).

Differential expression studies
The analysis of differentially expressed genes was carried out based on the read counts of all the samples using DESeq package (http://bioconductor.org/packages/release/ bioc/html/ DESeq.html). Initially, the read counts of common transcripts from all samples were considered as an input table for DESeq. All the transcript frequencies were brought to the common scale to make them comparable by normalizing the read counts from each sample. Differential expression was carried out by negative binomial test, and as a result, the mean read count of sample combination, fold change, and p values were estimated. The p value between sample combinations shows the significance of differential expression.

Soil characterization and estimation of pesticides and heavy metals
The soil surface was sandy loam in all samples, and the pH of soil for the organic and agriculture soil was 8 and 9.2 respectively, at the time of collection. Out of several pesticides analyzed, M1 soil revealed the presence of cypermethrin I and III in the range of 0.019 ± 0.001 ppb, whereas cypermethrin II and IV were found to be 0.018 ± 0.002 and 0.017 ± 0.002 ppb, respectively. Heavy metals such as nickel (47.7 ± 2.91 mg kg −1 and 58.7 ± 2.91 mg kg −1 ), mercury (23.8 ± 5.58 mg kg −1 and 28.8 ± 5.58 mg kg −1 ), selenium (21.1 ± 4.25 mg kg −1 and 21.1 ± 4.25 mg kg −1 ), and cadmium (3.1 ± 0.8 mg kg −1 and 5.1 ± 0.8 mg kg −1 ) are also reported in our previous studies [25]. No residues of pesticides and metals were detected in the O1 soil sample.
RNA extraction, quality checking and, Gene Ontology (GO) High-quality RNA was extracted from agricultural soil (M1) and organic soil (O1) and was quantified using Nanodrop. Paired-end Illumina cDNA libraries from two datasets yielded~11.49 GBp and~11.9 GBp data for the M1 and O1 samples, respectively (Table 1). A total of 114,908,828 and 119,019,498 raw data (R1 and R2) was sequenced for both M1 and O1respectively. The sequencing data showed 55% GC content and a Phred score ≥ 30. After removal of adapter and non-coding RNA, we assembled the metatranscriptome sequencing reads; however, very less number of reads could be assembled due to complexity and heterogeneity in the metatranscriptome datasets (data not shown). Alternatively, we used a read-based annotation after sequence clustering, as recommended previously [27,28]. Clustering of sequences resulted in~32,768 representative mRNA reads for M1 and 5611 for O1 respectively. The average length of the mRNA representative reads was > 100 and < 250 bp. Read-based annotation of clustered sequences were classified according to cellular (GO 00005575), molecular (GO 0003674), and biological (GO 0008150) processes [28]. Altogether, from GO assignments, it is evident that the two soil types share similar molecular function; however, they displayed a significant level of variations with respect to cellular and biological function. Percent distribution of transcripts assigned to biological, cellular, and molecular functions among the two datasets is shown in Fig. 1. Gene Ontology (GO) assignments and taxonomical classification are depicted in Additional file 3: Table S1 and Additional file 4: Table S2. The two soil systems however demonstrated great variations for processes involving energy generation, metabolism, transportation, and ribosomal activity. High ATPase and GTPase activity is directly related to energy, motility, host-pathogen interactions, and export of toxins and other wastes out of the cell and in multidrug resistance [29] The iron and sulfur binding metalloproteases play important roles in cellular and biological activities in both prokaryotes and eukaryotes [30]. Several transcripts playing crucial role in oxidation-reduction processes showed enhanced expression in M1. The top 25 transcripts from various organisms demonstrating abundant expression in M1 compared to O1 are depicted in Fig. 2 and Table 2. Other differentially expressed genes enlisted in Additional file 5: Table S3 were also analyzed to understand various metabolic processes taking place in soil.

Phylogeny analysis
Taxonomic analysis (Additional file 3: Table S1) of the annotated reads, as presented in Fig. 3, revealed Proteobactreria as major phylum in the two soil types, the order of top 3 abundant phyla in the M1 was Proteobacteria > Ascomycota > Firmicutes, whereas sample O1 demonstrated the following order Proteo-bacteria>Cyanobacteria>Actinobacteria (Fig. 3a). Proteobacteria encompass enormous diversity in terms of physiology, morphology, and metabolism and is considered as a key player in carbon and nitrogen cycle. Deltaproteobacteria and Alphaproteobacteria showed abundant presence in M1, whereas Lilliopsida and Deltaproteobacteria were more in O1. Another important phylum identified in this M1 soil sample belongs to Thaumarchaeota. At the genus level,1 312 transcripts were assigned to the M1 sample, and the most dominant genus identified was Candida (30%) followed by Candidatus and Nitrosophaera in M1 (Fig. 3b). At the species level, Candida albicans (30%) and Sideroxydans lithotrophicus (8%) revealed their unique occurrence in M1. Interestingly, Anaeromyxobacter sp. Fw109-5, Candidatus Nitrososphaera gargensis, N. Viennensis, Anaerolinea thermophila, Candidatus Entotheonella sp. TSY1, and Candidatus Entotheonella sp. TSY2 were less represented in the M1 sample as compared to O1 (Fig. 3c).
Comparing nitrification process in agriculture (M1) and organic soil (O1) Nitrification is used to monitor the side effects of contaminants on microbial community structure, function, and soil quality. KEGG (Kyoto Encyclopedia of Genes and Genomes analysis) analysis of nitrification pathway revealed that several transcripts encoding genes of nitrification were highly expressed in M1 that mainly include amoC from AOA (ammonia-oxidizing archaea) (amoC1, C2 = 4.3, 4.17-fold in N. gargensis, amoC3 = 1.46 in Nitrososphaera viennensis EN76, NTE_00725 = 3, NTE_ 01100 = 1.44-fold in Candidatus Nitrososphaera evergladensis SR1), indicating active contribution of AOA in nitrification process in M1 soil. Furthermore, we observed a high expression of putative NxrA2 and NxrB1/B2, i.e., nitrite oxidoreductase (NXR), in the M1 soil sample (Additional file 4: Table S4) from C. Nitrospira defluvii and a beta subunit of nitrite oxidoreductase from Nitrospira moscoviensis, a key enzyme of NO 2 oxidation in NOB (nitrate-oxidizing bacteria). In addition, coppercontaining nitrite reductases (NirK) from uncultured bacterium also had high expression in the M1 soil. Interestingly, a membrane-bound dissimilatory nitrate reductase (~2.7-fold) Tbd_1403 of T. denitrificans was also highly expressed in agriculture soil M1. High expression of the small subunit of methylmalonyl-CoA mutase (EC 2) that convert malonyl-CoA to succinyl-CoA as the main product of an alternative carbon fixation pathway was also observed in M1.

Aromatic and hydrocarbon metabolism
Transcripts associated with the metabolism of aromatic hydrocarbon demonstrated their high expression in cypermethrin-and metal-contaminated M1 soil ( Fig. 5 and Table 3) compared to O1. Increased expression of transcripts related to aromatic metabolism was widespread and associated with both central and peripheral aromatic metabolic pathways. Interestingly, high expression of transcripts that include 4-hydroxyphenylpyruvate dioxygenase (HPPD),

Discussion
Investigation of transcripts encoding functional protein by direct sequencing and analysis at a particular time and space can disclose structural and functional features of microbial communities. In this study, analysis of metatranscriptome samples from two distinct soils revealed divergences in functional features of microbial communities. Interestingly, during phylogeny analysis, we observed an abundant presence of  archaea especially Thaumarchaeota, a phylum which is widely distributed in extreme, non-extreme [31], terrestrial [32], and metal-contaminated soil [33]; these observations primarily indicate important consequences of archaea, especially towards degradation of the pesticides, ammonia oxidation [34], nitrate leaching in greenhouse gas production, and soil subsidence [35]. Additionally, the abundance of Candida albicans and Sideroxydans lithotrophicus in the M1 soil sample suggests their high tolerance to metals as also reported previously [36]. It was interesting to note that despite of low abundance of archaea in the M1 soil, it has shown high expression of genes related to several pathways, thus indicating archaeal and better adaptation and active contribution in pesticide-and metal-contaminated environment, e.g., several transcripts related to nitrification process, which is used to monitor side effects of pesticide contamination [37], showed high expression in M1. A highly expressed amoC transcript identified in the M1 soil (belongs to group I.1.a of archaea as depicted in Additional file 1: Figure S1, which harbor an additional copy of amoC gene) is well correlated previously with high metal resistance [38][39][40]. Additionally, we observed high expression of putative NxrA2 and NxrB1/B2, i.e., nitrite oxidoreductase (NXR) (Additional file 4: Table S4), from C. Nitrospira defluvii and β subunit of nitrite oxidoreductase from Nitrospira moscoviensis, key enzymes of NO 2 oxidation in NOB indicating active participation of these organisms in nitrification in M1 soil. On further analysis, we found that these genes have shared similarity with NxrA2 and NxrB of N. moscoviensis and with other known NO 2 − /NO 3 − -binding molybdoenzymes, such as NXR of Nitrobacter or bacterial nitrate reductases (NARs) which shuttles two electrons per oxidized NO 2 into the electron transport chain [41]. Another transcript (NirK) from uncultured bacterium also had high expression in M1; this transcript has shared maximum homology (> 70%) with NirK of Chloroflexi thus indicating denitrification through AOB (ammonia-oxidizing bacteria) [42] that forms NO from NO 2 [43]. Besides nirK homologs, no other genes typically contributing to the denitrification process showed high expression in the M1 soil. We therefore propose that high expression of transcripts nirK along with amoC can be a useful molecular marker to monitor soil ammonia oxidation in agricultural soil contaminated with pesticides and metals. Additionally, high expression (~2.7fold) of a membrane-bound dissimilatory nitrate reductase Tbd_1403 of T. denitrificans in agriculture soil indicates nitrate-dependent oxidation of metals with high reduction potentials and can also be used as molecular marker to study metal resistance pathways among denitrifying bacteria showing high metal tolerance U(IV) and Fe(II) [44][45][46].
In addition to high expression of transcripts related with nitrification and ammonia oxidation from Thaumarchaeal viz N. gargensis, we observed high expression of PCK1 in archaea; PCK1 regulates 3-hydroxypropionate/4-hydroxybutyrate carbon fixation pathway as reported previously [47][48][49][50]. In particular, high expression of PCK1 gene from C. albicans that encodes protein of gluconeogenesis pathway indicate soil environment where carbon availability might be changing continuously enabling C. albicans to switch its metabolism by regulating PCK1, as also reported previously in metal-contaminated soil [51,52]. Another transcript that demonstrated high expression in M1 belongs to a putative NADP + -specific D-arabinose dehydrogenase from C. dubliniensis sharing > 95% identity with C. albicans. The exact mechanism of action of D-arabinose dehydrogenase is unknown; however, its activity is shown to be inhibited by metals presence including Hg 2+ [53]; thus, it may be hypothesized that high expression of D-arabinose dehydrogenase is indicating oxidative stress response of C. albicans in agriculture soil.
Microbes demonstrate multiple adaptations to survive under osmotic, oxidative, heavy metals, elevated temperatures, high salinity, and other stress conditions, one of the strategies that microbes employ to cope up these stresses is accumulation of compatible solutes and small soluble organic molecules [54]. In order to investigate the transcripts showing high expression in metal and pesticide stress conditions, we compared and analyze the transcripts associated with stress from two soil types; interestingly, high expression of UDPglucose 6-dehydrogenase (EC 1.1.1.22) (udg_NIDE4145; 2.7-fold) and pyruvate to ferredoxin oxidoreductase δ subunit (EC 1.2.7.1) (porD_NIDE0972;~1.7-fold) indicate active synthesis of glycogen in Ca. N. defluvii in response to stress [38]. These solutes can either be transported into the cell or synthesized de novo. Indeed, glycogen deposits have been observed under electron microscopy in Nitrospira cells under stress conditions [55]. To mitigate oxidative damage, in contrast to most of the aerobic bacteria, N. defluvii lacks SOD and catalase, and its genome lacks superoxide dismutase (SOD) either. High expression of bacterioferritin transcripts in Ca. N. defluvii indicates ROS (reactive oxygen species) detoxification in M1 soil; it binds with free iron thus reduces the risk of ROS generation. Additionally, transcript ID_664789678 (Table 2) with unknown function from Pelosinus species also showed high expression in the M1 soil sample compared to O1. Previously, this organism has shown its presence in various sites that include Melton Branch Watershed, Oak Ridge, TN, USA ( [56] and uranium-contaminated field site near Oak Ridge [57]. During its growth, it uses Fe(III) [58] and U(VI) [59]. As most of the proteins from Pelosinus are not characterized [60], we hypothesized that high expression of one of its transcripts in the M1 soil can be correlated with the presence of heavy metals in M1 compared to the O1 soil. Additionally, high expression of cell division-related protein FtsZ from Anaerolinea thermophila indicates active peptidoglycan biosynthesis in the M1 soil. Furthermore, a multicopper oxidase (transcript ID_ 502368623, 700173093, 152029536) from Geobacter sp., Knoellia sp., and Anaeromyxobacter sp. also showed high expression in the M1 soil. Catalytic sites of these proteins contain copper-binding sites and are implicated in bacterial copper resistance, oxidation of phenolic compounds, and in detoxification of Mn 2+ [61]. Additionally, high expression of putative actin 22 in M1 indicates the formation of actin rods which was previously reported in Dictyostelium during the formation of spores under stress conditions such as heat shock and osmotic pressure as well as non-nutrient conditions [62,63]. Many of these chaperones were of Proteobacterial origin and of high expression. These proteases were also of proteobacterial origin, and their role has not yet established in the metal-and pesticidecontaminated agriculture soil; however, we propose that their high expression can be correlated with high turnover in Proteobacteria which mostly depends on ATP-dependent proteases that are recruited by the cells in the cytosol (Lon, ClpAP, ClpXP, HslVU) and are well reported in cypermethrin-and cadmium-contaminated soil [64][65][66][67] or are associated with the inner membrane (FtsH) [68]. These enzymes in adverse conditions not only help in degradation of misfolded or abnormal proteins but also degrade unstable proteins, like σ 32 heat shock factor. Another transcript Tca2 which is widely distributed in C. albicans, related with Ty1/copia-type retrotransposon [69], and is predominantly present in the M1 soil (transcript ID_68492214, 77022864, 703956851, 3273718) support the fact that Tca2 transpositional activity is favored at high-temperature stress and reveal a close relationship between the Tca2 expression and virulence in C. albicans endonuclease activity in vitro since it also utilizes Mg 2+ as a cofactor for enzymatic activity [70]. It further raises the possibility of co-selection of antibiotic and heavy metal resistance dissemination through mobile genetic elements [71] observed in other transcripts in the current study. High expression of MFS (transcript id_374853338) in uncultured Acetothermia bacterium showing similarity with oxalate/formate antiporter of proteobacteria, i.e., Nitrosomonas sp., point out that it may be participating in exchange of external divalent oxalate with the intracellular monovalent formate [72,73] to generate a proton-motive force that supports membrane functions, including ATP synthesis, accumulation of growth substrates, and extrusion of waste products [72][73][74][75].

Conclusion
In summary, a comparative metatranscriptomic study from agriculture and organic soil types illuminated structural and functional variations. The study further concluded that high expression of microbial transcripts in two soil types is associated with wide range of functions. It has also provided clue about several molecular markers which could be a strong indicator of metal and pesticide contamination in soils.
Interestingly, our study revealed that archaea are playing a significant role in nitrification process compared to bacteria in metal-and pesticide-contaminated soil. In particular, high expression of transcripts related to aromatic hydrocarbon degradation provided clue about degradation potential associated in polluted soil communities.