Genetic diversity analysis in a mini core collection of Damask rose (Rosa damascena Mill.) germplasm from Iran using URP and SCoT markers

Background Rosa damascena Mill is a well-known species of the rose family. It is famous for its essential oil content. The aim of the present study was to assess the genetic diversity and population structure of a mini core collection of the Iranian Damask rose germplasm. This involved the use of universal rice primers (URP) and start codon targeted (SCoT) molecular markers. Results Fourteen URP and twelve SCoT primers amplified 268 and 216 loci, with an average of 19.21 and 18.18 polymorphic fragments per primer, respectively. The polymorphic information content for URR and SCoT primers ranged from 0.38 to 0.48 and 0.11 to 0.45, with the resolving power ranging from 8.75 to 13.05 and 9.9 to 14.59, respectively. Clustering was based on neighbor-joining (NJ). The mini core collection contained 40 accessions and was divided into three distinct clusters, centered on both markers and on the combination of data. Conclusion Cluster analysis and principal coordinate analysis were consistent with genetic relationships derived by STRUCTURE analysis. The findings showed that patterns of grouping did not correlate with geographical origin. Both molecular markers demonstrated that the accessions were not genetically diverse as expected, thereby highlighting the possibility that gene flow occurred between populations. Supplementary Information The online version contains supplementary material available at 10.1186/s43141-021-00247-7.


Background
As a large genus in the Rosaceae family, Rosa has 200 species and covers more than 18,000 cultivars [1]. The Caucasus, Syria, Morocco, and Andalusia are all home to Rosa damascena, while Iran is usually referred to as a source of diversity in this respect [2]. Accordingly, a great variation of Damask rose landraces is expected to be discovered in this country. In addition to horticultural uses, roses are of economic importance because of the essential oils in their petals [3]. Rosa damascena has particular genotypes and cultivars which are noteworthy for their medicinal properties and oil [4][5][6]. Since genetic variation is available within the genus of Rosa, its breeding is usefully dependent on the systematic characterization of genetic resources and the study of likely mechanisms for hybridization. Morphological markers describe an organism's phenotypic characteristics and are the first to outline an organism's measurable characteristics. Each species in the Rosa genus has a wide, overlapping range of morphological variations that are affected by environmental factors. Thus, it would be insufficient to classify species and genotypes on morphological ground only [7]. According to Kiani et al. [8], the most of Iranian Damask roses are tetraploid; however, some other ploidy levels were observed.
For the classification and recognition of rose genotypes, chemotaxonomic studies are often addressed in a large variety of different phenolic structures and isozyme markers [8][9][10][11]. Nonetheless, there is a limited number of regularly resolvable loci, but this can reduce the efficiency of these markers [9,12]. The molecular approach is more acceptable because it provides easy access to the genetic material (genome) which makes it much easier to recognize plant relationships [13]. Molecular markers can identify genetic polymorphism at the DNA level and can be used in analyzing genetic variation, genetic distance estimate, parentage determination, marker-assisted selection, and gene localization. Many DNA-based molecular markers are available for the purpose of distinguishing biodiversity among plant populations. However, the selection of DNA markers depends on the type of study. Therefore, it is important to compare the various molecular markers and decide which molecular marker is appropriate for the species under study. New innovations have given rise to new molecular markers that can be used in describing genetic characteristics of plants in the Rosa genus. Several molecular assays have been used in recent years to test the genetic variation of various rose plants [14][15][16][17][18][19][20][21][22]. In theory, these molecular approaches, operations, classes, polymorphic count, function, and time requirements are varied.
In the plant genome, the SCoT marker mechanism relies on the start codon (ATG) which has a short conservation around it [23]. These markers can reproduce well within annealing temperatures [24] and have great potential as a relatively popular tool. SCoT marker system is a simple, low cost, polymorphic, reproducible, and reliable marker system. SCoT markers are known to be useful in a various studies, such as, cultivar recognition, genetic diversity evaluation, DNA fingerprinting, marker assistant selection and quantitative trait loci mapping [25,26]. This approach has an important context within genetic studies while its benefits are numerous [27][28][29][30].
Kang et al. [31] used a polymerase chain reaction (PCR) method using universal rice primers (URP) that provide a powerful tool for investigating the DNA diversity of most eukaryotic and prokaryotic genomes, with potential use in taxonomic and phylogenic research, as well as in population genotypic screening of individuals, both at the inter-and intraspecies level. As a matter of long primers and elevated annealing temperatures, URP-PCR has an advantage over randomly amplified polymorphic DNA (RAPD) and arbitrarily primed polymerase chain reaction (AP-PCR) methods. DNA marker performance is evaluated through factors like the marker index (MI) and the polymorphism information content (PIC). Comparing the ability of marker techniques can assist researchers in selecting the required markers in the amplification of genome fragments, thereby being more effective in using these markers for potential breeding studies [32].
This study aimed to investigate genetic variation in different Rosa damascena accessions from Iran and to demonstrate the effectiveness of different marker systems.

Plant materials and DNA extraction
In total, 40 Damask rose genotypes were collected from five regions in Iran (Fig. 1, Table 1). Sucker roses were harvested from Iran's rose oil-producing regions. These areas were divided according to geographical and climatological conditions, and each region consisted of some provinces (Fars, Isfahan, East Azerbaijan, Kerman, Semnan, Gilan, Kermanshah, Lorestan, Hormozgan, Tehran, and Markazi provinces). The geographical details are mentioned in Table 1. Accessions have been collected from the gene bank collection of Barij Essence company in Kashan. Young leaves from each accession were collected for DNA extraction since late March to early June. The CTAB procedure [33] was used, with slight modifications (changing the amount and content of the extraction buffer, the incubation time, and adding polyethylene glycol), to extract total genomic DNA. Electrophoresis was performed on a 1% agarose gel to evaluate the quality of DNA. High-quality genomic DNA samples were considered to be without broken DNA for amplification.

PCR amplification of different markers
The sequence and annealing temperature of all primers for the analysis are given in Table 2. The genomic DNA of all 40 genotypes was amplified with a set of 12 SCoT primers and 14 URP primer sequences [34]. The amplification was done in a Bio-Rad (T100) thermal cycler. Twenty microliters of PCR reaction mixtures consisted of 6.5 μl ddH 2 O, 10 μl master mix 2XPCR (ready-to-use PCR master mix 2X; Ampliqon), 2 μl isolated DNA per sample (50 ng/μl), and 1.5 μl per primer (10 pmole/ml). Each PCR cycle ran on initial denaturating at 94°C for 5 min, 35 denaturation cycles at 94°C for 45 s, with a primer annealing time of 45 s ( Table 2). This procedure was applied for each primer. Primer elongation lasted for 90 s at 72°C. A final extension cycle ran for 10 min (72°C). In order to detect polymorphism among accessions, the PCR product was transferred to 1.2% agarose gel wells, and then electrophoresis was performed at 90 volts. The gel was then immersed in ethidium bromide solution for 15 min (10 mg/ml). Using the gel documentation method, the illustration of banding patterns was obtained under UV light (Bio-Rad). SCoT and URP primers were used on the gel for the amplified fragments.

Data analysis
The amplified fragments were scored as absent (0) or present (1) in each sample. Screening the primers involved using several discriminatory criteria, including the number of polymorphic bands (NPB), total amplified bands (TAB), percentage of polymorphism bands (PPB), resolving power (Rp), polymorphism information content (PIC), and marker index (MI). PIC was calculated based on the formula given by Anderson et al. [35].
Molecular variation analysis (AMOVA) operated via GenAlEx ver. 6.5 to classify genetic diversity [36]. For each sample, GenAlEx ver. 6.5 was used for determining the percentage of polymorphic loci (PPL), effective number of alleles (Ne), and total number of alleles (Na) [37], Nei's [38] gene diversity (H), and Shannon's information index (I) [39]. Then, Jaccard's method was used for finding genetic dissimilarities by DARwin ver. 6 software [40]. The neighbor-joining (NJ) method contributed to the construction of the Fan-dendrogram using MEGA ver. 10.1 software [41]. The genetic makeup of populations was analyzed by the Bayesian-based model. This was performed by STRUCTURE (ver. 2.3.4) [42]. It estimated the clusters of population genetics (K) and the ratio of individual assignment out of each population. For each 'K' varying from 1 to 10, the analysis was repeated ten times, and the initial burn-in period was set to 100,000 followed by 100,000 Markov Chain Monte Carlo (MCMC) iterations. Finally, the DK was calculated by STRUCTURE HARVESTER, an online program [42].
Mantel correlation test showed a low and statistically nonsignificant correlation (r = 0.49) between distances revealed by SCoT and URP data for all 40 accessions across five collected regions.

Principal coordinate analyses (PCoA)
The principal coordinate analysis ultimately assisted in analyzing and depicting the population structure. According to URP (A), SCOT (B), and the combined data (C), the first three principal coordinates explained 36.69, 37.34, and 33.85% of molecular variations, respectively. The PCoA biplots showed that all accessions displayed a scattered distribution in the plot, although this did not follow their origins (Fig. 3A-C). Indeed, the results of cluster analysis supported these observations (Fig. 2).

Population structure analysis
Bayesian clustering was used for determining the population structure of the 40 accessions. The membership proportions varied from K = 1 to K = 10, and with URP primers, probabilities were most precisely derived at K = 3. Out of the 40 accessions, subgroup 1 included 12 accessions from Kermanshah (4), Esfahan (2), Gilan (4), and Kerman (2), as well as subgroup 2 which comprised all accessions from Markazi, Tehran, Hormozgan, East Azerbaijan, as well as some accessions from Esfahan (5)

Discussion
It is very difficult to evaluate the genetic diversity of R. damascena if only morphological features were to be available as markers. Meanwhile, technological tools for the identification of biodiversity include rapid, reliable procedures to describe genetic relationships and variation among roses. DNA markers are the most common tools in current research trends on rose genetic diversity [43][44][45][46][47].
In this research, the genetic variation of the 40 Rosa damascena accessions was measured using two marker techniques: URP and SCoT. Our results indicated a significant genetic variation within the populations. We compared the effectiveness of URP and SCoT as new gene-based markers for identifying genetic variation among Rosa damascena. By both markers, the proportion of polymorphism turned out to be 100% (Table 2) which was greater than the polymorphic ratios of bands. Given this polymorphic percentage, these markers can serve as a powerful tool in identifying and discriminating between rose genotypes. Henuka et al. [17] used RAPD markers and reported 98.54% polymorphism. Korkmaz and Dogan [21] observed 90.1% and 88.8% polymorphisms among twenty-seven Rosa spp. in Turkey, after using ISSR and RAPD markers, respectively. Panwar et al. [18] also reported 94% genetic polymorphism with ISSR markers. Carvalho et al. [48] found 93.7% polymorphism among a selection of rose genotypes based on ISSR markers. Jamali et al. [49] reported 77% polymorphism. These high percentages of polymorphism reflect the heterozygous nature of the polyploid genome structure of rose species. Agarwal et al. [22] studied genetic diversity in 29 Indian rose germplasms using SCoT marker. Based on their results, a high level of polymorphism was observed among the genotypes, which was in line with our results. The SSR markers also not only revealed a high level of diversity in R. damascena germplasm in Iran, but also showed a high level of variation in Pakistani genotypes [50].
URP markers showed higher values of TAB, TPB, Rp, PIC, and MI than SCoT markers in terms of marker informativeness indices. Therefore, the markers showed higher values of these indices and suggested that the Iranian Rosa damascena germplasm has a good degree of genetic diversity. The polymorphic information content (PIC) of a parameter represents the amount of polymorphism of a marker, as this can vary from zero to half. The larger the value, the greater the number of alleles and the higher the frequency of polymorphisms for that position in the study population. In the present study, the relatively high PIC and MI values for the URP primers provided an estimation of the discriminating ability of the URP marker systems [51]. They showed better resolution and differentiation. In general, in the present experiment, small differences were observed between markers in terms of indices. Statistics showed that both SCoT and URP methods have similar performance in the occurrence of genetic polymorphisms among the evaluated populations. Also, high levels of polymorphism showed that markers of both methods are useful in studying genetic variation. They are equally effective in distinguishing between Rosa damascena populations with close kinship ratios. According to the AMOVA, 96% and 90% of genetic variations were revealed by the URP and SCoT markers, respectively, which were partitioned within populations, suggesting that the observed variation within genotypes was higher than among them (Table 3). Interpopulation differentiation (GST) and gene flow (Nm) variables backed up these findings. As a result, the GST values for URP, SCoT, and combined data were 0.117, 0.185, and 0.148, respectively, revealing that genetic variation among populations is relatively low. The indirect estimate of gene flow (Nm) via GST was 3.77 (URP), 2.19 (SCoT), and 2.86 (combined data). The total number of migrants per generation exceeds two. Here, genetic differences may be partly due to gene flow, as the populations of this species are significantly affected by genetic drift. Also, local populations are different if Nm < 1 [52]. High values of Nm occurred in populations, and thus, gene flow prevented drastic genetic differences among gemmates. Population size and the spread of alleles among various regions can add details to this finding [53]. Kiani et al. [45] studied genetic relationships among 41 R. damascena accessions from Iran using 31 RAPD. The authors reported that the genetic variation within the collected populations was more than the variation among them. Similar results were achieved in the present study; however, the variation within the populations with both markers was higher than the previous study. Table 4 shows a list of genotypes of genetic diversity indices. Maximum values of indices in relation to genetic diversity (Ne, Na, I, PPL, and He) were reported for region I and region II populations using SCoT and combined data. In the URP marker system, region I had the highest polymorphism percentage (PPL) and the highest number of alleles (Na), whereas precision of genetic diversity was provided by Shannon's information index and Genetic Diversity Index for populations of regions II and IV. As divergent populations, regions I and II could be selected according to SCoT and combined data, while regions II and IV could be selected according to the URP data. A larger genetic variability here may reflect the population's frequent allelic variation, while weather conditions can affect ultimate variation among the populations [54]. Furthermore, this finding suggests that these regions could be a strong source of diversity for potential breeding projects which can benefit from new alleles and candidate genes [55]. Also, the highest genetic distance between accessions were based on all marker systems from regions I, II, and IV, as reported in the results of the genetic distance. Therefore, in inbreeding and hybridization systems, these accessions may be used as parents to achieve maximum heterosis if they have desirable traits. According to Pirseyedi et al. [56], an extreme degree of genetic diversity was observed among 12 Iranian Damask rose genotypes [45,57,58]. In contrast, Agaoglu et al. [59] and Baydar et al. [44] studied the genetic diversity of R. damascena in Turkey, via RAPD and AFLP techniques. Genetic uniformity existed among R. damascena cultivars.
In the current study, spatial distribution did not align with genetic relationships, based on the neighbor-joining cluster analysis (Fig. 2). For example, using URP analysis, populations from Iran's north (Gilan province) and west (Kermanshah province) were grouped together in the same subgroup. Also, with SCoT marker analysis, the populations of Minab (sampled from the south (Hormozgan province)) and Tabriz (sampled from the northwest (East Azerbaijan province)) were classified in the same subgroup. Moreover, the combination of URP and SCoT showed a clustering trend that contradicts the spatial distribution of populations. For example, populations from Esfahan (sampled from the central areas of Iran) and Fars (sampled from the south) were clustered together. Because of the country's diverse climate and the adaptation of damask rose to adverse environmental conditions, it seems that ecotypes of this plant have been moved and relocated by migrating people across the country, especially on foothills where crops usually do not grow. Pirseyedi et al. [56] noted genetic affinity between the damask rose of Kashan and Kazeroon districts, despite the long distance between them. Baydar et al. [44] used AFLP and microsatellite markers and found that R. damascena plants in Turkey can be derived from the same original genotype by vegetative propagation. Rusanov et al. [43] reported that rose plants of Iran and India may have a common origin. Based on the results, the patterns of grouping did not correlate with geographical origin. Similar results were observed in microsatellite analysis of Damask rose accessions from various regions of Iran [55]. Usually, a larger sample is necessary to determine the relationship between molecular data with geographical distance, whether there is isolation of populations due to barriers in gene flow, or whether different climatic conditions lead to differentiation within the species [60].
In the current research, the PCO analysis confirmed the results of cluster analysis. Genetic proximities were visually depicted by PCO among populations. In URP and SCoT, genetic difference and geographical distance were not clear-cut. Besides, phenotypic traits have high correlations in some occasions, while the first two components justify more than 90% of the changes. Meanwhile, molecular markers could not justify the higher values of variance of the primary variables by several of the main components. In investigating the genetic diversity using molecular data, the markers should have a uniform and appropriate distribution in the genome so that they can be sampled from the entire genome. As shown in Fig. 3, the genotypes were well distributed throughout the environment which can be due to the great variety between genotypes and the suitability of markers and primers used. The mini core collection covered a large amount of the genome and had differentiated values among genotypes in the environment.
The neighbor-joining cluster analysis was confirmed by the Bayesian clustering algorithm through STRUCTURE analysis in comparing the 40 accessions [61] (Fig. 4). In the combined data system, however, accessions 27 and 28 (Semnan province) were placed in a separate group (Fig.  4C). Without considering predetermined groups, the Bayesian clustering approach used genetic knowledge to assess the population membership of individuals. Focused on multilocus genotypes, they assign members or parts of their genome to several clusters [62]. Using the online structure harvester software and the Evanno method, the best K and the number of subpopulations (ΔK) were identified. In both marker systems, the best level of population classification were K = 3 and in the combined data system K = 4. In this clustering, it was found that the different populations of R. damascena can group into one cluster, such as cluster 'BI' in the SCoT marker system, in which seven populations were grouped from five regions and different altitudes. Moreover, our results showed that clustering by both markers and combined markers made similar classifications of the populations of Kermanshah (region V) and Gilan (region IV), thereby assigning them to the same subgroup, while Hormozgan (region II) and East Azerbaijan (region IV) were classified together in a subgroup. If genotypes or cultivars gather into one category from different areas, it may mean that they have the same genetic heritage [63,64]. This may have been due to human transmission of plants or genetic movement and displacement by natural variables [65]. The genetic evidence provided here, as well as the available literature, means that plant dispersal by humans has played a large role in the development of R. damascena populations throughout Iran. It seems that due to its high tolerance to drought, this crop is one of the most suitable species in arid provinces of the country. Due to a decrease in agricultural water resources and rainfall, its cultivation can replace many agricultural products which have high water requirements. In addition, the ability of this plant to adapt well to different climates and soil conditions in Iran has made farmers inclined to introduce it to other regions in the country. While the genetic origin of these plants is the same in different regions, the obvious difference may be attributed to the climate in which they emerge. Inter and intraspecific variation can be affected by temperature and rainfall [66].
Roses usually cross-pollinate and are self-incompatible which makes them more genetically diverse between and within populations [67][68][69]. Jurgens et al. [70] investigated the genetic variability of R. canina in Brandenburg (Germany). Fifty-five genotypes were classified into twelve subgroups. They attributed the high genetic variation to the outcrossing, seed dispersal system and polyploidy within the R. canina populations. The level of genetic variation is affected by breeding system, life cycle, seed dispersal, and geographic distribution which are important factors among populations. Rose species are known to be outcrossing, but there is little evidence on their outcrossing frequencies [71]. Contrary to the results of the current research, a study on Rosa canina L. via ISSR markers suggested that geographical distance is effective in causing allelic gaps among genotypes, and ecological conditions could cause genetic variation in R. canina [49].
The results of model-based clustering was based on the Bayesian statistical index, assuming that the Ancestry model is Admixture type and the allelic frequency model is of continuous type. While also assuming a range of K = 1 to 10 (the number of populations), many populations in the existing germplasm are not completely separated based on the regions from which these genotypes originated or were collected (Fig. 4). The mixing observed in this germplasm confirms the hypothesis that the studied genotypes are of mixed types. That is, plant i may have inherited parts of the genome from offspring in the K population. In fact, the formation of different subgroups in population structures depends on the frequency of allelic differences between the genotypes that make up the population. Most of the genotypes were not attributed completely to subgroups, thereby indicating that many genotypes have intermediate genetic traits of various subgroups, as a matter of genetic variation in this research.

Conclusions
Crop improvement is influenced by information about the degree and distribution of genetic variation, as well as relationships between breeding materials. The results of the present study revealed a high level of polymorphism in the Iranian R. damascene populations by the two marker systems. The mean values of PIC for URP and SCoT markers were 0.42 and 0.37, respectively, indicating the efficiency of the two markers in detecting polymorphism among the studied samples. Also, the results confirmed the efficiency of combined data in estimating the genetic diversity among the populations. The used marker systems showed a comprehensive pattern of the genetic diversity among the Iranian R. damascene populations, which could provide a future insight into Damask rose breeding programs.