Genetic diversity and population structure of date palms (Phoenix dactylifera L.) in Ethiopia using microsatellite markers

Background Date palm tree (Phoenix dactylifera L.) is a perennial monocotyledonous plant belonging to the Arecaceae family, a special plant with extraordinary nature that gives eminent contributions in agricultural sustainability and huge socio-economic value in many countries of the world including Ethiopia. Evaluation of genetic diversity across date palms at DNA level is very important for breeding and conservation. The result of this study could help to design for genetic improvement and develop germplasm introduction programmes of date palms mainly in Ethiopia. Results In this study, 124 date palm genotypes were collected, and 10 polymorphic microsatellite markers were used. Among 10 microsatellites, MPdCIR085 and MPdCIR093 loci showed the highest value of observed and expected heterozygosity, maximum number of alleles, and highest polymorphic information content values. A total of 112 number of alleles were found, and the mean number of major allele frequency was 0.26, with numbers ranging from 0.155 (MPdCIR085) to 0.374 (MPdCIR016); effective number of alleles with a mean value of 6.61, private alleles ranged from 0.0 to 0.65; observed heterozygosity ranged from 0.355 to 0.726; expected heterozygosity varied from 0.669 to 0.906, polymorphic information content with a mean value of 0.809; fixation index individuals relative to subpopulations ranged from 0.028 for locus MPdCIR032 to 0.548 for locus MPdCIR025, while subpopulations relative to total population value ranged from − 0.007 (MPdCIR070) to 0.891 (MPdCIR015). All nine accesstions, neighbour-joining clustering analysis, based on dissimilarity coefficient values were grouped into five major categories; in population STRUCTURE analysis at highest K value, three groups were formed, whereas DAPC separated date palm genotypes into eight clusters using the first two linear discriminants. Principal coordinate analysis was explained, with a 17.33% total of variation in all populations. Generally, the result of this study revealed the presence of allele variations and high heterozygosity (> 0.7) in date palm genotypes. Conclusions Microsatellites (SSR) are one of the most preferable molecular markers for the study of genetic diversity and population structure of plants. In this study, we found the presence of genetic variations of date palm genotypes in Ethiopia; therefore, these genetic variations of date palms is important for crop improvement and conservation programmes; also, it will be used as sources of information to national and international genbanks. Supplementary Information The online version contains supplementary material available at 10.1186/s43141-021-00168-5.


Background
Date palm tree (Phoenix dactylifera L.) is a diploid plant with 2n = 36 chromosome number, a perennial monocotyledonous plant belonging to the Arecaceae family [1,2]. It is one of the oldest known fruit-bearing tree crops with extensive cultivation and utilization in North Africa and Middle East for at least 5000 years and believed to have originated in Mesopotamia [2][3][4][5]. Date palms have a great socio-economic impact and an eminent contribution in agricultural sustainability in many arid and semiarid parts of the world [1,6]. It is a multipurpose tree having food, industrial, commercial, medicinal and ornamental values [7,8].
Date fruits have high nutritional value and contain about 70% sugar, essential vitamins and minerals, and different value-added products are produced [9]. Different parts of the date palm are used for different purposes: leaves are used for making roofs, mats, staple dishes, hand fans, baskets, packaging material, hats, ropes, fences and animal fodder [10]; trunks are used to construct houses, hives and bridges, and used as packing material [11]; terminal buds and young leaves can be cooked as vegetables, while rachises are used for paper making [12,13]. Due to the extraordinary nature of the tree with its long-term productivity, the date palm tree is termed as the "tree of life", "Bread of the Desert" [14] and the sacred tree [15].
Date palm is distributed throughout the Middle East, North Africa, South Sahel, areas of East and South Africa, and some parts of Europe and USA [2,16,17]. Date palm has been introduced to Ethiopia from Middle East countries approximately 200 years ago by Arabian traders [18]. Cultivation of date palm began in Afar region particularly at Afambo and Asayta and other places nearby Awash River and then spread to Errer Gota and Dredawa areas [19,20]. Date farms of these areas are mainly used for local consumption and income sources through treading in nearby towns. In these areas, different unknown varieties of date fruits are produced which have red and yellow colours with different fruit shapes. In Ethiopia, around 14 known varieties have been introduced from other countries for the last 8 years and have been cultivated in three places: Humodoyta site (Afambo), Asayta and Melka Werer agricultural research centre (Melka Werer) for adaptation and improvement programme.
Evaluations of genetic diversity of date palm varieties at DNA level have great value in the date palm's genetic improvement and conservation program. However, to date in Ethiopia, there was no research studied on genetic diversity and population structure of date palms. DNA typing has proven to be the most convenient method for screening variability between plant varieties, analysing genetic diversity and determining phylogenetic relationships among plants [17]. Many studies have been conducted to identify date palm genotypes using morphological traits and biochemical markers. However, using these markers alone, detection of genetic variation among genotypes is unreliable because these markers have been influenced by environmental factors and also show low levels of polymorphism [2,17,21]. Different DNA markers have been applied to analyse the genetic relationship of date palm cultivars in many countries such as in Egypt [22,23], Tunisia [4,24], Morocco [3], Nigeria [25], Pakistan [2] and Syria [26]. It is well known that microsatellite markers are still one of the most powerful molecular tools due to their nature and reproducibility for assessment of genetic diversity, population structure and differentiation. Therefore, the objective of the present research aimed to assess the genetic diversity and population structure within date palm genotypes collected on different locations and early introduced date palm varieties using microsatellite markers.

Sampling and DNA extraction
During the time of fruiting, a total of 124 date palm samples both females and males that were collected in Afar and Somalia regions of Ethiopia ( Fig. 1) which included early introduced date palm varieties collected from Afambo district specifically located at Humodoyta Kebele (in vitro date palm adaptation farm), and information related to data collection of all samples is available in detail as additional file with a manuscript (in Additional file 1 Table 1). Young and yellowish date palm leaf samples were collected from offshoots, cut into pieces and preserved in silica gel. The dried leaves were ground using liquid nitrogen and acidic sterilised sand. Genomic DNA was extracted using the CTAB method in 100 ml of CTAB buffer (pH 8.0) containing 2% of Cetyltrimethylammonium bromide, 4 ml of 20 mM EDTA, 4 ml of 100 mM Tris-HCl (pH 8.0), 8 g of 1.4 M NaCl and 0.1% β-mercaptoethanol. To test genome DNA quality, DNA (3 μl) was ran on 1% agarose gel using 0.5X TAE buffer, 90 V, 150 A, 50 W for 30 min set of the gel electrophoresis programme and stained with ethidium bromide solution and then illuminated under UV lights, and photographs were captured. DNA concentrations were also determined using a Gene Quant spectrophotometer.

Polymerase chain reaction (PCR) amplification
A total of 10 simple sequence repeat (SSR) primers developed by [27] were used (Table 1)    μl of PCR grade water, 5U Taq DNA polymerase (0.5 μl). All PCR amplifications were performed in thermal Cycler (Buio-Rad) with an initial denaturation step at 94°C for 5 min and followed by 35 cycles denaturation at 94°C for 30 s, annealing at 52°C for 1 min, extension at 72°C for 30 s, final extension at 72°C for 5 min and hold period at 4°C. The PCR products were separated using 1.5% agarose gels and stained in ethidium bromide solution and then illuminated under UV lights to be taken photographs. The 50 bp DNA ladder (Bio tools) was used to estimate the approximate molecular size of DNA fragments of PCR products (Additional file 4 Figure 1a-e).

Data analysis
Amplified DNA fragments produced in each microsatellite locus was manually recorded (see Additional file 2 Table 2). Genetic diversity parameters, i.e. expected heterozygosity (He), observed heterozygosity (Ho), observed number of alleles (Na), effective number of alleles (Ne), private alleles per locus, Shannon information index (I) and Nm (Gene flow estimated from Fst = 0.25(1 − Fst)/ Fst), were calculated by using Popgen32 version 1.31 [28] and GenAlEx version 6.5 [29] softwares. Major allele frequency (MAF) and number of genotypes (NG) per locus was performed using Power Marker software version 3.25. Polymorphic information content (PIC) of each locus was computed by Curves software version 3.0.7 and the fixation index (Fis, Fst, Fit) were calculated using Arlequin software version 3.5.2.2. Principal coordinate analysis (PCoA) was conducted from distance matrix of each accession using GenAlEx software. Ne's genetic identity and distance was also performed by pop-gen32 software. Darwin software, version 6.0.21 was used to construct a dendrogram using the neighbourjoining (NJ) algorithm based on dissimilarity matrix and by computing bootstrap value over 1000 replicates. Population genetic structure was analysed based on Bayesian clustering using STRUCTURE 2.3.4 [30] to define the number of clusters in the dataset (i.e. ranging from K1 to K10). The admixture ancestry model and correlated allele frequency model were used to perform a Markov chain Monte Carlo simulation algorithm (MCMC). The length of the burn-in period was set to 100,000; MCMC after the burn-in period was set to 200, 000, and was run 10 times for each K to estimate K values. Optimal K value among K groups was determined based on [31] the method using STRUCTURE HARVESTER [32] online website. The result files obtained from STRUCTURE HARVESTER was also analysed by CLUMPP software online website [33] to align the clusters across replicates and to display clusters in each K drawn as coloured box plots.

Discriminant analysis of principal components (DAPC) and two-locus linkage disequilibrium (LD) analysis
In addition to STRUCTURE analysis, we performed DAPC, an assumption-free multivariate clustering method [34] using the R package 'adegenet' [35] to determine genetic structure date palm genotypes among districts. The optimal number of clusters was inferred using k-means analysis [35] of principal component (PC)-transformed SSR data, and Bayesian information criterion (BIC) was used to assess the best supported model (i.e. the number and nature of clusters). For the DAPC, eight clusters were chosen because they had the lowest value according to the BIC criterion, and 20 PCs were retained. We also calculated Ohta's variance components of linkage disequilibrium [36] of D IS 2 (the average disequilibrium within subpopulations), D ST 2 (the contribution to the overall disequilibrium caused by differences in allele frequencies among subpopulations), D' ST 2 (the variance of the correlation of linkage disequilibrium of one population relative to a total population) and D' IS 2 (the variance of the correlation within population relative to that of the total population) using Pop-gen32 version 1.31.

SSR marker and its allelic diversity
A wide range of allele variants were observed from each locus ( Table 2). A total of 112 alleles were amplified with an average of 11.0 from all loci and a minimum (5.0) by MPdCIR032 and a maximum (16.0) number of alleles were revealed by MPdCIR085 and MPdCIR093 loci. The mean number of major allele frequency was 0.26, with numbers ranging from 0.155 (MPdCIR085) to 0.374 (MPdCIR016). All loci 307 genotypes were identified totally, and the number of genotypes per locus ranged from 7 (MPdCIR032) to 47.0 (MPdCIR085). The mean effective number of alleles was 6.61 ranging from 3.0 (for MPdCIR032) to 10.6 (for MPdCIR085). All SSR markers in this study was found to be highly informative with a PIC value ≥ 0.50 of which MPdCIR085 showed the highest PIC (0.899), while MPdCIR032 showed the lowest PIC (0.597) with an average of 8.09.

Genetic diversity analysis
In this study, high genetic diversity was shown among nine date palm accessions represented by a total of 124 genotypes. The genetic variation was estimated by observed heterozygosity (Ho), expected heterozygosity (He) and fixation index (Fis, Fst, Fit); these are presented in Table 2 and   (7.0) and the lowest (4.7) mean values of the number of different alleles with a frequency ≥ 5% was in Mego and Legaharae, respectively. The mean of number of locally common alleles with a frequency ≥ 5% found in 25% of the total populations was ranged from 0.2 (in Errer Gota) to 0.7 (in Mamulae) whereas, the mean number of locally common alleles with a frequency ≥ 5% found in 50% of the total population ranged from 0.8 (in Kerebuda) to 1.7 (in Mamulae) (see Fig. 2). The degree of genetic similarity and distance between date palm populations are presented in Table 3. High genetic similarity was observed between Mamulae and Legaharae populations, while low similarity was between early introduced and Kerebuda. On the other hand, among populations we found lowest and highest genetic distance values 0.1791 and 0.6403 between Mamulae and Legaharae and Introduced and Kerebuda respectively. In addition, AMOVA showed that the prevalence of higher percentage genetic variation within individuals (73%) than among populations (4%) in Table 4.

Clustering and population genetic structure
A dendrogram and principal coordinate analysis were made to determine the structural similarity of among and within date palm populations and to predict the genetic difference between them. All genotypes separated into five major clusters (Fig. 3a), and one population is represented by the same colour as coded in the number list (Fig. 3b), and is similarly displayed in a dendrogram. In cluster 1, twenty eight individuals, cluster 2 thirty four, cluster 3 twenty three, cluster 4, twenty five and cluster 5 fourteen individuals were grouped as members. Clusters 1, 2, 3, and 5 were further subdivided into two subclusters, while cluster 4 was divided into three subclusters. All subclusters further subdivided into many subgroups which represented a clade composed of mixed genotypes of the populations. Half of the genotypes of each Introduced and Humodoyta populations were found together in cluster 1, whereas others clusters contained mixed genotypes from different populations. PCoA analysis revealed 17.33% total variation explained by the first 3 axes (Fig. 4a). The distribution of genotypes on the PCoA graph is not separately clustered depending on their accessions; instead, they showed intermixing of genotypes among populations and also similarly displayed on NJ tree. In population STRUCT URE analysis, date palm genotypes from all accessions were segregated into three subpopulations in the Bayesian clustering model using STRUCTURE software version 2.3.4 with the criterion of maximum membership probabilities based on [31] method, (i.e. the delta K value had the highest peak, at K = 3 (Fig. 4b)). The STRUCT URE output CLUMPP software online displayed light blue, orange and violet colours of bar plots (Fig. 4c) and revealed admixture structure in each date palm population regardless their accessions. The distributions of genotypes on the coordinate axis (Fig. 4a) as well as clustering of genotypes in a NJ tree (Fig. 3) were accord with the result in population STRUCTURE.

Discriminate analysis of principal components and twolocus linkage disequilibrium
Using the Bayesian model-based approach, the membership probabilities of each date palm individual for the different groups were obtained from DAPC and the results of DAPC analysis. The DAPC separated date palm genotypes into eight clusters based on the first two linear discriminants (Fig. 5 and in Additional file 3 Table  3). The DAPC analyses revealed that all of early introduced date palm genotypes except one individual were represented by two genetic clusters (DAPC clusters 4 and 6). This result suggests that these date palm   Table 5. This relation in pairs of loci indicated that nonrandom association of the SSR alleles at particular variable loci was mainly caused by limited migration and random process or genetic drift [36,37]. However, the rest of the 15 pairs of loci were shown in the dual relationships of D ST 2 < D IS 2 and D' IS 2 < D'ST 2 ; this is because gametes with favorable combinations of alleles would increase in every population [36,37].

Discussion
The purpose of the present study is to determine the genetic diversity and relationship between date palm genotypes and populations using SSR markers. All of the SSR primers tested in this study demonstrated PIC values from 0.509 to 0.899; these are considered to be highly informative markers and the occurrence of allele variations among populations; this is also supported by previous studies [38,39]. A total of 112 alleles were detected from 124 date palm individual samples in (See figure on previous page.) Fig. 4 a Two-dimensional plot of principal coordinates analysis of 124 date palm genotypes that represent for nine populations. b Estimation of population using ΔK = mean (|L"(K)|) / sd(L(K)) with cluster number (K) ranging from 1 to 10 and K = 3 is the optimal K value based on Evanno et al. (2005) method. c Results of STRUCTURE analysis based on microsatellite data and estimation of genetic structure of the nine populations using K = 3. Each population is represented by a vertical bar and separated by a black line, partitioned into coloured segments representing the proportion of the individual's genome in the K clusters  populations. This exhibits the presence of high genetic diversity within date populations. In this study, number of alleles/ locus varied from 5 to 16 and maximum number of alleles (16) was amplified by MdPCIR085 and MdPCIR093 loci. According to [3], a maximum number of alleles (11) was detected by MdPCIR050; ranging between 4 and 11, whereas [40] reported a number of alleles ranging from 6 to 15 per locus and maximum (15) number of alleles amplified by primer MdPCIR015. The current study showed high genetic diversity among Ethiopian date palm populations may be due to high heterozygosity ( > 0.72).
In this study, NJ tree and principal coordinate analysis outcomes provided clear genetic relationship between unknown date palm varieties and early introduced cultivated date palm varieties. The dendrogram generated five major clusters. Generally, in this study, there was no clear separation among genotypes according to their geographical locations. In cluster one, eight individuals from Humodoyta kebele, five individuals from Mamulae kebele, three individuals from Alassabolo kebele and one individual from Errer Gota were clustered together with these early introduced cultivated date palm varieties (i.e. Sagaii, Jarvis, Khadrawy, Khalas, Medjool Israel and Khayra). Other cultivated varieties (Mdjool England, Shish, Zamli and Ashal Al Hassa) were found in cluster two with mixed individuals from Legaharae, Berga, Humodoyta, Error Gota, Alassabolo and Mamulae populations. Barhee cultivated variety was only found in cluster five incorporated with individuals from other populations. Half of date palm genotypes from Kerebuda population mainly observed in the subcluster of a clade four categories. This result was also supported with results in PCoA and population STRUCTURE analysis (i.e., intermixing distributions of genotypes on principal component axis and admixture structures were observed). The present result of PCoA showed certain match with the previous studies reported by [24,41], contrary to the report by [38][39][40]42]. According to the model-based clustering for genetic structure of date palm individuals, three genetically distinctive subpopulations were presented that were not formed in line with their collection places. The highest mean similarity score was estimated at K = 3, the most probable clustering of populations was observed at this K level and showed admixture structure among populations. This result displays the date palm populations that have a common genetic background and also they shared common alleles between them. The most differentiated population in this study was observed in Kerebuda population, which showed fewer admixtures than the other populations. Generally, the structure result had shown shared ancestry between unknown date palm genotypes and known date palm cultivars that were early introduced. The admixture result of the present study at K = 3 is in agreement with those reported by [39,40,43]. Similarly, [44] revealed three differentiated date palm memberships at K = 3 using different SSR as well as four date palm groups at K = 4 using SNPs. In both markers, they reported the admixture population structure of date palms at best K value of each cluster that was collected from different parts of the world. Intermixing clustering results of date palm genotypes in structure analysis of the present study to some extent coincides with the previous finding at five different K groups that were studied using different SSR primers [1,41]. The DAPC results of this study exhibited genetic differentiation between groups while overlooking within-group variation and achieved the best discrimination of individuals into predefined groups. We confirmed maximum likelihood-based clustering results using DAPC method that is considered free of Hardy-Weinberg and linkage disequilibrium assumptions. The DAPC approach relies on discriminant functions that seek to maximize the diversity between clusters while minimizing within-cluster diversity [34]. Due to this reason, DAPC mostly assigned individuals to single clusters [34]. DAPC was suggested as an alternative method to identify and describe clusters of genetically related individuals by analysing complex genetic data and detecting admixed individuals by determining the probability that each individual belonged to each cluster [40,45].
Overall, the AMOVA result of expected heterozygosity revealed high genetic differentiation within date palm genotypes and the outcome of the NJ tree; PCoA and structure analysis of this study are dominated by admixture structure patterns among populations. This might be due to the codominant nature of microsatellite markers contributed to have high allele's variations per locus as well as high heterozygosity within date palm genotypes due to DNA slippage during the process of DNA replication. DNA slippage is the mutation of microsatellite length during enzymatic replication of microsatellite regions that are usually the result of insertion and deletion of repeats in DNA strands [46,47]. On the other hand, date palms are dioecious and cross-and wind-pollinated plant species. This nature of the palms facilitates broadened genetic variations within date palm genotypes. In the case of cross-pollinating species, it is obvious that within plant populations, they maintain high levels of genetic diversity due to their breeding system [48][49][50]. Moreover, at date palm collection sites of the current study, there was no integrated management system related to propagation for date palms, i.e. the plants are like wilds (W. Ahmed, personal communication). Consequently, this situation could increase the probability of date palms breeding from seeds rather than offshoots. The date palms germinated by seeds have a dioecious nature, so an increase in the genetic distance between date palm genotypes is expected; also, low rate of gene flow within the population contributes to exhibited high genetic diversity within populations rather than among populations.

Conclusions
Preliminary research on genetic diversity of plant species is the most essential issue for long-term plant improvement and conservation and development of mechanisms for reduction of plant vulnerability. SSR markers used in the present study revealed high genetic diversity within date palm genotypes and date palm populations. Therefore, this finding contributes input information in genetic relationship between known cultivated date palm varieties and unknown date palm varieties for improvement and conservation programmes. Generally, this study will be an eminent evidence and source of information on genetic diversity of date palms in Ethiopia to regional and international genbanks.