Mining and validation of novel simple sequence repeat (SSR) markers derived from coconut (Cocos nucifera L.) genome assembly

Background In the past, simple sequence repeat (SSR) marker development in coconut is achieved through microsatellite probing in bacterial artificial chromosome (BAC) clones or using previously developed SSR markers from closely related genomes. These coconut SSRs are publicly available in published literatures and online databases; however, the number is quite limited. Here, we used a locally established, coconut genome-wide SSR prediction bioinformatics pipeline to generate a vast amount of coconut SSR markers. Results A total of 7139 novel SSR markers were derived from the genome assembly of coconut ‘Catigan Green Dwarf’ (CATD). A subset of the markers, amounting to 131, were selected for synthesis based on motif filtering, contig distribution, product size exclusion, and success of in silico PCR in the CATD genome assembly. The OligoAnalyzer tool was also employed using the following desired parameters: %GC, 40–60%; minimum ΔG value for hairpin loop, −0.3 kcal/mol; minimum ΔG value for self-dimer, −0.9 kcal/mol; and minimum ΔG value for heterodimer, −0.9 kcal/mol. We have successfully synthesized, optimized, and amplified 131 novel SSR markers in coconut using ‘Catigan Green Dwarf’ (CATD), ‘Laguna Tall’ (LAGT), ‘West African Tall’ (WAT), and SYNVAR (LAGT × WAT) genotypes. Of the 131 SSR markers, 113 were polymorphic among the analyzed coconut genotypes. Conclusion The development of novel SSR markers for coconut will serve as a valuable resource for mapping of quantitative trait loci (QTLs), assessment of genetic diversity and population structure, hybridity testing, and other marker-assisted plant breeding applications.

Ocean current which possibly favored the evolution and dispersal of coconut. Coconut palms thrive well in humid coastal environments at about 18° of latitude north or south of the equator where there is fertile soil, favorable temperature, and year-round rainfall [5]. Coconut belongs to the Indian center (II) and Indo-Malayan subcenter (II-A, where the Philippines belongs) in Vavilov's center of origin of cultivated plants [26]. It is generally classified into two types: tall and dwarf. The tall types are generally allogamous (heterozygous) or cross-pollinating, slow to mature; flower at 6-10 years after planting, and with an economic life of 60-70 years. Dwarf types, on the other hand, are highly autogamous (homozygous), or mainly self-pollinating, early to flower at around 4-6 years after planting with a productive life of 30-40 years [2,6,12].
Coconut is a diploid with 32 chromosomes (2n = 2× = 32). It belongs to the family Arecaceae (Palmaceae) in the subfamily Cocoideae and is the lone species of genus Cocos [17]. The estimated genome size of coconut is approximately 2.6 Gbp comprising of 50-70% repetitive sequences . Lantican et al. [9] reported the estimated genome size of 'CATD' to be 2.14 Gbp. The abundance of repeat contents in the coconut genome becomes advantageous in the assessment and characterization of coconut varieties/populations using molecular marker techniques. The use of molecular tools offers a more accurate assessment than the conventional way of characterizing coconut which is through morphological and agronomical traits that are mostly influenced by many environmental factors [15].
Molecular markers have established its importance as a modern breeding tool for crop improvement [7,24,31]. The use of molecular tools can significantly accelerate the overall duration of breeding programs for coconut improvement. One of the extensively used markers in molecular breeding and genetic diversity analyses is the simple sequence repeats (SSR). SSRs are short tandem repeats that have repeating units of di-, tri-, tetra-and pentanucleotides [20]. They are approximately 1-8-bp long, abundant, and well distributed throughout the genome on which repeat units can vary between genotypes/individuals which make it a very useful tool in fingerprinting, genotyping, and genetic diversity analyses [23].
In the past, SSR marker development in coconut was achieved through microsatellite probing in bacterial artificial chromosome (BAC) clones or using previously developed SSR markers from closely related genomes [15,21]. These coconut SSR markers are publicly available; however, the number and distribution across chromosomes are quite limited for quantitative trait loci (QTL) mapping and genetic diversity studies. Fortunately, with the current advancements in next-generation sequencing (NGS) technologies, it has now become possible to mine SSRs across the entire genome. By using genome-wide bioinformatics prediction, we can generate a vast amount of SSR markers efficiently.
This study aims to provide a valuable resource of SSR markers for potential use in marker-assisted selection breeding for coconut.

Plant materials and leaf collections
Leaf samples of the coconut parental genotypes 'Catigan Green Dwarf' (CATD), 'Laguna Tall' (LAGT), and 'West African Tall' (WAT) and a synthetic variety denoted as SYNVAR (LAGT × WAT) used in this study were obtained from the Philippine Coconut Authority -Zamboanga Research Center (PCA-ZRC) in San Ramon, Zamboanga City, Philippines. Coconut leaflets coming from the youngest frond or the "first leaf" and are free from any pest damage were carefully chosen as samples. Three leaflets were gathered from each of the left and right portions of the midrib near the base of the frond. The samples were transported to the Genetics Laboratory at the Institute of Plant Breeding -University of the Philippines Los Baños (IPB-UPLB), Laguna, Philippines, for DNA extraction.

Genomic DNA extraction of coconut parental genotypes
A total of eight individuals/palms of the coconut genotypes were collected (Table 1). Genomic DNA was extracted following the procedure adapted from Doyle and Doyle [3] with modifications. DNA quality and yield were determined by electrophoresis in 1% UltraPure ™ agarose (Invitrogen Corp., Carlsbad, California, USA) in 1× Tris-borate EDTA (TBE) running buffer at 100 V for 40 min, 0.5 ug mL −1 ethidium bromide staining, and UV illumination at 300 nm using the Enduro GDS Touch

Development of SSR markers using the genome assembly of coconut 'Catigan Green Dwarf' (CATD)
Previously, a set of 7139 novel SSRs was automatically generated based on the SSR loci annotation of the genome assembly of coconut 'Catigan Green Dwarf ' (CATD) using GMATA software package [9,27]. Given the vast amount of the predicted SSR markers, selection criteria were employed to obtain high-quality markers for eventual use in coconut genotyping. Motif filtering, contig distribution, and product size exclusion were used to further filter the predicted markers by manual checking. Markers with AT/AT and TA/TA repeat motifs were excluded in the selection. In silico PCR in the 'CATD' genome assembly [9] was then performed to ensure in vitro SSR amplification prior to synthesis [22]. Oli-goAnalyzer tool (Integrated DNA Technologies, Inc., Coralville, Iowa) was also employed using the following desired parameters: %GC, 40-60%; minimum ΔG value for hairpin loop, −0.3 kcal/mol; minimum ΔG value for self-dimer, −0.9 kcal/mol; and minimum ΔG value for heterodimer, −0.9 kcal/mol for further filtering of the SSRs (Fig. 1).

PCR analysis
PCR was carried out with 10 uL reaction volume (

Results
A total of 131 SSR markers were synthesized, and 98% of these were comprised by dinucleotide repeats (or 2-mer), while the remaining 2% are tri-and tetranucleotide repeats comprising of 1% each, as shown in Fig. 2.
In addition, tri-and tetranucleotide repeats of AAG (1.0%) and ACAT (1.0%) were also observed. All SSRs showed successful amplification in coconut genomic DNA. Of the 131 SSRs, 113 (86%) were polymorphic among the test coconut varieties, while the remaining 18 (14%) were monomorphic. An average of 2.70 alleles per locus was observed across test varieties, implying a high degree of polymorphism of the selected SSRs. Representative gels of polymorphic SSRs optimized among coconut genotypes are presented in Fig. 3 on which distinct and good amplification patterns were observed. The product size of these markers ranged from 130 to 690 bp. The summary of the characteristics of the selected SSRs are presented in Table 2 which includes the name of marker, annealing temperature, repeat motif, contig distribution, product size range, and number of alleles (Fig. 4).

Discussion
The work of Lantican et al. [9] was able to identify genomewide SSRs based on de novo prediction of repeat loci across the CATD genome assembly. However, the predicted loci were not screened nor tested in actual wet lab conditions. Here, the SSR markers generated were subjected to various filtering parameters that are advantageous based on genome distribution, repeat motif, and ideal thermodynamic properties. Markers with AT/AT and TA/TA repeat motifs were excluded in the selection since these are the most common type of repeats found in the coconut/palm genome [9, 13,   Table 2 Characteristics of the selected coconut SSRs with name, primer sequence, annealing temperature, repeat motif, contig number, and expected allele No.  29] on which the high repeat content may hinder specificity of the markers and/or may result to nonspecific amplification of products. Markers were also selected based on the distribution in the contig to cover the entire coconut genome. In silico PCR in the CATD genome assembly was performed. This allows checking of contig specificity of the marker and ensures in vitro SSR amplification [22]. Allele size range of the markers was also limited to 80-400 bp for easy visualization in gel, and OligoAnalyzer tool was used to check dimerization capability and formation of hairpin loop of the primers to produce high-quality markers. The predominance of dinucleotide repeats in coconut and other related species is supported by previous works of Rivera et al. [21], Palliyarakkal et al. [13], Xia et al. [29], and Lantican et al. [9]. This result coincides with studies of Palliyarakkal et al. [13] and Xia et al. [29] on which AG/ GA/TC/CT motifs were also the most common dinucleotide repeats found in coconut/palm genome. The results obtained here are consistent with previous studies on which high levels of polymorphism are likely attributed to phenotypic variation and differences in the breeding behaviors of the dwarf and tall varieties which are said to be generally autogamous (self-pollinating) and allogamous (crosspollinating), respectively [14,21,25]. The development of SSRs using advanced bioinformatics tools in this study has become very efficient in generating high number of markers in coconut. The generated SSRs here are expected to contribute to the pool of available molecular markers [10,16,[28][29][30] for fingerprinting, genetic diversity analysis and QTL mapping, and other relevant studies in coconut.

Marker ID
Microsatellites or SSRs are a very useful molecular tool for studying genetic diversity and genotyping of coconut [8,10,15,16,30]. It has been extensively used in these analyses since SSR markers are abundant and well distributed throughout the genome, multi-allelic, co-dominant, highly polymorphic, and highly reproducible [11,20]. Previous studies like Rivera et al. [21], Perera et al. [15], Xiao et al. [30], and Wu et al. [28] have already developed SSRs in coconut for genetic diversity studies, and these markers showed high levels of polymorphism as well.

Conclusion
Here, we demonstrated that a locally established bioinformatics pipeline can mine SSRs from NGS data with actual utility in terms of amplification and distinguishing power across several varieties of coconut. The advantage of using a genome-wide bioinformatics prediction approach in marker development is its relatively fast and cost-effective way of generating vast amounts of markers. SSRs and SNPs can be easily generated automatically in the genome sequences with the use of these programs or pipelines.
Polymorphic markers in this study will be further used to genotype the coconut mapping population generated from a three-way cross of 'Pacific' LAGT and CATD and 'Indo-Atlantic' WAT coconut for QTL mapping analysis. The development of novel SSR markers for coconut will serve as a valuable resource for mapping QTLs, assessment of genetic diversity and population structure, hybridity testing, and other marker-assisted plant breeding applications.