Mining and validation of novel simple sequence repeat (SSR) markers derived from coconut (Cocos nucifera L.) genome assembly
Journal of Genetic Engineering and Biotechnology volume 20, Article number: 71 (2022)
In the past, simple sequence repeat (SSR) marker development in coconut is achieved through microsatellite probing in bacterial artificial chromosome (BAC) clones or using previously developed SSR markers from closely related genomes. These coconut SSRs are publicly available in published literatures and online databases; however, the number is quite limited. Here, we used a locally established, coconut genome-wide SSR prediction bioinformatics pipeline to generate a vast amount of coconut SSR markers.
A total of 7139 novel SSR markers were derived from the genome assembly of coconut ‘Catigan Green Dwarf’ (CATD). A subset of the markers, amounting to 131, were selected for synthesis based on motif filtering, contig distribution, product size exclusion, and success of in silico PCR in the CATD genome assembly. The OligoAnalyzer tool was also employed using the following desired parameters: %GC, 40–60%; minimum ΔG value for hairpin loop, −0.3 kcal/mol; minimum ΔG value for self-dimer, −0.9 kcal/mol; and minimum ΔG value for heterodimer, −0.9 kcal/mol. We have successfully synthesized, optimized, and amplified 131 novel SSR markers in coconut using ‘Catigan Green Dwarf’ (CATD), ‘Laguna Tall’ (LAGT), ‘West African Tall’ (WAT), and SYNVAR (LAGT × WAT) genotypes. Of the 131 SSR markers, 113 were polymorphic among the analyzed coconut genotypes.
The development of novel SSR markers for coconut will serve as a valuable resource for mapping of quantitative trait loci (QTLs), assessment of genetic diversity and population structure, hybridity testing, and other marker-assisted plant breeding applications.
Coconut (Cocos nucifera L.) is one of the most economically important crops in the Philippines. In 2017, the country produced 14.05 million metric tons of coconut, and the value of production hits 120.3 million pesos . The Philippines remained to be the top global supplier of coconut copra and desiccated coconut in both volume and total USD value as of 2010 . Coconut oil, one of the many diversified products of coconut, ranked first among the top ten agricultural exports of the Philippines comprising 21.9% of the total agricultural exports in 2015 .
Coconut is situated across the tropical and subtropical latitudes that are accessible to the equatorial Pacific Ocean current which possibly favored the evolution and dispersal of coconut. Coconut palms thrive well in humid coastal environments at about 18° of latitude north or south of the equator where there is fertile soil, favorable temperature, and year-round rainfall . Coconut belongs to the Indian center (II) and Indo-Malayan subcenter (II-A, where the Philippines belongs) in Vavilov’s center of origin of cultivated plants . It is generally classified into two types: tall and dwarf. The tall types are generally allogamous (heterozygous) or cross-pollinating, slow to mature; flower at 6–10 years after planting, and with an economic life of 60–70 years. Dwarf types, on the other hand, are highly autogamous (homozygous), or mainly self-pollinating, early to flower at around 4–6 years after planting with a productive life of 30–40 years [2, 6, 12].
Coconut is a diploid with 32 chromosomes (2n = 2× = 32). It belongs to the family Arecaceae (Palmaceae) in the subfamily Cocoideae and is the lone species of genus Cocos . The estimated genome size of coconut is approximately 2.6 Gbp comprising of 50–70% repetitive sequences . Lantican et al.  reported the estimated genome size of ‘CATD’ to be 2.14 Gbp. The abundance of repeat contents in the coconut genome becomes advantageous in the assessment and characterization of coconut varieties/populations using molecular marker techniques. The use of molecular tools offers a more accurate assessment than the conventional way of characterizing coconut which is through morphological and agronomical traits that are mostly influenced by many environmental factors .
Molecular markers have established its importance as a modern breeding tool for crop improvement [7, 24, 31]. The use of molecular tools can significantly accelerate the overall duration of breeding programs for coconut improvement. One of the extensively used markers in molecular breeding and genetic diversity analyses is the simple sequence repeats (SSR). SSRs are short tandem repeats that have repeating units of di-, tri-, tetra- and pentanucleotides . They are approximately 1–8-bp long, abundant, and well distributed throughout the genome on which repeat units can vary between genotypes/individuals which make it a very useful tool in fingerprinting, genotyping, and genetic diversity analyses .
In the past, SSR marker development in coconut was achieved through microsatellite probing in bacterial artificial chromosome (BAC) clones or using previously developed SSR markers from closely related genomes [15, 21]. These coconut SSR markers are publicly available; however, the number and distribution across chromosomes are quite limited for quantitative trait loci (QTL) mapping and genetic diversity studies. Fortunately, with the current advancements in next-generation sequencing (NGS) technologies, it has now become possible to mine SSRs across the entire genome. By using genome-wide bioinformatics prediction, we can generate a vast amount of SSR markers efficiently.
This study aims to provide a valuable resource of SSR markers for potential use in marker-assisted selection breeding for coconut.
Plant materials and leaf collections
Leaf samples of the coconut parental genotypes ‘Catigan Green Dwarf’ (CATD), ‘Laguna Tall’ (LAGT), and ‘West African Tall’ (WAT) and a synthetic variety denoted as SYNVAR (LAGT × WAT) used in this study were obtained from the Philippine Coconut Authority — Zamboanga Research Center (PCA–ZRC) in San Ramon, Zamboanga City, Philippines. Coconut leaflets coming from the youngest frond or the “first leaf” and are free from any pest damage were carefully chosen as samples. Three leaflets were gathered from each of the left and right portions of the midrib near the base of the frond. The samples were transported to the Genetics Laboratory at the Institute of Plant Breeding — University of the Philippines Los Baños (IPB-UPLB), Laguna, Philippines, for DNA extraction.
Genomic DNA extraction of coconut parental genotypes
A total of eight individuals/palms of the coconut genotypes were collected (Table 1). Genomic DNA was extracted following the procedure adapted from Doyle and Doyle  with modifications. DNA quality and yield were determined by electrophoresis in 1% UltraPure™ agarose (Invitrogen Corp., Carlsbad, California, USA) in 1× Tris-borate EDTA (TBE) running buffer at 100 V for 40 min, 0.5 ug mL−1 ethidium bromide staining, and UV illumination at 300 nm using the Enduro GDS Touch Imaging System (Labnet International, Inc, Edison, New Jersey, USA). DNA concentration was estimated by visual comparison of gel fragments with known concentrations of lambda (λ) DNA molecular weight standards (Sigma-Aldrich Inc., St. Louis, Missouri, USA).
Development of SSR markers using the genome assembly of coconut ‘Catigan Green Dwarf’ (CATD)
Previously, a set of 7139 novel SSRs was automatically generated based on the SSR loci annotation of the genome assembly of coconut ‘Catigan Green Dwarf’ (CATD) using GMATA software package [9, 27]. Given the vast amount of the predicted SSR markers, selection criteria were employed to obtain high-quality markers for eventual use in coconut genotyping. Motif filtering, contig distribution, and product size exclusion were used to further filter the predicted markers by manual checking. Markers with AT/AT and TA/TA repeat motifs were excluded in the selection. In silico PCR in the ‘CATD’ genome assembly  was then performed to ensure in vitro SSR amplification prior to synthesis . OligoAnalyzer tool (Integrated DNA Technologies, Inc., Coralville, Iowa) was also employed using the following desired parameters: %GC, 40–60%; minimum ΔG value for hairpin loop, −0.3 kcal/mol; minimum ΔG value for self-dimer, −0.9 kcal/mol; and minimum ΔG value for heterodimer, −0.9 kcal/mol for further filtering of the SSRs (Fig. 1).
PCR was carried out with 10 uL reaction volume (15 ng genomic DNA, 1× PCR buffer (10 mM Tris pH 9.1 at 20 °C, 50 mM KCl, 0.01% Triton™ X-100); Vivantis Technologies, Malaysia), 1.5 mM MgCl2, 0.2 mM dNTPs (Promega Corporation, Madison, Wisconsin, USA), 0.2 μM forward and reverse primer (Integrated DNA Technologies Pte. Ltd., Singapore), and Taq DNA polymerase (Vivantis Technologies, Malaysia). The temperature profile used is as follows: initial denaturation at 95 °C for 3 min, 30 cycles of denaturation (95 °C, 30 s), annealing (45–60 °C depending on the primer pair, 45 s), extension (72 °C, 1 min), and final extension at 72 °C for 5 min. Amplifications were carried out in the Applied Biosystems Veriti™ 96-well Thermal Cycler (Thermo Fisher Scientific, Madison, Wisconsin, USA). PCR products were resolved with electrophoresis using 8% non-denaturing polyacrylamide gel in 1× Tris-borate EDTA buffer at 100 V for 60–75 min in the C.B.S. Scientific Triple Wide Mini-Vertical System™ (C.B.S. Scientific Company San Diego, California, USA) and visualized using 0.5 ug mL−1 ethidium bromide staining and UV illumination using the Enduro GDS Touch Imaging System (Labnet International, Inc, Edison, New Jersey, USA). Gels were scored manually for the presence or absence of bands.
A total of 131 SSR markers were synthesized, and 98% of these were comprised by dinucleotide repeats (or 2-mer), while the remaining 2% are tri- and tetranucleotide repeats comprising of 1% each, as shown in Fig. 2. AG and GA motifs are the most abundant dinucleotide repeats found in the 131 SSR markers, with 29 and 18.3%, respectively. These are followed by CT (14.5%), TG (13.7%), TC (11.5%), AC (7.6%), and GT (3.8%) repeats. In addition, tri- and tetranucleotide repeats of AAG (1.0%) and ACAT (1.0%) were also observed.
All SSRs showed successful amplification in coconut genomic DNA. Of the 131 SSRs, 113 (86%) were polymorphic among the test coconut varieties, while the remaining 18 (14%) were monomorphic. An average of 2.70 alleles per locus was observed across test varieties, implying a high degree of polymorphism of the selected SSRs. Representative gels of polymorphic SSRs optimized among coconut genotypes are presented in Fig. 3 on which distinct and good amplification patterns were observed. The product size of these markers ranged from 130 to 690 bp. The summary of the characteristics of the selected SSRs are presented in Table 2 which includes the name of marker, annealing temperature, repeat motif, contig distribution, product size range, and number of alleles (Fig. 4).
The work of Lantican et al.  was able to identify genome-wide SSRs based on de novo prediction of repeat loci across the CATD genome assembly. However, the predicted loci were not screened nor tested in actual wet lab conditions. Here, the SSR markers generated were subjected to various filtering parameters that are advantageous based on genome distribution, repeat motif, and ideal thermodynamic properties. Markers with AT/AT and TA/TA repeat motifs were excluded in the selection since these are the most common type of repeats found in the coconut/palm genome [9, 13, 29] on which the high repeat content may hinder specificity of the markers and/or may result to nonspecific amplification of products. Markers were also selected based on the distribution in the contig to cover the entire coconut genome. In silico PCR in the CATD genome assembly was performed. This allows checking of contig specificity of the marker and ensures in vitro SSR amplification . Allele size range of the markers was also limited to 80–400 bp for easy visualization in gel, and OligoAnalyzer tool was used to check dimerization capability and formation of hairpin loop of the primers to produce high-quality markers.
The predominance of dinucleotide repeats in coconut and other related species is supported by previous works of Rivera et al. , Palliyarakkal et al. , Xia et al. , and Lantican et al. . This result coincides with studies of Palliyarakkal et al.  and Xia et al.  on which AG/GA/TC/CT motifs were also the most common dinucleotide repeats found in coconut/palm genome. The results obtained here are consistent with previous studies on which high levels of polymorphism are likely attributed to phenotypic variation and differences in the breeding behaviors of the dwarf and tall varieties which are said to be generally autogamous (self-pollinating) and allogamous (cross-pollinating), respectively [14, 21, 25]. The development of SSRs using advanced bioinformatics tools in this study has become very efficient in generating high number of markers in coconut. The generated SSRs here are expected to contribute to the pool of available molecular markers [10, 16, 28,29,30] for fingerprinting, genetic diversity analysis and QTL mapping, and other relevant studies in coconut.
Microsatellites or SSRs are a very useful molecular tool for studying genetic diversity and genotyping of coconut [8, 10, 15, 16, 30]. It has been extensively used in these analyses since SSR markers are abundant and well distributed throughout the genome, multi-allelic, co-dominant, highly polymorphic, and highly reproducible [11, 20]. Previous studies like Rivera et al. , Perera et al. , Xiao et al. , and Wu et al.  have already developed SSRs in coconut for genetic diversity studies, and these markers showed high levels of polymorphism as well.
Here, we demonstrated that a locally established bioinformatics pipeline can mine SSRs from NGS data with actual utility in terms of amplification and distinguishing power across several varieties of coconut. The advantage of using a genome-wide bioinformatics prediction approach in marker development is its relatively fast and cost-effective way of generating vast amounts of markers. SSRs and SNPs can be easily generated automatically in the genome sequences with the use of these programs or pipelines.
Polymorphic markers in this study will be further used to genotype the coconut mapping population generated from a three-way cross of ‘Pacific’ LAGT and CATD and ‘Indo-Atlantic’ WAT coconut for QTL mapping analysis. The development of novel SSR markers for coconut will serve as a valuable resource for mapping QTLs, assessment of genetic diversity and population structure, hybridity testing, and other marker-assisted plant breeding applications.
Availability of data and materials
The dataset(s) supporting the conclusions of this article is (are) included within the article (and its additional file(s)).
Anderson JR, Lubberstedt T (2003) Functional markers in plants. Trend Plant Sci 8:554–560
Batugal P, Bourdeix R, Baudouin L (2009) Coconut breeding. In: Jain SM, Priyadarshan PM (eds) Breeding plantation tree crops: tropical species. Springer Science Business Media, LLC, pp 327–373
Doyle JJ, Doyle JL (1990) Isolation of plant DNA from fresh tissue. Focus 12:13–15
FAOSTAT Database. 2013. http://faostat.fao.org/
Foale M (2003) The coconut odyssey: the bounteous possibilities of the tree of life. ACIAR Monograph 101:132
Harries HC (1978) Evolution, dissemination and classification of Cocos nucifera L. Bot Rev 44(3):265–320
Kesawat MS, Kumar BD (2009) Molecular markers: it’s application in crop improvement. J Crop Sci Biotechnol 12(4):169–181
Konan KJN, Koffi KKE, Konan JL, Lebrun P, Dery SK, Sangare A (2007) Microsatellite gene diversity in coconut (Cocos nucifera L.) accessions resistant to lethal yellowing disease. Afr J Biotechnol 6(4):341–347
Lantican D, Strickler S, Canama A, Gardoce R, Mueller L, Galvez H (2019) De novo genome sequence assembly of dwarf coconut (Cocos nucifera L. ‘Catigan green dwarf’) provides insights into genomic variation between coconut types and related palm species. G3 (Bethesda) 9(8):2377–2393. https://doi.org/10.1534/g3.119.400215 PMID: 31167834; PMCID: PMC6686914
Lebrun P, N'cho Y, Seguin M, Grivet L, Baudouin L (1998) Genetic diversity in coconut (Cocos nucifera L.) revealed by restriction fragment length polymorphism (RFLP) markers. Euphytica 101(1):103–108. https://doi.org/10.1023/a:1018323721803
Mason AS (2015) SSR genotyping. In: Batley J (ed) Plant genotyping. Springer, New York, pp 77–89
Meerow AW, Krueger RR, Singh R, Low ETL, Maizuraithnin M, Ooi LCL (2012) Coconut, date, and oil palm genomics. In: Schnell RJ, Priyadarshan PM (eds) Genomics of tree crops. © Springer Science Business Media, LLC, pp 299–351. https://doi.org/10.1007/978-1-4614-0920-5_10
Palliyarakkal MK, Ramaswamy M, Vadivel A (2011) Microsatellites in palm (Arecaceae) sequences. Bioinformation. 7(7):347–351
Perera L, Russell JR, Provan J, Powell W (1999) 1999. Identification and characterization of microsatellites in coconut (Cocos nucifera L.) and the analysis of coconut population in Sri Lanka. Mol Ecol 8:344–346
Perera L, Russell JR, Provan J, Powell W (2003) Studying genetic relationships among coconut varieties/populations using microsatellite markers. Euphytica 132:121–128
Perera L, Russell RJ, Provan J, Mcnicol WJ, Powell W (1998) Evaluating genetic relationships between indigenous coconut (Cocos nucifera L.) accessions from Sri Lanka by means of AFLP profiling. Theor Appl Genet 96(3):545–550. https://doi.org/10.1007/s001220050772
Perera PIP, Hocher V, Verdeil JL, Yakandawala DMD, Weerakoon LK (2007) Recent advances in anther culture of coconut (Cocos nucifera L.). In: Xu Z, Li J, Xue Y, Yang W (eds) Biotechnology and sustainable agriculture 2006 and beyond. Springer, Dordrecht, p 451
Philippine Statistics Authority. 2017. Agricultural foreign trade statistics of the Philippines: 2015. https://psa.gov.ph
Philippine Statistics Authority. 2018. Selected Statistics on Agriculture 2013-2017. https://psa.gov.ph
Powell W, Machray GC, Provan J (1996) Polymorphism revealed by simple sequence repeats. Trends Plant Sci 1(7):215–222
Rivera R, Edwardds KJ, Barker JHA, Arnold GM, Ayad G, Hodgkin T, Karp AA (1999) Isolation and characterization of polymorphic microsatellites in Cocos nucifera L. Genome 42:668–675
Rotmistrovsky K, Jang W, Schuler GD (2004) A web server for performing electronic PCR. Nucleic acids research, 32 (web server issue), W108–W112. https://doi.org/10.1093/nar/gkh450
Sharma A, Namdeo AG, Mahadik KR (2008) Molecular markers: new prospects in plant genome analysis. Pharmacogn Rev 2(3):23–34
Sindhumole P, Ambili SN (2011) Marker assisted breeding in coconut (Cocos nucifera L.). Gregor Mendel Foundation Proceedings 2011:30-32
Teulat B, Aldam C, Trehin R, Lebrun P, Barker JHA, Arnold GM, Karp A, Baudouin L, Rognon F (2000) 2000. An analysis of genetic diversity in coconut (Cocos nucifera L.) populations from across the geographic range using sequence-tagged microsatellites (SSRs) and AFLPs. Theor Appl Genet 100:764–771. https://doi.org/10.1007/s001220051350
Vavilov NI (1926) Centres of origin of cultivated plants. Bull Appl Bot Genet Plant Breed 16:1–248
Wang X, Wang L (2016) GMATA: an integrated software package for genome-scale SSR mining, marker development and viewing. Front Plant Sci 7:1350
Wu Y, Yaodong Y, Qadri R, Iqbal A, Li J, Fan H, Wu Y (2019) Development of SSR markers for coconut (Cocos nucifera L.) by selectively amplified microsatellite (SAM) and its applications. Trop Plant Biol 12(1):32–43
Xia W, Xiao Y, Liu Z, Luo Y, Mason A, Haikuo F, Yang Y, Zhao S, Peng M (2014) Development of gene-based simple sequence repeat markers for association analysis in Cocos nucifera. Mol Breed 34(2):1–11
Xiao Y, Luo Y, Yang Y, Fan H, Xia W, As M, Zhao S, Sager R, Qiao F (2013) Development of microsatellite markers in Cocos nucifera and their application in evaluating the level of genetic diversity of Cocos nucifera. Plant Omics J 6(3):193–200
Xu Y, Crouch JH (2008) Marker-assisted selection in plant breeding: from publications to practice. Crop Sci 48:391–407. https://doi.org/10.2135/cropsci2007.04.0191
We express our gratitude to the Department of Science and Technology — Philippine Council for Agriculture, Aquatic and Natural Resources Research and Development (DOST-PCAARRD) for funding the project “QTL mapping in coconut for high yield outstanding quality of copra oil and other coconut major by-products” under the program “Improvement of Coconut Varieties through Genomics, Genetics, and Breeding for a Competitive and Sustainable Philippine Coconut Industry (Genomics-Assisted Molecular Breeding).” This research also has been made possible by the commitment and support of the Philippine Genome Center, and the Philippine Coconut Authority-Zamboanga Research Center (PCA-ZRC), which we also thank for providing the plant materials. We likewise acknowledge the valuable technical services rendered by Ms. Desiree Diaz for the completion of this work.
The Department of Science and Technology — Philippine Council for Agriculture, Aquatic and Natural Resources Research and Development of the Department of Science and Technology (DOST-PCAARRD)
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Caro, R.E.S., Cagayan, J., Gardoce, R.R. et al. Mining and validation of novel simple sequence repeat (SSR) markers derived from coconut (Cocos nucifera L.) genome assembly. J Genet Eng Biotechnol 20, 71 (2022). https://doi.org/10.1186/s43141-022-00354-z