Skip to main content

Mining and validation of novel simple sequence repeat (SSR) markers derived from coconut (Cocos nucifera L.) genome assembly



In the past, simple sequence repeat (SSR) marker development in coconut is achieved through microsatellite probing in bacterial artificial chromosome (BAC) clones or using previously developed SSR markers from closely related genomes. These coconut SSRs are publicly available in published literatures and online databases; however, the number is quite limited. Here, we used a locally established, coconut genome-wide SSR prediction bioinformatics pipeline to generate a vast amount of coconut SSR markers.


A total of 7139 novel SSR markers were derived from the genome assembly of coconut ‘Catigan Green Dwarf’ (CATD). A subset of the markers, amounting to 131, were selected for synthesis based on motif filtering, contig distribution, product size exclusion, and success of in silico PCR in the CATD genome assembly. The OligoAnalyzer tool was also employed using the following desired parameters: %GC, 40–60%; minimum ΔG value for hairpin loop, −0.3 kcal/mol; minimum ΔG value for self-dimer, −0.9 kcal/mol; and minimum ΔG value for heterodimer, −0.9 kcal/mol. We have successfully synthesized, optimized, and amplified 131 novel SSR markers in coconut using ‘Catigan Green Dwarf’ (CATD), ‘Laguna Tall’ (LAGT), ‘West African Tall’ (WAT), and SYNVAR (LAGT × WAT) genotypes. Of the 131 SSR markers, 113 were polymorphic among the analyzed coconut genotypes.


The development of novel SSR markers for coconut will serve as a valuable resource for mapping of quantitative trait loci (QTLs), assessment of genetic diversity and population structure, hybridity testing, and other marker-assisted plant breeding applications.


Coconut (Cocos nucifera L.) is one of the most economically important crops in the Philippines. In 2017, the country produced 14.05 million metric tons of coconut, and the value of production hits 120.3 million pesos [19]. The Philippines remained to be the top global supplier of coconut copra and desiccated coconut in both volume and total USD value as of 2010 [4]. Coconut oil, one of the many diversified products of coconut, ranked first among the top ten agricultural exports of the Philippines comprising 21.9% of the total agricultural exports in 2015 [18].

Coconut is situated across the tropical and subtropical latitudes that are accessible to the equatorial Pacific Ocean current which possibly favored the evolution and dispersal of coconut. Coconut palms thrive well in humid coastal environments at about 18° of latitude north or south of the equator where there is fertile soil, favorable temperature, and year-round rainfall [5]. Coconut belongs to the Indian center (II) and Indo-Malayan subcenter (II-A, where the Philippines belongs) in Vavilov’s center of origin of cultivated plants [26]. It is generally classified into two types: tall and dwarf. The tall types are generally allogamous (heterozygous) or cross-pollinating, slow to mature; flower at 6–10 years after planting, and with an economic life of 60–70 years. Dwarf types, on the other hand, are highly autogamous (homozygous), or mainly self-pollinating, early to flower at around 4–6 years after planting with a productive life of 30–40 years [2, 6, 12].

Coconut is a diploid with 32 chromosomes (2n = 2× = 32). It belongs to the family Arecaceae (Palmaceae) in the subfamily Cocoideae and is the lone species of genus Cocos [17]. The estimated genome size of coconut is approximately 2.6 Gbp comprising of 50–70% repetitive sequences . Lantican et al. [9] reported the estimated genome size of ‘CATD’ to be 2.14 Gbp. The abundance of repeat contents in the coconut genome becomes advantageous in the assessment and characterization of coconut varieties/populations using molecular marker techniques. The use of molecular tools offers a more accurate assessment than the conventional way of characterizing coconut which is through morphological and agronomical traits that are mostly influenced by many environmental factors [15].

Molecular markers have established its importance as a modern breeding tool for crop improvement [7, 24, 31]. The use of molecular tools can significantly accelerate the overall duration of breeding programs for coconut improvement. One of the extensively used markers in molecular breeding and genetic diversity analyses is the simple sequence repeats (SSR). SSRs are short tandem repeats that have repeating units of di-, tri-, tetra- and pentanucleotides [20]. They are approximately 1–8-bp long, abundant, and well distributed throughout the genome on which repeat units can vary between genotypes/individuals which make it a very useful tool in fingerprinting, genotyping, and genetic diversity analyses [23].

In the past, SSR marker development in coconut was achieved through microsatellite probing in bacterial artificial chromosome (BAC) clones or using previously developed SSR markers from closely related genomes [15, 21]. These coconut SSR markers are publicly available; however, the number and distribution across chromosomes are quite limited for quantitative trait loci (QTL) mapping and genetic diversity studies. Fortunately, with the current advancements in next-generation sequencing (NGS) technologies, it has now become possible to mine SSRs across the entire genome. By using genome-wide bioinformatics prediction, we can generate a vast amount of SSR markers efficiently.

This study aims to provide a valuable resource of SSR markers for potential use in marker-assisted selection breeding for coconut.


Plant materials and leaf collections

Leaf samples of the coconut parental genotypes ‘Catigan Green Dwarf’ (CATD), ‘Laguna Tall’ (LAGT), and ‘West African Tall’ (WAT) and a synthetic variety denoted as SYNVAR (LAGT × WAT) used in this study were obtained from the Philippine Coconut Authority — Zamboanga Research Center (PCA–ZRC) in San Ramon, Zamboanga City, Philippines. Coconut leaflets coming from the youngest frond or the “first leaf” and are free from any pest damage were carefully chosen as samples. Three leaflets were gathered from each of the left and right portions of the midrib near the base of the frond. The samples were transported to the Genetics Laboratory at the Institute of Plant Breeding — University of the Philippines Los Baños (IPB-UPLB), Laguna, Philippines, for DNA extraction.

Genomic DNA extraction of coconut parental genotypes

A total of eight individuals/palms of the coconut genotypes were collected (Table 1). Genomic DNA was extracted following the procedure adapted from Doyle and Doyle [3] with modifications. DNA quality and yield were determined by electrophoresis in 1% UltraPure™ agarose (Invitrogen Corp., Carlsbad, California, USA) in 1× Tris-borate EDTA (TBE) running buffer at 100 V for 40 min, 0.5 ug mL−1 ethidium bromide staining, and UV illumination at 300 nm using the Enduro GDS Touch Imaging System (Labnet International, Inc, Edison, New Jersey, USA). DNA concentration was estimated by visual comparison of gel fragments with known concentrations of lambda (λ) DNA molecular weight standards (Sigma-Aldrich Inc., St. Louis, Missouri, USA).

Table 1 Coconut genotypes used in the study for screening the SSR markers

Development of SSR markers using the genome assembly of coconut ‘Catigan Green Dwarf’ (CATD)

Previously, a set of 7139 novel SSRs was automatically generated based on the SSR loci annotation of the genome assembly of coconut ‘Catigan Green Dwarf’ (CATD) using GMATA software package [9, 27]. Given the vast amount of the predicted SSR markers, selection criteria were employed to obtain high-quality markers for eventual use in coconut genotyping. Motif filtering, contig distribution, and product size exclusion were used to further filter the predicted markers by manual checking. Markers with AT/AT and TA/TA repeat motifs were excluded in the selection. In silico PCR in the ‘CATD’ genome assembly [9] was then performed to ensure in vitro SSR amplification prior to synthesis [22]. OligoAnalyzer tool (Integrated DNA Technologies, Inc., Coralville, Iowa) was also employed using the following desired parameters: %GC, 40–60%; minimum ΔG value for hairpin loop, −0.3 kcal/mol; minimum ΔG value for self-dimer, −0.9 kcal/mol; and minimum ΔG value for heterodimer, −0.9 kcal/mol for further filtering of the SSRs (Fig. 1).

Fig. 1
figure 1

Schematic diagram depicting the SSR primer filtering pipeline

PCR analysis

PCR was carried out with 10 uL reaction volume (15 ng genomic DNA, 1× PCR buffer (10 mM Tris pH 9.1 at 20 °C, 50 mM KCl, 0.01% Triton™ X-100); Vivantis Technologies, Malaysia), 1.5 mM MgCl2, 0.2 mM dNTPs (Promega Corporation, Madison, Wisconsin, USA), 0.2 μM forward and reverse primer (Integrated DNA Technologies Pte. Ltd., Singapore), and Taq DNA polymerase (Vivantis Technologies, Malaysia). The temperature profile used is as follows: initial denaturation at 95 °C for 3 min, 30 cycles of denaturation (95 °C, 30 s), annealing (45–60 °C depending on the primer pair, 45 s), extension (72 °C, 1 min), and final extension at 72 °C for 5 min. Amplifications were carried out in the Applied Biosystems Veriti™ 96-well Thermal Cycler (Thermo Fisher Scientific, Madison, Wisconsin, USA). PCR products were resolved with electrophoresis using 8% non-denaturing polyacrylamide gel in 1× Tris-borate EDTA buffer at 100 V for 60–75 min in the C.B.S. Scientific Triple Wide Mini-Vertical System™ (C.B.S. Scientific Company San Diego, California, USA) and visualized using 0.5 ug mL−1 ethidium bromide staining and UV illumination using the Enduro GDS Touch Imaging System (Labnet International, Inc, Edison, New Jersey, USA). Gels were scored manually for the presence or absence of bands.


A total of 131 SSR markers were synthesized, and 98% of these were comprised by dinucleotide repeats (or 2-mer), while the remaining 2% are tri- and tetranucleotide repeats comprising of 1% each, as shown in Fig. 2. AG and GA motifs are the most abundant dinucleotide repeats found in the 131 SSR markers, with 29 and 18.3%, respectively. These are followed by CT (14.5%), TG (13.7%), TC (11.5%), AC (7.6%), and GT (3.8%) repeats. In addition, tri- and tetranucleotide repeats of AAG (1.0%) and ACAT (1.0%) were also observed.

Fig. 2
figure 2

Percentage of repeat motifs of the selected SSRs

All SSRs showed successful amplification in coconut genomic DNA. Of the 131 SSRs, 113 (86%) were polymorphic among the test coconut varieties, while the remaining 18 (14%) were monomorphic. An average of 2.70 alleles per locus was observed across test varieties, implying a high degree of polymorphism of the selected SSRs. Representative gels of polymorphic SSRs optimized among coconut genotypes are presented in Fig. 3 on which distinct and good amplification patterns were observed. The product size of these markers ranged from 130 to 690 bp. The summary of the characteristics of the selected SSRs are presented in Table 2 which includes the name of marker, annealing temperature, repeat motif, contig distribution, product size range, and number of alleles (Fig. 4).

Fig. 3
figure 3

Representative gels of polymorphic SSRs optimized among coconut genotypes

Table 2 Characteristics of the selected coconut SSRs with name, primer sequence, annealing temperature, repeat motif, contig number, and expected allele
Fig. 4
figure 4

Percentage of polymorphic SSRs per motif


The work of Lantican et al. [9] was able to identify genome-wide SSRs based on de novo prediction of repeat loci across the CATD genome assembly. However, the predicted loci were not screened nor tested in actual wet lab conditions. Here, the SSR markers generated were subjected to various filtering parameters that are advantageous based on genome distribution, repeat motif, and ideal thermodynamic properties. Markers with AT/AT and TA/TA repeat motifs were excluded in the selection since these are the most common type of repeats found in the coconut/palm genome [9, 13, 29] on which the high repeat content may hinder specificity of the markers and/or may result to nonspecific amplification of products. Markers were also selected based on the distribution in the contig to cover the entire coconut genome. In silico PCR in the CATD genome assembly was performed. This allows checking of contig specificity of the marker and ensures in vitro SSR amplification [22]. Allele size range of the markers was also limited to 80–400 bp for easy visualization in gel, and OligoAnalyzer tool was used to check dimerization capability and formation of hairpin loop of the primers to produce high-quality markers.

The predominance of dinucleotide repeats in coconut and other related species is supported by previous works of Rivera et al. [21], Palliyarakkal et al. [13], Xia et al. [29], and Lantican et al. [9]. This result coincides with studies of Palliyarakkal et al. [13] and Xia et al. [29] on which AG/GA/TC/CT motifs were also the most common dinucleotide repeats found in coconut/palm genome. The results obtained here are consistent with previous studies on which high levels of polymorphism are likely attributed to phenotypic variation and differences in the breeding behaviors of the dwarf and tall varieties which are said to be generally autogamous (self-pollinating) and allogamous (cross-pollinating), respectively [14, 21, 25]. The development of SSRs using advanced bioinformatics tools in this study has become very efficient in generating high number of markers in coconut. The generated SSRs here are expected to contribute to the pool of available molecular markers [10, 16, 28,29,30] for fingerprinting, genetic diversity analysis and QTL mapping, and other relevant studies in coconut.

Microsatellites or SSRs are a very useful molecular tool for studying genetic diversity and genotyping of coconut [8, 10, 15, 16, 30]. It has been extensively used in these analyses since SSR markers are abundant and well distributed throughout the genome, multi-allelic, co-dominant, highly polymorphic, and highly reproducible [11, 20]. Previous studies like Rivera et al. [21], Perera et al. [15], Xiao et al. [30], and Wu et al. [28] have already developed SSRs in coconut for genetic diversity studies, and these markers showed high levels of polymorphism as well.


Here, we demonstrated that a locally established bioinformatics pipeline can mine SSRs from NGS data with actual utility in terms of amplification and distinguishing power across several varieties of coconut. The advantage of using a genome-wide bioinformatics prediction approach in marker development is its relatively fast and cost-effective way of generating vast amounts of markers. SSRs and SNPs can be easily generated automatically in the genome sequences with the use of these programs or pipelines.

Polymorphic markers in this study will be further used to genotype the coconut mapping population generated from a three-way cross of ‘Pacific’ LAGT and CATD and ‘Indo-Atlantic’ WAT coconut for QTL mapping analysis. The development of novel SSR markers for coconut will serve as a valuable resource for mapping QTLs, assessment of genetic diversity and population structure, hybridity testing, and other marker-assisted plant breeding applications.

Availability of data and materials

The dataset(s) supporting the conclusions of this article is (are) included within the article (and its additional file(s)).


  1. Anderson JR, Lubberstedt T (2003) Functional markers in plants. Trend Plant Sci 8:554–560

    Article  Google Scholar 

  2. Batugal P, Bourdeix R, Baudouin L (2009) Coconut breeding. In: Jain SM, Priyadarshan PM (eds) Breeding plantation tree crops: tropical species. Springer Science Business Media, LLC, pp 327–373

    Chapter  Google Scholar 

  3. Doyle JJ, Doyle JL (1990) Isolation of plant DNA from fresh tissue. Focus 12:13–15

    Google Scholar 

  4. FAOSTAT Database. 2013.

  5. Foale M (2003) The coconut odyssey: the bounteous possibilities of the tree of life. ACIAR Monograph 101:132

    Google Scholar 

  6. Harries HC (1978) Evolution, dissemination and classification of Cocos nucifera L. Bot Rev 44(3):265–320

    Article  Google Scholar 

  7. Kesawat MS, Kumar BD (2009) Molecular markers: it’s application in crop improvement. J Crop Sci Biotechnol 12(4):169–181

    Article  Google Scholar 

  8. Konan KJN, Koffi KKE, Konan JL, Lebrun P, Dery SK, Sangare A (2007) Microsatellite gene diversity in coconut (Cocos nucifera L.) accessions resistant to lethal yellowing disease. Afr J Biotechnol 6(4):341–347

    Google Scholar 

  9. Lantican D, Strickler S, Canama A, Gardoce R, Mueller L, Galvez H (2019) De novo genome sequence assembly of dwarf coconut (Cocos nucifera L. ‘Catigan green dwarf’) provides insights into genomic variation between coconut types and related palm species. G3 (Bethesda) 9(8):2377–2393. PMID: 31167834; PMCID: PMC6686914

    Article  Google Scholar 

  10. Lebrun P, N'cho Y, Seguin M, Grivet L, Baudouin L (1998) Genetic diversity in coconut (Cocos nucifera L.) revealed by restriction fragment length polymorphism (RFLP) markers. Euphytica 101(1):103–108.

    Article  Google Scholar 

  11. Mason AS (2015) SSR genotyping. In: Batley J (ed) Plant genotyping. Springer, New York, pp 77–89

    Chapter  Google Scholar 

  12. Meerow AW, Krueger RR, Singh R, Low ETL, Maizuraithnin M, Ooi LCL (2012) Coconut, date, and oil palm genomics. In: Schnell RJ, Priyadarshan PM (eds) Genomics of tree crops. © Springer Science Business Media, LLC, pp 299–351.

    Chapter  Google Scholar 

  13. Palliyarakkal MK, Ramaswamy M, Vadivel A (2011) Microsatellites in palm (Arecaceae) sequences. Bioinformation. 7(7):347–351

    Article  Google Scholar 

  14. Perera L, Russell JR, Provan J, Powell W (1999) 1999. Identification and characterization of microsatellites in coconut (Cocos nucifera L.) and the analysis of coconut population in Sri Lanka. Mol Ecol 8:344–346

    Google Scholar 

  15. Perera L, Russell JR, Provan J, Powell W (2003) Studying genetic relationships among coconut varieties/populations using microsatellite markers. Euphytica 132:121–128

  16. Perera L, Russell RJ, Provan J, Mcnicol WJ, Powell W (1998) Evaluating genetic relationships between indigenous coconut (Cocos nucifera L.) accessions from Sri Lanka by means of AFLP profiling. Theor Appl Genet 96(3):545–550.

    Article  Google Scholar 

  17. Perera PIP, Hocher V, Verdeil JL, Yakandawala DMD, Weerakoon LK (2007) Recent advances in anther culture of coconut (Cocos nucifera L.). In: Xu Z, Li J, Xue Y, Yang W (eds) Biotechnology and sustainable agriculture 2006 and beyond. Springer, Dordrecht, p 451

    Chapter  Google Scholar 

  18. Philippine Statistics Authority. 2017. Agricultural foreign trade statistics of the Philippines: 2015.

  19. Philippine Statistics Authority. 2018. Selected Statistics on Agriculture 2013-2017.

  20. Powell W, Machray GC, Provan J (1996) Polymorphism revealed by simple sequence repeats. Trends Plant Sci 1(7):215–222

    Article  Google Scholar 

  21. Rivera R, Edwardds KJ, Barker JHA, Arnold GM, Ayad G, Hodgkin T, Karp AA (1999) Isolation and characterization of polymorphic microsatellites in Cocos nucifera L. Genome 42:668–675

    Article  Google Scholar 

  22. Rotmistrovsky K, Jang W, Schuler GD (2004) A web server for performing electronic PCR. Nucleic acids research, 32 (web server issue), W108–W112.

  23. Sharma A, Namdeo AG, Mahadik KR (2008) Molecular markers: new prospects in plant genome analysis. Pharmacogn Rev 2(3):23–34

    Google Scholar 

  24. Sindhumole P, Ambili SN (2011) Marker assisted breeding in coconut (Cocos nucifera L.). Gregor Mendel Foundation Proceedings 2011:30-32

  25. Teulat B, Aldam C, Trehin R, Lebrun P, Barker JHA, Arnold GM, Karp A, Baudouin L, Rognon F (2000) 2000. An analysis of genetic diversity in coconut (Cocos nucifera L.) populations from across the geographic range using sequence-tagged microsatellites (SSRs) and AFLPs. Theor Appl Genet 100:764–771.

    Article  Google Scholar 

  26. Vavilov NI (1926) Centres of origin of cultivated plants. Bull Appl Bot Genet Plant Breed 16:1–248

    Google Scholar 

  27. Wang X, Wang L (2016) GMATA: an integrated software package for genome-scale SSR mining, marker development and viewing. Front Plant Sci 7:1350

    Google Scholar 

  28. Wu Y, Yaodong Y, Qadri R, Iqbal A, Li J, Fan H, Wu Y (2019) Development of SSR markers for coconut (Cocos nucifera L.) by selectively amplified microsatellite (SAM) and its applications. Trop Plant Biol 12(1):32–43

    Article  Google Scholar 

  29. Xia W, Xiao Y, Liu Z, Luo Y, Mason A, Haikuo F, Yang Y, Zhao S, Peng M (2014) Development of gene-based simple sequence repeat markers for association analysis in Cocos nucifera. Mol Breed 34(2):1–11

    Article  Google Scholar 

  30. Xiao Y, Luo Y, Yang Y, Fan H, Xia W, As M, Zhao S, Sager R, Qiao F (2013) Development of microsatellite markers in Cocos nucifera and their application in evaluating the level of genetic diversity of Cocos nucifera. Plant Omics J 6(3):193–200

    Google Scholar 

  31. Xu Y, Crouch JH (2008) Marker-assisted selection in plant breeding: from publications to practice. Crop Sci 48:391–407.

    Article  Google Scholar 

Download references


We express our gratitude to the Department of Science and Technology — Philippine Council for Agriculture, Aquatic and Natural Resources Research and Development (DOST-PCAARRD) for funding the project “QTL mapping in coconut for high yield outstanding quality of copra oil and other coconut major by-products” under the program “Improvement of Coconut Varieties through Genomics, Genetics, and Breeding for a Competitive and Sustainable Philippine Coconut Industry (Genomics-Assisted Molecular Breeding).” This research also has been made possible by the commitment and support of the Philippine Genome Center, and the Philippine Coconut Authority-Zamboanga Research Center (PCA-ZRC), which we also thank for providing the plant materials. We likewise acknowledge the valuable technical services rendered by Ms. Desiree Diaz for the completion of this work.


The Department of Science and Technology — Philippine Council for Agriculture, Aquatic and Natural Resources Research and Development of the Department of Science and Technology (DOST-PCAARRD)

Author information

Authors and Affiliations



RESC and JC conducted the wet lab experiments and drafted original manuscript. RESC and DVL conducted bioinformatics analyses. RRG and ANCM supervised wet lab experiments and confirmed validation procedures. RLR provided the leaf samples for analysis. AOCS, HFG, and CER secured funding for the project. DVL, ANCM, and RRG conceptualized the hypothesis and methodology of the study. All authors have read and approved the final manuscript for publication.

Corresponding author

Correspondence to Anand Noel C. Manohar.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Caro, R.E.S., Cagayan, J., Gardoce, R.R. et al. Mining and validation of novel simple sequence repeat (SSR) markers derived from coconut (Cocos nucifera L.) genome assembly. J Genet Eng Biotechnol 20, 71 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Bioinformatics
  • Catigan green dwarf genome
  • Coconut (Cocos nucifera L.)
  • Marker-assisted breeding
  • SSRs