Skip to main content

Short tandem repeat (STR) variation from 6 cities in Iraq based on 15 loci

Abstract

Background

One thousand sixty-one individuals were sampled from the cities of Anbar, Baghdad, Basra, Diyala, Najaf, and Wasit in Iraq and typed for 15 forensic STRs to explore the genetic structure of Iraq and develop a forensic DNA database. The total number of alleles that were identified was 203.

Result

Analyses of molecular variance (AMOVA) were then conducted Baghdad provides a good representation of the rest of the country, while Anbar is the most genetically distinct. The average heterozygosities of these loci was 0.779, homozygosities was 0.221, polymorphism information content was 0.77, power of discrimination was 0.927, and power of exclusion was 0.563. At these loci, a matching genotype will occur, on average, in 1 in 8.152 × 1017 individuals. For paternity tests, the average paternity probability for a matching profile is 99.9997%.

Conclusions

These loci are appropriate for use in forensic and paternity testing for this population. Iraq is similar to other countries in the Middle East, particularly Iran and Turkey, and is more similar to Europe than either Asia or Africa.

Background

Short tandem repeats, also known as microsatellites, are repeated sequences of DNA, usually consisting of 2 to 6 bases. They are highly polymorphic and distributed throughout the genome. These sequences vary in the number of repeats, resulting in them being multiallelic [1]. Consequently, STRs prove very useful for forensic and paternity testing when multiple STR loci are employed [2]. The distribution of allele frequencies within the tested population enables the calculation of match probabilities. The objective of this study was to report allele frequencies for these STRs in the Iraqi population, facilitating their application in forensic and paternity investigations. The population of Iraq was estimated at 39 million in July 2017 [3]. Arabs constitute 75–80% of the population, while Kurds make up 15–20% [3, 4]. Other minorities, including Turkmen, Assyrian, Shabak, Yazidi, and more, are also present [3]. Samples were collected from various Iraqi cities: Diyala, Anbar, Wasit, Najaf, Baghdad, and Basra. Among these, Baghdad, Diyala, and Najaf are situated in the center of Iraq; Wasit is in the east; Basra is in the southeast; and Al-Anbar is in the west [4] and subjected to STR profiling.

Methods

Ethics committee approval and patient approval

The collection of samples used in the research (blood, buccal swab, saliva, and fingernails) was approved by the Center’s Scientific Research Ethics Committee before starting work. The volunteers signed a written consent to participate in this study.

Sample collection

A total of 1061 unrelated individuals were sampled from six Iraqi cities: Diyala (n = 139), Anbar (n = 132), Wasit (n = 120), Najaf (n = 119), Baghdad (n = 354), and Basra (n = 198). The samples were collected as Buccal swabs from patients and laboratory workers at private laboratories. These samples were used to study the population genetic diversity in Iraq. The population genetics of Iraqis is important due to their ethnic diversity. Several studies have been conducted to analyze the genetic diversity of Iraqi populations, including studies on the distribution of Y chromosome haplotypes, 23-YSTR markers, and 15 STRs. These studies aimed to analyze the genetic structure of different cities in Iraq and to create a forensic DNA database for the country.

DNA extraction

Samples were extracted using a PrepFiler Forensic DNA Extraction Kit (Applied Biosystems, Foster City, CA), and their DNA content was quantified with NanoDrop (Thomson, Wilmington, DE).

PCR amplification

Fifteen autosomal STR markers (the 13 CODIS core loci and D19S433 and D2S1338) were genotyped along with the amelogenin locus on the X and Y chromosomes using the Applied Biosystems AmpFiSTR® Identifiler™ kit (3). Approximately 1ng of template DNA was amplified for each sample following the protocols described in the user’s manual (Applied Biosystems). The samples were amplified with an Applied Biosystems Veriti® PCR System (Applied Biosystems).

DNA typing

Amplification products were diluted 1:15 in Hi-Di™ formamide and GS500-LIZ internal size standard (Applied Biosystems) and analyzed on the 16-capillary ABI Prism® 3100 Genetic Analyzer. On a 36-cm array, POPTM-4 (Applied Biosystems) was used for higher-resolution separations.

Data collection

Data collection was performed with Data Collection v. 2.0 software (Applied Biosystems), and samples were analyzed with GeneMapper v. 3.2 software (Applied Biosystems) at the Forensic DNA Center for Research and Training of Al-Nahrain University.

Statistical analyses

Allele frequencies for each locus, heterozygosities, homozygosities, polymorphic information anthropology, and content (PIC) measure the formativeness of a genetic marker, indicating how well it can distinguish between different alleles in a population. PIC = 1−∑ (pi^2) PIC ranges from 0 to 1, with higher values indicating greater allelic diversity and formativeness of the marker.

Matching probability (MP), powers of discrimination (PD) is a measure of how well a genetic marker can discriminate between individuals within a population.

Pi and pj are the frequencies of the i-th and j-th alleles at the locus, summed over all possible pairwise allele combinations.

PD also ranges from 0 to 1, with higher values indicating better discriminatory power. PD = ∑ (pi * pj).

Powers of exclusion (PE) measures the probability that two individuals randomly chosen from a population will have different genotypes at a specific locus.

PE = 1–P (same genotype)

P (same genotype) is the probability that two individuals chosen at random will have the same genotype at the locus.

PE can also range from 0 to 1, with higher values indicating a higher probability of distinguishing between individuals.

And typical paternity index (TPI) was calculated at each locus using Power Stats v1.2 [5]. The significance value for the HWE test was set at 𝛼 = 0.05 and p values were calculated using Monte Carlo methods with 100,000 permutations. The Holm-Bonferroni method [6] was used to account for multiple testing. Multidimensional scaling (MDS) was done using the “stats” package (R Core Team, 2016) to compare the Iraqi population to other Middle Eastern countries as well as countries from Europe, Asia, and Africa [7].

Results

A total of 203 alleles were identified in this study. The distribution of these alleles across the 15 short tandem repeat (STR) loci is detailed in Table 1. Notably, notable occurrences include allele 8 of TPOX, which had the highest frequency at 51.0%, allele 12 of CSF1PO at 32.9%, allele 12 of D5S818 at 31.9%, allele 12 of D5S818 at 30.8%, allele 11 of CSF1PO at 30.7%, allele 11 of D16S539 at 30.6%, and allele 12 of D13S317 at 30.4%. It is noteworthy that the most diverse loci in terms of the number of distinct alleles were FGA with 23 alleles, D18S51 with 20 alleles, D2S1338 with 19 alleles, D19S433 with 17 alleles, and D21S11 with 17 alleles.

Table 1 Allele frequencies for STR loci D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818, and FGA

The city of Baghdad boasted the largest sample size, comprising 354 individuals. Interestingly, five alleles were exclusive to Baghdad and were not found in the other cities or neighboring countries. Three of these five alleles were absent from the other five Iraqi cities and all countries used for comparison, including Turkey, Iran [8], Syria [7], Kuwait [9], Saudi Arabia [10], Poland, Belgium [11], China [12], Japan [13], Equatorial Guinea, and Angola [14] (Fig. 1). Specifically, these alleles were allele 8 at vWA (found in Baghdad at 0.3%), allele 25 at vWA (found in Baghdad at 0.1%), and allele 16 at FGA (found in Baghdad at 0.3%). Baghdad also exhibited two other rare alleles: allele 11 at vWA (0.3% occurrence) and allele 15 at FGA (0.1% prevalence). Importantly, these alleles were absent in other Iraqi cities, Middle Eastern nations, and the European and Asian countries used for comparison [15]. However, they were present in the African countries used for comparison, such as Equatorial Guinea and Angola [16, 17]. Specifically, allele 11 at vWA was observed at 3.5% frequency in Equatorial Guinea and 0.4% in Angola, while allele 15 at FGA was found in Equatorial Guinea at 0.4%.

Fig. 1
figure 1

Multidimensional scaling plot of the genetic distances between Iraq, Turkey, Iran, Syria, Kuwait, Saudi Arabia, Poland, Belgium, China, Japan, Equatorial Guinea, and Angola

An additional rare allele, allele 22 at D13S317, was identified in the city of Wasit, with a frequency of 0.4%, but was not observed in any other Iraqi cities or countries in the comparison set [7].

To assess whether the observed genotype frequencies adhere to the expected frequencies, the population’s adherence to the Hardy-Weinberg equilibrium (HWE) was examined [18]. Table 2 displays the log-likelihood ratio p values for the HWE test [19, 20], which, after Holm-Bonferroni adjustment for multiple testing, were not deemed significant [21].

Table 2 Hardy-Weinberg Equilibrium p-values (for each five city) at each locus

Details concerning each locus are provided in Table 3, encompassing heterozygosities (He), homozygosities (Ho), polymorphism information content (PIC), matching probabilities (MP), powers of discrimination (PD), powers of exclusion (PE), and the typical paternity index (TPI). All values fall within a range of 0.0 to 1.0, where 0.0 denotes the absence (or presence) of heterozygotes (or homozygotes) and 1.0 indicates full heterozygosity (or homozygosity) across the sampled individuals. The significance of these metrics varies with the genotype; high He, low Ho, substantial PIC, PD, and PE values characterize loci pertinent to forensic and paternity analyses. The average values for the 15 loci are He = 0.791, Ho = 0.209, PIC = 0.72, PD = 0.923, and PE = 0.587. Composite metrics, namely composite matching probability (CMP) and composite paternity index (CPI), are obtained by multiplying each locus’s MP and TPI, respectively.

Table 3 Forensic efficiency parameters for 15 STR loci (1061 samples) included matching probabilities, powers of discrimination, polymorphism information content, powers of exclusion, a typical paternity index, homozygosities, and heterozygosities

Exploring STR analysis within Iraqi populations has been somewhat limited, and the studies that have been conducted employed distinct commercial STR kits, posing challenges for direct comparisons with the current study’s outcomes. Nevertheless, there is an overlap of the 15 STR markers between the aforementioned studies and neighboring countries, as outlined in Table 4.

Table 4 Comparison p values of HWE for STRs data for Arab-related populations

Among these 15 common STR markers, D2S1338 demonstrated the highest Polymorphic Information Content (PIC) value within this study, while FGA exhibited the highest PIC value among Jordanians, succeeded by D19S433 in Turkey, D18S51 and FGA in Palestinians [25], and D19S433 and FGA in Saudi Arabia [27]. Conversely, the TPOX locus presented the lowest PIC value in this study, and within Jordanians [22], D13S317 displayed the lowest PIC value, followed by TPOX in Turkey [23], Palestinians, Saudi Arabia [27], and Iran [26].

Discussion

In forensic investigations, CMP is frequently expressed as the likelihood that one individual in a certain population subset possesses a genotype matching the composite value. With the global population approximately at 7.5 billion, an average MP of 0.173 or lower across all 15 loci is necessary for the CMP to be interpreted as 1 in 7.5 billion or greater. The dataset's mean MP across 15 loci is 0.073. In paternity reports, the probability of paternity is determined by CPI/(CPI + 1), where a CPI of 100 or higher indicates a probability of 99.0% or higher. For the average TPI to reach 1.43 for 15 loci, the average TPI over 1.5 locations was 2.36 [20].

The findings of this investigation highlight the utility of these 15 autosomal STR loci as valuable markers for forensic and paternity testing within the Iraqi population. Within certain loci, the presence of a wide range of repeated sequences suggests possible population admixture dynamics. To elucidate the origins of the less common alleles discovered in this study, it may be beneficial to conduct further allele frequency analyses within specific ethnic groups in Iraq.

Lastly, the comparative evaluation of forensic genetic efficiency parameters across diverse populations, including the Middle Eastern region, is vital for comprehending genetic diversity and potential intermingling among these populations. To achieve a more accurate comparison, it is strongly recommended to conduct comparative studies employing the same STR kit across these populations, considering that different STR kits have been used previously to define forensic efficiency parameters.

Consistency and variation

Looking at the values within each population, you can observe that some alleles have consistent frequencies across populations, while others vary more. This variation could be due to a variety of factors, including historical migrations, genetic drift, and natural selection.

Genetic diversity

The range of allele frequencies across the different populations indicates the genetic diversity present in these regions. Higher diversity might be indicative of a more mixed or heterogeneous population.

The findings from this study highlight the effectiveness of the 15 autosomal STR loci as markers for forensics and paternity testing within the Iraqi population. It is worth noting that certain loci exhibit a wide range of repeats, which may be indicative of population admixture. To gain insight into the origins of the less common alleles identified in this study, further investigations should focus on analyzing allele frequencies within specific ethnic groups in Iraq.

A comprehensive assessment of forensic genetic efficiency parameters is crucial for understanding genetic diversity and admixture among different populations, including those in the Middle East. To ensure a robust comparison, it is strongly recommended that comparative studies utilize the same STR kit across these populations. Previous studies have employed disparate STR kits, which has affected the description of forensic efficiency parameters.

Conclusion

The study discussed in the given text highlights the effectiveness of 15 autosomal STR loci as markers for forensics and paternity testing within the Iraqi population. The CMP represents the likelihood that an individual in a specific population subset possesses a genotype matching a composite value. With an average Match Probability (MP) of 0.073 across all 15 loci in a global population of 7.5 billion, the CMP can be interpreted as 1 in 7.5 billion or greater The presence of a wide range of repeated sequences within certain loci suggests possible population admixture dynamics. To understand the origins of less common alleles, further allele frequency analyses within specific ethnic groups in Iraq are recommended. Conducting comparative evaluations of forensic genetic efficiency parameters across diverse populations, including those in the Middle East, is essential for comprehending genetic diversity and potential intermingling among these populations. Using the same STR kit for these studies is crucial for accurate comparisons. The study also highlights that some alleles have consistent frequencies across populations, while others exhibit more variation, which could be attributed to factors such as historical migrations, genetic drift, and natural selection.

Availability of data and materials

All data analyzed during this study are included in this article.

Abbreviations

STR:

Short tandem repeat

AMOVA:

Analysis of molecular variance

CODIS:

Combined DNA index system

PIC:

Polymorphic Information Content

MP:

Matching probability

PD:

Power of discrimination

PE:

Probability of exclusion

PI:

Paternity index

HWE:

Hardy-Weinberg Equilibrium

References

  1. Ellegren H (2004) Microsatellites: simple sequences with complex evolution. Nat Rev Genet 5:435–445

    Article  Google Scholar 

  2. Awad EO, Alsafar H, Tay GK, Theyab JBJM, Mubasher M, Sheikh NE-E, AlHarthi H, Crawford MH, El Ghazali G (2015) Autosomal short tandem repeat (STR) variation based on 15 loci in a population from the central region (Riyadh Province) of Saudi Arabia. Forensic Res 6:1

    Google Scholar 

  3. World Factbook (2017) Central intelligence agency, Washington, DC, p 2017. https://www.cia.gov/the-world-factbook/about/cover-gallery/2017-cover/

  4. Kirmanj S (2013) Identity and Nation in Iraq. Lynne Rienner Publishers, Inc: Boulder, CO

    Book  Google Scholar 

  5. Tereba A (1999) Tools for analysis of population statistics. Profiles in DNA 2(3):14–16 PowerStats version 1.2, Promega Corporation

    Google Scholar 

  6. Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70 https://www.cia.gov/library/publications/the-world-factbook/geos/iz.html

    MathSciNet  MATH  Google Scholar 

  7. Abdin L, Shimada I, Brinkmann B, Hohoff C (2003) Analysis of 15 short tandem repeats reveals significant differences between the Arabian populations from Morocco and Syria. Leg Med 5:S150–S155

    Article  Google Scholar 

  8. Avati MK, Akbari MT (2018) Allele frequency of 15 autosomal short tandem repeat loci in iranian population with comparison to some other population. J Human Gen Genome 2(2):e87127

    Google Scholar 

  9. Alenizi M, Goodwin W, Ismael S, Hadi S (2008) STR data for the AmpFℓSTR® Identifiler® loci in Kuwaiti population. Leg Med 10(6):321–325

    Article  Google Scholar 

  10. Al Idrissi E, Crawford MH, El Ghazali G (2015) Autosomal short tandem repeat (STR) variation based on 15 loci in a population from the central region (Riyadh Province) of Saudi Arabia. J Forensic Sci 6:267

    Google Scholar 

  11. Decorte R, Gilissen A, Cassiman JJ (2003) Allele frequency data for 15 STR loci (AMPFISTR®SGM plusTM and AmpFISTR® ProfilerTM) in the Belgian population. Int Congr Ser 1239:219–222

    Article  Google Scholar 

  12. Wang ZY, Yu RJ, Wang F, Li XS, Jin TB (2005) Genetic polymorphisms of 15 STR loci in Han population from Shaanxi (NW China). Forensic Sci Int 147(1):89–91

    Article  Google Scholar 

  13. Hashiyada M (2000) Short tandem repeat analysis in Japanese population. Electrophoresis 21(2):347–350

    Article  Google Scholar 

  14. Babiker HMA, Schlebusch CM, Hassan HY, Jakobsson M (2011) Genetic variation and population structure of Sudanese populations as indicated by 15 Identifiler sequence-tagged repeat (STR) loci. Investig Genet 2(12)

  15. Szczerkowska Z, Kapińska E, Wycocka J, Cybulska L (2004) Northern Polish population data and forensic usefulness of 15 autosomal STR loci. Forensic Sci Int 144(1):69–71

    Article  Google Scholar 

  16. Alves C, Gusmão L, López-Parra AM, Mesa MS, Amorim A, Arroro-Pardo E (2005) STR allelic frequencies for an African population sample (Equatorial Guinea) using AmpFISTR Identifiler and Powerplex 16 kits. Forensic Sci Int 148(2-3):239–242

    Article  Google Scholar 

  17. Beleza S, Alves C, Reis F, Amorim A, Carracedo A, Gusmão L (2005) 17 STR data (AmpF/STR Identifiler and Powerplex 16 System) from Cabinda (Angola). Forensic Sci Int 141(2-3):193–196

    Article  Google Scholar 

  18. Hardy HG (1908) Mendelian proportions in a mixed population. Science 28:49–50

    Article  Google Scholar 

  19. Castle WE (1903) The laws of Galton and Mendel and some laws governing race improvement by selection. Proc Am Acad Arts Sci 35:233–242

    Google Scholar 

  20. Edwards AW, Hardy GH (2008) Anecdotal, Historical and Critical Commentaries on Genetics. Genetics Society of America 179:1143–1150

  21. Emel HY, Yonar FC, Karatas O, Rayimoglu G, Engels WR (2009) Exact tests for Hardy-Weinberg proportions. Genetics 183(4):1431–1441 https://CRAN.R-project.org/package=HWxtest

    Article  Google Scholar 

  22. Yasin SR, Hamad MM, Elkarmi AZ, JaranJ AS (2005) African Jordanian population genetic database on fifteen short tandem repeat genetic loci. Croat Med J 46(4):587–92

  23. Asicioglu F, Canpolat E, Ozturk O, Erkan I (2023) Population Data and Internal Validation of the 21 Short Tandem Repeat Loci in Turkish Population. Pak J Zool 55(2):501–512

    Google Scholar 

  24. Yavuz I, Sarikaya AT (2005) Turkish Population Data for 15 STR Loci by Multiplex PCR. J Forensic Sci 50(3):737–738

    Article  Google Scholar 

  25. Halimah MSA (2009) Genetic Variation of 15 Autosomal Short Tandem Repeat (STR) Loci in the Palestinian Population of Gaza Strip. Leg Med 11(4):203–204

  26. Shepard EM, Herrera RJ (2006) Iranian STR variation at the fringes of biogeographical demarcation. Forensic Sci Int 158(2-3):140–148

    Article  Google Scholar 

  27. Osman AE, Alsafar H, Tay GK, JBJM T, Mubasher M, Eltayeb-El Sheikh N, Al Harthi H, R Core Team (2016) R: A language and environment for statistical computer. R Foundation for Statistical Computing, Vienna Austria https://www.R-project.org/

    Google Scholar 

Download references

Acknowledgements

We extend our thanks to everyone who agreed to give a sample and contribute to the completion of this research We would also like to thank Al-Nahrain University, the Forensic DNA Center, represented by the director of the center, who expressed his approval to complete this research. Thanks are extended to the researchers who participated in the completion of this research.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Mohammed M. Al-Zubaidi was responsible for the design, supervision, and preparation of the manuscript. Majeed A. Sabbah contributed to the planning, sampling, and statistical analysis. Thooalnoon Y. Al‑janabi reviewed and edited the manuscript. Dhuha S. Namaa, Haider K. Al‑rubai, and Hala K. Ibrahem are responsible for sample collection, demographic data, and lab work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mohammed M. Al-Zubaidi.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The author declares that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sabbah, M.A., Al-Zubaidi, M.M., Al-janabi, T.Y. et al. Short tandem repeat (STR) variation from 6 cities in Iraq based on 15 loci. J Genet Eng Biotechnol 21, 160 (2023). https://doi.org/10.1186/s43141-023-00570-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s43141-023-00570-1

Keywords