Pan-Arab Array
Overview
The Arab region, consists of 22 countries and extends geographically from the Atlantic Ocean in the West to the Arabian Sea in the East, and from the Mediterranean Sea in the north to the Indian Ocean in the southeast. The region spans more than 13.6M Km2 and groups a total population of more than 447M (Ref. Appendix1). Although the populations in this region share many similarities in terms of religions, history, and culture, they represent a high diversity in terms of ethnic and linguistic groups, social constructions, and historic identities [Seteney et al., 2013].
Despite the size of the Arab population and its high prevalence of genetic studies [Tadmouri et al., 2014], the majority of reference panels and genetic studies have focused on European ancestries [Sirugo et al., 2019]. The lack of genetic studies in the Arab region is due mainly to the lack of national strategies targeting precision medicine, the cost of the genomic sequencing and the associated infrastructure necessary for the storage and the analysis of the data, and the lack of human capacity in the fields of genetics, bioinformatics, and other related specialties. Qatar is among the pioneering countries in the region that has precision medicine at the heart of its health strategy. Through its national initiative, Qatar Genome Program (QGP), The Qatar Precision Medicine Institute (QPMI) has contributed so far to the sequencing of more than 14,669 whole genomes from Qatar population and over 2,946 whole genomes from 19 other Arab countries (Algeria, Bahrain, Djibouti, Egypt, Iraq, Jordan, Kuwait, Lebanon, Mauritania, Morocco, State of Palestine, Oman, Saudi Arabia, Somalia, Sudan, South Sudan, Syria, Tunisia, United Arab Emirates, and Yemen). The QGP data is the largest existing data on the Arab and Middle Eastern regions. It contributes to the understanding of the genetic structure of these populations, elucidating the genetic factors of various diseases, and laying the ground of precision medicine in the region. It has enabled the creation of efficient population-specific tools for genotyping and imputation that can help in boosting the genetic research in the region through offering cost-effective accurate tools for genetic characterization of the Arab genome specifically and the Middle East generally [Rodriguez-Flores et al., 2021] [Razali et al., 2021].
An efficient and cost-effective alternative to whole genome sequencing is whole genome genotyping that analyzes human DNA by using genome-wide genotyping arrays together with an imputation reference panel. Whole genome genotyping allows the identification of most common single nucleotide variants present in a population. Imputation enables to infer the rest of the SNVs not directly assayed by the array by using structural correlation between the SNVs and haplotypes structure of the reference population.
Most contemporary commercial genome-wide genotyping arrays are based on genetic variation available in the public domain, such as the International HapMap consortium project and the 1000 Genomes Project, neither of which sample genetic variation in Arab or Middle Easten populations. Whole genome genotyping using commercial arrays and publicly available reference panels have shown to be less effective for Qatar population than using a population-specific reference panel [Razali et al., 2021].
The main aim of the project is to use QGP reference panel of Arab and Middle East whole genome data in the creation of a grid of variants that intends to maximize the imputation power across the whole genome or targeted regions in the genome of these populations. The whole genome grid will be used in the custom design of an array that will be used together with an imputation reference panel to offer a cost-effective service of whole genome genotyping for Arab and Middle Eastern populations.
The project will analyze the structure of the genome of these populations to inform the best design and imputation approach for population specific whole-genome genotyping arrays. Thermofisher, one of the main leaders in array design and manufacturing, will be the partner to produce the array. The project also aims to build the necessary infrastructure to impute the data whenever available using the QGP reference panel that will be developed in this project.
The Arab Region includes Algeria, Bahrain, Comoros, Djibouti, Egypt, Iraq, Jordan, Kuwait, Lebanon, Libya, Mauritania, Morocco, Palestine, Oman, Qatar, Saudi Arabia, Somalia, Sudan, Syria, Tunisia, the United Arab Emirates and Yemen