Background: Despite being underreported, orofacial cleft lip/palate (CLP) remains in the top five of South Africa’s most common congenital disorders. Maternal air pollution exposure has been associated with CLP in neonates. South Africa has high air pollution levels due to domestic burning practices, coal-fired power plants, mining, industry, and traffic pollution, among other sources. We investigated air pollutant levels in geographic locations of CLP cases. Methods: In a retrospective case series study (2006–2020) from a combined dataset by a Gauteng surgeon and South African Operation Smile, the maternal address at pregnancy was obtained for 2,515 CLP cases. Data from the South African Air Quality Information System was used to calculate annual averages of particulate matter (PM) concentrations of particles < 10 µm (PM10 ) and < 2.5 µm (PM2.5 ). Correlation analysis determined the relationship between average PM2.5 /PM10 concentrations and CLP birth prevalence. Hotspot analysis was done using the Average Nearest Neighbor tool in ArcGIS. Results: Correlation analysis showed an increasing trend of CLP birth prevalence to PM10 (CC = 0.61, 95% CI = 0.38–0.77, p < 0.001) and PM2.5 (CC = 0.63, 95% CI = 0.42–0.77, p < 0.001). Hot spot analysis revealed that areas with higher concentrations of PM10 and PM2.5 had a higher proclivity for maternal residence (z-score = –68.2, p < 0.001). CLP birth prevalence hotspot clusters were identified in district municipalities in the provinces of Gauteng, Limpopo, North-West, Mpumalanga, and Free State. KwaZulu-Natal and Eastern Cape had lower PM10 and PM2.5 concentrations and were cold spot clusters. Conclusions: Maternal exposure to air pollution is known to impact the fetal environment and increase CLP risk. We discovered enough evidence of an effect to warrant further investigation. We advocate for a concerted effort by the government, physicians, researchers, non-government organizations working with CLP patients, and others to collect quality data on all maternal information and pollutant levels in all provinces of South Africa. Collaboration and data sharing for additional research will help us better understand the impact of air pollution on CLP in South Africa.
Particulate matter (PM) data with an aerodynamic diameter of 2.5 (PM2.5) and 10 (PM10) micrometers between 2006 and 2020 were sourced from the South African Air Quality Information System (https://saaqis.environment.gov.za/) through scripted POST (a method used to send data to a destination using the Internet) requests. Data available in hourly averages per day were downloaded, filtered, and merged into comprehensive and continuous datasets for the entire study period for each ambient air quality monitoring station for the two listed pollutants. The data sets were quality controlled in the web-based user interface Jupyter Lab, considering negative values, missing data, and outliers. Annual averages were calculated using the 99th and 98th percentile for PM10 and PM2.5, respectively, and only if data availability for a monitoring station exceeded 50%. This was done to match the temporal resolution of the health data to enable a direct correlation between PM concentrations and CLP birth prevalence. Though 50% data availability is generally considered low, the threshold for inclusion of an air quality monitoring station’s data in this study was lowered to ensure a larger geographical representation of ambient air quality. To provide an overview of annual PM concentrations over the study period at a provincial level, descriptive statistics, including mean, standard deviation, and median and interquartile range (IQR) were conducted. The 50% data availability threshold as well as the provincial concentration averages are considered limitations of the study, as uncertainties are introduced when data used may not be considered representative due to lacking data or due to high spatial variability. A retrospective cohort of patients with CLP for the period 2006–2020 was obtained from two databases and combined into one dataset. The first database consisted of patient records of 4,804 patients treated at a hospital in Pretoria, Gauteng by a maxillo-facial and oral surgeon. The maternal place of residence during the pregnancy was extracted from the surgeon’s database of patients (the database is self-managed by the surgeon and comprises all the patients he treats). All patients were included regardless of age. The second database was provided by Operation Smile South Africa and comprised 485 individuals. Operation Smile is an international medical charity that raises funds to provide free surgical procedures for children and young adults born with CLP. Cases are screened to confirm the diagnosis by medical practitioners including pediatricians, nurses, anesthesiologists, and surgeons all formally licensed, trained and certified to work with patients at the mission site. For all cases in both databases, CLP was classified into eight categories: cleft lip (CL); cleft lip and cleft alveolus (CLA); cleft lip, cleft alveolus, hard palate cleft and soft palate cleft (CLAP); hard palate cleft (hP); hard palate cleft and soft palate cleft (hpsP); soft palate cleft (sP); combination cleft (CL or CLA and sP without hP); and oblique (involves soft tissue and/or skeleton around the eye). Patients were included in our database if they were accompanied by their biological mother (18 years or older) and the mother reported their place of residence (not necessarily their place of residence during pregnancy, and this is discussed in the limitations). A total of 5,289 cases of CLP were merged from the two datasets; however, only 2,515 could be geocoded due to missing information for maternal place of residence during pregnancy. Half the CLP cases were in Gauteng province (52%) since the larger of the two databases used was from a surgeon located in Gauteng (although 39% of his patients were from other provinces). Research ethics approval for the study was granted by the University of Pretoria Research Ethics Committee (NAS 142/2020 and NAS 334/2020). Data was first managed in Microsoft Office™ packages: Microsoft Excel™ and Microsoft Access™. Cases of CLP were assigned geographic coordinates in ArcGIS 10.3. Cases from maternal place of residence were then aggregated to the district municipality level. Life-time birth prevalence of CLP per district municipality was then calculated per 1 000 live births. The following equation incorporating yearly live births from Statistics South Africa for the period 2006 to 2020 (Stats SA 2020) was used as the denominator: Correlation analysis, conducted using STATA version 15 [47], was used to determine the link between annual average PM2.5 and PM10 concentrations at a site and CLP birth prevalence at the district municipality level. The PM2.5 and PM10 concentrations obtained from air quality monitoring stations that were included in the analysis had more than 50% data availability. Correlation coefficients (CC) are reported with the associated 95% confidence intervals (CI) and p-values (α < 0.05) denoting whether data values are statistically significant. The Average Nearest Neighbor tool in ArcGIS was used to measure the distribution of CLP cases to determine whether cases were clustered or uniformly spaced and to identify possible patterns in clusters. The Average Nearest Neighbor tool measures the distance between the centroid of each feature and its nearest neighbor’s centroid. It then averages all these nearest-neighbor distances to calculate a ratio using the observed average distance divided by the expected average distance. If the ratio is less than 1, the pattern exhibits clustering. If it is greater than 1, the trend is toward dispersion. The Hot Spot Analysis tool in ArcGIS 10.3 was used to identify statistically significant spatial clusters of high values (hot spots) and low values (cold spots) of CLP birth prevalence. The results provide z-scores and p-values. Z-scores are standard deviations and very high or very low (negative) z-scores are associated with very small p-values and are found in the tails of the normal distribution. For statistically significant positive z-scores, the larger the z-score is, the more intense the clustering of high values (hot spot). For statistically significant negative z-scores, the smaller the z-score is, the more intense the clustering of low values (cold spot). Confidence levels were derived from z-scores of hot and cold spots and were based on 90%, 95%, and 99%.