Contamination of groundwater with naturally occurring arsenic (As) poses a health risk to millions of American and tens of millions of people worldwide. It is important to identify and predict where contamination is occurring as more than two billion people rely on groundwater as their source of drinking water. Long term exposure to high levels of As has been linked to a range of diseases and an increase in mortality rate. Similarly, long term exposure to manganese (Mn) in drinking water impairs cognitive function in children and is associated with problems afflicting memory, attention, and motor skills. A crucial limitation to the current literature is the inability to predict groundwater As or Mn levels at the scales required to make sound decisions on where to site groundwater wells. Previous studies have identified processes of As and Mn release at individual sites, but lack the transferability to the global scale. To address these issues, I synthesized a database of groundwater geochemistry and other associated hydrological and geologic variables. I aggregated more than one hundred datasets from a wide range of peer-reviewed published datasets as well as government and NGO sources. In total, these datasets represent a large portion of the world, with a heavy focus in Southeast Asia and the United States. In addition to As and Mn concentrations, the database contains geospatial, geologic, hydrological, and environmental parameters (e.g. well depth, lithology, water table depth). To integrate data from a range of studies, formats, and reporting approaches, I established a uniform set of data handling and reporting standards and incorporated these into a reproducible data aggregation workflow—including instructions on how to effectively maintain and organize geospatial, hydrological, or other similar data. To our knowledge, this is the largest global database (n ≈ 1,000,000 with 250 parameters) related to groundwater geochemistry. I then utilized this database and a parsimonious set of remotely-sensed flooding variables as predictors to develop machine learning models that predict groundwater As and Mn concentrations in Southeast Asia. These models accurately identify whether a location is safe for a drinking water well and produce minimal erroneous predictions that result in public health threats.
Do You Approve this Abstract?