Detecting the sources of chemicals in the Black Sea using non-target screening and deep learning convolutional neural networks
The Black Sea is an important ecosystem, which is affected by various anthropogenic pressures, such as shipping activities and wastewater inputs from large coastal cities. Significant loads of chemical pollutants are being continuously brought in by major European rivers. This study investigated the spatial distribution of chemicals in the Ukrainian shelf (the northwestern part of the Black Sea) and their main sources. Chemical occurrence data used in the study was generated within the Joint Black Sea Surveys (JBSS, 2016, 2017), which took place as a part of the EU/UNDP EMBLAS II project (www.emblasproject.org). During the JBSS, seawater samples were analyzed by non-target screening workflow using high-throughput liquid chromatography high-resolution mass spectrometry (LC-HRMS) method. Open-source algorithms were applied to generate a combined dataset of 30,489 detected chemical signals and their intensities. Out of these, 35 compounds were tentatively identified by the application of a non-target screening identification workflow based on automated matching of their mass spectra against those in available mass spectral libraries. The dataset was used to generate images, representing spatial distribution of each of the signals. These images were then used as an input to a deep learning convolutional neural network classification model. The study resulted in the development of an open-source end-to-end workflow for the estimation of the pollution load by chemicals contributed by the two major inflowing rivers (Danube and Dnieper) and other, so far unidentified, sources. A dedicated dashboard was built to facilitate data visualization per detected signal/compound. The presented model has proven to be especially useful at the prioritization of signals of unknown compounds, which is of key importance for the follow up structure elucidation efforts of bulky non-target screening data. The deep learning approach for prioritization of emerging contaminants in the environment has been used for the first time.
The datasets are available at Zenodo (https://doi.org/10.5281/zenodo.6474592).
The datasets and the predicted spatial distribution are visualized in an interactive dash application, which is available at https://norman-data.eu/BS/