Bayesian statistical analysis of hydrogeochemical data using point processes: a new tool for source detection in multicomponent fluid mixtures
Abstract
Hydrogeochemical data may be seen as a point cloud in a multi-dimensional space. Each dimension of this space represents a hydrogeochemical parameter (i.e. salinity, solute concentration, concentration ratio, isotopic composition…). While the composition of many geological fluids is controlled by mixing between multiple sources, a key question related to hydrogeochemical data set is the detection of the sources. By looking at the hydrogeochemical data as spatial data, this paper presents a new solution to the source detection problem that is based on point processes. Results are shown on simulated and real data from geothermal fluids.
Introduction
The composition of many geological fluids is controlled by variable contributions of multiple sources (e.g. seawater, meteoric water, hydrothermal water). The knowledge of these sources helps to built conceptual and quantitative models of fluid and mass transfer in the Earth’s crust (Yardley \BBA Bodnar, \APACyear2014). If the sources are known, the contribution of each sources in every mixture can be inferred from hydrogeochemical data (e.g. Carrera \BOthers., \APACyear2004; Skuce \BOthers., \APACyear2015). In the case where the sources are not known, they can be inferred from the data (e.g. Pinti \BOthers., \APACyear2020).
The paper presents a Bayesian method of source detection based on point processes. The method is inspired by pattern detection methodologies used in image analysis, animal epidemiology and astronomy (R. Stoica \BOthers., \APACyear2004; R\BPBIS. Stoica, Gay\BCBL \BBA Kretzschmar, \APACyear2007; R\BPBIS. Stoica, Martinez\BCBL \BOthers., \APACyear2005; R\BPBIS. Stoica, Martínez\BCBL \BBA Saar, \APACyear2007).
1 Materials and methods
Let be a set of sources, giving the source position within a multi-dimensional (in practice two-dimensional) space formed by the hydrogeochemical parameters. A data point is a mixture of these sources (i.e. it is explained by these sources) if it is a barycenter of these sources as stated in (Faure, \APACyear1997) i.e.
(1) |
with for each and .
In the Euclidean plane, the source pattern (i.e. set of sources) is unknown and also somehow outlined by the set of hydrogeological data points. The key hypothesis at the basis of our work is that this pattern is made of interacting points. A preliminary condition for our model is that the hidden sources pattern exhibits the following properties :
-
•
the number of sources is not known but it should be controlled or minimal in a certain sense
-
•
two sources cannot be too close
-
•
the data points originating from a mixture of sources should be rather close to them
-
•
the convex hull enclosing the set of data points is enclosed within the convex hull given by the source positions.
These hypotheses allow to consider the sources as a realisation of a point process described by a Gibbs probability density:
with the configuration of sources (or set of sources), the normalising constant and the energy function.
The energy function is built as the sum of two components. The first term, is the data term and it controls the positioning of the sources with respect to the observed data points . Its expression is given by
Here is the absolute value difference between the area of the sources and the data point convex hull, respectively. The function represents the minimum distance between the data point and the sources cloud and it is given by , with the number of sources in the configurations. The measure counts the number of data points in , that belong to the convex hull given by . The parameters and are chosen such that to penalize important differences between the convex hull areas, to encourage the source to be situated rather close to the data points and to increase the number of data points that are explained by the sources, respectively.
The second term, is the interaction term and it writes as
with the number of pairs of sources at distance shorter than , which is a pre-fixed known value. The parameters are chosen in order to penalize a too high number of sources and pairs of sources situated too close, respectively.
The point process on a finite domain (i.e. ), that is defined by the previous energy function, is well defined and locally stable R. Stoica (\APACyear2014). Based on these terms, the model is able to generate point configurations that exhibit the properties required by the assumed hypotheses. The source pattern is estimated by the point configuration that maximises the probability density
(2) |
The solution of the problem (2) is obtained by implementing a simulated annealing algorithm. This algorithm is a global optimisation method that iteratively samples from while making slowly. Convergence properties of this algorithm are shown in R\BPBIS. Stoica, Gregori\BCBL \BBA Mateu (\APACyear2005).
1.1 Optimisation algorithm
The implemented simulated annealing algorithm has the following structure:
-
1)
set , , , , and
-
2)
while
-
a)
generate with probability
-
b)
set and
-
a)
-
3)
set
This structure implements a sub-optimal cooling schedule, for practical reasons. An optimal logarithmic cooling schedule as specified by R\BPBIS. Stoica, Gregori\BCBL \BBA Mateu (\APACyear2005) may be considered.
The sampling of is done via the Metropolis-Hasting algorithm described below :
-
1)
set , , , with
-
2)
with probability choose birth, with probability choose death and with probability choose change.
-
birth:
-
a)
generate a random point on and set
-
b)
calculate
-
a)
-
death:
-
a)
choose a point of and set
-
b)
calculate
-
a)
-
change:
-
a)
choose a point of and generate a random point in the ball and set
-
b)
calculate
-
a)
-
birth:
-
3)
the new configuration is accepted with the appropriate probability ; otherwise the algorithm remains in the same state .
The previous dynamic is irreducible, Harris recurrent and geometric ergodic, guaranteeing the convergence of the algorithm towards the distribution of interest given by van Lieshout (\APACyear2000); Moller \BBA Waagepetersen (\APACyear2003); R. Stoica (\APACyear2014).
2 Results
The proposed model was tested on two different data sets. The first one is a simulated data set, the second one is a set of hydrogeochemical data from geothermal fluids described in (Pinti \BOthers., \APACyear2020). The model is coded in C++, and the results are displayed in R with the library ”ggplot”.
We set , , , , , and . The parameters were chosen separately, for each data set, after several trials and errors.
2.1 Simulated data
The data are created by generating three sources. The vector of contributions of each sources to a data point is generated by a Dirichlet law with parameters . Hence the data points are points uniformly distributed in the convex hull given by the sources positions in the 2D space of hydrogeochemical parameters. Moreover, a Gaussian noise, of mean and variance for each coordinate, was added to each datapoint to represent the noise during the measurement.
We set the parameters to and . The results are shown in Figure 1. The black dots are the data points, the blue symbols are the real sources and the gradient of blue color shows the density of simulated sources.

There are three areas that exhibit a high density of simulated sources, which corresponds to the actual number of sources. Moreover their positions are really close to the real sources.
2.2 Real data
We are now comparing the results of our model with the results of the model presented in (Pinti \BOthers., \APACyear2020). This model gives the smallest triangle (in term of area) that enclose the data. In Figure 2 we set , and in Figure 3 we set . The parameters are completely different than in 1 because the data are not in the same scale.


The model is able to detect the number and the position of the sources inferred in (Pinti \BOthers., \APACyear2020), while the model sufficient statistics provides a more complete morpho-statistical description of the sources.
3 Conclusions and perspectives
Clearly, the use of this method requires at least partial knowledge regarding the model parameters. Such a knowledge is built by embedding the available geological information into prior distributions.
This new tool should be improved in order to become helpful not only in the analysis of geological fluids but also in other fields that deals with mixtures (e.g. Phillips \BBA Gregg, \APACyear2003; Longman \BOthers., \APACyear2018). This is possible due to the use of the embedded spatial and Bayesian paradigms.
Acknowledgments
This work was performed in the frame of the DEEPSURF project ( http://lue.univ-lorraine.fr/fr/impact-deepsurf ) at Université de Lorraine. This work was supported partly by the french PIA project Lorraine Université d’Excellence, reference ANR-15-IDEX-04-LUE.
References
- Carrera \BOthers. (\APACyear2004) \APACinsertmetastarCarVaz04{APACrefauthors}Carrera, J., Vázquez-Suñé, E., Castillo, O.\BCBL \BBA Sánchez-Vila, X. \APACrefYearMonthDay2004. \BBOQ\APACrefatitleA methodology to compute mixing ratios with uncertain end-members A methodology to compute mixing ratios with uncertain end-members.\BBCQ \APACjournalVolNumPagesWater resources research4012. \PrintBackRefs\CurrentBib
- Faure (\APACyear1997) \APACinsertmetastarFau97{APACrefauthors}Faure, G. \APACrefYear1997. \APACrefbtitlePrinciples and applications of geochemistry Principles and applications of geochemistry (\BVOL 625). \APACaddressPublisherPrentice Hall New Jersey, United States,. \PrintBackRefs\CurrentBib
- Longman \BOthers. (\APACyear2018) \APACinsertmetastarLonVerErs18{APACrefauthors}Longman, J., Veres, D., Ersek, V., Phillips, D\BPBIL., Chauvel, C.\BCBL \BBA Tamas, C\BPBIG. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleQuantitative assessment of Pb sources in isotopic mixtures using a Bayesian mixing model Quantitative assessment of pb sources in isotopic mixtures using a bayesian mixing model.\BBCQ \APACjournalVolNumPagesScientific reports816154. \PrintBackRefs\CurrentBib
- Moller \BBA Waagepetersen (\APACyear2003) \APACinsertmetastarMollWaag04{APACrefauthors}Moller, J.\BCBT \BBA Waagepetersen, R\BPBIP. \APACrefYear2003. \APACrefbtitleStatistical inference and simulation for spatial point processes Statistical inference and simulation for spatial point processes. \APACaddressPublisherChapman and Hall/CRC. \PrintBackRefs\CurrentBib
- Phillips \BBA Gregg (\APACyear2003) \APACinsertmetastarPhiGre03{APACrefauthors}Phillips, D\BPBIL.\BCBT \BBA Gregg, J\BPBIW. \APACrefYearMonthDay2003. \BBOQ\APACrefatitleSource partitioning using stable isotopes: coping with too many sources Source partitioning using stable isotopes: coping with too many sources.\BBCQ \APACjournalVolNumPagesOecologia1362261–269. \PrintBackRefs\CurrentBib
- Pinti \BOthers. (\APACyear2020) \APACinsertmetastarPint20{APACrefauthors}Pinti, D\BPBIL., Shouakar-Stash, O., Castro, M\BPBIC., Lopez-Hernández, A., Hall, C\BPBIM., Rocher, O.\BDBLRamírez-Montes, M. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleThe bromine and chlorine isotopic composition of the mantle as revealed by deep geothermal fluids The bromine and chlorine isotopic composition of the mantle as revealed by deep geothermal fluids.\BBCQ \APACjournalVolNumPagesGeochimica et Cosmochimica Acta. \PrintBackRefs\CurrentBib
- Skuce \BOthers. (\APACyear2015) \APACinsertmetastarSkuLon15{APACrefauthors}Skuce, M., Longstaffe, F., Carter, T.\BCBL \BBA Potter, J. \APACrefYearMonthDay2015. \BBOQ\APACrefatitleIsotopic fingerprinting of groundwaters in southwestern Ontario: Applications to abandoned well remediation Isotopic fingerprinting of groundwaters in southwestern ontario: Applications to abandoned well remediation.\BBCQ \APACjournalVolNumPagesApplied Geochemistry581–13. \PrintBackRefs\CurrentBib
- R. Stoica (\APACyear2014) \APACinsertmetastarSto14{APACrefauthors}Stoica, R. \APACrefYearMonthDay2014. \BBOQ\APACrefatitleModélisation probabiliste et inférence statistique pour l’analyse des données spatialisées Modélisation probabiliste et inférence statistique pour l’analyse des données spatialisées.\BBCQ \APACjournalVolNumPagesResearch Habilitation Thesis, Université Lille1. \PrintBackRefs\CurrentBib
- R. Stoica \BOthers. (\APACyear2004) \APACinsertmetastarStoDesZer03{APACrefauthors}Stoica, R., Descombes, X.\BCBL \BBA Zerubia, J. \APACrefYearMonthDay2004. \BBOQ\APACrefatitleA Gibbs point process for road extraction from remotely sensed images A gibbs point process for road extraction from remotely sensed images.\BBCQ \APACjournalVolNumPagesInternational Journal of Computer Vision572121–136. \PrintBackRefs\CurrentBib
- R\BPBIS. Stoica, Gay\BCBL \BBA Kretzschmar (\APACyear2007) \APACinsertmetastarStoiGayKret07{APACrefauthors}Stoica, R\BPBIS., Gay, E.\BCBL \BBA Kretzschmar, A. \APACrefYearMonthDay2007. \BBOQ\APACrefatitleCluster pattern detection in spatial data based on Monte Carlo inference Cluster pattern detection in spatial data based on monte carlo inference.\BBCQ \APACjournalVolNumPagesBiometrical Journal: Journal of Mathematical Methods in Biosciences494505–519. \PrintBackRefs\CurrentBib
- R\BPBIS. Stoica, Gregori\BCBL \BBA Mateu (\APACyear2005) \APACinsertmetastarStoiGregMate05{APACrefauthors}Stoica, R\BPBIS., Gregori, P.\BCBL \BBA Mateu, J. \APACrefYearMonthDay2005. \BBOQ\APACrefatitleSimulated annealing and object point processes : tools for analysis of spatial patterns Simulated annealing and object point processes : tools for analysis of spatial patterns.\BBCQ \APACjournalVolNumPagesStochastic Processes and their Applications1151860-1882. \PrintBackRefs\CurrentBib
- R\BPBIS. Stoica, Martinez\BCBL \BOthers. (\APACyear2005) \APACinsertmetastarStoMaSa05{APACrefauthors}Stoica, R\BPBIS., Martinez, V\BPBIJ., Mateu, J.\BCBL \BBA Saar, E. \APACrefYearMonthDay2005. \BBOQ\APACrefatitleDetection of cosmic filaments using the Candy model Detection of cosmic filaments using the candy model.\BBCQ \APACjournalVolNumPagesAstronomy & Astrophysics4342423–432. \PrintBackRefs\CurrentBib
- R\BPBIS. Stoica, Martínez\BCBL \BBA Saar (\APACyear2007) \APACinsertmetastarStoMaSa07{APACrefauthors}Stoica, R\BPBIS., Martínez, V\BPBIJ.\BCBL \BBA Saar, E. \APACrefYearMonthDay2007. \BBOQ\APACrefatitleA three-dimensional object point process for detection of cosmic filaments A three-dimensional object point process for detection of cosmic filaments.\BBCQ \APACjournalVolNumPagesJournal of the Royal Statistical Society: Series C (Applied Statistics)564459–477. \PrintBackRefs\CurrentBib
- van Lieshout (\APACyear2000) \APACinsertmetastarLies00{APACrefauthors}van Lieshout, M\BPBIN\BPBIM. \APACrefYear2000. \APACrefbtitleMarkov Point Processes and their Applications Markov point processes and their applications. \APACaddressPublisherImperial College Press, London. \PrintBackRefs\CurrentBib
- Yardley \BBA Bodnar (\APACyear2014) \APACinsertmetastarYarBod14{APACrefauthors}Yardley, B\BPBIW.\BCBT \BBA Bodnar, R\BPBIJ. \APACrefYearMonthDay2014. \BBOQ\APACrefatitleFluids in the continental crust Fluids in the continental crust.\BBCQ \APACjournalVolNumPagesGeochemical Perspectives311–2. \PrintBackRefs\CurrentBib