Public defence Luc Steinbuch / Openbare verdediging PhD proefschrift

Abstract

Soils, crop yields and other agronomic variables can be spatially mapped using various methods. This thesis aims to: feature and advance current developments in model-based geostatistics as well as Bayesian statistics in a spatial context, bring these developments in line with contemporary computational possibilities, and explore its limitations. Main focus points are: 1) spatial change-of-support, 2) incorporating parameter uncertainty, 3) hierarchical spatial modelling, 4) use of informative priors and 5) small-data situations.

I showed the added value of using model-based geostatistics for a case study with sorghum and millet in West-Africa. Based on potential crop yields calculated by plant scientists for ca. 38 locations, I used kriging with external drift to predict potential crop yields per pixel in the whole area. In combination with land use data and country borders, I summed those potential crop yield predictions to determine area totals; with spatial stochastic simulation, I estimated the uncertainty of those total production potentials as well as the spatial cumulative distribution function. I concluded that the use of model-based geostatistics offers sophisticated tools for exploring relationships between yields and environmental variables and also to assist policy makers with tangible results (including uncertainty ) on yield gaps at multiple levels of spatial aggregation.

I explored the use of legacy information to improve the accuracy of a prediction map showing unripened subsoils for a reclamation area in the west of The Netherlands, applying a Bayesian extension of binomial logistic regression. The 'probability of ripening' parameter was modelled as a linear combination of three predictors: soil type, freeboard (the desired water level in the ditches, compared to surface level), and mean lowest groundwater table. My research focused on quantifying the influence of informative prior distributions (inferred from legacy data) with different information levels, in combination with different sample sizes, on the resulting parameters and maps. When using the `overall accuracy' statistical validation metric, I found — for this case study — an optimal value for the prior information level. The effect of incorporating informative priors however is only detectable for smaller datasets. Bayesian binomial logistic regression proved to be a flexible mapping tool but the accuracy gain compared to conventional logistic regression was marginal and may not outweigh the extra modelling and computing effort.

I investigated the accuracy of prediction uncertainties in case of sparse data when model-based geostatistics is applied on an area-to-point kriging (ATPK) situation, illustrated with disaggregating millet crop yields in Burkina-Faso. Because the dataset of areal means is often considerably smaller (< 50 observations) than datasets conventionally dealt with in geostatistical analyses, it might be worth including parameter uncertainty. Using simulated data I compared several models with an increasing number of parameters considered stochastic. In most cases with known short-range behaviour, an approach that disregards uncertainty in the variogram distance parameter gives a reasonable assessment of prediction uncertainty.

I explored and explained an existing implementation of a Bayesian generalised linear geostatistical model — a hierarchical spatial model — including practical issues and their solutions. Using the depth of the Pleistocene sand layer in the Dutch province of Flevoland, with the depth reduced to a binary variable, I created a map of the probability of Pleistocene sand within 1.2m from the surface. The implementation appeared quite demanding with respect to the minimal required sample size and computational costs; the third hurdle however, tuning the algorithm, was removed by adding an automated tuning algorithm. The implementation might especially be useful in case of few relevant predictors.

Rounding up, I concluded that future research might be directed to 1) informative priors especially in combination with small data situations; and 2) minimal data requirements. A related more general discussion is in how far the soil- and crop science community are actually interested in the tools and possibilities delivered by model-based geostatistics and its Bayesian extensions, beside (and partly overlapping) the contemporary main focus on big data spatial mapping approaches.