Spatial ‘BIG DATA’

Modeling and inference for large, or massive, spatial-temporal data sets

Overview

Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. Dr. Banerjee has led pioneering contributions in model-based solutions for spatial “BIG DATA” analysis. These include the development and implementation of classes of low-rank spatial process models known as “predictive processes” that achieve dimension-reduction by projecting the original process on an optimal lower-dimensional subspace. Another, more recent, line of research led by Dr. Banerjee develop sparsity-inducing spatial processes known as “Nearest-Neighbor Gaussian processes” (NNGP) that achieve scalabilty by exploiting sparsity in models without discarding spatial information. Finally, Dr. Banerjee has also explored “meta-kriging”, which refers to dividing and conquering massive spatial data sets by analyzing subsets of the data and subsequently pooling the analyses to draw inference for the entire data. These approaches have attracted significant attention among statisticians and practitioners and are widely deployed to deliver inference for massive spatial databases without compensating for richness of modeling.