A general framework for active learning with arbitrary data based on Christoffel functions

Ben Adcock
Simon Fraser University

Active learning is an important concept in machine learning, in which the learning algorithm is able to choose where to query the underlying ground truth to improve the accuracy of the learned model. As machine learning techniques come to be more commonly used in scientific computing problems, where data is often expensive to obtain, the use of active learning is expected to be particularly important in the design of efficient algorithms. In this work, we introduce a general framework for active learning in regression problems. Our framework extends the standard setup by allowing for general types of data, rather than merely pointwise samples of the target function. This generalization covers many cases of practical interest, such as data acquired in transform domains (e.g., Fourier data), vector-valued data (e.g., gradient-augmented data), data acquired along continuous curves, and, multimodal data (i.e., combinations of different types of measurements). Our framework considers random sampling according to a finite number of sampling measures and arbitrary nonlinear approximation spaces (model classes). We introduce the concept of generalized Christoffel functions and show how these can be used to optimize the sampling measures. We prove that this leads to near-optimal sample complexity in various important cases. This work focuses on applications in scientific computing, where, as noted, active learning is often desirable, since it is usually expensive to generate data. We demonstrate the efficacy of our framework for gradient-augmented learning with polynomials, Magnetic Resonance Imaging (MRI) using generative models and adaptive sampling for solving PDEs using Physics-Informed Neural Networks (PINNs). This is joint work with Juan M. Cardenas (UC Boulder) and Nick Dexter (Florida State). The relevant paper can be found here: https://arxiv.org/abs/2306.00945