Research in Machine Learning

I. Robustness of deep neural networks. In colaboration with E. Pauwels and V. Magron we want to analyze the robustness of deep neural networks with respect to noise in the input. Recent works have considered semidefinite relaxations to address this robustness issue for RELU neural networks. In fact this relaxation is the first level of the Moment-SOS hierarchy that we have developed and we claim that we can improve such analysis by solving higher-level semidefinite relaxations of the hierarchy. Indeed the scalability issue of the Moment-SOS hierarchy can be overcome for certain types of neural nets (e.g. RELU, but not only) by implementing an adequate ``sparse" version of the hierarchy. The same methodology can also be used to provide certified upper bounds on the Lipschitz constant of Deep Neural Nets. See e.g. 

II. The Christoffel function for ML.  This research is done mainly in collaboration with Edouard Pauwels (IMT, Toulouse) and more recently also with Mihai Putinar (University of California, Santa Barbara). The ultimate goal is to promote the Christoffel function as a new and promising tool for some important applications in Machine Learning (ML) and Data Analysis (e.g. data representation and encoding, outlier detection, density estimation). Our foundational approach is summarized in the three papers:

  • Lasserre J.B., Pauwels E. The empirical Christoffel function with applications in data analysis,  Adv. Comp. Math. 45 (2019), pp. 1439--1468
  • Pauwels E., Putinar M., Lasserre J.B.  Data analysis from empirical moments and the Christoffel function, Found. Comp. Math. 21 (2021), pp. 243--273
  • Lasserre J.B., Pauwels E. Sorting out typicality with the inverse moment matrix SOS polynomial, Proceedings of NIPS 2016, Barcelona 2016arXiv:1606.03858

We started from a simple and striking observation. ​First we draw a cloud of 2-D points and built up the empirical moment matrix Md (with moments up to order "2d") associated witht the points of the cloud. Then we built up the SOS polynomial x -> cd(x) of degree "2d" whose associated Gram matrix is the inverse of M. Then one readily observes that the level sets of cd capture the shape of the cloud quite well even for small "d"! When µ is a measure with compact support K and with a density f w.r.t. the Lebesgue measure on K, then the reciprocal of cis the Christoffel function, well-known in approximation theory. For instance when K has a simple geometry (a box, ellipsoid, simplex) then it is well-known that 1/cd(x) converges pointwise to the density f "times" an equilibrium density, intrinsic to K, for x in K, and converges to zero for x outside K

Therefore, somehow the Christoffel function identifies the support of K. What is striking is how well this happens (even for small "d") for the Christoffel function associated with an empirical measure on a cloud of points drawn from an unknown distribution µ. Also it is worth noticing that the empirical moment matrix Md is quite easy to construct and to invert (modulo the dimension), and with no optimization process involved! This should make cd(x) an appealing and easy to use tool  in some important applications of Machine Learning (ML) and statistics. For instance we have already shown that it has a  remarkable efficiency in e.g. outlier detection  and estimation of density. In addition, if the cloud of points is on a manifold then the kernel of Mcontains a lot of information that can be used, e.g., to learn the dimension of the manifod when it has an algebraic boundary, with relatively little effort compared to more sophisticated methods of the literature.

All these potential applications in Machine Learning and Data Analysis are described in our forthcoming book

``The Christoffel_Darboux Kernel for Data Analysis" by J.B. Lasserre, E. Pauwels, M. Putinar, Cambridge University Press, 2021

III. New applications and connections of the Christoffel function. 

It turns out that a non-standard applications of the Christoffel functions allow to recover accurately a function from a sample of its values and in several cases even without the Gibbs phenomenon if the function is discontinuous. To illustrate the basic idea  consider a function  f: [0,1] -> [0,1]. We (i) build up the moment matrix of the empirical measure µ on [0,1]×[0,1] supported on the graph of the function at the points where it is evaluated, (ii) and  the degree-n Christoffel function Cn(x,y) associated with this measure. Then for each fixed x in 0,1], we approximate f(x) by f_n(x):=argmin_y Cn(x,y), which can bedone efficiently as y -> C_n(x,y) is a univariate SOS polynomial. This strategy is described in 

We have also shown the potential of the Christoffel function as a simple tool for supervised classification in data analysis (of moderate dimension)  in 

We have also shown that the Christoffel function of a measure µ on a cartesian product X×Y disintegrates as the product of the Christoffel function of the marginal of µ on X and the Christoffel function of a measure with the flavor of the conditional probability on Y given x in X, exactly as for Borel measures on X×Y. See 

In addition, we have also revealed some (in the author's opinion surprising) connections with other disciplines  involving the polynomial Pell's equation, certificates of positivity, equilibrium measure of compact sets, and  a duality result of Nesterov for some cones of positive polynomials. See e.g.