My work aims at evaluating and optimizing the performance of large-scale dynamical systems using probabilistic methods. Many of these systems have a heterogeneous structure that can be described by a graph: Internet, data centers, online matching systems, transportation networks, etc. For these systems, I develop control policies to optimize average performance criteria, such as delay or availability, while respecting the operational constraints specific to each system. The abstraction provided by stochastic modeling and graph theory helps me better understand the impact of decisions at different time scales. Since my arrival at CNRS, I am particularly interested in developing reinforcement learning algorithms adapted to these systems and modeling their performance thanks to Markov decision processes (MDPs).
Most of my research so far has revolved one way or another around product-form queueing systems, i.e., queueing systems that can be modeled using discrete-state Markov chains whose stationary measures can be written as a finite product. Such probability distributions often arise from detailed balance equations and are tightly related to exponential families and probabilistic graphical models. For such queueing systems, I am interested in predicting and/or optimizing performance, when the system parameters are either known or unknown. Questions that I find particularly relevant in this area are: Are there simple (necessary and/or sufficient) conditions under which such product-form distributions arise? Can we exploit the properties of these distributions to improve prediction/optimization?
Although these are fundamental questions that can find applications in a wide variety of domains, I have studied two applications in particular: redundancy scheduling in datacenters, and online stochastic matching. Redundancy in datacenters was introduced to exploit service variability at the server level (i.e., when the waiting or processing time of a task may vary significantly from one server to another), and redundancy scheduling refers to the question of how to schedule replicas over servers to best exploit this variability. Online stochastic matching is an extension of the classical matching problem in graph theory, whereby items to be matched may arrive randomly over time. The goal is to find the best matches despite uncertainty about the future.
My publications are available on HAL and Google Scholar.
Stochastic networks and queueing systems often lead to Markov decision processes (MDPs) with large state and action spaces as well as nonconvex objective functions, which hinders the convergence of many reinforcement learning (RL) algorithms. Policy-gradient methods, a class of RL algorithms that directly optimize the policy via stochastic gradient ascent on the objective function, perform well on MDPs with large state and action spaces, but they sometimes experience slow convergence due to the high variance of the gradient estimator. In this work, we show that some of these difficulties can be circumvented by exploiting the structure of the underlying MDP. We first introduce a new family of gradient estimators called score-aware gradient estimators (SAGEs). When the stationary distribution of the MDP belongs to an exponential family parametrized by the policy parameters, SAGEs allow us to estimate the policy gradient without relying on value-function estimation, contrary to classical policy-gradient methods like actor-critic. To demonstrate their applicability, we examine two common control problems arising in stochastic networks and queueing systems whose stationary distributions have a product-form, a special case of exponential families. As a second contribution, we show that, under appropriate assumptions, the policy under a SAGE-based policy-gradient method has a large probability of converging to an optimal policy, provided that it starts sufficiently close to it, even with a nonconvex objective function and multiple maximizers. Our key assumptions are that, locally around a maximizer, a nondegeneracy property of the Hessian of the objective function holds and a Lyapunov function exists. We believe that the proof technique is of independent interest and can be adapted to other gradient-based methods. Finally, we conduct a numerical comparison between a SAGE-based policy-gradient method and an actor-critic algorithm. The results demonstrate that the SAGE-based method finds close-to-optimal policies more rapidly, highlighting its superior performance over the traditional actor-critic method.
We consider the following online stochastic matching problem. Items from different classes arrive according to independent Poisson processes. Matching compatibilities between items are described by a simple graph over the classes, so that two items can be matched if and only if their classes are neighbors in the graph. This graph is assumed to be connected and non-bipartite. Each incoming item is matched immediately with the oldest compatible unmatched item, if any. The evolution of the sequence of unmatched item classes, ordered by arrival times, defines an irreducible Markov chain. Previous work has derived (i) a necessary and sufficient condition for this Markov chain to be positive recurrent, which we assume is satisfied, and (ii) a closed-form expression for its stationary distribution. In this presentation, we first provide more direct method to derive the stationary distribution, based on quasi-reversibility, a notion introduced by Kelly (1979). We then derive new closed-form expressions for several performance metrics, such as the waiting probability and the mean waiting time, which can be efficiently implemented using dynamic programming. Both these formulas and the numerical results that they allow us to derive are used to better understand the impact of parameters on performance. In particular, we characterize performance in a so-called heavy-traffic regime, in which the number of items of a subset of the classes goes to infinity while items of other classes become scarce.
In this presentation, we will consider a stochastic dynamic matching model, in which items of different classes arrive according to independent Poisson processes, and compatibilities between items are described by an undirected graph on their classes. We will first focus on a specific matching policy called first-come-first-matched. Our main contribution is the observation that, under this policy, the matching model is equivalent to an order-independent (loss) queue, a model that has recently gained momentum in the queueing-theory community. Using this equivalence, we will formulate simpler proofs for several existing results and derive closed-form expressions for performance metrics like the waiting time of a class and the matching rate along an edge. In a second time, we will use results from graph theory and linear algebra to characterize the set of achievable matching rates under any matching policy.
We consider a stochastic bipartite matching model consisting of multi-class customers and multi-class servers. Compatibility constraints between the customer and server classes are described by a bipartite graph. Each time slot, exactly one customer and one server arrive. The incoming customer (resp. server) is matched with the earliest arrived server (resp. customer) with a class that is compatible with its own class, if there is any, in which case the matched customer-server couple immediately leaves the system; otherwise, the incoming customer (resp. server) waits in the system until it is matched. Contrary to classical queueing models, both customers and servers may have to wait, so that their roles are interchangeable. While (the process underlying) this model was already known to have a product-form stationary distribution, this paper derives a new compact and manageable expression for the normalization constant of this distribution, as well as for the waiting probability and mean waiting time of customers and servers. We also provide a numerical example and make some important observations.
Order-independent (OI) queues, introduced by Berezner et al. in 1995, expanded the family of multi-class queues that are known to have a product-form stationary distribution by allowing for intricate class-dependent service rates. In this presentation, we introduce pass-and-swap (P&S) queues, an extension of OI queues where, upon a service completion, the customer that completes service is not necessarily the one that leaves the system. After defining P&S queues and proving that their stationary distribution indeed has a product form, we demonstrate that closed networks of P&S queues can be applied to describe the dynamics of new and existing load-balancing protocols in clusters of machines in which jobs have assignment constraints. These include the ALIS (assign to the longest idle server) and the first-come-first-served cancel-on-complete redundancy protocols.
The design and analysis of load-balancing algorithms has been a popular research topic over the past three decades. The basic problem consists of assigning a stream of incoming jobs to servers, so as to equalize the server loads as far as possible. The fundamental challenge consists of taking the best assignment decision upon the arrival of each job despite uncertainty on the length of ongoing and future jobs. However, the growing use of cloud-computing platforms for data-intensive applications has added further constraints on load balancing. For instance, the large scale of the system calls for decentralized algorithms that perform well in the presence of many dispatchers, and data locality may restrict the set of servers to which each job can be assigned.
In this presentation, we consider a blocking variant of a well-known load-balancing algorithm, called join-idle-queue, whereby incoming jobs are rejected if no server is available upon their arrival. It was previously observed that, if jobs arrive according to a Poisson process, this blocking variant can be described by a Markov process with a product-form stationary distribution; furthermore, the long-run performance is insensitive to the job size distribution beyond its mean. We first introduce several extensions to this blocking variant that preserve these two properties while also accounting for practical constraints of cloud-computing platforms. We also show how these extensions relate to existing insensitive load-balancing algorithms, and use this observation to compute long-run performance metrics. These results give insights into the performance of the original join-idle-queue algorithm under the practical constraints of cloud-computing platforms.
La demande croissante pour les services de cloud computing encourage les opérateurs à optimiser l'utilisation des ressources dans les grappes d'ordinateurs. Cela motive le développement de nouvelles technologies qui rendent plus flexible la gestion des ressources. Cependant, exploiter cette flexibilité pour réduire le nombre d'ordinateurs nécessite aussi des algorithmes de gestion des ressources efficaces et dont la performance est prédictible sous une demande stochastique. Dans cette thèse, nous concevons et analysons de tels algorithmes en utilisant le formalisme de la théorie des files d'attente.
Notre abstraction du problème est une file multi-serveur avec plusieurs classes de clients. Les capacités des serveurs sont hétérogènes et les clients de chaque classe entrent dans la file selon un processus de Poisson indépendant. Chaque client peut être traité en parallèle par plusieurs serveurs, selon des contraintes de compatibilité décrites par un graphe biparti entre les classes et les serveurs, et chaque serveur applique la politique premier arrivé, premier servi aux clients qui lui sont affectés. Nous prouvons que, si la demande de service de chaque client suit une loi exponentielle indépendante de moyenne unitaire, alors la performance moyenne sous cette politique simple est la même que sous l'équité équilibrée, une extension de processor-sharing connue pour son insensibilité à la loi de la demande de service. Une forme plus générale de ce résultat, reliant les files order-independent aux réseaux de Whittle, est aussi prouvée. Enfin, nous développons de nouvelles formules pour calculer des métriques de performance.
Ces résultats théoriques sont ensuite mis en pratique. Nous commençons par proposer un algorithme d’ordonnancement qui étend le principe de round-robin à une grappe où chaque requête est affectée à un groupe d'ordinateurs par lesquels elle peut ensuite être traitée en parallèle. Notre seconde proposition est un algorithme de répartition de charge à base de jetons pour des grappes où les requêtes ont des contraintes d'affectation. Ces deux algorithmes sont approximativement insensibles à la loi de la taille des requêtes et s'adaptent dynamiquement à la demande. Leur performance peut être prédite en appliquant les formules obtenues pour la file multi-serveur.
The growing demand for cloud-based services encourages operators to maximize resource efficiency within computer clusters. This motivates the development of new technologies that make resource management more flexible. However, exploiting this flexibility to reduce the number of computers also requires efficient resource-management algorithms that have a predictable performance under stochastic demand. In this thesis, we design and analyze such algorithms using the framework of queueing theory.
Our abstraction of the problem is a multi-server queue with several customer classes. Servers have heterogeneous capacities and the customers of each class enter the queue according to an independent Poisson process. Each customer can be processed in parallel by several servers, depending on compatibility constraints described by a bipartite graph between classes and servers, and each server applies first-come-first-served policy to its compatible customers. We first prove that, if the service requirements are independent and exponentially distributed with unit mean, this simple policy yields the same average performance as balanced fairness, an extension to processor-sharing known to be insensitive to the distribution of the service requirements. A more general form of this result, relating order-independent queues to Whittle networks, is also proved. Lastly, we derive new formulas to compute performance metrics.
These theoretical results are then put into practice. We first propose a scheduling algorithm that extends the principle of round-robin to a cluster where each incoming job is assigned to a pool of computers by which it can subsequently be processed in parallel. Our second proposal is a load-balancing algorithm based on tokens for clusters where jobs have assignment constraints. Both algorithms are approximately insensitive to the job size distribution and adapt dynamically to demand. Their performance can be predicted by applying the formulas derived for the multi-server queue.
Efficiently exploiting the resources of data centers is a complex task that requires efficient and reliable load balancing and resource allocation algorithms. The former are in charge of assigning jobs to servers upon their arrival in the system, while the latter are responsible for sharing the server resources between their assigned jobs. These algorithms should adapt to various constraints, such as data locality, that restrict the feasible job assignments. In this présentation, we propose a token-based algorithm that efficiently balances the load between the servers without requiring any knowledge on the job arrival rates and server capacities. Assuming a balanced fair sharing of the server resources, we show that the resulting dynamic load balancing is insensitive to the job size distribution. Its performance is compared to that obtained under the best static load balancing and in an ideal system that would constantly optimize the resource utilization. We also make the connection with other token-based algorithms such as Join-Idle-Queue.
Understanding the performance of a pool of servers is crucial for proper dimensioning. One of the main challenges is to take into account the complex interactions between servers that are pooled to process jobs. In particular, a job can generally not be processed by any server of the cluster due to various constraints like data locality. In this talk, we will represent these constraints by some assignment graph between jobs and servers. We will present a recursive approach to computing performance metrics like mean response times when the server capacities are shared according to balanced fairness. While the computational cost of these formulas can be exponential in the number of servers in the worst case, we will illustrate their practical interest by introducing broad classes of pool structures that can be exactly analyzed in polynomial time. This extends considerably the class of models for which explicit performance metrics are accessible.
We consider a network of multi-class multi-server queues. Each job can be processed in parallel by any subset of servers within a pre-defined set that depends on its class. Each server is allocated in FCFS order at each queue. Jobs arrive according to Poisson processes, have independent exponential service requirements and are routed independently at random. We prove that, when stable, the network has a product-form stationary distribution. From a practical perspective, we propose an algorithm on this basis to allocate the resources of a computer cluster according to balanced fairness. We finally examine further developments of this model to allow for dynamic adaptation to the system occupancy.
Exponential families are parametric sets of probability distributions that arise in many applications. These include well-known univariate distributions (such as the binomial, Poisson, geometric, exponential, and normal distributions), but also multi-variate distributions like probabilistic graphical models and stationary distributions of several queueing models. In this presentation, we will first recall the definition of exponential families and motivate their study. In a second time, we will present a generic method for approximating the normalization constant of these distributions, as the exact calculation of this constant is practically infeasible in high dimension.
Based on the following references:
The Python Data Analysis Library (pandas) provides data structures and tools for manipulating and analyzing data. It is built on top of NumPy, the core package for numerical computations in the Scientific Computing Tools for Python (SciPy) ecosystem. The objective of this presentation is threefold: explain when one can benefit from using pandas, describe its fundamental data structures, and give an overview of the data analysis tools it provides.
Based on the following references:
Based on the following references: