My work aims at evaluating and optimizing the performance of large-scale systems using probabilistic methods. Many of these systems have a heterogeneous structure that can be described by a graph: Internet, data centers, organ donation programs, transportation networks, etc. For these systems, I develop control policies to optimize average performance criteria, such as delay or availability, while respecting the operational constraints specific to each system. The abstraction provided by stochastic modeling and graph theory helps me to better understand the impact of decisions at different time scales. Since my arrival at CNRS, I am particularly interested in developing reinforcement learning algorithms adapted to these systems and modeling their performance thanks to Markovian decision processes.
Stochastic networks and queueing systems often lead to Markov decision processes (MDPs) with large state and action spaces as well as nonconvex objective functions, which hinders the convergence of many reinforcement learning (RL) algorithms. Policy-gradient methods perform well on MDPs with large state and action spaces, but they sometimes experience slow convergence due to the high variance of the gradient estimator. In this paper, we show that some of these difficulties can be circumvented by exploiting the structure of the underlying MDP. We first introduce a new family of gradient estimators called score-aware gradient estimators (SAGEs). When the stationary distribution of the MDP belongs to an exponential family parametrized by the policy parameters, SAGEs allow us to estimate the policy gradient without relying on value-function estimation, contrary to classical policy-gradient methods like actor-critic. To demonstrate their applicability, we examine two common control problems arising in stochastic networks and queueing systems whose stationary distributions have a product-form, a special case of exponential families. As a second contribution, we show that, under appropriate assumptions, the policy under a SAGE-based policy-gradient method has a large probability of converging to an optimal policy, provided that it starts sufficiently close to it, even with a nonconvex objective function and multiple maximizers. Our key assumptions are that, locally around a maximizer, a nondegeneracy property of the Hessian of the objective function holds and a Lyapunov function exists. Finally, we conduct a numerical comparison between a SAGE-based policy-gradient method and an actor-critic algorithm. The results demonstrate that the SAGE-based method finds close-to-optimal policies more rapidly, highlighting its superior performance over the traditional actor-critic method.
The stochastic dynamic matching problem has recently drawn attention in the stochastic-modeling community due to its numerous applications, ranging from supply-chain management to kidney exchange programs. In this paper, we consider a matching problem in which items of different classes arrive according to independent Poisson processes. Unmatched items are stored in a queue, and compatibility constraints are described by a simple graph on the classes, so that two items can be matched if their classes are neighbors in the graph. We analyze the efficiency of matching policies, not only in terms of system stability, but also in terms of matching rates between different classes.
Our results rely on the observation that, under any stable policy, the matching rates satisfy a conservation equation that equates the arrival and departure rates of each item class. Our main contributions are threefold. We first introduce a mapping between the dimension of the solution set of this conservation equation, the structure of the compatibility graph, and the existence of a stable policy. In particular, this allows us to derive a necessary and sufficient stability condition that is verifiable in polynomial time. Secondly, we describe the convex polytope of non-negative solutions of the conservation equation. When this polytope is reduced to a single point, we give a closed-form expression of the solution; in general, we characterize the vertices of this polytope using again the graph structure. Lastly, we show that greedy policies cannot, in general, achieve every point in the polytope. In contrast, non-greedy policies can reach any point of the interior of this polytope, and we give a condition for these policies to also reach the boundary of the polytope.
The need for matching items with one another while meeting assignment constraints or preferences has given rise to several well-known problems like the stable marriage and roommate problems. While most of the literature on matching problems focuses on a static setting with a fixed number of items, several recent works incorporated time by considering stochastic models, in which items of different classes arrive according to independent Poisson processes and assignment constraints are described by an undirected non-bipartite graph on the classes. In this paper, we prove that the continuous-time Markov chain associated with this model has the same transition diagram as in a product-form queueing model called an order-independent loss queue. This allows us to adapt existing results on order-independent (loss) queues to stochastic matching models, and in particular to derive closed-form expressions for several performance metrics, like the waiting probability or the mean matching time, that can be implemented using dynamic programming. Both these formulas and the numerical results that they allow us to derive are used to gain insight into the impact of parameters on performance. In particular, we characterize performance in a so-called heavy-traffic regime in which the number of items of a subset of the classes goes to infinity while items of other classes become scarce.
We consider a stochastic bipartite matching model consisting of multi-class customers and multi-class servers. Compatibility constraints between the customer and server classes are described by a bipartite graph. Each time slot, exactly one customer and one server arrive. The incoming customer (resp. server) is matched with the earliest arrived server (resp. customer) with a class that is compatible with its own class, if there is any, in which case the matched customer-server couple immediately leaves the system; otherwise, the incoming customer (resp. server) waits in the system until it is matched. Contrary to classical queueing models, both customers and servers may have to wait, so that their roles are interchangeable. While (the process underlying) this model was already known to have a product-form stationary distribution, this paper derives a new compact and manageable expression for the normalization constant of this distribution, as well as for the waiting probability and mean waiting time of customers and servers. We also provide a numerical example and make some important observations.
Efficiently exploiting servers in data centers requires performance analysis methods that account not only for the stochastic nature of demand but also for server heterogeneity. Although several recent works proved optimality results for heterogeneity-aware variants of classical load-balancing algorithms in the many-server regime, we still lack a fundamental understanding of the impact of heterogeneity on performance in finite-size systems. In this paper, we consider a load-balancing algorithm that leads to a product-form queueing model and can therefore be analyzed exactly even when the number of servers is finite. We develop new analytical methods that exploit its product-form stationary distribution to understand the joint impact of the speeds and buffer lengths of servers on performance. These analytical results are supported and complemented by numerical evaluations that cover a large variety of scenarios.
Order-independent (OI) queues, introduced by Berezner et al. (Queueing Syst 19(4):345–359, 1995), expanded the family of multi-class queues that are known to have a product-form stationary distribution by allowing for intricate class-dependent service rates. This paper further broadens this family by introducing pass-and-swap (P&S) queues, an extension of OI queues where, upon a service completion, the customer that completes service is not necessarily the one that leaves the system. More precisely, we supplement the OI queue model with an undirected graph on the customer classes, which we call a swapping graph, such that there is an edge between two classes if customers of these classes can be swapped with one another. When a customer completes service, it passes over customers in the remainder of the queue until it finds a customer it can swap positions with, that is, a customer whose class is a neighbor in the graph. In its turn, the customer that is ejected from its position takes the position of the next customer it can be swapped with, and so on. This is repeated until a customer can no longer find another customer to be swapped with; this customer is the one that leaves the queue. After proving that P&S queues have a product-form stationary distribution, we derive a necessary and sufficient stability condition for (open networks of) P&S queues that also applies to OI queues. We then study irreducibility properties of closed networks of P&S queues and derive the corresponding product-form stationary distribution. Lastly, we demonstrate that closed networks of P&S queues can be applied to describe the dynamics of new and existing load-distribution and scheduling protocols in clusters of machines in which jobs have assignment constraints.
One of the key features of small-world networks is the ability to route messages in a few hops, using a decentralized algorithm in which each node has a limited knowledge of the topology. In 2000, Kleinberg proposed a model based on an augmented grid that asymptotically exhibits such a property.
In this paper, we revisit the original model with the help of numerical simulations. Our approach is fueled by a new algorithm that can sample augmenting links in an almost constant time. The speed gain allows us to perform detailed numerical evaluations. We first observe that, in practice, the augmented scheme proposed by Kleinberg is more robust than what is predicted by the asymptotic behavior, even in very large finite grids. We also propose tighter bounds on the asymptotic performance of Kleinberg's greedy routing algorithm. We finally show that, if the model is fed with realistic parameters, the results are in line with real-life experiments.
Efficiently exploiting the resources of data centers is a complex task that requires efficient and reliable load balancing and resource allocation algorithms. The former are in charge of assigning jobs to servers upon their arrival in the system, while the latter are responsible for sharing the server resources between their assigned jobs. These algorithms should adapt to various constraints, such as data locality, that restrict the feasible job assignments. In this paper, we propose a token-based algorithm that efficiently balances the load between the servers without requiring any knowledge on the job arrival rates and the server capacities. Assuming a balanced fair sharing of the server resources, we show that the resulting dynamic load balancing is insensitive to the job size distribution. Its performance is compared to that obtained under the best static load balancing and in an ideal system that would constantly optimize the resource utilization. We also make the connection with other token-based algorithms such as Join-Idle-Queue.
Efficiently exploiting the resources of data centers is a complex task that requires efficient and reliable load balancing and resource allocation algorithms. The former are in charge of assigning jobs to servers upon their arrival in the system, while the latter are responsible for sharing server resources between their assigned jobs. These algorithms should take account of various constraints, such as data locality, that restrict the feasible job assignments. In this paper, we propose a token-based mechanism that efficiently balances load between servers without requiring any knowledge on job arrival rates and server capacities. Assuming a balanced fair sharing of the server resources, we show that the resulting dynamic load balancing is insensitive to the job size distribution. Its performance is compared to that obtained under the best static load balancing and in an ideal system that would constantly optimize the resource utilization.
Stochastic network calculus is a tool for computing error bounds on the performance of queueing systems. However, deriving accurate bounds for networks consisting of several queues or subject to non-independent traffic inputs is challenging. In this paper, we investigate the relevance of the tools from analytic combinatorics, especially the kernel method, to tackle this problem. Applying the kernel method allows us to compute the generating functions of the queue state distributions in the stationary regime of the network. As a consequence, error bounds with an arbitrary precision can be computed. In this preliminary work, we focus on simple examples which are representative of the difficulties that the kernel method allows us to overcome.
Understanding the performance of a pool of servers is crucial for proper dimensioning. One of the main challenges is to take into account the complex interactions between servers that are pooled to process jobs. In particular, a job can generally not be processed by any server of the cluster due to various constraints like data locality. In this paper, we represent these constraints by some assignment graph between jobs and servers. We present a recursive approach to computing performance metrics like mean response times when the server capacities are shared according to balanced fairness. While the computational cost of these formulas can be exponential in the number of servers in the worst case, we illustrate their practical interest by introducing broad classes of pool structures that can be exactly analyzed in polynomial time. This extends considerably the class of models for which explicit performance metrics are accessible.
We represent a computer cluster as a multi-server queue with some arbitrary graph of compatibilities between jobs and servers. Each server processes its jobs sequentially in FCFS order. The service rate of a job at any given time is the sum of the service rates of all servers processing this job. We show that the corresponding queue is quasi-reversible and use this property to design a scheduling algorithm achieving balanced fair sharing of the computing resources.
We consider a system of processor-sharing queues with state-dependent service rates. These are allocated according to balanced fairness within a polymatroid capacity set. Balanced fairness is known to be both insensitive and Pareto-efficient in such systems, which ensures that the performance metrics, when computable, will provide robust insights into the real performance of the system considered. We first show that these performance metrics can be evaluated with a complexity that is polynomial in the system size if the system is partitioned into a finite number of parts, so that queues are exchangeable within each part and asymmetric across different parts. This in turn allows us to derive stochastic bounds for a larger class of systems which satisfy less restrictive symmetry assumptions. These results are applied to practical examples of tree data networks, such as backhaul networks of Internet service providers, and computer clusters.
Traffic modeling is key to the dimensioning of data networks. Usual models rely on the implicit assumption that each user generates data flows in series, one after the other, the ongoing flows sharing equitably the considered network link. We relax this assumption and consider the more realistic case where users may generate several data flows in parallel, these flows having to share the user's access line as well. We qualify this model as multi-source since each user now behaves as an independent traffic source. Usual performance metrics like mean throughput and congestion rate must now be defined at user level rather than at flow level. We derive explicit expressions for these performance metrics under the assumption that flows share bandwidth according to balanced fairness. These results are compared with those obtained by simulation when max-min fairness is imposed, either at flow level or at user level.
Les programmes de dons croisés de reins donnent lieu à un problème d'appariement complexe pour lequel une solution optimale est inconnue. Ce problème peut être décrit par un modèle d'appariement dynamique et aléatoire où des éléments de différentes classes arrivent selon des processus de Poisson indépendants. Chaque élément représente un couple donneur-receveur où le donneur, prêt à donner un rein pour le receveur, est incompatible avec lui ; et la classe encode des propriétés du couple, telles que les groupes sanguins du donneur et du receveur, qui influencent sa compatibilité. Les contraintes d'appariement sont décrites par un graphe non-orienté dont les sommets sont les classes : deux couples ont des classes voisines dans le graphe s'ils peuvent être appariés dans le sens où chaque donneur peut donner son rein au receveur de l'autre couple. Les éléments en attente sont stockés dans une file d'attente. Nous étudions l'efficacité des politiques d'appariement, non seulement en termes de stabilité du système, mais aussi de contrôle des taux d'appariement entre différentes classes. Ce dernier point est important en vue de privilégier les appariements maximisant d'autres mesures de performance, comme la qualité de vie après la transplantation ou le taux de survie du greffon. Cet article présente les principales contributions sur les appariements dynamiques pré-publiés et implantés par les auteurs [CMB22, Mat]. Par souci de concision, les preuves sont omises dans la présente version. La partie 1 formalise le problème et introduit l'équation de conservation qui caractérise les taux d'appariement de toute politique stable. La partie 2 présente une relation simple reliant la structure du graphe, l'équation de conservation, et la stabilité du système. Cette relation nous permet de caractériser dans la partie 3 l'ensemble des solutions de l'équation de conservation comme un polytope convexe. Finalement, la partie 4 montre que tout point du polytope est atteignable ou approchable par des politiques simples qui éliminent certaines arêtes.
Alors que la plus grande partie de la littérature sur les problèmes de couplage considère des modèles statiques où le nombre d'éléments à coupler est donné, plusieurs travaux récents se sont intéressés à une variante stochastique où des éléments de différentes classes arrivent selon des processus de Poisson indépendants, les contraintes de couplage entre ces éléments étant décrites par un graphe non-biparti entre leurs classes. Ces éléments peuvent notamment représenter des pièces à assembler dans une chaîne de production ou des couples donneur-receveur sur une liste de don d'organes. Dans ce papier, nous développons de nouvelles formules pour calculer des métriques de performance comme le temps moyen d'attente. Ces formules, ainsi que les résultats numériques qu'elles nous permettent de tracer, sont utilisées pour mieux comprendre l'impact des paramètres sur la performance. Nous caractérisions en particulier un régime limite où les éléments de certaines classes s'accumulent alors que les éléments des autres classes deviennent rares.
Optimiser l’utilisation des centres de données requiert des algorithmes efficaces et fiables pour répartir la charge entre les machines en s’adaptant à des contraintes variées, telles que la localité des données, qui restreignent les affectations possibles. Dans ce papier, nous proposons un algorithme simple qui réalise cette tâche sans avoir besoin de connaître le taux d’arrivée des requêtes ni la capacité des machines. Sa propriété essentielle, vérifiée à l’aide d’un modèle de files d’attente, est son insensibilité à la loi de la taille des requêtes : si chaque machine applique la discipline de service processor sharing et si les requêtes arrivent selon un processus de Poisson, alors la performance ne dépend de la loi de la taille des requêtes que par l’intermédiaire de sa moyenne. De premiers résultats numériques évaluant les taux de rejet des requêtes et d’occupation des machines indiquent de plus que cet algorithme surpasse le meilleur algorithme statique et est proche d’un algorithme idéal qui optimiserait constamment l’utilisation des ressources.
Le calcul réseau stochastique est un outil de calcul de bornes d’erreur sur la performance des réseaux de files d’attente. Obtenir des bornes précises pour des réseaux constitués de plusieurs files ou soumis à des arrivées non-indépendantes est un exercice délicat. Dans ce papier, nous évaluons la pertinence des outils de combinatoire analytique pour aborder ce problème. Nous appliquons en particulier la méthode du noyau pour exprimer les fonctions génératrices des distributions des états des files, ce qui nous permet de calculer des bornes d’erreur d’une précision arbitraire sur le comportement à l’état stationnaire. Dans ce travail préliminaire, nous nous focalisons sur des exemples simples mais représentatifs des difficultés que la méthode du noyau permet de surmonter. Ces résultats sont validés par des simulations.
Les infrastructures de cloud computing reposent sur des grappes de serveurs qui se partagent des requêtes. Leur dimensionnement nécessite de prédire leurs performances. L’un des principaux défis consiste à tenir compte des interactions complexes entre serveurs lorsqu’ils sont mis en commun pour traiter des requêtes en parallèle, en intégrant certaines contraintes comme la localité des données qui restreignent les affectations possibles. Pour les analyser, nous représentons ces contraintes par un graphe d’affectation entre les requêtes et les serveurs ; les ressources sont partagées selon l’équité équilibrée, qui a l’avantage de rendre les performances du système insensibles à la distribution de la taille des requêtes. Notre principale contribution est une nouvelle approche récursive pour calculer les métriques de performance, consistant à décomposer le système en éteignant les serveurs les uns après les autres. Même si la complexité des formules obtenues peut être exponentielle en la taille de la grappe dans le pire cas, nous illustrons leur intérêt pratique en identifiant de vastes familles de structures pour lesquelles elle devient polynomiale. Ceci étend considérablement l’ensemble des systèmes pour lesquels des métriques de performance explicites sont accessibles.
Longtemps source d’anecdotes, le routage dans les petits mondes a été étudié en 2000 par Kleinberg grâce à un modèle simple qui capture l’influence de la construction de liens distants, les raccourcis, sur la navigabilité dans un graphe : la grille de Kleinberg. Tandis que le modèle initial a été enrichi par de nombreuses analyses asymptotiques et extensions, les résultats numériques sur des grilles simulées sont limités par la complexité du calcul des raccourcis. La contribution principale de l’article est l’introduction d’une nouvelle méthode de tirage par double rejet dynamique qui permet une analyse numérique détaillée. Il devient possible de choisir comme taille de la grille de Kleinberg l’univers, et le reste de l’article montre quelques applications : mise en évidence d’une certaine robustesse du routage ; proposition de nouvelles bornes asymptotiques ; observation des six degrés de séparation.
Nous considérons un cluster de serveurs traitant des requêtes en parallèle. Si les clients ont en général intérêt à ce que leurs requêtes soient traitées par le plus grand nombre de serveurs, l'impact du parallélisme sur les serveurs est moins clair : trop faible, il ne permet pas d'utiliser pleinement les ressources disponibles ; trop fort, il risque d'encombrer inutilement les serveurs de requêtes en attente. Nous étudions ce phénomène à l'aide d'un modèle de files d'attente où les requêtes arrivent selon un processus de Poisson et requièrent des traitements dont le volume suit une loi exponentielle. Chaque nouvelle requête est affectée à un certain nombre de serveurs, choisis de manière aléatoire, uniforme, et indépendante de l'état du système. Chaque serveur traite ses requêtes dans leur ordre d'arrivée. Nous montrons qu'il existe un degré de parallélisme qui minimise le nombre moyen de requêtes présentes dans chaque serveur. Ce degré optimal est de l'ordre de la racine carrée du nombre de serveurs pour une charge faible à modérée, et décroît jusqu'à deux à très forte charge.
The growing demand for cloud-based services encourages operators to maximize resource efficiency within computer clusters. This motivates the development of new technologies that make resource management more flexible. However, exploiting this flexibility to reduce the number of computers also requires efficient resource-management algorithms that have a predictable performance under stochastic demand. In this thesis, we design and analyze such algorithms using the framework of queueing theory.
Our abstraction of the problem is a multi-server queue with several customer classes. Servers have heterogeneous capacities and the customers of each class enter the queue according to an independent Poisson process. Each customer can be processed in parallel by several servers, depending on compatibility constraints described by a bipartite graph between classes and servers, and each server applies first-come-first-served policy to its compatible customers. We first prove that, if the service requirements are independent and exponentially distributed with unit mean, this simple policy yields the same average performance as balanced fairness, an extension to processor-sharing known to be insensitive to the distribution of the service requirements. A more general form of this result, relating order-independent queues to Whittle networks, is also proved. Lastly, we derive new formulas to compute performance metrics.
These theoretical results are then put into practice. We first propose a scheduling algorithm that extends the principle of round-robin to a cluster where each incoming job is assigned to a pool of computers by which it can subsequently be processed in parallel. Our second proposal is a load-balancing algorithm based on tokens for clusters where jobs have assignment constraints. Both algorithms are approximately insensitive to the job size distribution and adapt dynamically to demand. Their performance can be predicted by applying the formulas derived for the multi-server queue.
In this presentation, we will consider a stochastic dynamic matching model, in which items of different classes arrive according to independent Poisson processes, and compatibilities between items are described by an undirected graph on their classes. We will first focus on a specific matching policy called first-come-first-matched. Our main contribution is the observation that, under this policy, the matching model is equivalent to an order-independent (loss) queue, a model that has recently gained momentum in the queueing-theory literature. Using this equivalence, we will formulate simpler proofs for several existing results and derive closed-form expressions for performance metrics like the matching rate along an edge and the waiting probability of a class. In a second time, we will use results from graph theory and linear algebra to characterize the set of achievable matching rate vectors under any matching policy.
The first part of this presentation is based on the paper Stochastic Non-Bipartite Matching Models and Order-Independent Loss Queues published in the journal Stochastic Models, Taylor & Francis (2022). The second part of this presentation is based on the recent preprint Stochastic Dynamic Matching: A Mixed Graph-Theory and Linear-Algebra Approach published in collaboration with Fabien Mathieu (LINCS) and Ana Busic (Inria and PSL University).
We consider a stochastic bipartite matching model consisting of multi-class customers and multi-class servers. Compatibility constraints between the customer and server classes are described by a bipartite graph. Each time slot, exactly one customer and one server arrive. The incoming customer (resp. server) is matched with the earliest arrived server (resp. customer) with a class that is compatible with its own class, if there is any, in which case the matched customer-server couple immediately leaves the system; otherwise, the incoming customer (resp. server) waits in the system until it is matched. Contrary to classical queueing models, both customers and servers may have to wait, so that their roles are interchangeable. While (the process underlying) this model was already known to have a product-form stationary distribution, this paper derives a new compact and manageable expression for the normalization constant of this distribution, as well as for the waiting probability and mean waiting time of customers and servers. We also provide a numerical example and make some important observations.
Order-independent (OI) queues, introduced by Berezner et al. in 1995, expanded the family of multi-class queues that are known to have a product-form stationary distribution by allowing for intricate class-dependent service rates. In this presentation, we introduce pass-and-swap (P&S) queues, an extension of OI queues where, upon a service completion, the customer that completes service is not necessarily the one that leaves the system. After defining P&S queues and proving that their stationary distribution indeed has a product form, we demonstrate that closed networks of P&S queues can be applied to describe the dynamics of new and existing load-balancing protocols in clusters of machines in which jobs have assignment constraints. These include the ALIS (assign to the longest idle server) and the first-come-first-served cancel-on-complete redundancy protocols.
The design and analysis of load-balancing algorithms has been a popular research topic over the past three decades. The basic problem consists of assigning a stream of incoming jobs to servers, so as to equalize the server loads as far as possible. The fundamental challenge consists of taking the best assignment decision upon the arrival of each job despite uncertainty on the length of ongoing and future jobs. However, the growing use of cloud-computing platforms for data-intensive applications has added further constraints on load balancing. For instance, the large scale of the system calls for decentralized algorithms that perform well in the presence of many dispatchers, and data locality may restrict the set of servers to which each job can be assigned.
In this presentation, we consider a blocking variant of a well-known load-balancing algorithm, called join-idle-queue, whereby incoming jobs are rejected if no server is available upon their arrival. It was previously observed that, if jobs arrive according to a Poisson process, this blocking variant can be described by a Markov process with a product-form stationary distribution; furthermore, the long-run performance is insensitive to the job size distribution beyond its mean. We first introduce several extensions to this blocking variant that preserve these two properties while also accounting for practical constraints of cloud-computing platforms. We also show how these extensions relate to existing insensitive load-balancing algorithms, and use this observation to compute long-run performance metrics. These results give insights into the performance of the original join-idle-queue algorithm under the practical constraints of cloud-computing platforms.
La demande croissante pour les services de cloud computing encourage les opérateurs à optimiser l'utilisation des ressources dans les grappes d'ordinateurs. Cela motive le développement de nouvelles technologies qui rendent plus flexible la gestion des ressources. Cependant, exploiter cette flexibilité pour réduire le nombre d'ordinateurs nécessite aussi des algorithmes de gestion des ressources efficaces et dont la performance est prédictible sous une demande stochastique. Dans cette thèse, nous concevons et analysons de tels algorithmes en utilisant le formalisme de la théorie des files d'attente.
Notre abstraction du problème est une file multi-serveur avec plusieurs classes de clients. Les capacités des serveurs sont hétérogènes et les clients de chaque classe entrent dans la file selon un processus de Poisson indépendant. Chaque client peut être traité en parallèle par plusieurs serveurs, selon des contraintes de compatibilité décrites par un graphe biparti entre les classes et les serveurs, et chaque serveur applique la politique premier arrivé, premier servi aux clients qui lui sont affectés. Nous prouvons que, si la demande de service de chaque client suit une loi exponentielle indépendante de moyenne unitaire, alors la performance moyenne sous cette politique simple est la même que sous l'équité équilibrée, une extension de processor-sharing connue pour son insensibilité à la loi de la demande de service. Une forme plus générale de ce résultat, reliant les files order-independent aux réseaux de Whittle, est aussi prouvée. Enfin, nous développons de nouvelles formules pour calculer des métriques de performance.
Ces résultats théoriques sont ensuite mis en pratique. Nous commençons par proposer un algorithme d’ordonnancement qui étend le principe de round-robin à une grappe où chaque requête est affectée à un groupe d'ordinateurs par lesquels elle peut ensuite être traitée en parallèle. Notre seconde proposition est un algorithme de répartition de charge à base de jetons pour des grappes où les requêtes ont des contraintes d'affectation. Ces deux algorithmes sont approximativement insensibles à la loi de la taille des requêtes et s'adaptent dynamiquement à la demande. Leur performance peut être prédite en appliquant les formules obtenues pour la file multi-serveur.
The growing demand for cloud-based services encourages operators to maximize resource efficiency within computer clusters. This motivates the development of new technologies that make resource management more flexible. However, exploiting this flexibility to reduce the number of computers also requires efficient resource-management algorithms that have a predictable performance under stochastic demand. In this thesis, we design and analyze such algorithms using the framework of queueing theory.
Our abstraction of the problem is a multi-server queue with several customer classes. Servers have heterogeneous capacities and the customers of each class enter the queue according to an independent Poisson process. Each customer can be processed in parallel by several servers, depending on compatibility constraints described by a bipartite graph between classes and servers, and each server applies first-come-first-served policy to its compatible customers. We first prove that, if the service requirements are independent and exponentially distributed with unit mean, this simple policy yields the same average performance as balanced fairness, an extension to processor-sharing known to be insensitive to the distribution of the service requirements. A more general form of this result, relating order-independent queues to Whittle networks, is also proved. Lastly, we derive new formulas to compute performance metrics.
These theoretical results are then put into practice. We first propose a scheduling algorithm that extends the principle of round-robin to a cluster where each incoming job is assigned to a pool of computers by which it can subsequently be processed in parallel. Our second proposal is a load-balancing algorithm based on tokens for clusters where jobs have assignment constraints. Both algorithms are approximately insensitive to the job size distribution and adapt dynamically to demand. Their performance can be predicted by applying the formulas derived for the multi-server queue.
Efficiently exploiting the resources of data centers is a complex task that requires efficient and reliable load balancing and resource allocation algorithms. The former are in charge of assigning jobs to servers upon their arrival in the system, while the latter are responsible for sharing the server resources between their assigned jobs. These algorithms should adapt to various constraints, such as data locality, that restrict the feasible job assignments. In this présentation, we propose a token-based algorithm that efficiently balances the load between the servers without requiring any knowledge on the job arrival rates and server capacities. Assuming a balanced fair sharing of the server resources, we show that the resulting dynamic load balancing is insensitive to the job size distribution. Its performance is compared to that obtained under the best static load balancing and in an ideal system that would constantly optimize the resource utilization. We also make the connection with other token-based algorithms such as Join-Idle-Queue.
Understanding the performance of a pool of servers is crucial for proper dimensioning. One of the main challenges is to take into account the complex interactions between servers that are pooled to process jobs. In particular, a job can generally not be processed by any server of the cluster due to various constraints like data locality. In this talk, we will represent these constraints by some assignment graph between jobs and servers. We will present a recursive approach to computing performance metrics like mean response times when the server capacities are shared according to balanced fairness. While the computational cost of these formulas can be exponential in the number of servers in the worst case, we will illustrate their practical interest by introducing broad classes of pool structures that can be exactly analyzed in polynomial time. This extends considerably the class of models for which explicit performance metrics are accessible.
We consider a network of multi-class multi-server queues. Each job can be processed in parallel by any subset of servers within a pre-defined set that depends on its class. Each server is allocated in FCFS order at each queue. Jobs arrive according to Poisson processes, have independent exponential service requirements and are routed independently at random. We prove that, when stable, the network has a product-form stationary distribution. From a practical perspective, we propose an algorithm on this basis to allocate the resources of a computer cluster according to balanced fairness. We finally examine further developments of this model to allow for dynamic adaptation to the system occupancy.
Exponential families are parametric sets of probability distributions that arise in many applications. These include well-known univariate distributions (such as the binomial, Poisson, geometric, exponential, and normal distributions), but also multi-variate distributions like probabilistic graphical models and stationary distributions of several queueing models. In this presentation, we will first recall the definition of exponential families and motivate their study. In a second time, we will present a generic method for approximating the normalization constant of these distributions, as the exact calculation of this constant is practically infeasible in high dimension.
Based on the following references:
The Python Data Analysis Library (pandas) provides data structures and tools for manipulating and analyzing data. It is built on top of NumPy, the core package for numerical computations in the Scientific Computing Tools for Python (SciPy) ecosystem. The objective of this presentation is threefold: explain when one can benefit from using pandas, describe its fundamental data structures, and give an overview of the data analysis tools it provides.
Based on the following references:
Based on the following references: