Scalable High-Performance QoS @ SSRC

Scalable High-Performance QoS

This project is no longer active. Information is still available below.

Large-scale, high-performance storage systems are gaining momentum in data centers and high performance computing systems. Quality of Service (QoS) will be an essential feature as storage systems scale out in capacity and number of clients to support because QoS can help ensuring high resource utilization and fair resource allocation between competing clients. Few existing QoS solutions are designed to work at the scale that can support millions of concurrent client accesses. This research project aims at designing scalable QoS solutions for these large-scale, high-performance storage systems.

The previous focus of this project is Automating Contention Management for High-Performance Storage Systems, see the Ascar project page for more information.

Status

The current focuse of this project is on automated performance enhancement through data layout optimization.

Large distributed storage systems such as high performance computing (HPC) systems used by national or international laboratories require sufficient performance and scale for demanding scientific workloads, and must handle shifting workloads with ease. Ideally, data is placed in locations to optimize performance, but the size and complexity of large storage systems inhibits rapid effective restructuring of data layouts to maintain performance as workloads shift.

To address these issues, we are develloping Geomancy, a tool that models the placement of data within a distributed storage system and reacts to drops in performance. Using a combination of machine learning techniques suitable for temporal modeling, Geomancy determines when and where a bottleneck may happen due to changing workloads and suggests changes in the layout that mitigate or prevent them. Our approach to optimizing throughput offers benefits for storage systems such as avoiding potential bottlenecks and increasing overall I/O throughput from 11% to 30%.

Faculty

Publications

Date		Publication
Oct 1, 2020		Oceane Bel, Kenneth Chang, Nathan Tallent, Dirk Duellman, Ethan L. Miller, Faisal Nawab, Darrell D. E. Long, Geomancy: Automated Performance Enhancement through Data Layout Optimization, Proceeding of the Conference on Mass Storage Systems and Technologies (MSST '20), October 2020. [Scalable High-Performance QoS] [Prediction and Grouping] [Storage QoS]
Nov 13, 2017		Yan Li, Oceane Bel, Kenneth Chang, Ethan L. Miller, Darrell D. E. Long, CAPES: Unsupervised Storage Performance Tuning Using Neural Network-Based Deep Reinforcement Learning, Supercomputing '17, November 2017. [Scalable High-Performance QoS] [Tracing and Benchmarking] [Ultra-Large Scale Storage]
Jun 2, 2015		Yan Li, Xiaoyuan Lu, Ethan L. Miller, Darrell D. E. Long, ASCAR: Automating Contention Management for High-Performance Storage Systems, 31st International Conference on Massive Storage Systems and Technologies (MSST2015), June 2015. [Scalable High-Performance QoS] [Ultra-Large Scale Storage] [Storage QoS]
Jan 1, 2000		Jehan-François Pâris, Steven W. Carter, Darrell D. E. Long, A Reactive Broadcasting Protocol for Video on Demand, Proceedings of the Multimedia Computing and Networking Conference, January 2000. [Scalable High-Performance QoS]

Last modified 19 Oct 2020