IEEE Cluster 2021 Program

Conference - Wednesday, September 8

08:00 am PT

Welcome: Toni Cortes and Kathryn Mohror

Keynote: Kate Keahey, Argonne National Laboratory, USA

Experiments in the Edge to Cloud Continuum

Experiments in the Edge to Cloud Continuum

Kate Keahey, Argonne National Laboratory, USA

Abstract

The increasing popularity of IoT devices allows us to communicate better, interact better, and ultimately build a new type of a scientific instrument that will allow us to explore our environment in ways that we could only dream about just a few years ago. This disruptive opportunity however raises a new set of challenges: how should we manage the massive amounts of data and network traffic such instruments will eventually produce? What types of environments will be most suited to developing their full potential? What new security problems will arise? And finally, what are the best ways of leveraging intelligent edge to create new types of applications?

In a research area that creates its own new reality, such questions are too often approached only theoretically for lack of a realistic testbed, a scientific instrument that keeps pace with the emergent requirements of science and allows researchers to deploy, measure, and analyze relevant scientific hypotheses. The NSF-funded Chameleon testbed, originally created to provide a platform for exploration of research topics in cloud computing -- such as design of new virtualization solutions, operating systems, or power management -- has now been extended to support experiments in cloud to edge.

In this talk, I will first describe the Chameleon testbed -- a scientific instrument for computer science systems research, originally created to allow exploration of research topics in cloud computing such as virtualization, programmable networking, or power management -- as well as its recent extensions to support experimentation at the edge. I will describe the testbed capabilities required to provide a platform for the edge to cloud continuum, and give examples of edge to cloud research and education projects our users are running. Finally, I will describe tools and methodologies that Chameleon provides to improve experimental methodology and reproducibility of experiments in this environment and illustrate how a common experimentation platform can enhance sharing and scientific productivity.

Chair: Ewa Deelman

09:20 am PT - Session 1

Track A: HPC Applications

Chair: Bing Xie, Oak Ridge National Laboratory

tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores

MiniMod: A Modular Miniapplication Benchmarking Framework for HPC

Track B: Performance Optimization

Chair: Jerry Chou, National Tsing Hua University

Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum

WIRE: Resource-efficient Scaling with Online Prediction for DAG-based Workflows

10:00 am PT

Break

10:20 am PT - Session 2

Track A: AI in practice

Chair: Yang Zhang, University of Notre Dame

HPC AI500 V2.0: The Methodology, Tools, and Metrics for Benchmarking HPC AI Systems

RPTCN: Resource Prediction for High-dynamic Workloads in Clouds based on Deep Learning

READYS: A Reinforcement Learning Based Strategy for Heterogeneous Dynamic Scheduling

Accelerating DNN Architecture Search at Scale using Selective Weight Transfer

SAP-SGD: Accelerating Distributed Parallel Training with High Communication Efficiency on Heterogeneous Clusters

2PGraph: Accelerating GNN Training over Large Graphs on GPU clusters

Track B: Storage and I/O

Chair: Awais Khan, Sogang University

HFlow: A Dynamic and Elastic Multi-Layered Data Forwarder

Building A Fast and Efficient LSM-tree Store by Integrating Local Storage with Cloud Storage

Virtual Log-Structured Storage for High-Performance Streaming

RISE: Reducing I/O Contention in Staging-based Extreme-Scale In-situ Workflows

Lazy-WL: A Wear-aware Load Balanced Data Redistribution Method for Efficient SSD Array Scaling

Streamlining distributed Deep Learning I/O with ad hoc file systems

12:20 pm PT

Break

12:20 - 14:20 PT - Poster Session

12:20 - 13:20 PT

Computational Storage to Increase the Analysis Capability of Tier-2 HEP Data Sites

CVFCC: CV-Based Framework for Container Consolidation in Cloud Data Centers

A Roadmap to Robust Science for High-throughput Applications: The Developers' Perspective

RELAR A Reinforcement Learning Framework for Adaptive Routing in Network-on-Chips

Supporting Elastic Compaction of LSM-tree with a FaaS Cluster

Exploring Node Connection Modes in Multi-Rail Fat-tree

SDIS: A PB-level seismic data index system with ML methods

A Dynamic Power Capping Library for HPC Applications

CASQ: Accelerate Distributed Deep Learning with Sketch-Based Gradient Quantization

13:20 - 14:20 PT

Load Balancing Policies for Nested Fork-Join

A Transfer Learning Scheme for Time Series Forecasting Using Facebook Prophet

NUMA-aware I/O System Call Steering

Malleability Implementation in a MPI Iterative Method

A Generative Approach to Visualizing Satellite Data

Incorporating Fault-Tolerance Awareness into System-Level Modeling and Simulation

Halcyon: Unified HPC Center Operations

Toward a Comprehensive Benchmark Suite for Evaluating GASPI in HPC Environments

Automatic Parallelisation of Sturctured Mesh Computations with SYCL

Conference - Thursday, September 9

08:00 am PT - Session 3

Announcements

Best Paper Candidates

Chair: Dingwen Tao, Washington State University

Accelerating GPU Message Communication for Autonomous Navigation Systems

csTuner: Scalable Auto-tuning Framework for Complex Stencil Computation on GPUs

09:00 am PT - Parallel Sessions

Track A: Applications

Chair: Weifeng Liu, China University of Petroleum-Beijing

Octo-Tiger's New Hydro Module and Performance using HPX+CUDA on ORNL's Summit

Pipelined Preconditioned s-step Conjugate Gradient methods for Distributed Memory Systems

Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs

Track B: Workloads

Chair: Jie Liu, University of California Merced

Optimizing Distributed Load Balancing for Workloads with Time-Varying Imbalance

Distributed Work Stealing at Scale via Matchmaking

Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts

10:00 am PT

Break

10:20 am PT - Session 4

Track A: Compression and data reduction

Chair: Lin Gan, Tsinghua University

CSwap: A Self-Tuning Compression Framework for Accelerating Tensor Swapping in GPUs

cuSZ(x): Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs

Exploring Autoencoder-Based Error-Bounded Compression for Scientific Data

cuZ-Checker: A GPU-Based Ultra-Fast Assessment System for Lossy Compressions

DPZ: Improving Lossy Compression Ratio with Information Retrieval on Scientific Data

O(1) Communication for Distributed SGD through Two-Level Gradient Averaging

Track B: Systems support for parallel applications

Chair: Keren Zhou, Rice University

Distributed Computation of Persistent Homology from Partitioned Big Data

FineQuery: Fine-Grained Query Processing on CPU-GPU Integrated Architectures

Packet Forwarding Cache of Commodity Switches for Parallel Computers

Two-Chains: High Performance Framework for Function Injection and Execution

HNGraph: Parallel Graph Processing in Hybrid Memory Based NUMA Systems

Modeling the Linux page cache for accurate simulation of data-intensive applications

12:20 pm PT

Break

12:40 - 13:40 PT

Keynote: David Abramson, University of Queensland, Australia

Translational Research in Cluster Computing

Chair: Maciej Malawski

Conference - Friday, September 10

08:00 am PT

Announcements, Cluster 2022 Preview, and Best Paper Award Announcement

Keynote 3: Trilce Estrada, University of New Mexico, USA

Demystifying machine learning in scientific research: a case for embedding domain knowledge in data representation

Demystifying machine learning in scientific research: a case for embedding domain knowledge in data representation

Trilce Estrada, University of New Mexico, USA

Abstract

Over the past years, the use of Artificial Intelligence has become ubiquitous in most disciplines, and Scientific High Throughput applications are not the exception. Machine Learning increasingly plays a central role across the whole workflow pipeline, from workload forecasting, adaptive scheduling, self-managed resource allocation, and on the fly analysis. As we consider a pathway towards reproducible, scalable, and trustworthy science, we must pay special attention to the impact of ML in HTC, and how current practices can advance or hinder these efforts.

A common pitfall when designing ML-based solutions is the use of data "as is", and hoping that the models will automatically distill relevant features from the raw input. While this approach is feasible in domains with huge datasets, and no critical need for verifiable and reproducible results, this is a suboptimal approach in science, where data is expensive, very high dimensional, and noisy. This talk shows the power of embedding domain knowledge into the ML cycle, specifically within the data representation of complex entities, such as proteins. The goal of this encoding is to expose intra- and inter-molecular structural patterns to enable scalable and interpretable high throughput analyses. We present use cases in the context of protein function prediction and in-situ analysis of Molecular Dynamics simulations.

Chair: Hai Ah Nam

09:20 am PT - Session 5

Track A: Impacts of Errors on Applications

Chair: Bogdan Nicolae, Argonne National Laboratory

Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights

Understanding the Effects of DRAM Correctable Error Logging at Scale

Track B: Supporting Applications

Chair: Pengfei Su, UC Merced

Tackling Cold Start of Serverless Applications by Efficient and Adaptive Container Runtime Reusing

Reusability First: Toward FAIR Workflows

10:00 am PT

Break

10:20 am PT - Session 6

Track A: Understanding system interactions with applications

Chair: Jay Lofstead, Sandia National Laboratories

Thinking More about RDMA Memory Semantics

Monitoring Large Scale Supercomputers: A Case Study with the Lassen Supercomputer

Robustness Analysis of Loop-Free Floating-Point Programs via Symbolic Automatic Differentiation

Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint Alteration

Track B: Communication Optimization

Chair: Bing Xie, Oak Ridge National Laboratory

On-the-Fly, Robust Translation of MPI Libraries

Daps: A Dynamic Asynchronous Progress Stealing Model for MPI Communication

Combining One-Sided Communications with Task-Based Programming Models

Optimizing Barrier Synchronization on ARMv8 Many-Core Architectures

11:40 - 12:00 pm PT

Closing