IEEE Cluster 2021 Program

Conference - Wednesday, September 8


08:00 am PT

Welcome: Toni Cortes and Kathryn Mohror

Keynote: Kate Keahey, Argonne National Laboratory, USA
Experiments in the Edge to Cloud Continuum
Chair: Ewa Deelman

09:20 am PT - Session 1

Track A: HPC Applications
Chair: Bing Xie, Oak Ridge National Laboratory
tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores

MiniMod: A Modular Miniapplication Benchmarking Framework for HPC

Track B: Performance Optimization
Chair: Jerry Chou, National Tsing Hua University
Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum

WIRE: Resource-efficient Scaling with Online Prediction for DAG-based Workflows

10:00 am PT

Break

10:20 am PT - Session 2

Track A: AI in practice
Chair: Yang Zhang, University of Notre Dame
HPC AI500 V2.0: The Methodology, Tools, and Metrics for Benchmarking HPC AI Systems

RPTCN: Resource Prediction for High-dynamic Workloads in Clouds based on Deep Learning

READYS: A Reinforcement Learning Based Strategy for Heterogeneous Dynamic Scheduling

Accelerating DNN Architecture Search at Scale using Selective Weight Transfer

SAP-SGD: Accelerating Distributed Parallel Training with High Communication Efficiency on Heterogeneous Clusters

2PGraph: Accelerating GNN Training over Large Graphs on GPU clusters

Track B: Storage and I/O
Chair: Awais Khan, Sogang University
HFlow: A Dynamic and Elastic Multi-Layered Data Forwarder

Building A Fast and Efficient LSM-tree Store by Integrating Local Storage with Cloud Storage

Virtual Log-Structured Storage for High-Performance Streaming

RISE: Reducing I/O Contention in Staging-based Extreme-Scale In-situ Workflows

Lazy-WL: A Wear-aware Load Balanced Data Redistribution Method for Efficient SSD Array Scaling

Streamlining distributed Deep Learning I/O with ad hoc file systems

12:20 pm PT

Break

12:20 - 14:20 PT - Poster Session

12:20 - 13:20 PT
Computational Storage to Increase the Analysis Capability of Tier-2 HEP Data Sites

CVFCC: CV-Based Framework for Container Consolidation in Cloud Data Centers

A Roadmap to Robust Science for High-throughput Applications: The Developers' Perspective

RELAR A Reinforcement Learning Framework for Adaptive Routing in Network-on-Chips

Supporting Elastic Compaction of LSM-tree with a FaaS Cluster

Exploring Node Connection Modes in Multi-Rail Fat-tree

SDIS: A PB-level seismic data index system with ML methods

A Dynamic Power Capping Library for HPC Applications

CASQ: Accelerate Distributed Deep Learning with Sketch-Based Gradient Quantization

13:20 - 14:20 PT
Load Balancing Policies for Nested Fork-Join

A Transfer Learning Scheme for Time Series Forecasting Using Facebook Prophet

NUMA-aware I/O System Call Steering

Malleability Implementation in a MPI Iterative Method

A Generative Approach to Visualizing Satellite Data

Incorporating Fault-Tolerance Awareness into System-Level Modeling and Simulation

Halcyon: Unified HPC Center Operations

Toward a Comprehensive Benchmark Suite for Evaluating GASPI in HPC Environments

Automatic Parallelisation of Sturctured Mesh Computations with SYCL

Conference - Thursday, September 9

08:00 am PT - Session 3

Announcements

Best Paper Candidates
Chair: Dingwen Tao, Washington State University
Accelerating GPU Message Communication for Autonomous Navigation Systems
csTuner: Scalable Auto-tuning Framework for Complex Stencil Computation on GPUs

09:00 am PT - Parallel Sessions

Track A: Applications
Chair: Weifeng Liu, China University of Petroleum-Beijing
Octo-Tiger's New Hydro Module and Performance using HPX+CUDA on ORNL's Summit
Pipelined Preconditioned s-step Conjugate Gradient methods for Distributed Memory Systems
Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs

Track B: Workloads
Chair: Jie Liu, University of California Merced
Optimizing Distributed Load Balancing for Workloads with Time-Varying Imbalance
Distributed Work Stealing at Scale via Matchmaking
Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts

10:00 am PT

Break

10:20 am PT - Session 4

Track A: Compression and data reduction
Chair: Lin Gan, Tsinghua University
CSwap: A Self-Tuning Compression Framework for Accelerating Tensor Swapping in GPUs
cuSZ(x): Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs
Exploring Autoencoder-Based Error-Bounded Compression for Scientific Data
cuZ-Checker: A GPU-Based Ultra-Fast Assessment System for Lossy Compressions
DPZ: Improving Lossy Compression Ratio with Information Retrieval on Scientific Data
O(1) Communication for Distributed SGD through Two-Level Gradient Averaging

Track B: Systems support for parallel applications
Chair: Keren Zhou, Rice University
Distributed Computation of Persistent Homology from Partitioned Big Data
FineQuery: Fine-Grained Query Processing on CPU-GPU Integrated Architectures
Packet Forwarding Cache of Commodity Switches for Parallel Computers
Two-Chains: High Performance Framework for Function Injection and Execution
HNGraph: Parallel Graph Processing in Hybrid Memory Based NUMA Systems
Modeling the Linux page cache for accurate simulation of data-intensive applications

12:20 pm PT

Break

12:40 - 13:40 PT

Keynote: David Abramson, University of Queensland, Australia
Translational Research in Cluster Computing
Chair: Maciej Malawski

Conference - Friday, September 10

08:00 am PT

Announcements, Cluster 2022 Preview, and Best Paper Award Announcement

Keynote 3: Trilce Estrada, University of New Mexico, USA
Demystifying machine learning in scientific research: a case for embedding domain knowledge in data representation
Chair: Hai Ah Nam

09:20 am PT - Session 5

Track A: Impacts of Errors on Applications
Chair: Bogdan Nicolae, Argonne National Laboratory
Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights
Understanding the Effects of DRAM Correctable Error Logging at Scale

Track B: Supporting Applications
Chair: Pengfei Su, UC Merced
Tackling Cold Start of Serverless Applications by Efficient and Adaptive Container Runtime Reusing
Reusability First: Toward FAIR Workflows

10:00 am PT

Break

10:20 am PT - Session 6

Track A: Understanding system interactions with applications
Chair: Jay Lofstead, Sandia National Laboratories
Thinking More about RDMA Memory Semantics
Monitoring Large Scale Supercomputers: A Case Study with the Lassen Supercomputer
Robustness Analysis of Loop-Free Floating-Point Programs via Symbolic Automatic Differentiation
Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint Alteration

Track B: Communication Optimization
Chair: Bing Xie, Oak Ridge National Laboratory
On-the-Fly, Robust Translation of MPI Libraries
Daps: A Dynamic Asynchronous Progress Stealing Model for MPI Communication
Combining One-Sided Communications with Task-Based Programming Models
Optimizing Barrier Synchronization on ARMv8 Many-Core Architectures

11:40 - 12:00 pm PT

Closing