Conference - Wednesday, September 808:00 am PTWelcome: Toni Cortes and Kathryn Mohror
Keynote: Kate Keahey, Argonne National Laboratory, USA
Experiments in the Edge to Cloud Continuum
Chair: Ewa Deelman
09:20 am PT - Session 1Track A: HPC Applications
Chair: Bing Xie, Oak Ridge National Laboratory
tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores
MiniMod: A Modular Miniapplication Benchmarking Framework for HPC
Track B: Performance Optimization
Chair: Jerry Chou, National Tsing Hua University
Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum
WIRE: Resource-efficient Scaling with Online Prediction for DAG-based Workflows
10:00 am PTBreak
10:20 am PT - Session 2Track A: AI in practice
Chair: Yang Zhang, University of Notre Dame
HPC AI500 V2.0: The Methodology, Tools, and Metrics for Benchmarking HPC AI Systems
RPTCN: Resource Prediction for High-dynamic Workloads in Clouds based on Deep Learning
READYS: A Reinforcement Learning Based Strategy for Heterogeneous Dynamic Scheduling
Accelerating DNN Architecture Search at Scale using Selective Weight Transfer
SAP-SGD: Accelerating Distributed Parallel Training with High Communication Efficiency on Heterogeneous Clusters
2PGraph: Accelerating GNN Training over Large Graphs on GPU clusters
Track B: Storage and I/O
Chair: Awais Khan, Sogang University
HFlow: A Dynamic and Elastic Multi-Layered Data Forwarder
Building A Fast and Efficient LSM-tree Store by Integrating Local Storage with Cloud Storage
Virtual Log-Structured Storage for High-Performance Streaming
RISE: Reducing I/O Contention in Staging-based Extreme-Scale In-situ Workflows
Lazy-WL: A Wear-aware Load Balanced Data Redistribution Method for Efficient SSD Array Scaling
Streamlining distributed Deep Learning I/O with ad hoc file systems
12:20 pm PTBreak
12:20 - 14:20 PT - Poster Session12:20 - 13:20 PT
Computational Storage to Increase the Analysis Capability of Tier-2 HEP Data Sites
CVFCC: CV-Based Framework for Container Consolidation in Cloud Data Centers
A Roadmap to Robust Science for High-throughput Applications: The Developers' Perspective
RELAR A Reinforcement Learning Framework for Adaptive Routing in Network-on-Chips
Supporting Elastic Compaction of LSM-tree with a FaaS Cluster
Exploring Node Connection Modes in Multi-Rail Fat-tree
SDIS: A PB-level seismic data index system with ML methods
A Dynamic Power Capping Library for HPC Applications
CASQ: Accelerate Distributed Deep Learning with Sketch-Based Gradient Quantization
13:20 - 14:20 PT
Load Balancing Policies for Nested Fork-Join
A Transfer Learning Scheme for Time Series Forecasting Using Facebook Prophet
NUMA-aware I/O System Call Steering
Malleability Implementation in a MPI Iterative Method
A Generative Approach to Visualizing Satellite Data
Incorporating Fault-Tolerance Awareness into System-Level Modeling and Simulation
Halcyon: Unified HPC Center Operations
Toward a Comprehensive Benchmark Suite for Evaluating GASPI in HPC Environments
Automatic Parallelisation of Sturctured Mesh Computations with SYCL
|
Conference - Thursday, September 908:00 am PT - Session 3Announcements
Best Paper Candidates
Chair: Dingwen Tao, Washington State University
Accelerating GPU Message Communication for Autonomous Navigation Systems
csTuner: Scalable Auto-tuning Framework for Complex Stencil Computation on GPUs
09:00 am PT - Parallel SessionsTrack A: Applications
Chair: Weifeng Liu, China University of Petroleum-Beijing
Octo-Tiger's New Hydro Module and Performance using HPX+CUDA on ORNL's Summit
Pipelined Preconditioned s-step Conjugate Gradient methods for Distributed Memory Systems
Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs
Track B: Workloads
Chair: Jie Liu, University of California Merced
Optimizing Distributed Load Balancing for Workloads with Time-Varying Imbalance
Distributed Work Stealing at Scale via Matchmaking
Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts
10:00 am PTBreak
10:20 am PT - Session 4Track A: Compression and data reduction
Chair: Lin Gan, Tsinghua University
CSwap: A Self-Tuning Compression Framework for Accelerating Tensor Swapping in GPUs
cuSZ(x): Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs
Exploring Autoencoder-Based Error-Bounded Compression for Scientific Data
cuZ-Checker: A GPU-Based Ultra-Fast Assessment System for Lossy Compressions
DPZ: Improving Lossy Compression Ratio with Information Retrieval on Scientific Data
O(1) Communication for Distributed SGD through Two-Level Gradient Averaging
Track B: Systems support for parallel applications
Chair: Keren Zhou, Rice University
Distributed Computation of Persistent Homology from Partitioned Big Data
FineQuery: Fine-Grained Query Processing on CPU-GPU Integrated Architectures
Packet Forwarding Cache of Commodity Switches for Parallel Computers
Two-Chains: High Performance Framework for Function Injection and Execution
HNGraph: Parallel Graph Processing in Hybrid Memory Based NUMA Systems
Modeling the Linux page cache for accurate simulation of data-intensive applications
12:20 pm PTBreak
12:40 - 13:40 PTKeynote: David Abramson, University of Queensland, Australia
Translational Research in Cluster Computing
Chair: Maciej Malawski
|
Conference - Friday, September 1008:00 am PTAnnouncements, Cluster 2022 Preview, and Best Paper Award Announcement
Keynote 3: Trilce Estrada, University of New Mexico, USA
Demystifying machine learning in scientific research: a case for embedding domain knowledge in data representation
Chair: Hai Ah Nam
09:20 am PT - Session 5Track A: Impacts of Errors on Applications
Chair: Bogdan Nicolae, Argonne National Laboratory
Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights
Understanding the Effects of DRAM Correctable Error Logging at Scale
Track B: Supporting Applications
Chair: Pengfei Su, UC Merced
Tackling Cold Start of Serverless Applications by Efficient and Adaptive Container Runtime Reusing
Reusability First: Toward FAIR Workflows
10:00 am PTBreak
10:20 am PT - Session 6Track A: Understanding system interactions with applications
Chair: Jay Lofstead, Sandia National Laboratories
Thinking More about RDMA Memory Semantics
Monitoring Large Scale Supercomputers: A Case Study with the Lassen Supercomputer
Robustness Analysis of Loop-Free Floating-Point Programs via Symbolic Automatic Differentiation
Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint Alteration
Track B: Communication Optimization
Chair: Bing Xie, Oak Ridge National Laboratory
On-the-Fly, Robust Translation of MPI Libraries
Daps: A Dynamic Asynchronous Progress Stealing Model for MPI Communication
Combining One-Sided Communications with Task-Based Programming Models
Optimizing Barrier Synchronization on ARMv8 Many-Core Architectures
11:40 - 12:00 pm PTClosing
|