IEEE Cluster 2010 Conference Program

Gold supporters:

Silver supporters:

IEEE Cluster 2010 Program

Monday, September 20 2010

Workshop on Parallel Programming and Applications on Accelerator Clusters (PPAAC)
Session 1 / 09:00 – 10.30 / Ourania Hall
09:00–09:15 09:15–10:00 10:00–10:30	Opening Remarks Alexandros Stamatakis, Technische Universität München Keynote Speech : Highly Parallel Implementations of Bioinformatics Applications Ioannis Papaefstathiou, Technical University of Crete A Package for OpenCL Based Heterogeneous Computing on Clusters with Many GPU Devices Amnon Barak, Tal Ben-Nun, Ely Levy, Amnon Shiloh, The Hebrew University of Jerusalem
10:30–11:00	Coffee Break
Session 2 / 11:00 – 12.30 / Ourania Hall
11:00–11:30 11:30–12:00 12:00–12:30	Accelerating Data Clustering on GPU-based Clusters under Shared Memory Abstraction Konstantinos Karantasis, Eleftherios Polychronopoulos and George N. Dimitrakopoulos, University of Patras A Multi-Platform Linear Algebra Toolbox for Finite Element Solvers on Heterogeneous Clusters Vincent Heuveline, Chandramowli Subramanian, Dimitar Lukarski, Jan-Philipp Weiss, Karlsruhe Institute of Technology Efficient Complex Matrix Multiplication on the Synergistic Processing Element of the Cell Processor Quentin Bourgerie, Pierre Fortin, Jean-Luc Lamotte, Université Pierre et Marie Curie
12:30–14:00	Lunch (Main Restaurant)
Session 3 / 14:00 – 15.45 / Ourania Hall
14:00–14:45 14:45–15:15 15:15–15:45	Invited Presentation: Green Flash: Ultra-Efficient Supercomputing David Donofrio, Lawrence Berkeley National Laboratory High Performance Triangle versus Box Intersection Checks Thomas V. Christensen and Sven Karlsson, Technical University of Denmark Assessment of Barrier Implementations for Fine-Grain Parallel Regions on Current Multi-core Architectures Simon A. Berger and Alexandros Stamatakis, Technische Universität München

Tutorial on Practical Approach to Performance Analysis and Modeling

09:00 – 17:00 / Kalia Hall

09:00 – 17:00

Adolfy Hoisie. Darren J. Kerbyson,

Pacific Northwest National Laboratory

Abstract: This tutorial presents a practical approach to the performance modeling of large-scale scientific applications on high performance systems. The defining characteristic involves the description of a proven modeling approach, developed at Los Alamos, of full-blown scientific codes, that has been validated on systems containing 10,000’s of processors and beyond. We show how models are constructed and demonstrate how they are used to predict, explain, diagnose, and engineer application performance in existing or future codes and/or systems. Notably, our approach does not require the use of specific tools but rather is applicable across commonly used environments. Moreover, since our performance models are parametric in terms of machine and application characteristics, they imbue the user with the ability to “experiment ahead” with different system configurations or algorithms/coding strategies. Both will be demonstrated in studies emphasizing the application of these modeling techniques including: verifying system performance, comparison of large-scale systems, and examination of possible future systems.

Workshop on High Performance Computing on Complex Environments (HPCCE)
Session 1 / 08:30 – 10:30 / Clio Hall
08:30–09:00 09:00–09:30 09:30–10:00 10:00–10:30	Opening Remarks, Emmanual Jeannot, INRIA Parallel Sorting Algorithms for Optimizing Particle Simulations Michael Hofmann, Gudula Rünger, Chemnitz University of Technology; Paul Gibbon, Robert Speck, Jülich Supercomputing Centre Investigation of Selection Strategies in Parallel Branch and Bound Algorithm with Simplicial Partitions Remigijus Paulavicius, Julius Žilinskas, Institute of Mathematics and Informatics–Akademijos; Andreas Grothey, University of Edinburgh Investigation of Parallel Particle Swarm Optimization Algorithm With Reduction of the Search Area Algirdas Lancinskas, Julius Žilinskas, Institute of Mathematics and Informatics–Akademijos; Pilar Martínez Ortigosa, University of Almeria
10:30–11:00	Coffee Break
Session 2 / 11:00–12:30 / Clio Hall
11:00–11:30 11:30–12:00 12:00–12:30	Optimization of Topology of Truss Structures using Grid Computing Aleksandr Igumenov, Julius Žilinskas, Institute of Mathematics and Informatics–Akademijos; Krzysztof Kurowski, Mikolaj Mackowiak, Poznan Supercomputing and Networking Center Identifying Cloud Computing Usage Patterns, Dana Petcu, West University of Timisoara THOR: A Transparent Heterogeneous Open Resource framework Jose Luis Vázquez-Poletti Universidad Compultense de Madrid; Jan Perhac, John Ryan, Anne C. Elster, Norwegian University of Science and Technology
12:30–14:00	Lunch (Main Restaurant)
Session 3 / 14:00–15:00 / Clio Hall
14:00–14:30 14:30–15:00	Run-Time Optimization of Sends, Receives and File I/O Thorvald Natvig, Anne C. Elster, Norwegian University of Science and Technology Applicability of Dynamic Selection of Implementation Variants of Sequential Iterated Runge-Kutta Methods Natalia Kalinnik, Matthias Korch, Thomas Rauber, University of Bayreuth
15:00–15:30	Coffee Break
Session 4 / 15:30–16:30 / Clio Hall
15:30–16:00 16:00–16:30	GPU-Based Segmentation of Cervical Vertebra in X-Ray Images Sidi Ahmed Mahmoudi, Fabian Lecron, Pierre Manneback, Mohammed Benjelloun, Saïd Mahmoudi, University of Mons GPU Implementation of the Pixel Purity Index Algorithm for Hyperspectral Image Analysis Sergio Sánchez, Antonio Plaza, University of Extremadura
16:30–17:00	Coffee Break
Session 5 (Invited Presentations) / 17:00–18:20 / Clio Hall
17:00–17:20 17:20–17:40 17:40–18:00 18:00–18:20	Performance of Scheduling Strategies in Computational Grids and Clouds Helen Karatza, Aristotle University of Thessaloniki Component-based Methodology for High Development Productivity of Complex Applications Vladimir Getov, University of Westminster Research Activities at the University of Manchester related to Complex HPC, Rizos Sakellariou, University of Manchester Selecting High Performance Computing and High Throughput Computing Capabilities for Hydro Meteo Research e-Instrastructures Andrea Clematis, Daniele D’ Agostino, Antonella Galizia, Alfonso Quarati, IMATI-CNR; Antonio Parodi, Nicola Rebora, CIMA Research Foundation; Dieter Kranzlmueller, Michael Schiffers, Ludwig Maximilian Universität and Leibniz Supercomputing Center

Tuesday, September 21 2010

Plenary Session
Opening Remarks & Keynote 1 / 09:00 – 10:30 / Hermes Hall
09:00–09:15 09:15 –10:30	Opening Remarks Dimitrios S. Nikolopoulos, Angelos Bilas, FORTH-ICS; Ricardo Bianchini, Rutgers University Keynote 1 Title: No Power, No Cloud Speaker: Christian Belady, Microsoft Research
10:30–11:00	Coffee Break
Session 1 (Chair: DK Panda) / 11:00 – 12:30 / Hermes Hall
11:00–11:30 11:30–12:00 12:00–12:30	Minimizing MPI Resource Contention in Multithreaded Multicore Environments David Goodell, Pavan Balaji, Darius Buntinas, ANL; Gabor Dozsa, IBM; William Gropp, University of Illinois; Sameer Kumar, IBM; Bronis De Supinski, LLNL/CASC; Rajeev Thakur, ANL TCCluster: A Cluster Architecture Utilizing the Processor Host Interface as a Network Interconnect Heiner Litz, Maximilian Thuermer, Ulrich Bruening, University of Heidelbe Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing Canqun Yang, Feng Wang, NUDT, PRC; Yunfei Du, Juan Chen, Jie Liu, Huizhan Yi, Kai Lu, School of Computer Science, National University of Defense Technology
12:30-14:00	Lunch (Main Restaurant)
	Session 2 (Chair: Brian Wylie) / 14:00 – 15:30 / Hermes Hall	Session 3 (Chair: Ron Brightwell) / 14:00 – 15:30 / Apollon Hall
14:00–14:30 14:30–15:00 15:00–15:30	How to scale Nested OpenMP Applications on the ScaleMP vSMP Architecture Dirk Schmidl, Christian Terboven, Andreas Wolf, Dieter anMey, Christian Bischof, RWTH Aachen University Synchronizing Concurrent Events in Traces of Hybrid MPI/OpenMP Applications Daniel Becker, German Research School for Sim; MarkusGeimer, Forschungszentrum Juelich GmbH; RolfRabenseifner; Felix Wolf, GRS Getting Rid of Coherency Overhead for Memory-Hungry Applications Hector Montaner, Federico Silla; Univ. Politècnica deValència; Holger Froning, Universität Heidelberg; JoseDuato, Univ. Politècnica de València	Energy-aware Scheduling in Virtualized Datacenters Íñigo Goiri, Ferran Julià, UPC; Ramón Nou, Josep Berral, Jordi Guitart, Jordi Torres, BSC TRACER: A Trace Replay Tool to Evaluate Energy-Efficiency of Mass Storage Systems Zhuo Liu, Fei Wu, Xiao Qin, Department of Computer Science and Software Engineering, Auburn University, Auburn; Chang Sheng Xie, Jian Zhou, Huazhong University of Science and Technology; Jianzong Wang Designing OS for HPC applications: Scheduling Roberto Gioiosa; BSC; Sally McKee; Chalmers University of Technology; Mateo Valero; BSC
15:30 -16:00	Coffee Break
	Session 4 (Chair: Darren Kerbyson) / 16:00 -17:30 / Hermes Hall	Session 5 (Chair: Vijay Pai) / 16:00 -17:30 / Apollon Hall
16:00–16:30 16:30–17:00 17:00–17:30	Exploiting Data Deduplication to Accelerate Live Virtual Machine Migration Xiang Zhang, Zhigang Huo, Dan Meng, Chinese Academy of Sciences SHelp: Automatic Self-healing for Multiple Application Instances in Virtual Machine Environment Gang Chen, Hai Jin, Deqing Zou, Huazhong Univ. of Sci. & Tech.; Bingbing Zhou, University of Sydney; Weizhong Qiang, Huazhong Univ. of Sci. & Tech. Virtualizing Modern OS-bypass Networks with Performance and Scalability Bo Li, Institute of Computing Technology; Zhigang Huo, Panyong Zhang, Dan Meng, Chinese Academy of Sciences	RDMA-Based Job Migration Framework for MPI over InfiniBand Xiangyong Ouyang, Sonya Marcarelli, Raghunath Rajachandrasekar, Dhabaleswar Panda, The Ohio State University Host Side Dynamic Reconfiguration in Infiniband Wei Lin Guay, Sven-Arne Reinemo, Olav Lysne, Tor Skeie, Simula Research Laboratory Multiplexing Endpoints of HCA for Scaling MPI applications: Design and Performance Evaluation with uDAPL Jasjit Singh, Yogeshwar Sonawane, C-DAC

Poster Session

19:00-21:00

19:00-21:00

Design and Evaluation of Remote Memory Disk Cache

Changgyoo Park, Shin-gyu Kim, Hyuck Han, Hyeonsang Eom, Heon Y. Yeom, Seoul National University

Power-aware, Dependable, and High-Performance Communication Link using PCI Express: PEARL

Toshihiro Hanawa, Taisuke Boku, Shin’ichi Miura, Mitsuhisa Sato, Kazutami Arimoto, University of Tsukuba

Cloud-based Synchronization of Distributed File System Hierarchies

Sandesh Uppoor, Michail D. Flouris, Angelos Bilas, FORTH-ICS

Low-latency Explicit Communication and Synchronization in Scalable Multi-core Clusters

Christoforos Kachris, George Nikiforos, Vassilis Papaefstathiou, Stamatis Kavadias, Manolis Katevenis, FORTH-ICS

Non-blocking Adaptive Cycles: Deadlock Avoidance for Fault-tolerant Interconnection Networks

Gonzalo Zarza, Diego Lugones, Daniel Franco, Emilio Luque, Universitat Autonoma Barcelona

A Multi-Pronged Approach to Benchmark Characterization

Nikola Puzovic, University of Siena; Sally McKee, Chalmers University; Revital Eres, Ayal Zaks, IBM Haifa; Paolo Gai, Evidence S.r.l.; Stephan Wong, Delft University of Technology; Roberto Giorgi, University of Siena

Early Experience of Building a Cloud Platform for Service Oriented Software Development

Hailong Sun, Xu Wang, Chao Zhou, Zicheng Huang, Xudong Liu, Beihang University

Adaptable Scheduling Schemes for Scientific Applications on Science Cloud

Seoyoung Kim, Yoonhee Kim, Sookmyung Women's University; Naeyoung Song, Chongam Kim, Seoul National University

Fault-Tolerance Mechanisms for Exascale Systems

Maria Ruiz Varela, University of Delaware; Kurt B. Ferreira, Rolf E. Riesen, Sandia National Laboratories

(Drinks and snacks will be served at the adjoining area)

Wednesday, September 22 2010

Plenary Session
Keynote 2 / 09:00 – 10:30 / Hermes Hall
09:00–10:30	Title: Scaling Storage into the Exascale Era Speaker: Garth Gibson, Carnegie Mellon University and Panasas Inc.
10:30–11:00	Coffee Break
Session 6 (Chair: Daniel Katz) / 11:00–12:30 / Hermes Hall
11:00–11:30 11:30–12:00 12:00–12:30	The Impact of System Design Parameters on Application Noise Sensitivity Kurt Ferreira, Sandia National Labs; Patrick Bridges, Univ. of New Mexico; Ron Brightwell, Kevin Pedretti, Sandia National Labs Computing Contingency Statistics in Parallel: Design Trade-Offs and Limiting Cases Philippe Pébay, Janine Bennett, David Thompson, Sandia National Labs Integration Experiences and Performance Studies of A COTS Parallel Archive System Hsing-bung (HB) Chen, Los Alamos National Lab
12:30–14:00	Lunch (Main Restaurant)
	Session 7 (Chair: Roberto Gioiosa) / 14:00 – 15:30 / Hermes Hall	Session 8 (Chair: Rob Latham) / 14:00 – 15:30 / Apollon Hall
14:00–14:30 14:30–15:00 15:00–15:30	Enforcing SLAs in Scientific Clouds Oliver Nieh¨rster, André Brinkmann, Gregor Fels, Paderborn Center for Parallel Computing; Jens Krüger, Univ. of Paderborn; Jens Simon, Paderborn Center for Parallel Computing DRM: A Dynamic Replication Management Scheme for Cloud Storage Cluster Qingsong Wei, Data Storage Institute; Bharadwaj Veeravalli, National University of Singapore An Efficient Process Live Migration Mechanism for Load Balanced Distributed Virtual Environments Balazs Gerofi, Hajime Fujita, Yutaka Ishikawa, University of Tokyo	Acceleration of Streamed Tensor Contraction Expressions on GPGPU-based Clusters Wenjing Ma, Sriram Krishnamoorthy, Oreste Villa, Karol Kowalski, Pacific Northwest National Laboratory Efficient Parallel Subgraph Counting using G-Tries Pedro Ribeiro, Fernando Silva, Luís Lopes, Universidade do Porto Cluster versus GPU Implementation of an Orthogonal Target Detection Algorithm for Remotely Sensed Hyperspectral Images Abel Paz, Antonio Plaza, University of Extremadura
15:30–16:00	Coffee Break
Conference Panel / 16:00–17:30 / Hermes Hall
16:00–17:30	Title: Implications of Exascale Computing for Storage Systems Research Moderator: Andre Brinkmann, Univ. of Paderborn, Germany Panelists: Toni Cortes, UPC / BSC Garth Gibson, CMU / Panasas Peter Haas, HLRS Stuttgart Rob Ross, ANL
19:00–22:00	Conference Beach Dinner

Thursday, September 23 2010

Plenary Session
Keynote 3 / 09:00 – 10:30 / Hermes Hall
09:00–10:30	Title: Image-Based Biomedical Modeling, Simulation and Visualization Speaker: Chris Johnson, University of Utah
10:30–11:00	Coffee Break
Session 9 (Chair: Rob Ross) / 11:00–12:30 / Hermes Hall
11:00–11:30 11:30–12:00 12:00–12:30	Breaking the MapReduce stage barrier Abhishek Verma, Nicolas Zea, Brian Cho, Indranil Gupta, Roy Campbell, University of Illinois at Urbana-Champaign Asynchronous Algorithms in MapReduce Karthik Shashank Kambatla, Naresh Rapolu, Suresh Jagannathan, Ananth Grama, Purdue University Reducing Communication Overhead in Large Eddy Simulation of Jet Engine Noise Yingchong Situ, Lixia Liu, Chandra Martha, Matthew Louis, Zhiyuan Li, Gregory Blaisdell, Anastasios Lyrintzis, Purdue University
12:30–14:00	Lunch (Main Restaurant)
	Session 10 (Chair: Adolfy Hoisie) / 14:00 – 15:30 / Hermes Hall	Session 11 (Chair: Toni Cortes) / 14:00 – 15:30 / Apollon Hall
14:00–14:30 14:30–15:00 15:00–15:30	Performance Analysis of Multi-level Time Sharing Task Assignment Policies on Cluster-based Systems Malith Jayasinghe, Zahir Tari, Panlop Zeephongsekul, RMIT Univ., Australia A Simulation Framework to Automatically Analyze the Communication-Computation Overlap in Scientific Applications Vladimir Subotic, Jose Carlos Sancho, Jesus Labarta, Mateo Valero, BSC Analysis of Tasks Reallocation in a Dedicated Grid Environment Ghislain Charrier, INRIA - LIP/ENS Lyon; FrédéricDesprez, Yves Caniou, UCBL - LIP/ENS Lyon	Replication-based Highly Available Metadata Management for Cluster File Systems Zhuan Chen, ICT; Jin Xiong, Dan Meng, Chinese Academy of Sciences Improving Parallel I/O Performance with Data Layout Awareness Yong Chen, Xian-He Sun, Illinois Institute of Tech; Rajeev Thakur, ANL; Huaiming Song, Hui Jin, Illinois Institute of Technology Optimization Techniques at I/O Forwarding Layer Kazuki Ohta, Univ. of Tokyo; Dries Kimpe, Univ. of Chicago; Jason Cope, Kamil Iskra, Robert Ross, ANL; Yutaka Ishikawa, Univ. of Tokyo
15:30–16:00	Coffee Break
Session 12 (Industry Session) / 16:00–17:00 / Hermes Hall
16:00–16:30 16:30–17:00	Paving The Road to Exascale Computing Gilad Sainer, Mellanox Technologies HPC and Cluster Systems – Made in Saxony Jörg Heydemüller, Megware

Friday, September 24 2010

Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS)
Session 1 / 08:30 – 10:00 / Ourania Hall
08:30–08:45 08:45–09:30 09:30–10:00	Opening Remarks Rob Latham, Argonne National Laboratory Invited Presentation: Block-level Virtualization aka Doing Things Below the Filesystem: Examples, Observations, and Challenges Angelos Bilas, FORTH-ICS Object Storage Semantics for Replicated Concurrent-Writer File Systems Philip Carns, Robert Ross, Samuel Lang, Argonne National Laboratory
10:30–11:00	Coffee Break
Session 2 / 11:00 – 12:300 / Ourania Hall
11:00–11:30 11:30–12:00 12:00–12:30	Supporting High-Performance I/O at the Petascale: The Event Data Store for ATLAS at the LHC Peter van Gemmeren, David Malon, Argonne National Laboratory Comprehensive Data Infrastructure for Plant Bioinformatics Chris Jordan, Dan Stanzione, Texas Advanced Computing Center; Doreen Ware, Christos Noutsos, Jerry Lu, Cold Spring Harbor Laboratory H5hut: A High-Performance I/O Library for Particle-based Simulation Mark Howison, Lawrence Berkeley National Laboratory; Andreas Adelmann, Paul Sherrer Institut; E. Wes Bethel, Lawrence Berkeley National Laboratory; Achim Gsell, Benedikt Oswald, Paul Sherrer Institut; Prabhat, Lawrence Berkeley National Laboratory
12:30–14:00	Lunch (Main Restaurant)
Session 3 / 14:00 – 15:30 / Ourania Hall
14:00–14:30 14:30–15:15 15:15–15:30	pWalrus: Towards Better Integration of Parallel File Systems into Cloud Storage Yoshihisa Abe, Garth Gibson, Carnegie Mellon University Invited Presentation: Title TBA Robert Ross, Argonne National Laboratory Closing Remarks

Tutorial on Designing High-End Computing Systems with IB and 10GigEth

08:30 – 12:30 / Kalia Hall

08:30–12:30

Dhabaleswar K. Panda, Ohio State University; Pavan Balaji, Argonne National Labroatory

Abstract: InfiniBand (IB) and 10-Gigabit Ethernet (10GE) interconnects are generating a lot of excitement towards building next generation High Performance Computing (HPC) systems and enterprise datacenters. This tutorial will provide an overview of these emerging interconnects, their offered features, their current market standing, and their suitability for prime-time HPC. It will start with a brief overview of IB, 10GE and their architectural features. An overview of the emerging OpenFabrics stack which encapsulates both IB and 10GE in a unified manner will be presented. IB and 10GE hardware/software solutions and the market trends will be highlighted. Finally, sample performance numbers highlighting the performance these technologies can achieve in different environments such as MPI, Sockets, Parallel File Systems, Multi-tier Datacenters, and Virtual Machines, will be shown.

Workshop on Application/Architecture Co-design for Extreme-scale Computing (AACEC)
Session 1 / 08:45 – 10:30 / Clio Hall
08:45–09:00 09:00–09:30 09:30–10:00 10:00–10:30	Welcome and Introductory Remarks Invited Presentation: Bringing up Anton: Taking Co-Design into Production Joseph Bank, D. E. Shaw Research Invited Presentation: Green Flash: Three Problems, One Solution David Donofrio, Lawrence Berkeley National Laboratory Mobile-Subjective Programming for Massively Multithreaded Shared Memory Applications Megan Vance, Peter Kogge, University of Notre Dame
10:30–11:00	Coffee Break
Session 2 / 11:00–12:30 / Clio Hall
11:00–11:30 11:30–12:00 12:00–12:30	Invited Presentation: Designing Applications, HW and SW together: adventures with 80 and 48 cores Tim Mattson, Intel Facilitating Co-Design for Extreme-Scale Systems Through Lightweight Simulation Christian Engelmann, Frank Lauer, Oak Ridge National Laboratory Invited Presentation: An Evolutionary Approach to Exascale System Software by Leveraging Co-Design Principles Robert Wisniewski, IBM T. J. Watson Research Center
12:30–14:00	Lunch (Main Restaurant)
Session 3 / 14:00–15:30/ Clio Hall
14:00–14:30 14:30–15:0 15:00–15:30	Invited Presentation : Co-Designing MPI Library and Applications for InfiniBand Clusters Dhabaleswar K. Panda, Ohio State University Efficient Sparse Matrix-Matrix Multiplication on Heterogeneous High Performance Systems Jakob Siegel, University of Delaware; Oreste Villa, Sriram Krishnamoorthy, Antonio Tumeo, Pacific Northwest National Laboratory; Xiaoming Li, University of Delaware Confidence: Analyzing Performance With Empirical Probabilities Bradley W. Settlemyer, Stephen W. Hodson, Jeffery A. Kuehn, Stephen W. Poole, Oak Ridge National Laboratory
15:30–16:00	Coffee Break
Session 4 / 16:00–16:45/ Clio Hall
16:00–16:30 16:30–16:45	Invited Presentation : Opportunities and Approaches for System Software in Supporting Application/Architecture Co- Design Ron Brightwell, Sandia National Laboratories Concluding Remarks

Tutorial on Practical Parallel Application Performance Engineering Using Innovative Tools

08:30 – 17:00 / Thalia Hall

08:30–17:00

Bryan J. N. Wylie, Jülich Supercomputing Centre; Michael Gerndt, Technical University of Munich; Wolfgang Nagel, Technical University of Dresden

Abstract: This tutorial presents state-of-the-art tools for engineering performant parallel applications on computer clusters with MPI and/or OpenMP. The suite of tools developed by the Virtual Institute for High Productivity Supercomputing (VI-HPS) are introduced, including Scalasca, Vampir and Periscope. The tools support automated and manually-customizable measurement and analyses with hardware counter metrics as well as communication and synchronization overheads. A series of hands-on exercises are included which participants are encouraged to follow on their notebook computers using a provided Live-DVD with a bootable typical HPC cluster Linux environment. This will offer practical experience using the tools and help prepare participants to apply modern methods for locating and diagnosing performance bottlenecks in real-world parallel applications up to the largest scales.