### SIAM AN09 Minisymposium on High Performance Computing on Massive Real-World Graphs

####
Colorado Convention Center

Denver, CO

6-10 July 2009

### Scope and Goals:

Graph theoretic abstractions are extensively used to analyze massive temporal streaming data from socio-economic interactions, social networking sites, communication networks, scientific computing, and applications in national security. Yet these problems pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. Few parallel graph algorithms outperform their best sequential implementation due to long memory latencies and high synchronization costs. In this minisymposium, we consider novel research on graph problems for real-world applications including the representation of these problems in the language of linear algebra.

### Location:

This workshop is co-located with SIAM AN09, held 6-10 July 2009, at the Sheraton Denver Downtown Hotel

in Denver, CO. Registration information for AN09 can be found at here.

Program:

Wednesday, July 8
#### 10:30 AM - 12:30 PM

*Room: 710*

**10:30-10:55 Exascale Analytics of Massive Social Networks***David A. Bader*, Georgia Institute of Technology- Combinatorial techniques are proving useful in solving real-world challenges in traditional and emerging computational sciences such as chemistry, biology, and medicine, as well as applications in national security. Emerging real-world applications can be modeled using algorithms that process massive spatio-temporal complex networks, such as power grid stability, inference of gene function in protein interaction networks, and cancer research, as well as homeland security and national defense. In this talk we focus on the analysis of massive social networks using graph theoretic techniques. Our experimental studies use real-world graph instances with billions of elements and demonstrate superior performance on highly parallel 20 systems such as the Cray XMT multithreaded architecture and the Sun's Niagara 2 Maramba system. We will also describe SNAP (Small- world Network Analysis and Partitioning), an open-source graph framework that we have developed for the exploration of massive complex networks and present new ideas on the exploration of the dynamic structure of massive spatio-temporal networks with billions of entities, such as understanding the genesis and dissipation of communities, allegiance switching, and source detection.
**11:00-11:25 Graph Detection Theory for Power Law Graphs***Jeremy Kepner*, Massachusetts Institute of Technology**11:30-11:55 Parallel Combinatorial BLAS and Applications in Graph Computation***Aydin Buluc*and John R. Gilbert, University of California, Santa Barbara- Graph computations are difficult to parallelize using traditional approaches due to their irregular nature and low operational intensity. Many graph computations, however, contain sufficient coarse grained parallelism that can be uncovered by using the right set of abstractions. We have been developing a parallel combinatorial BLAS that consists of a small but powerful set of linear algebra primitives. We will explain our software design philosophy and report on the initial results with our reference implementation.
**12:00-12:25 Scaling up Graph Algorithms on Emerging Multicore Systems***Kamesh Madduri*, Lawrence Berkeley National Laboratory- Graph algorithms on sparse random networks typically suffer a 3-5X drop in performance when their memory footprint gets larger than the last-level cache on current systems. We attempt to ameliorate this drop for breadth-first search, and present several strategies to limit the number of non-contiguous memory references incurred in graph traversal. We extend these ideas to improve parallel scaling on cache-based multicore systems, by utilizing the shared and private caches more effectively, and by reducing synchronization overhead. We observe that the strategies discussed for breadth-first search are applicable to a large class of graph algorithms, and present another case study of a cache-aware parallel betweenness centrality algorithm for small-world networks.

#### 4:00 PM - 6:00 PM

*Room: 710*

**4:00-4:25 Parallel Implementation of Tensor Decompositions for Large Data Analysis***Mark P. Sears*, Brett W. Bader, and Tammy Kolda, Sandia National Laboratories- Large data sets can often be mapped into multidimensional arrays, which we call tensors; the resulting tensor is usually extremely sparse. Similar to the SVD decomposition of a matrix there are a number of proposed tensor representations in terms of factors. This talk will describe serial and parallel implementations of several algorithms for tensor analysis, in particular algorithms which compute PARAFAC and Tucker representations of tensors. We will discuss previous and potential applications of these representations.
**4:30-4:55 Novel Graph Algorithms for Structure Extraction from Informatics Networks***Michael Mahoney*, Stanford University- Large informatics graphs such as large social and information networks that are increasingly common in Internet and high-performance computing applications have surprising complex and subtle structural and dynamic properties. For instance, their size alone, which can be in the millions or billions or more of nodes, immediately renders many traditional algorithmic tools unusable. In addition, their extreme sparsity and adversarial noise properties render many existing statistical tools inappropriate. Finally, due to their heavy-tailed properties, these networks are not meaningfully low-dimensional nor are they meaningfully high-dimensional, and thus methods that explicitly or implicitly rely on either of those common assumptions have severe limitations. I will describe recently-developed algorithms that begin to deal with some of these problems and that allow the analyst to probe in very fine ways nontrivial structural properties of extremely large informatics networks. Examples of these algorithms include generalizations of spectral graph partitioning to find good cuts that are near a specified node, efficient methods to compute graph resistances that can be used to identify disproportionately important or influential nodes, graph partitioning algorithms that interpolate between spectral and flow-based algorithms, and heuristics to characterize the large-scale clustering structure of large networks.
**5:00-5:25 Scalable & Efficient Parallelization of Graph Methods for Boolean Satisfiability and Power Grid Analysis on the Cray XMT***Daniel Chavarria*, Pacific Northwest National Laboratory- Graph-based algorithms have a broad applicability to many different areas of science and technology. Unfortunately, executing them efficiently for large-scale graphs requires a significant time and resource investment on traditional HPC systems. The Cray XMT multithreaded architecture provides specialized hardware designed for the efficient parallel execution of highly irregular codes, such as those derived from graph algorithms. This presentation will discuss our experiences in implementing two graph-based algorithms on the Cray XMT system: a Boolean satisfiability solver based on parallel Survey Propagation and an exact, edge-based, weighted Betweenness Centrality calculation applied to power grid analysis.
**5:30-5:55 Open Discussion / Challenge Problems**

#### 10:30 AM - 12:30 PM

*Room: 710*

**10:30-10:55 Detecting Community Structure in Dynamic Networks***Sanjukta Bhowmick*, Shweta Bansal, Kelly Fermoyle, and Padma Raghavan, Pennsylvania State University- Community structure detection in networks has important applications in understanding social structures, spread of epidemics, etc. However most algorithms for detecting communities (defined as tightly connected group of nodes) are applicable only to a static network with fixed nodes and edges. In reality, social networks are dynamic and the connections change with time. In this talk, we present an algorithm that can dynamically detect the change in communities as the network evolves over time steps.
**11:00-11:25 Structure of Large Scale Social Contact Graphs and its Effect on Epidemics***Anil Vullikanti*, Virginia Polytechnic Institute & State University- In recent years, there has been a lot of interest in studying the spread of epidemics on social contact graphs. We discuss important structural properties of a class of large scale synthetic social contact graphs. We discuss what effect the underlying graph structure has on the disease dynamics, and how these structural properties can be used in designing effective interventions for epidemics (e.g., whom to vaccinate), which is an important public health issue.
**11:30-11:55 High Performance Computing for Large Graph Problems***Bruce Hendrickson*and Jonathan Berry, Sandia National Laboratories- Graph abstractions have long been central to sparse matrices, parallel computing, chemistry, logistics and many other areas. But recent years have witnessed a further broadening of graph models to represent entities and relationships in ecology, social science, text analysis and other fields. There is a growing need for high performance graph algorithms to address problems in which the data is very large and/or timely response is important. For complex graphs, algorithms often exhibit very poor locality, and so cache-based architectures, and message-passing programming models deliver disappointing performance. In this talk, I will discuss our exploration of non-traditional architectures and programming models for graph algorithms, placing this work in the larger context of ongoing changes within the high performance computing universe.
**12:00-12:25 Anatomy of a Distributed Graph***Andrew Lumsdaine*, Douglas Gregor, and Nick Edmonds, Indiana University- For the large-scale matrix and vector data structures used in
scientific computing the data distribution process has become
relatively well understood. However, the increasing prominence
of large-scale graph-based problems has magnified the importance
of effective computation with distributed graph structures.
Based on our experience with the Parallel Boost Graph Library, we
present an overview of the unique characteristics of distributed
graph data structures and the impact of these characteristics on
application development and performance.

### Workshop Co-Chairs:

**David A. Bader**,*Georgia Institute of Technology***Jeremy Kepner**,*Massachusetts Institute of Technology Lincoln Laboratory***John R. Gilbert**,*University of California, Santa Barbara***Sanjukta Bhowmick**,*Pennsylvania State University***Padma Raghavan**,*Pennsylvania State University*