WebScale Data Management
Christopher Olston, Ben Reed (Yahoo! Research)
A new breed of software systems is being developed to manage and process web-scale data sets on large clusters of commodity computers. A typical software stack includes a distributed file system (e.g., GFS, HDFS), a scalable data-parallel workflow system (e.g., Map-reduce, Dryad), and a declarative scripting language (e.g., Pig Latin, Hive). These technologies are driven primarily by the needs of large Internet companies like Google, Microsoft, and Yahoo!, but are also finding applications in the sciences, journalism, and other domains. This tutorial gives an overview of web-scale data management technologies, with special focus on open-source offerings.
Networked Systems for Developing Regions
Lakshminarayanan Subramanian (New York University)
This tutorial will highlight many of the interesting systems and networking research challenges within the Information and Communication Technologies for Development (ICTD) space. This is a space that is getting an increasing amount of interest within the research community (NSDR, WiNSDR workshops at SIGCOMM, SOSP and Mobicom) and there is potential for having significant impact in this space and also directly affecting the lives of many in developing regions. The decreasing cost and increasing access to information and communication technologies (e.g., mobile phones) are rapidly enabling access to new services and markets for previously disconnected populations. However, due to a variety of factors, including cost, literacy, education, and organizational capacity, conventional approaches to technology design and implementation are often not relevant. In this tutorial, I will describe many of our experiences in identifying the right set of real-world problems in this space, defining specific research challenges, our experiences working on the ground in developing regions and the initial surface we have scratched in deploying usable and potentially useful systems on the ground. This tutorial will specifically highlight new research opportunities in four broad areas: (a) Low-cost network connectivity solutions; (b) Extending the Web to developing regions; (c) Next generation mobile services and applications; (d) Usable security and trust management issues in developing regions.
Storage Virtualization and High-Performance I/O: Storage I/O Stack Design and Implementation
Manolis Marazakis, Angelos Bilas, Michail Flouris (ICS-FORTH)
I/O performance is a pressing requirement for applications running not only on large-scale server deployments but also on small- to medium-scale systems. At the same time, additional features in the I/O path, such as data protection, versioning, and space optimization are becoming increasingly relevant for smaller-scale server deployments as well. Commodity components, including storage controllers, hard disks and SSDs, and multicore host CPUs, already exhibit nominal performance characteristics that make them a viable choice for building cost-effective, high-performance I/O servers. In this tutorial, we identify the key challenges in implementing an efficient I/O path between applications running on a host machine and the storage devices, through an I/O controller. We present a detailed walk-through of a prototype I/O stack consisting of host-side drivers and Linux-based firmware running on a representative SoC controller. We examine various design trade-offs using microbechmarks and realistic I/O workloads and place issues in context by comparing the performance of our prototype with a state-of-the-art commercial I/O controller. Finally, we discuss several open issues, such as options for dividing functionality between the host and the I/O controller, especially, in anticipation of the wide availability of systems with multiple processing cores and specialized hardware resources.
Web 2.0 Applications
Armando Fox (UC Berkeley)
Web 2.0 applications are a prominent part of today's deployed systems. More complex workloads, new deployment options such as cloud computing, and new research directions such as scalable non-relational storage make this domain a rich one for both researchers and practitioners. However, the rate at which new tools, frameworks, and development methodologies are appearing is formidable. The goal of this tutorial is to bring together the intellectual vocabulary necessary to discourse intelligently about research and practice in all areas related to Web 2.0 application development, assessment, deployment, and operations, including the roles of related research areas such as machine learning.
Grid and Cloud Computing with XtreemOS
Corina Stratan (Vrije Universiteit Amsterdam), Massimo Coppola (Universita di Pisa and ISTI-CNR), Guillaume Pierre (Vrije Universiteit Amsterdam)
Large scale distributed systems like grids and clouds provide means for executing complex scientific and business applications. However, they often involve installing and interacting with several layers of middleware, a difficult task for inexperienced users. This tutorial introduces XtreemOS, a Linux-based operating system that provides grid and cloud computing functionalities. XtreemOS aims to give users the illusion of working with a traditional computer while removing the burden of complex resource management issues of typical distributed environments. Existing POSIX and Grid applications are transparently handled, automatically exploiting container and virtualization support when needed. After introducing the general concepts of the domain, the tutorial will present the XtreemOS functionalities and will demonstrate how they can be used. We shall focus on a few core XtreemOS services for grids: virtual organization management (building and operating dynamic virtual organizations), application execution management (providing scalable resource discovery and job scheduling for distributed interactive applications), and data management (accessing and storing data in XtreemFS, a distributed POSIX-like file system).