NIMO is a system for proactively and automatically creating end-to-end application performance models to enable informed assignment of resources in networked utilities---distributed collection of compute and storage resources. Self-managing systems must manage the performance goals of a system, e.g., meet SLAs, optimize application performance, and maximize resource utilization in an automated fashion. To do so, a system must understand the impact of all the relevant factors on application performance such as: (i) the workload, e.g., the arrival patterns in an Internet web-service such as Amazon; (ii) the resources assigned to the application, e.g., the amount of CPU, memory, storage, and network resources assigned to the application---the provisioning (how much) as well as the placement (where) of these resources; and (iii) the data that it processes, e.g., the dataset size of a scientific application. A self-managing system must capture the impact of all the relevant factors that affect application performance in an automated fashion.
NIMO enables automated management of system performance goals. It has the following objectives.
NIMO's overall architecture consists of: (i) a scheduler that enumerates, selects, and requests resource assignments (e.g., CPU, memory, and network) for applications from the utility resource infrastructure (e.g., from Shirako); (ii) a modeling engine that consists of an application profiler, a resource profiler, and a data profiler that learns performance models for applications; and (iii) a workbench where NIMO conducts active (or proactive) application runs to automatically collect samples for learning performance models. Active learning with acceleration seeks to reduce the time before a reasonably-accurate model is available.
We now summarize the components of NIMO in the context of a
computational-science workflow G (See the VLDB paper for
NIMO's scheduler is responsible for generating and executing an effective plan for a given workflow G. The scheduler enumerates candidate plans for G, estimates the performance of each plan, and chooses the execution plan with the best performance. A plan P for workflow G is an execution strategy that specifies a resource assignment for each task in G. In addition to the tasks in G, G may also interpose additional tasks for staging data between each pair of tasks in G. For example, a staging task Gij between tasks Gi and Gj in the workflow DAG, copies the parts of Gj's input dataset produced by Gi from Gi's storage resource to that of Gj.
The scheduler uses a performance model M(G, I, R) to estimate the performance of G with input dataset I on a resource assignment R. NIMO builds profiles of resources and frequently executed applications by analyzing instrumentation data gathered from previous runs using common and noninvasive tools (e.g., sar, tcpdump, and nfsdump). A performance model M for an application G predicts the performance of a plan for G given three inputs: (i) G's application profile, (ii) resource profiles of resources assigned to the plan, and (iii) data profile of the input dataset.
Intuitively, the application profile captures how an
application uses the input data set and the resources
assigned to it. Resource profiles specify attributes that
characterize the function and power of those resources in an
application-independent way. For example, a resource
profile might represent a compute server with a fixed
number of CPUs defined by attributes such as clock rate and
cache sizes, with an attached memory of a given size.
Similarly, storage resources can be approximated by
attributes such as capacity, spindle count, seek time, and
transfer speed. The data profile comprises the data
characteristics of G's input dataset, e.g., the input data
size. The profiles are described in our ICAC paper.
Instrumentation data is collected during a run, then
aggregated to generate a sample data point as soon as
the run completes. In keeping with NIMO's objective of
being noninvasive, the collection of instrumentation data
requires no changes to the workflow or the underlying
system. Instead, NIMO relies only on high-level metrics
collected by commonly-available monitoring tools.