Large computing systems, especially those running time-varying workloads, are
difficult to keep tuned. Tuning in these environments means dealing with thousands of knobs at each
horizontal level of a computing system, for example a database, storage or network system.
Current work on autonomic computing often
leads to systems that perform only slightly better, and sometimes worse,
than systems that are tuned by skilled administrators.
Clockwork identifies a new way of thinking about autonomic tuning; that is,
predictive autonomicity, based on forward feedback control. A general method
for constructing predictive autonomic systems is proposed that is based on
statistical modeling, tracking and forecasting techniques that are borrowed from
econometrics. Systems employing this method detect, and subsequently forecast,
cyclic variations in load; estimate the impact on future performance;
and use these data to self-tune themselves dynamically in anticipation of need.
At Almaden Research Center, we have built a prototype, Network Attached Storage (NAS) system
that demonstrates the method's feasibility. The prototype gathers key performance
measurements and demonstrates the method's practicality
Clockwork is an autonomic storage management system that compares
a statistical prediction of storage performance against real storage data
from both Storage Area Networks (SANs) and Network Attached Storage (NAS).
The procedure is self-monitoring, self-adjusting and self-correcting
because it continuously performs statistical evaluations of new storage data.
The breakthrough theme in Clockwork is the application of statistics to autonomic computing.
A statistical model of the system is constructed as follows:
- Select a small set of simple measurements of system demand and forecast them.
- Model the impact of controllable parameters on demand.
- Enter policies, such as system reliability criteria, as constraints or objectives.
- Drive the system through time using forecasts of demand.
The autonomic NAS system runs without
manual intervention and monitors exceptions
as shown in the diagram below.
Applying Clockwork to managing Network Attached Storage (NAS):
After the statistical model is constructed, a simple policy is defined, such as "Maintain NFS
response time below 5 milliseconds on all nodes". The NAS appliances continually collect data
on the following:
- Actual number of requests
- Actual response time
- Value of system-controllable parameters (number and identify of nodes serving requests)
Request data are used to generate forecasts of system demand without specific details defining the workload.
The control and request data are used to estimate the impact of controls on the objective.
Clockwork adjusts the controls to meet the goal and runs automatically without manual intervention.
|