The Thorny Challenge Of Elasticity

Listen to people describe their idealized cloud, and sometimes you'll hear them say "infinitely elastic".

Most IT infrastructure people cringe when they hear that sort of talk, and justifiably so. At some point, all infrastructure becomes physical no matter how abstracted or virtualized you make it.
Not only that, being able to meaningfully grow and shrink infrastructure resources dynamically in response to changing application demand is a very difficult problem -- especially for general purpose use cases.
The application environment has to be able to request and release resources. The supporting infrastructure has to work in concert with the application stack, of course.
And then there's the head-hurting topic of policy definition and management: under what circumstances should appplication "A" get more resources, and application "B", "C" and "D" get less?
How do you define what types of internal or external infrastructure are allowed candidates for supporting dynamic application expansion?
And how do you communicate back to the business what real-time tradeoffs are being made, and showing them that everything is compliant?
I don't think anyone has complete answers for most use cases, but some interesting work in this area is starting to manifest itself.
In particular, the VCE crew is presenting their current progress on a solution dubbed "Application Lifecycle Manager", or ALP for short. And it's an interesting discussion, to be sure.

The Real Action At VMworld Is The Technology Previews Most people who are new to VMworld tend to focus on the big announcements: products, alliances, services, etc. While those are interesting, I am inevitably drawn in to all the technology previews you can see at the show -- it gives you a good sense of what will be productized -- usually by the next VMworld!

EMC is doing lots of these cool technology previews at the event. One big dose will happen at Pat Gelsinger's Supersession (#SUP1006) Tuesday morning at 10:00 AM -- not to be missed. Another one will be Chad's "kitchen sink" -- "Next Gen Storage and Backup For Your Cloud" (#SPO3977) Tuesday at 2:30.
Trust me, it wouldn't be a Chad show unless there were some off-the-hook technology previews :)
VCE is doing their fair share as well. One of their sessions describes their work on using Vblocks to create new forms of automated elasiticity for larger VMware deployments.
And, despite their excellent progress, it gives you an appreciation of just how much work lies ahead ...
The Problem In A Nutshell
Larger applications are inherently dynamic in their resource requirements. In addition to moment-to-moment swings in demand, there's also the lifecycle aspect: from development to test and ending with decommissioning.
Virtualized infrastructure resources (compute, memory, storage, network, etc.) are getting inherently more dynamic as well. Marrying dynamic application resource usage with virtualized resources creates the tantalizing potential of a world that delivers better service levels while using far less infrastructure resource.
Traditional applications were designed for the physical world; they aren't designed to scale dynamically -- all we can do is give them static allocations of resources, and try to do a little hidden magic in the background (e.g. automatically tier storage, balance server workloads, etc.)
But the newer application environments (think SpringSource, vFabric, the new Data Director, etc.) *can* express their infrastructure requirements to virtualized infrastructure.
And that makes the potential for orchestrating application resource elasticity more than just a pipe dream.
Why VCE?

One of the advantages of working on a Vblock is that it's a known, popular and standardized environment: APIs, infrastructure management with UIM, and so on.
I can't prove it, but I wouldn't be surprised if there were far more clouds running on Vblocks today than any of the alternatives. A lot more.
That standardization property enables anyone working on a Vblock to have far more cycles to tackle the "complexity above" vs. the inherent complexity below.
The VCE engineering team is using this property to do all sorts of interesting productization and solution engineering. Much of it is at the cutting edge of deployable enterprise cloud technology, and this effort is no exception.
Not only that, the VCE folks are deeply embedded in all sorts of real-world cloud scenarios with customers and partners these days -- an essential component for innovating new solutions to newer challenges.
Imagine A Use Case

A good starting point is to consider a modern three-tier web application: an application layer (perhaps using tcServer), a transactional messaging layer or perhaps an in-memory low-latency data grid (perhaps using GemFire or RabbitMQ) and, of course, a database layer (maybe using the new vFabric Data Director).
One application, three pools of infrastructure resources to dynamically optimize.
Web applications are always a good example for this sort of discussion; not only are they inherently dynamic in their requirements, they tend to be built from newer components.
To make this web app "infrastructure elastic", you'd need some sort of mechanism to define the components (what they were, their initial resource allocations, their targeted performance parameters, etc.) but also the macro-application environment.
You'd want to not only expose the pool of available resources and some mechanisms for taking advantage of them, but also provide some guidance and constraints about exactly how and under what circumstances you'd initiate a resource expansion or contraction.
And understanding those broad requirements is helpful to digging in to some of the details around ALP.
Deconstructing the ALP

The ALP is built on a few simple notions.
One is the concept of a "blueprint" which states desired infrastructure policy for both individual application components (maybe GemFire in this example), as well as macro application policies
(e.g. don't let transaction time get too slow!).
Blueprints are used to drive workflows, monitor results and evaluate remediation scenarios where desired state varies from actual state.
The "blueprint orchestrator" is responsible for doing most of the heavy lifting. It can drive workflows through vCenter Orchestrator (for example), monitor application-level performance and associated resource consumption (using Hyperic in this example) and report back higher-level metrics, using the vCenter Service Manager in this example.
The interesting part for me will be the "remediator" block, shown above.

Although this solution has a limited palette of responses today (e.g. clone application, expand cluster), it's not hard to imagine a greatly expanded repertoire of both remediation activities, as well as policy constraints on those activities.
Policy remediations could run the gamut from re-tiering storage service levels (using something like FAST) all the way through selectively bursting portions of candidate workloads to alternate resource pools (maybe using VPLEX?); either to directly provide more performance to the application in question, or perhaps to free up additional resources by relocating less-important workloads.
Policy constraints could include the usual resource parameters (available infrastructure, latency, etc.) or risk-avoidance concerns (regulatory compliance, minimum geographical separation, must use isolated redundant infrastructure, and so on).
Indeed, it's not hard to see that -- before long -- most of the "secret sauce" will move to this remediator component. It's the one place where all the potential infrastructure resource responses will be exposed -- and key decisions made!
It's also the one place where all the policy constraints on resource reallocation will be captured. And it's where the "smarts" will ultimately live for balancing and optimizing competing demands.
Are We There Yet?
Achieving some meaningful measure of automated and dynamic resource elasticity is one of the next "holy grails" for IT architects everywhere. And, no, we're not there yet -- although we can see it from here.

Part of the challenge will, of course, maturing the underlying technology integrations. Good progress here, but more needs to be done.
However, don't let technology mask the real challenge here -- and that's coming up with agreed policies for allocating shared and pooled infrastructure.
Policy maturity always lags technology enablement by a great deal; and there's no reason to expect that this topic won't be any different in that regard.
Indeed, I'd full expect a closed-loop feedback process to eventually be used here: here's the policy we initially set, here's how well it did, and here's how we're going to tweak it going forward to do even better. Lather, rinse, repeat.
It's nice to see the progress the VCE team has made with their current solution -- it's quite compelling in its own right.
But there's still a long road ahead to be travelled ...

By: Chuck Hollis

Friday, September 2, 2011

The Thorny Challenge Of Elasticity