Tuesday, August 30, 2011

When Virtualization Changed Databases

Inexorably, virtualization is changing how we think about every aspect of IT.
It’s already vastly changed how we think about physical IT infrastructure: servers, storage and network.

From static allocations to dynamic pools of resources – without VMware’s popularity, we really wouldn’t be talking much these days about cloud and transforming to IT to a service.
But what about databases and data management?  

Clearly, many of these technologies haven’t made the transition to the new model.  At best, we've only been able to encapsulate and containerize legacy databases using virtual infrastructure vs. revisiting how databases might intelligently work in this new model.

How do we get databases to intelligently use dynamic resources?  How do we deliver database as a service?  And how do we make databases as easy to consume as other forms of infrastructure?
Today, VMware announced their first foray into this new (and important) realm by announcing the new vFabric Data Director.

And many of us think this is very big news indeed.

Today’s Databases Were Designed For A Physical World
The more you look, the more you realize that the vast majority of databases in use today were designed to operate in the physical world, and not the virtual one.  And that’s far from ideal.
One immediate example is the lack of dynamic resource utilization.  All databases use precious resources: memory, CPU and storage.  Even though most database workloads are extremely variable, the vast majority of databases expect a big, fat over-allocation of these resources – they’re not smart enough to request more when they need it, and give it back when they don’t.

Any “virtualized” database should be smart about the environment it’s running in – smart enough to request and release resources as circumstances change.

Another example is the absence of integrated provisioning using standard templates and workflows.
Today, provisioning infrastructure can be dead simple on something like a Vblock: the administrator defines standard infrastructure services and templates, and creates or changes resource instances using a very high level of automation.
This isn’t true for provisioning new database instances – it’s still mostly a manual process that requires hands-on work by an important and scarce resource – the database administrator.   Any “virtualized” database should be as easy to provision as a virtual machine.

A final example might be the need for self-service portals.  In the infrastructure world, using products like vCloud Director, it’s easy for administrators to expose resources to anyone at all: other IT groups, even end users.  A simple portal explains your choices, collects your details, and gives you what you want: typically with little or no human intervention.

More importantly, the system administrator is additionally armed with powerful tools that help manage the pool of resources: allocation, service levels, and so on.  Again, if we’re talking fully virtualized databases, the same generic model should apply.

Consuming a new instance of a database should be as conceptually simple as consuming a virtual machine.  And managing pools of databases ought to be as straightforward as managing pools of virtual machines.

Ideally, virtualized databases would support dynamic resource usage, integrated provisioning and self-service pooled consumption models.  But, outside of a few exceptions, that’s not the case today.

Dynamic Resource Usage
One of the first things that leaps out of the announcement is the virtual enhancements the VMware team has made to the popular PostgreSQL database.  At the outset, a “balloon driver” is able to request and release memory based on changing circumstances.  The same sort of capability seems to be there for GemFire.  The announcement is pretty clear: more options coming over time.
Extending this idea a bit, it would be logical to assume that – eventually – this concept could include to storage performance (perhaps using a variety of mechanisms that are extensions of VAAI: linked clones, Storage vMotion and/or storage service pools) --  creating and releasing additional database storage instances (or perhaps relocating them to different storage tiers) thereby increasing or decreasing performance.  The same expand/contract potential exists for dynamically using virtual or physical cores.

I have observed that massive over-provisioning seems to be the accepted norm in the database world: overprovisioning on memory, overprovisioning on storage performance, and overprovisioning on CPU.

Wouldn’t it be wonderful if databases were smart enough to take what they needed to meet changing service level requirements, and no more?  If they had the same elastic properties as other portions of the infrastructure?

That's the goal here.

Integrated Provisioning
Everyone who has had the pleasure of doing physical server provisioning knows all of the sequential, labor-intensive and occasionally error-prone steps involved.

Indeed, anyone who’s working in a fully integrated virtualized environment (such as a Vblock using UIM) probably doesn’t want to go back to the old way of doing things anytime soon.
Indeed, in these new environments, valuable system administrator time is now spent on more worthwhile, higher-order tasks vs. the drudgery of before.

The database administrator in many regards is no different – their time is important as well, and they could greatly benefit from the same sorts of capabilities: far less time doing sequential, labor-intensive and occasionally error-prone grunt work; and far more time tackling the more interesting challenges and opportunities.

I haven’t had the opportunity to look at the new vFabric Data Director in gory detail, but from what I can see from the overviews, there appear to be the same sort of templates and automated workflow concepts you see in virtualized server provisioning workflows.

Ease Of Consumption
 Today’s pooled and virtualized environments are designed to be easy to consume -- that's what the whole "as a service" thing is about.  

Popular request types can be easily exposed on a portal, and people can get what they need with an absolute minimum of human intervention.  Behind that, resource administrators now have powerful tools that help manage and control the pooled environment in aggregate.No such luck for most of the database world today.

Getting (or changing) a database instance almost always involves tracking down a database administrator and asking them to do something on your behalf.  And, while database administrators have the tools to manage individual database instances, there’s not much out there that addresses their need to manage and control hundreds or thousands of database instances being delivered as a service.
That changes with the new vFabric Data Director.

Digging Deeper
I think once the novelty wears off, most IT thinkers will realize a few simple truths.
First, there’s a big and obvious problem to be solved here.  

I routinely meet customers who have hundreds and occasionally thousands of database instances swirling around their environment.  Telling people not to create new databases just means they’ll go elsewhere.  Not good.

And no one has the stomach for a massive “gee, let’s go consolidate a bunch of existing databases into a single humungous instance” project.  At least, not twice :)

The only viable approach for many?  Use virtualization techniques to lessen resource usage, control service delivery and manage the pool of database instances more efficiently.  Just like you do server instances.

Second, while the technology is capable of supporting demanding workloads, that’s not where it’s going to be used first.  Just as with server virtualization, the most appealing initial target will be non-critical database workloads vs. the big hairy stuff.   Make no mistake, that too will come -- in time.
Third, the underlying hybrid cloud model is extremely relevant here.  If you think for a moment about external database and PaaS offeirngs (e.g. AWS, Azure, et. al.) there’s only one consumption option for each: their particular service.  Easy to get in to, somewhat more difficult to get out of …
Compare and contrast this with the vFabric Data Director approach where you’re free to set it up internally, use any number of compatible external service providers, or any particular combination that suits you.

Fourth, I’ve met more than a few people that are looking for a different industry model to deliver database services to the business vs. buying more of what they already have.  Here's a model that's worthy of serious consideration.

The Journey Begins
When server virtualization first become popular, the IT infrastructure world quickly segmented in to two distinct camps: those that saw the potential -- and committed to accelerating – a key industry transition, and of course those that valiantly fought the inevitable changes.

Indeed, we still talk about “server huggers” – although there are a lot less of them around these days :)
I’d expect the same thing to happen in the database world: there will be those database architects and admins that “get it”, and will passionately commit to accelerating the transition.  And there will certainly be many who will find reason after reason to keep doing things as they’ve been done before.
I expect that – before long – we’ll be using terms like “database hugger” to describe this mindset :)
If you live in this world – or are responsible for it – you might want to think of the new vFabric Data Director as a sort of gauntlet being thrown down: new options are now available to significantly change the way you do things – transforming databases from physical entities to fully virtualized ones.

And, as before, you’ll expect to see the same sort of dramatic and meaningful impacts that occurred when server environments underwent a similar transition.

Are you up for the journey?

By: Chuck Hollis