Thursday, September 8, 2011

Storage-As-A-Service: Basic Concepts

These days, I don't spend much time talking about lofty clouds as I used to.  
Why?  For most of my IT customers, the real underlying discussion is transforming to a competitive internal service provider: the IT-as-a-service approach.  In many important aspects, "cloud" is about IT delivering competitive IT services vs. simply lumps of technology.
Not only does ITaaS make sense for the entire IT function: the "as-a-service" concept can be recursively applied to underlying IT disciplines, often with great results.  And storage is no exception.
Even if the rest of the IT organization lumbers forward using a more traditional approach; many storage professionals are thinking in terms of their idealized "storage service catalog", and moving quickly in that direction.
The payoff?  Less cost, better IT services delivered, and -- ultimately happier storage users who get what they want without having to wait for the storage team to eventually get around to it.

It's Not Exactly A New Idea I don't remember exactly when the notion of storage service catalogs first became popular at EMC -- maybe six or seven years ago?  Longer?
The problem -- at the time -- was that many of our customers were using the available technology very inefficiently -- over provisioning, over protecting, over configuring performance, etc.
Couple that with very manual and labor-intensive management practices, and it was a recipe for intense frustration -- not only within the storage team, but ultimately across IT and over to the business side.
At the time, we found strong demand for short professional services engagements to help customers analyze different requirements, recommend a target storage service catalog, help them implement it using various technologies, and -- most importantly -- work with the storage team to establish key processes and associated role that could get them out of their predicament.
As I remember things, it worked out pretty well.
What is old is new again: the conversation and the ideas seem to be coming back again in force.  It's now an increasingly frequent part of customer conversations -- either self-generated by the storage team, or in conjuction with a broader private cloud or ITaaS initiative.
Fortunately, the supporting technology to do storage-as-a-service has gotten fantasically better over the years -- back then, storage devices were rather limited in the range of services they could dleliver.
These days, the newer storage tech is capable of generating a far wider range of service levels, able to dynamically change service levels, do so at lower costs, with more powerful automation and so on.  The management tools are far better as well, as is the integration between different components.
But -- as always -- even though the technology is much better, it's no panacea.  There's always a decent amount of heavy organizational work to transition from a traditional physical storage model to a more modern storage-as-a-service approach.
No silver bullets, I'm sorry to say :)
The Basic Concepts
A storage function should ideally expose a reasonable set of standardized services to those who need them.  Those users might be another part of the IT function (e.g. server administrators, database administrators, etc.), or -- frequently -- perhaps non-IT users who just have a lot of data they want to store.
The goal is to move away from hand-carving and hand-managing individual storage requirements, and to create easy-to-access pools of standardized offerings that cover 80% of the day-to-day requirements -- without a storage administrator having to be directly involved!
The payoff for the storage team is obvious: by making the 80% of requirements dead-simple for both the storage team and the people who depend on them, storage admins are now freed up to work more on the 20% that tends to be more interesting, unique and valuable to the business.
Needless to say, you'll also get less badmouthing of the storage team :)
Any storage service in the catalog should have a visible cost associated with it -- regardless of whether you actually chargeback or not.  Think of it as your rate card.
Exposing costs helps people to make more intelligent choices.  Note: you'll still have people who want to do unreasonable things; you'll just have less of them to deal with :)
Any storage service ought to report back to the consumer a relevant amount of information: simple notifications for the non-critical services, perhaps more sophisticated reports for more sophisticated services.
Now, of course, many exceptions to your standardized services will invariable come up, but the idea is to minimize them, not eliminate them.  If you can get 80% of your storage requests to fit in a handful of standard offerings with standard processes, you win.
To achieve this, someone has to "own" the external view of the storage services -- advertise and promote them, see how people are actually using them, enhance and improve them, and generally  tuning and refining what's on the rate card to meet the majority of storage requirements.
This last bit can be harder than it looks, see "In Search Of The Missing IT Gene".
I won't be able to do the full topic justice here in this simple post, but -- hopefully -- I can share what we've learned and what we've seen other customers doing.
Primary Storage -- How Much And How Long?
Imagine someone comes to you, the storage administrator, and wants to store some data.  You'd start by asking some questions, wouldn't you?
First up would likely be "how much?" and 'how long?"
Since people rarely know exact answers to either, you'll probably want to have some predefined options, maybe something like 10, 25, 50 and 100 GB for usable capacity, and perhaps 90 days, 180 days and a full year for default retention periods.
The logic behind the standard capacity offerings is simple: it establishes a quick starting point that gives people most of what they want quickly.  If you're worried about wasting capacity, that's where virtual provisioning, compression, etc. come in.  And, of course, you need to make sure your customers know you can get them more capacity very quickly if and when they need it.
The rationale behind the standard retention period is blindingly simple: unless you specify a time period, the default will be "store this data for all eternity".  The idea here is that active storage consumers have to periodically indicate that -- yes -- the data set is still active, and still needed.
If no one is willing to put their hand up that they need the data set any more, off it goes to archiving for a while, or -- more ideally -- deleted.
How Fast And How Protected?
Storage performance basically boils down to a combination of response time (or latency) and bandwidth (or throughput).  Without getting too complicated, let's think in terms of "low bandwidth", "moderate bandwidth" and "high bandwidth" and perhaps "moderate latency" and "low latency".
A garden variety personal file share, for example, would be "low bandwidth, moderate latency".  Your average decision support database might be "medium bandwidth, moderate latency".  And so on.
No, I'm not going to get into the NAS vs. SAN debate: the world needs both.
The idea here is to avoid broadcasting precise performance specifications, e.g. millisecond response times, or Gb/sec bandwidth numbers.  Way, way too much information.
When I was younger, I remember the Sears catalog always had three choices: "good", "better" and "best".  It's all you really had to know in most situations.  
Yes, I'm that old.
Protection can be a thorny discussion, and it's easy to go overboard.  Don't be asking your customers to be specifying RAID levels, for example, or exact protection mechanisms used.
Not only should the descriptive language be painfully simple and obvious, but the same terms  should be used to describe both data and system availability if possible, e.g.
  • "business support" (we back you up every 24 hours with 4 hour restore), 
  • "business critical" (we take a snapshot of your world every 4 hours with a one hour recovery) 
  • and perhaps "mission critical" (continuous replication to a remove site with failover measured in minutes).
These are just examples, but the point is clear: the fewer performance and protection categories you can get away with, the better.
Combining The Two Buckets
I don't think we're at a good concept of "services" yet -- at least, not in our idealized world.  Let's be honest, most people have no flipping idea as to the performance or protection they'll need -- so we have to help out a bit.
If you think about combining these primitives in customer-friendly ways, you'll probably recognize that it's often the case that most file shares don't require real-time remote replication and failover.  It's also usually the case that transactionally-intensive applications can't get by with only a daily backup.
The goal is to combine performance and protection attributes into something mere mortals can understand.  You might end up with something that looks like this:
  • File Share Class 1 - base offering for generic file storage
    (low bandwidth, moderate latency, "business support" availability)
  • File Share Class 2 - if you need something special
    (moderate bandwidth, moderate latency, "business critical" availability)
  • Decision Support Class 1 -- best for running departmental reports
    (moderate bandwidth, moderate latency, "business support" availability)
  • Decision Support Class 2 -- bigger data marts and small warehouses
    (high bandwidth, moderate latency, "business critical" availability)
  • Transaction Support Class 1 -- logging requests or events
    (moderate bandwidth, moderate latency, "business critical" availability)
  • Transaction Support Class 2  -- the bigger OLTP apps
    (high bandwidth, low latency, "mission critical" availability).
Ideally, you could move a data set between, say, Class 1 and Class 2 (or back!) non-disruptively.  Broadcasting your ability to do so would help keep people from over specifying up-front, or having a place to put that oh-so-important project when it ain't so important anymore.
Yes, I know this is all rather imprecise, but -- just for a moment -- let's say you could get away with something along the lines I've outlined here.
How much of your existing day-in-day-out storage requests could be handled "good enough" with these six service offerings?  
If the answer is "quite a lot", you're starting to get an appreciation for the approach.
Let's Add Archiving
The discussion above is aimed at primary storage.  There's usually a separate independent service catalog defined around archiving requirements -- performance, cost, retention period, regulatory compliance, and so on.
Rather than make your primary storage service catalog overly complicated, I've seen better results where you ask the storage consumer for two separate decisions; primary requirements and archiving requirements.
Very often, you have to make that second decision on their behalf :)
I'll leave the construction of a sample archiving service catalog for another time -- but the same concepts apply.
And Make It Easy To Consume
I can't tell you how many IT organizations seem to go out of their way to make their services difficult to consume.
It's almost as if they've entered the IT Prevention profession.
Anything as-a-service won't work unless the services are attractive and easy to consume.
That means that there are clear and useful explanations of the services and how the supporting processes work  -- that is, without scheduling multiple meetings with the storage team!
It doesn't necessarily mean everything is self-service, but you'd be surprise how often that ends up being a good answer at the end of the day.
Make your storage services attractive and easy to consume -- especially to other IT functions -- and people will prefer your service-based offerings over other less attractive and more difficult alternatives.
Like doing it themselves :)
Behind The Scenes
Of course, we want "simple" when talking to storage consumers.  We also desperately want "simple" for the storage professionals delivering the service.
Provisioning is an excellent example.  Ideally, the majority of the storage services above should be able to be provisioned without the direct involvement of a storage administrator.  That usually means having a variety of consumption mechanisms: perhaps auto provisioning for the VMware team, or perhaps a self-service capability for authorized users.
Reporting on usage ought to be dead-simple as well.  I've seen great value where the storage team simply set up a periodic email to the average storage consumer: "here's the service offering you've signed up for, here's how much you've allocated, here's him much you've actually used, here's the implied cost and here's when your allocation expires and has to be renewed".
Again, unless you're delivering sophisticated storage services, there's little need for sophisticated reporting back to the people who are using it.
Of course, the storage team needs to report out an aggregate picture for IT management team, and you'll probably want more sophisticated reports and tools for that -- but there's rarely a need to expose that sort of gory detail to the people using your service.
My monthly phone bill is pretty simple.  And I don't need a detailed network utilization report from the phone company, thank you anyway.
Building The Storage Environment
While I'm sure that many will vigorously debate the ideal technology setup to deliver storage-as-a-service, there is one clear guiding principle: keep it as simple as possible.  Specifically, use as few technology components as possible to deliver the required services at reasonable cost.  Steer towards pre-integrated solutions vs. ones that you have to integrate and support yourself.
Over-engineering brings complexity, and complexity breaks simplicity.  Attempt to over-optimize by using too many moving pieces simply defeats what you're after.  Remember, this is supposed to be the boring and standardized part of your storage environment, the 80% solution.  If you feel the need to get fancy, focus it on the 20%, not the 80%.
Obligatory plug: in this light, you'll understand why many EMC products are designed the way they are.  FAST on EMC storage platforms allows a wide range of services to be driven off a single array with a single management interface.
Great management tools like Unisphere and Prosphere give the storage administrator a consistent way to define, create and manage storage services.
Ditto with backup products like Avamar, Data Domain and Data Protection Advisor -- not to mention replication products like RecoverPoint.
You May Need Some Help
Many more modest environments can do a decent job of standing up an internal storage service catalog without too much fuss.  They may already be doing this sort of thing today without using the exact words I'm using here.
For all of you -- congratulations!
Unfortunately, not everyone has it so easy.
We might be talking multiple data centers, or perhaps multiple IT organizations. There might be a very wide spectrum of very specific and precise requirements.  The finance model might not easily support the shared services approach.
Or perhaps the relationship between the storage team and the rest of IT might have degenerated badly, and be in need of repair.  The key processes and roles might not be adequately defined.  The supporting technology on the floor might not easily support  creating a shared and variable pool of services.
And that's just a partial list :)
Indeed, many storage environments can grow so fast that they reach a crisis point and things break badly.  What was manageable at small scale is a nightmare for all at larger scale.
And that's why we often bring in the EMC Global Services team to assess the situation, and make some key recommendations to move the ball forward.
The Larger Perspective?
I think by now that everyone has gotten the memo that -- yes -- storage capacities are growing like crazy with absolutely no end in sight.
At some point, explosive storage growth means that there's a strong incentive to change the model from a traditional physical, manual and dedicated model to a more modern service catalog orientation.
Yes, the transition described here isn't easy or simple, but -- ultimately -- it's been proven to prepare the storage team -- and the organization it serves -- for what inevitably lies ahead.