Friday, July 22, 2011

Organizing From Silos To Services

I"ll let you in on a secret I've known for a while.

When I speak to IT leadership, the #1 topic that they're interested in -- by far -- is the organizational changes that result from moving to an IT-as-a-service model, e.g. "cloud".

Slide9 They understand that they need to move in that direction.  They understand it's a journey, and not an event.

But -- at the end of the day -- IT organizations are comprised of people and -- if you've ever led an organization -- it all comes down to the people: what skills, what roles and what structure.
And meaningful organizational change is a daunting task for any leader.

I've written before about the new skills and new roles in next-gen IT organizations, and I've given you an update on our internal progress (EMC IT) as we move our non-inconsequential IT function in that direction.

Today, I'd like to give you progress report on the structure, role and measurements of our recent 
Private Cloud Infrastructure Group within EMC IT.   To me, it looks like an organizational pattern we'll much seeing more frequently before too long.

If you're looking at this from somewhere in your IT organization, and thinking "gee, this doesn't apply to me", I'd encourage you to do yourself a favor and perhapsshare it with someone a bit higher up in the organization?

My guess is that they might find it interesting :)

From IT Silos To IT Services
There are many ways to compare and contrast traditional technology-and-project-centric IT organizations and newer IT-as-a-service ones.  For me, the most descriptive way to describe the transition is "from silos to services".

Just to be absolutely clear, by "silos" I'm referring to a predominance of specialized technology groups, e.g. Windows team, Linux team, VMware team, SAN team, NAS team, backup team, BC/DR team, security team, networking team, etc. etc. with extremely weak "connective tissue" between the disciplines.

Slide1 EMC's VP of IT Infrastructure and Services -- Jon Peirce (great guy!) -- has a very illustrative slide that looks at a strikingly similar before-and-after transformation that happened in the manufacturing industry, which -- interestingly enough -- was exactly where he started his career.

Consider the picture on the left of old-school manufacturing.

Lots of excess materials stacked everywhere.  People doing individualized and highly-specialized roles.   Not a whole lot of thought given to automation, process reengineering and the like.

Don't smirk too much -- our current IT environments aren't all that different.  For a profession that's  supposed to be proficient at technology, we often use it in very inefficient ways.
Now consider the picture on the right of modern manufacturing.

Slide2 Automation as the default.  
No people -- anywhere -- unless there's a problem.  
No waste.  

Completely optimized and matured processes.  
Things are measured and monitored vs. "managed" in a traditional sense.

Take a close look, please.  I think it's a decent proxy for the before-and-after that IT is going through.

If manufacturing -  or telecommunications or logistics or energy distribution or any other darn industry -- can make this sort of seismic transition, then certainly the lumbering and balkanized IT industry can do the same.

At least I hope so.

Key Roles -- Before and After
Slide3 I swiped this slide from KK, our lead architect within EMC IT, and -- if we had an IT CTO -- well, he'd be it.

I thought it did a good job of capturing some of the key functional transitions that were at play here.
For starters, consider the "design and architecture" role.

Historically, this has been a project-oriented role.
Each new application or project got its own design and/or its own architecture.  Maybe there was re-use of similar component technologies, and maybe some of the design patterns were roughly the same -- but the key point is that they did their job assuming that each application environment was designed to be implemented and run as separate entities, and not based on shared services.

This is in sharp contrast to the new version of the role, where the goal is design and architect a single multi-tenant environment that can be shared by as many applications (projects?) as possible.
Still a need for design and architecture skills, except they're building a small number of big things to shared vs. a large number of smaller things that aren't designed to be shared.

Next, consider the "build and operate" roles.
Slide4 Historically, the "build" role been acquiring, assembling and configuring the required components, and provisioning them to be used by other parts of the organization.

The "operate" role has been mostly monitoring, with a healthy dose of break-fix when something isn't working.

Keep in mind, this expertise is usually spread across a very wide landscape of different application/infrastructure combinations (one per app!) making repeatability and automation difficult.
In the new world, the roles are still important.

"Build" is more like "provisioning of services when needed" from the shared pool vs. physical assembly.  "Operate" has shifted to monitoring the processes vs. monitoring the individual components of the environments.

Perhaps the most significant change -- at least to me -- is in the front-end of the process, termed "product and service management" function here.

Slide5 Historically, these have been the people who (a) take new requests for resources and services and find out what needs to be done, and (b) generally take the lead when things break and need some deeper investigation.

In this new model, they're more like mini-entrepreneurs: they "own" their service: the definition of the service and its composition, consumption costs associated with the service, publicizing and promoting the service (whether inside of IT or outside), monitoring service delivery levels, and -- ultimately -- figuring out which services need to be retired (due to lack of demand) or new services are needed.

As Adam Wagner (one of the people in EMC IT who works in this group and is living the dream, so to speak) explains the role of the new services manager:  "It's just like a retailer with ten things on the shelf.  If five things sell and five don't, you go get more of the five that sell, and figure out how to replace the five that aren't selling with five that do".

The New Organizational Model

Take all of this, and bake it into an organizational model, and you get something that usually looks like a three-part stack.
Slide6 The services group is "known" by its interface to the outside world: a published list of services, with service managers behind each and every one of them.

That services group is then supported by a platforms group that is responsible for designing, building and operating the shared platform behind those services.

Behind that, there's a foundational technologies group with the required deep expertise in particular disciplines as needed: servers, virtualization, storage, networking, security technology, et. al.

Although this specific example is for our Private Cloud Infrastructure Group (or PCIG for short), the same design pattern is being applied to other IT functions, e.g. applications, user experience, data services, etc.

The same three-part model is familiar in each functional instantiation: exposed services from that group, a "platform" that might incorporate services published and managed by other IT groups (e.g. infrastructure), and whatever foundational technologies are unique to that functional area, (e.g. middleware).

It's an important point, so I want to be clear -- the vast majority of published and consumed services using this model are entirely consumed by other internal EMC IT groups vs. directly consumed by non-IT users.  Sure, there are services that are directly consumed by users (e.g. the Cloud 9 self-service infrastructure), but that's not the goal of each and every service.

Key Interactions To Note
The "value chain" if you will, is driven by the services manager(s).

He or she has an eye on how the services are being delivered and consumed, costs associated, shifts in demand away from existing services and towards new services -- like any "owner" would think about things.

The "supplier" is the platforms group.  The service manager is constantly pushing the platform group to do more, do it faster, do it better, do new things.  The platform group, in turn, is motivated to cost-reduce service delivery from the platform, standardize and automate things as much as possible, and so on.  Put differently, the platform group "sells" their capability to the services manager.
The platform group, in turn, relies heavily on the foundation technologies group to be out there looking for cool new technologies that help the platforms group do their job better: newer hardware, newer software capabilities, etc.

The same sort of end-to-end value chain is invoked when there's a problem or issue.  Service manager says "we've got a problem", platform manager investigates, and calls in the foundational technology specialists if needed.

Slide7 More importantly, we're seeing a wonderful flow with "requirements" for the shared services coming down from above (and percolating into the other layers) as well as a steady flow of innovations and enhancements generally coming from the other direction.

All of the interfaces, roles and responsibilities are relatively clear -- at least at a high level.
Just like any supply-chain delivery model :)

End-To-End Supporting Functions
If the "core" of the IT service delivery model is multiple instantiations of this three-part model (services, platform, foundational tech), it's worthwhile to point to a few disciplines that are clearly *outside* of this framework, and serve to support the whole vs. pieces.  IT Finance and HR are obvious examples, as are the Global Operations Center and, of course, the Help Desk.

Perhaps the most important (and interesting) new component of the services-oriented stack is the new Solutions Desk.  Think of it as a front desk for all the other front desks.

The Importance Of The Solutions Desk "Clearinghouse"
Imagine I'm a business user, and I'd like 300GB of capacity for some reason.  I can get in contact with a Solutions Consultant, and share my request.

The answer that's likely to come back is that if I'm willing to accept 250GB at moderate performance and once-a-day backup with four-hour restore, there's a standard service that I can click on the portal, no questions asked.  Immediate and instant gratification.

However, if I'm insistent that I really *DO* need 300GB, and 24 hour RPO / 4 hour RTO isn't good enough, and performance matters, there's a slightly different process for approval and provisioning that's measured in a few days/weeks vs. a few minutes.

Slide8 Of course, the special "service" is still carved from the same shared platform, using the same processes, etc.  It's just not offered as a standard service for easy consumption.

You'd be surprised how many people would take the 250 GB "standard" option to get what they need right now vs. later.

That back-and-forth interaction turns out to be really important.

First, the consumer of this service (presumably me in this example) often doesn't know what the options are -- someone needs to explain them to me -- at least, the first time around.

Second, the service manager who's defined and managing the service is highly motivated to have as much as possible come through his or her "standard" service.

If too many "specials" come through that look similar, that's a strong indication that maybe a new service needs to be created.  The notion of a retailer with services on the shelf is reasonably accurate here.

Going Farther
The concept is turning out to be very extensible in practice.  The individuals who staff it really aren't architects in the classical sense; they're more consultants.

They know what's in the service catalog, and they know what's involved (cost and efforts) in creating new services.  Like any good consultant, they're highly motivated to sell what's on the truck, and do as little customization as possible.

And, of course, for the very big or the very unusual, the process shifts back to a more traditional requirements and planning approach, but -- still -- there's a high proportion of the standard services offerings that comprise the eventual "solution".

An interesting use case arises around remote locations.  Given that we're EMC, and we operate our various business functions around the globe. this comes up frequently.

The services team has come up with two broad flavors of offerings.  It turns out that in many situations, not much IT footprint is needed locally.  Between VDI and WAN acceleration, a "dumb" footprint in the location is becoming more frequent.  Even if there's a server footprint required, there's a standard set of service choices to back it up, monitor it, secure it, etc.

When some lucky EMC employee lands in a new location to set up shop on behalf of the company, they talk to a service consultant who knows the standard remote office offerings, their pros and cons, costs, and makes it very easy indeed for the local requester to simply get things done and move on to the real job at hand.

The same line of thinking has been extended to providing a limited set of standard user experiences (we don't use the term desktop images anymore), whether that be on a classical laptop, or -- more often -- on a mobile device of one kind or another.

For example, if you've requested "iPhone support", there's a standard set of services you're going to want: email, web access to internal applications, etc.  Make it a packaged "service", and everyone wins.

Common Questions
I've now been in more than a few situations where we've put this line of thinking in front of a senior IT team, and there are some common questions that come out.  I thought I'd share the more common questions, and the answers.

Where did you start?  Bottoms up, or top down?
Well, if you start top down, you can't make much progress.  After all, every request of the IT organization is inherently different and unique.  Conversely, if you start from the bottom up, you're simply documenting what you already do at a very granular level.

The answer ended up being "middle out" -- create logical groupings of services (e.g. infrastructure) and start there.  Also, for very logical reasons, we tackled the infrastructure function first, hence the title of the group "Private Cloud Infrastructure Group".

We intend to apply the same design pattern to other service-delivery parts of IT.

Do you do chargeback?  How was this funded?
Every group understands their cost-to-serve for the service to about the 80-90% level.  Sure, there are some allocated costs in everything, and it's not as precise as we'd like it to be.

However, there's enough awareness of costs to have an intelligent discussion with someone who feels they need zero-data-loss availability vs. daily backups.  There is a chargeback model in some areas, but not as many as we'd like.  The belief is that understanding and exposing true costs (e.g. "showback") is a necessary first step towards chargeback models.

The hard part was breaking away from the per-project-funded-by-the-business model that defines so much of IT activity.  We basically had to justify the upfront investment in the creating of shared, pooled services -- and the people and processes to deliver them -- on the promise that they'd ultimately be more cost-effective (not to mention more agile) for the business.

We got that result.

How did you decide on the first round of services?
It was an educated approximation.  Some of the services turned out to be very popular, others weren't.  A lot of requests were ending up to be pretty close to the standard services, which allowed up to tune up the offerings a bit.

Again, it's that continual feedback loop between producer and consumer that results in services that people want.

I'm interesting in your self-service environment, you called Cloud 9.  How did you set pricing, and how do you govern its consumption?
Anyone can get a decent amount of resources on Cloud 9, but only for 90 days -- no exceptions.  That 90 day limit turns out to be very effective in attracting the "right" use cases, and discouraging the "wrong" ones.  If an application or use case needs to be around more than 90 days at the outset, it's probably a different discussion.

It should be pointed out that self-service Cloud 9 resources are made available from the same pool of resources, services and processes that we use for more demanding parts of our business.  It's nothing more than a different consumption model on top of a standard capability.

After much thought, we decided to use Amazon's pricing as a proxy for pricing our internal services.  We thought it better to perhaps slightly subsidize initial consumption so that we could get visibility into what people were doing, provide some basic protection and security, and -- most importantly -- make it very easy to move surviving applications into something more appropriate at the end of 90 days, if it was needed.

How many of your new projects are being consumed off of the shared services catalog, vs. project-specific infrastructure and processes?
A significant majority of our new "projects" (we really don't think in terms of projects in the traditional way anymore, but our clients do) consume the majority of their needs off of either a standard service, or a slightly modified variation.  We're also doing a lot of work to package elemental services together into easy-to-consume "bundles", e.g. compute, storage, data protection, etc. that's roughly scaled together.

Any optimization we might get by individually fine tuning the components is more than outweighed by the ease of provisioning and consumption.  Keep in mind, many of the physical resources are actually virtualized ones: server, thin provisioned disk, etc.

The role of the service managers is to make sure we have a minimum number of the "right" services to cover the majority of the incoming requirements without too much customization or modification.
It's a learning process, but it's going pretty quickly.

How do you keep people from over-specifying or under-specifying their requirements?
That's where the role of the solutions consultant comes in.  If it can't be resolved at that level, there's the usual escalation process between business leaders, just as you'd see with any organization requesting services from another.

The majority of the time, though, things can be resolved at an operational level.  It's mostly a conversation around real requirements and tradeoffs.

Is This New?
No, not if you look outside of traditional IT groups.  You'll see many of these same nservice-oriented IT organizational patterns in the newer IT service providers we're working with.
After all, if you're an IT service provider -- and that's your business -- this is precisely the sort of structure you'd need to effectively deliver IT services that people want.
Or, put differently, many IT organizations are becoming internal service providers -- so they'll have to organize like them, won't they?
Food for thought.

By: Chuck Hollis