All cloud vendors offer standard, one-size-fits-all service level agreements (SLAs) that cover aspects such as high availability, performance, security, response times, support, and so on. However, is what’s written on paper actually implemented in real life? Do you know if your cloud vendor violated any agreements? If your cloud management vendor doesn’t deliver, do you really know your rights?
A Change in Awareness
A recent study by IDG found that 69% of enterprises today either have applications or infrastructure running in the cloud, and that number is growing by tens of percents every year. The IDG Enterprise Cloud Computing Study 2014 found that cloud investments have increased by almost 20% in large-scale enterprises (1,000+ employees), spending over $3M a year on average. According to the study, 24% of IT budgets will be allocated to cloud solutions next year.
Due to the fact that more and more enterprises are moving to the cloud, cloud vendors will have to start abiding by a general set of guidelines or standards for SLAs that really suit their enterprise customers. If you take a look back at the evolution of IT, in the past, a company could deliver a service without measuring any metrics for quality such as availability or response time. However, once an IT company becomes a profit and loss (P&L) organization, meaning its business units start paying for the services they receive from IT, these internal customers demand an SLA.
It’s reasonable to assume that the same pattern will repeat itself within the cloud. Among cloud vendors, HP has the most realistic SLA, which demonstrates their understanding and experience dealing with large enterprise IT. For example, HP granularly defines SLAs so that penalties are paid back to customers for service level violations, which include various metrics or objectives. Most other vendors have yet to create SLAs that properly measure service level metrics or penalize vendors for service level violations.
Most of the time, cloud vendors deliver a service in response to consumer demand. So far, large cloud vendors have experienced a demand for SLAs from large clients that require custom made SLAs. However, in discussions with SMB cloud clients with $1M annual cloud spend, it seems like they have yet to demand SLAs. Nonetheless, as more traditional and large enterprises adopt the cloud, they will want what they use to be met by tighter SLA metrics. I believe that now is the time to really develop these SLAs and make sure vendors stand by their commitments.
Start at Users’ Expectation Levels
On January 10th, this year, Verizon Cloud customers were notified that the service would undergo maintenance. It lasted 40 hours! As of today, cloud consumers don’t seem to negotiate or validate their cloud providers’ SLAs. However, that’s not going to last. Enterprise and independent software vendor (ISV) end-users don’t care about what infrastructure their online services run on. They expect application providers to take full responsibility for their applications. In order to guarantee service quality and stand up to these expectations, application vendors and enterprise IT have to push IaaS providers towards more transparency and stronger policy enforcement, providing a truly end-to-end SLA.
In his recent article in InformationWeek magazine, Charles Babcock mentioned a survey that showed that 79% of cloud customers found their SLAs “too simplistic” and 73% believed cloud providers were hiding infrastructure problems that affect workload performance.
With these findings, it’s clear that it will not only be increasingly unacceptable for cloud latecomers vendors, such as Verizon, to have major interruptions across the board, but all vendors will have to start reimbursing customers for downtime. As mentioned above, while large enterprises already enjoy custom SLAs, companies of all sizes will soon benefit, as well. Enhanced, transparent and enforced SLAs will become the order of the day for all cloud customers.
SLOs in the Cloud
A more realistic SLA is needed. One that holds vendors accountable if there are issues with their services. SLAs cover many different service level objectives (SLOs), such as uptime, the percentage of successful requests, service provisioning requests, time to recover, mean time to recover/respond (MTTR) and mean time between failures (MTBF). These are relevant for end user experience and how the cloud impacts an application. In terms of capacity, SLOs include metrics such as the number of simultaneous connections, resource capacity usage and throughput. In terms of support, SLOs ensure that adequate support is provided by means of a support representative that upholds the expected time it takes to reach a solution. Another important SLA domain is security service level objectives, such as reliability, authentication, authorization, mean time to revoke user access, and mean time to grant user access. In general, there are various aspects for SLOs that need to be measured.
The Shared Responsibility Challenge
There is a shared responsibility in terms of service level monitoring. In order to accurately develop customized service level monitoring, vendors have to provide standard metrics, which will be detailed below, and enterprises would be responsible for linking those metrics together. For example, it would be up to clients to merge metrics from Amazon Web Services, Google, HP and Akamai together in order to understand each vendor’s true level of service. This approach clarifies which vendors are responsible for existing issues.
For example, if you run a single EC2 instance on Amazon and rely on various services, such as RDS, ELB, CloudFront, CDN and a number of other services that may or may not fall under Amazon’s responsibility, how can you effectively measure it all? AWS’ basic infrastructure offers monitoring indicators that will need to be extended in the near future, including availability and response time, because at the end of the day, clients expect 100% availability of their entire environments. If the network is broken somewhere down the line, a client has know which vendor(s) to approach and claim that there were service violations.
What typically happens in these types of scenarios is that a cloud vendor representative meets with a client at the end of the quarter, or whenever they do their account status review. Assuming that there were service level violations that could be presented to that representative with clear cut evidence, the account manager would compensate the client with credit points for the coming year.