Service Level Considerations for the Cloud

There has been significant discussion about what cloud computing is and what will be required to ensure compliance efforts.  However, what features and service considerations should you look for when assessing the viability of different cloud providers? While Service Level Agreements (SLA’s) are valuable tools in assessing availability guarantees, response time commitments, and penalties, they often fail to address key elements to running and maintaining your critical applications. SLA’s are usually written by service providers to protect themselves; however, customers need to understand how they will interact with their application after it is in-service. What level of access will customers have to their environment? How will upgrades to the cloud be coordinated with the end user applications? What are the operating practices of the cloud and will it impact your critical production or maintenance activities? Are there proprietary protocols that would inhibit migration to or from the cloud? What are the failover capabilities and practices with-in the cloud? These questions will be explored in detail and are ultimately critical to successfully running and maintaining your applications within the cloud.

Not all portals (Web Access) are created equal. A best-in-class portal will be feature rich, but does it have what you need to support your environment? At a minimum you should have visibility into the following capabilities.

-          Restart of an application or reboot of a virtual container

-          Health of the cloud or status of the virtual servers

-          Historical use (peak and monthly average) – CPU,  memory, I/O, network

-          Current (real time) use statistics - CPU,  memory, I/O, network

-          Visibility into where your application is running - which servers & site

-          Assess available capacity – storage, network, server/compute

-          Service Provider alerts and notifications

-          Open trouble tickets and current status of tickets

-          Core service component status such as patching and backup schedules

-          Contracted service schedule

-          Billing information

It is also important to understand the customer support strategy.  Who do you call if you have a problem? Once your application is deployed and the implementation team is off to support the next customer migration; who will provide support? Is there a team supporting a small pool of customers that are familiar with your environment?

 It is important to understand the operating practices of the cloud to ensure they do not conflict with your production or maintenance schedules. Many service providers reserve the ability to schedule maintenance windows for their network, facilities, and cloud infrastructure to perform scheduled maintenance activities and repairs. You need to understand how these notifications will be handled and what they mean to the operating environment.  Will there be a loss of capacity or redundancy for a period of time? Does maintenance require failover to an alternate site? Is there alternate site capability? Does the failover site have identical processing, network and storage capabilities? Were these failover functions tested during your implementation period or will some critical maintenance be the first test of the failover strategy.

The other core component of the operating practices that needs to be understood are upgrades to the core components, cloud or virtualization software, network, and storage infrastructure.  Will these components have a coordinated release strategy? How often will these components be upgraded, once or twice annually, more often?  Best practice would suggest no more than one major version upgrade and two minor version upgrades annually. For all major version upgrades, there should be a 6 month waiting period prior to a production release. How are these upgrades tested and will a pre-release upgrade environment be available for testing customer applications prior to upgrade of the production environment? Will a post-release version be available in the event you cannot perform an upgrade? If so, what capacity will be available and how will capacity across the environment be provided.

According to the Uptime Institute, no site can achieve 100% availability. As the concept of the cloud suggests, site redundancy should be a core component of the cloud wherever possible. It is important to fully understand site redundancy and assess any limitations in advance of a failover. Depending on your application there may be limitations regarding failover of database environments, Domain Name Service (DNS) routing and addressing limitations, capacity or hardware limitations, or networking limitations. Should your provider have multi-site capability, testing prior to deployment and annual testing should be incorporated into your maintenance schedule.

With the relative newness of cloud and virtualization it is further important to understand any limitations due to the operating environment. Should you find yourself in the difficult position of having to switch providers what is your migration strategy.  If everyone were running the same virtualization software you could backup your containers and go elsewhere.  Reality is that many providers are running different software throughout their environment and will likely inhibit a seamless migration. At a minimum, you need to have a solution that provides for the contracted ability to backup and restore your operating systems, applications, and data from a running environment.  As many backup systems have interaction problems with “open” files it is important to stress full backup capability from the running environment.

 About Visage Solutions – www.VisageSolutions.com

Visage Solutions is a consulting company operating in the areas of regulatory compliance, risk assessment, information security, risk management and compliance processes. Utilizing our proprietary SingleVue™ and OpsAudit™ methodologies, the company focuses on assisting business entities in mitigating operational risk. Visage has provided solutions to a client base ranging from private, entrepreneurial companies to large multinationals. Our team is comprised of experienced executives, managers and consultants who can assist clients with the development, implementation and execution of their risk management and compliance strategy.

About our guest co Author - Don Clow

formerly COO & CTO with Hosted Solutions, Don is currently working as an independent consultant delivering data center infrastructure and IT solutions to include data center planning and assessments, data center design, infrastructure hardware procurement (cooling, emergency power – UPS, & generator, electrical systems – automatic transfer switches, power distribution systems, switchgear, switchboards, raised floor, physical security systems), construction management and owners representation, and data center commissioning.

 
 


"The Visage Risk assessment tool and methodology allowed us to respond the risk assessment requirements of the FFIEC in a timely and cost effective manner ".
   Robert Kernodle, SVP and Risk Officer of Cornerstone Bank
 
"Although there is always a degree of subjectivity in any risk assessment, the Visage Risk Assessment tool and methodology is one of the best I've seen in removing subjectivity and providing the underlying support for the scoring system".
   Patrick Camblin Senior Partner in Camblin CPA, PLLC

if you would no longer like to receive periodic updates from VisageSolutions, please follow the unsubscription instructions at the bottom of the email.
Copyright © 2009 Visage Solutions, LLC.