Service Level Considerations for
the Cloud
There has been significant discussion about what cloud computing
is and what will be required to ensure compliance efforts.
However, what features and service considerations should you
look for when assessing the viability of different cloud
providers? While Service Level Agreements (SLA’s) are valuable
tools in assessing availability guarantees, response time
commitments, and penalties, they often fail to address key
elements to running and maintaining your critical applications.
SLA’s are usually written by service providers to protect
themselves; however, customers need to understand how they will
interact with their application after it is in-service. What
level of access will customers have to their environment? How
will upgrades to the cloud be coordinated with the end user
applications? What are the operating practices of the cloud and
will it impact your critical production or maintenance
activities? Are there proprietary protocols that would inhibit
migration to or from the cloud? What are the failover
capabilities and practices with-in the cloud? These questions
will be explored in detail and are ultimately critical to
successfully running and maintaining your applications within
the cloud.
Not all portals (Web Access) are created equal. A best-in-class
portal will be feature rich, but does it have what you need to
support your environment? At a minimum you should have
visibility into the following capabilities.
-
Restart of an application or reboot of a virtual
container
-
Health of the cloud or status of the virtual servers
-
Historical use (peak and monthly average) – CPU, memory,
I/O, network
-
Current (real time) use statistics - CPU, memory, I/O,
network
-
Visibility into where your application is running - which
servers & site
-
Assess available capacity – storage, network,
server/compute
-
Service Provider alerts and notifications
-
Open trouble tickets and current status of tickets
-
Core service component status such as patching and backup
schedules
-
Contracted service schedule
-
Billing information
It is also important to understand the customer support
strategy. Who do you call if you have a problem? Once your
application is deployed and the implementation team is off to
support the next customer migration; who will provide support?
Is there a team supporting a small pool of customers that are
familiar with your environment?
It is important to understand the operating practices of the
cloud to ensure they do not conflict with your production or
maintenance schedules. Many service providers reserve the
ability to schedule maintenance windows for their network,
facilities, and cloud infrastructure to perform scheduled
maintenance activities and repairs. You need to understand how
these notifications will be handled and what they mean to the
operating environment. Will there be a loss of capacity or
redundancy for a period of time? Does maintenance require
failover to an alternate site? Is there alternate site
capability? Does the failover site have identical processing,
network and storage capabilities? Were these failover functions
tested during your implementation period or will some critical
maintenance be the first test of the failover strategy.
The other core component of the operating practices that needs
to be understood are upgrades to the core components, cloud or
virtualization software, network, and storage infrastructure.
Will these components have a coordinated release strategy? How
often will these components be upgraded, once or twice annually,
more often? Best practice would suggest no more than one major
version upgrade and two minor version upgrades annually. For all
major version upgrades, there should be a 6 month waiting period
prior to a production release. How are these upgrades tested and
will a pre-release upgrade environment be available for testing
customer applications prior to upgrade of the production
environment? Will a post-release version be available in the
event you cannot perform an upgrade? If so, what capacity will
be available and how will capacity across the environment be
provided.
According to the Uptime Institute, no site can achieve 100%
availability. As the concept of the cloud suggests, site
redundancy should be a core component of the cloud wherever
possible. It is important to fully understand site redundancy
and assess any limitations in advance of a failover. Depending
on your application there may be limitations regarding failover
of database environments, Domain Name Service (DNS) routing and
addressing limitations, capacity or hardware limitations, or
networking limitations. Should your provider have multi-site
capability, testing prior to deployment and annual testing
should be incorporated into your maintenance schedule.
With the relative newness of cloud and virtualization it is
further important to understand any limitations due to the
operating environment. Should you find yourself in the difficult
position of having to switch providers what is your migration
strategy. If everyone were running the same virtualization
software you could backup your containers and go elsewhere.
Reality is that many providers are running different software
throughout their environment and will likely inhibit a seamless
migration. At a minimum, you need to have a solution that
provides for the contracted ability to backup and restore your
operating systems, applications, and data from a running
environment. As many backup systems have interaction problems
with “open” files it is important to stress full backup
capability from the running environment.
About Visage Solutions –
www.VisageSolutions.com
Visage Solutions is a consulting company operating in the areas
of regulatory compliance, risk assessment, information security,
risk management and compliance processes. Utilizing our
proprietary SingleVue™ and OpsAudit™ methodologies, the company
focuses on assisting business entities in mitigating operational
risk. Visage has provided solutions to a client base ranging
from private, entrepreneurial companies to large multinationals.
Our team is comprised of experienced executives, managers and
consultants who can assist clients with the development,
implementation and execution of their risk management and
compliance strategy.
About our guest co Author - Don Clow
formerly COO & CTO with Hosted Solutions, Don is currently
working as an independent consultant delivering data center
infrastructure and IT solutions to include data center planning
and assessments, data center design, infrastructure hardware
procurement (cooling, emergency power – UPS, & generator,
electrical systems – automatic transfer switches, power
distribution systems, switchgear, switchboards, raised floor,
physical security systems), construction management and owners
representation, and data center commissioning.