Common misconceptions behind cloud migration failures
Tags: cloud , Cloud migrations , RDBMS
Migrating your workloads to the cloud can bring some undeniable benefits to your organisation. For example, you can leverage cloud automation to significantly improve your time to market. You can also benefit from the ever increasing number of cloud regions to place your workloads close to your clients. This improves the response time of your services and, as a result, your customers’ satisfaction.
However, there are numerous misconceptions around public clouds. These misconceptions often lead to costly cloud migration failures and might end up in cloud repatriation. This article is the second in a series aiming to help you avoid costly mistakes around cloud migration. In this blog, we will go over some of the widely shared misconceptions and provide recommendations to help you get the most of public clouds and make a well-considered decision to only migrate the pertinent workloads.
The “illusion of infinite capacity”
The CEO of AWS pertinently stated in 2020 that “The cloud is all about creating the illusion of infinite capacity”. The problem is that many cloud consumers fully believe in the illusion and even make plans based on those assumptions. The promise of having cloud elasticity and on-demand capacity you can tap into as you scale is compelling. But when you spend enough time using cloud services, you will eventually meet errors like “insufficient capacity”, “SKUNotAvailable” and “not … enough resources available”.
It’s important to understand that even if a cloud provider has millions of available resource units (e.g. VMs), those units might not be the same type you used to validate your application functionalities. You might find the resources you are looking for in another region or zone but it might not fit your compliance and performance constraints. Moving your services to other (and hopefully similar) resources can involve considerable qualification time and efforts that are simply incompatible with your Service Level Agreements.
You might mitigate some of the above problems by reserving your hardware in advance (for 1, 2 or 3 years typically) but it goes against the promise of elasticity and the associated cost savings. So if you are planning to benefit from cloud elasticity, we recommend to:
- Adopt multi-region deployment.
- Rely on different resource types and test your application against the possible combinations.
- Check that the elasticity benefit is not outweighed by the reservation benefit.
The public cloud has plenty of resources to offer. In order to benefit from this abundance, you need to have the flexibility to deploy in many locations and the flexibility to move between hardware types. You might even consider adopting a multi-cloud strategy as 81% of the respondents to this HashiCorp survey.
It’s equally important to ask whether such flexibility allows you to increase your revenues or reduce your costs.
Let’s now dive into another common misconception: the cloud is faster.
Cloud-based deployments are more performant
Most public clouds provide a broad choice of compute and storage solutions that no privately owned data centre can claim or even consider having. For example, AWS provides nearly 70 VM instance types, 7 EBS (you can think SAN) types of remote volumes and 7 storage classes of object storage!
Moreover, most cloud providers tend to regularly renew (around every 3-5 years) their hardware to keep up with the latest innovations. Some cloud providers are even becoming hardware innovators themselves, like AWS with its Graviton lineup of Arm processors.
The previous facts contribute to the false impression that the performance you can get in the cloud can not be matched with an on-premise set-up. This is far from the truth as we detailed in this whitepaper. Here are some of the scenarios where your services’ performances can actually degrade after a cloud move:
- Your on-premise set-up is leveraging persistent local flash storage. Cloud providers offer decent options for VMs with local flash storage as with AWS’s I4g series, Azure’s Lasv3 series and Google cloud’s N series. Yet, local storage is ephemeral in the cloud. So you’re left with 2 choices:
- Use ephemeral storage and heavily invest in automation to respawn failing nodes (and the associated services) automatically.
- Resort to remote storage and accept a possible degradation in performances.
The cloud model is currently incompatible with supporting local and persistent storage.The cloud provider would need to commit to replacing individual failing disks in a short time frame. The latter would impose a lot of strain and complications on cloud operations and the hardware logistics are simply incompatible with the required scale.
- Your on-premise set-up is benefitting from a low network latency between your components. The low latency might be due to a simple setup where most of your components are co-located in a few servers or racks. It might also be due to short distances between your data centres or co-locations. This is not because the cloud provider’s network design is less optimised than yours. It’s simply because you are moving from one set-up to another one. Cloud providers would not be able to claim the good availability guarantees they offer (usually around 99.99% for multi availability zone deployments) without relying on better isolation constructs than simple server isolation or rack isolation.
So your cloud deployment might be faster than your on-premise deployment. Yet, the reverse is also possible. The only way to know is to perform benchmarks before and after the migration.
Next, we will discuss one of the growing concerns around cloud migrations: security in the cloud.
The cloud is more secure
The general story line is that cloud providers have among the best security architects in the industry and that they are investing heavily in security-related tools and research. That is absolutely true! Microsoft alone is planning to spend $20 billion on cybersecurity over 5 years. Most cloud providers offer a wide range of services covering threat detection and response. For example, Google has a listing of 36 different products around security and identity management.
So what’s the issue with claiming that the “cloud is more secure”? The problem is that in order to benefit from all the security enhancements offered by the cloud you need to:
- Use cloud services correctly. There is an ever growing list of famous configuration issues that exposed sensitive files to the public (e.g. Booz Allen Hamilton’s misuse of AWS‘s S3).
- Leverage all the security features your cloud provider offers:
- Firewall rules
- Routing tables
- Secrets Manager
- Certificate Managers
- Key Management Systems
- Identity and Access Management services
- And more
We might argue, rightly so, that the controls that we need to implement in public clouds are the same as those we need to implement on-premise. However, you are significantly more exposed in public clouds than in a private data centre. The cloud infrastructure is shared between all users of said cloud – and there are many of them. Despite the measures cloud providers take to segregate resources associated to different tenants and customers, there are always security holes that would allow an attacker to bypass those measures (e.g. Cloud Managed PostgreSQL’s vulnerability).
So “the cloud is more secure” should be conditioned with the proper usage of the cloud’s security features. You need to have the resources and skills to correctly leverage cloud features to outweigh the risks inherent to increased exposure.
Our recommendation is to design for security from the start of your cloud journey. In this whitepaper we provide some of the best practices to adopt the “design for security” posture for database deployments.
Similarly, you need to design to optimise your costs in the early stages of your cloud journey. Otherwise, you might end up in an endless loop of budget overruns. Let’s explore the labyrinthe of cloud costs to help you avoid such traps.
The cloud is cheaper
“It depends” is almost always the right answer to any big question, as we learned for clouds and security. The same is true for public clouds’ TCO. The only way to estimate the financial outcome of a cloud migration is by building a realistic business plan. We can split most IT costs in three categories:
- Staffing costs: This represents all the costs related to hiring, retaining and training your IT professionals.
- Licence costs: It represents any licence cost associated with your database or its hosting OS (e.g. Ubuntu Pro subscription).
- Hardware costs: It covers all the costs related to either purchasing hardware or provisioning it using cloud automation.
Let’s dive into these components separately:
Your staffing costs might increase, decrease or remain flat after a cloud migration. Let’s check what you might expect in possible scenarios:
When can you expect a reduction in the staffing cost?
When migrating all of your infrastructure to the cloud you might reasonably expect a reduction in staffing costs related to the management of physical servers. Physical security, cabling, maintenance of power and cooling infrastructures are all outsourced to the cloud vendor. However, the cloud vendor will not take care of everything you previously had to handle in a private cloud. Tasks related to hardware planning (remember the “illusion of infinite capacity”), hardware qualification against your workloads and provider management will still be your responsibility. Moreover, a cloud migration will be only successful if you have the right skill set to perform the migration.
When can you expect an increase in staffing costs?
According to IBM’s 2022 Transformation Index, 69% of respondents admitted that “their teams lack skills to be proficient in architecting/managing cloud applications” and 72% “are creating new positions to fulfil the need for cloud skills”. Whether you choose to train your staff or hire new people with these skills, you need to factor those additional costs in your business plans.
When is it hard to tell in advance?
You might also opt in for managed services to reduce some of the operational burden on your staff. By opting for Platform-as-a–Service products, you typically offload activities like security patching and VM deployment to the cloud provider. When you choose to go up on the stack by opting for Software-as-a–Service products, you typically offload even more activities, like manual resources management.
Yet, there is “no such thing as a free lunch” and the more you go up on the stack the more you pay. For example, when we compare the AWS cost for a 1TB PostgreSQL Multi-AZ deployment to the cost of provisioning the equivalent hardware (2 AWS EC2 VMs of the same type and having 2 times the same allocated storage in order to account for the backup needs) we find the following figure:
Obviously, the above is not an apple to apple comparison as AWS’ RDS comes with solid and proven automation to deploy, patch, upgrade and backup your databases. Developing an RDSlike automation can result in a high initial investment and requires a particular set of skills that might be hard to find.
So opting for managed services might help you reduce your staffing requirements in the long run but it might incur significant additional costs that you need to weigh in before deciding which way to go.
In sum, there is no simple answer to the cost implications of a cloud migration from a staff perspective. Generally speaking, the following pattern is likely:
- A slight increase in staffing costs during the migration.
- A flat staffing cost in the mid term (after finishing the migration).
- A reduction in hiring needs in the long term (years after finishing the migration).
It’s more prudent to not bet on any significant staffing cost decrease as a result of a cloud migration.
We have covered a few considerations related to staffing costs. Let’s look at more considerations related to hardware.
This is probably the area where we might have most of the misconceptions related to the cloud. Some of the related myths are rooted in the following storyline. Cloud providers are buying hardware on a scale that no owner of a private data centre or colocation can sustain or even consider. Therefore, they are benefiting from economies of scale that can not be matched by other traditional self-managed data centres. The latter is absolutely true but it does not mean that the economies of scale are being transferred to the end user, including you.
What adds to the confusion is the contradicting surveys you might find on the internet. For example, HPE claims to provide better or equivalent “performance” than AWS’s infrastructure for analytics workloads while costing only a third of AWS’ cost. Dell claims similar findings for a SAP HANA workload. Gartner, provides a reverse example of a “workload migration for 2,500 virtual machines from an on-premises data centre to Amazon Web Services EC2” that led to around 45% gain.
So again, there is no simple answer to the impact of a cloud migration on your Total Cost of Ownership. Ultimately, you need to have your own business plan with your own realistic assessments as detailed in this whitepaper.
Here are some possible scenarios and their associated outcomes to guide you in your decision:
When can you expect the hardware cost to increase ?
Generally, a company that meets all of the following criteria will not see any significant savings by just migrating to the cloud:
- When the company already operates a large deployment at its private data centre or colocation. We consider large, any deployment of more than a few PBs of raw storage.
- When the usage of the hardware resources is not extremely variable. We do not consider any usage pattern where the sustained peak is less than 3 times the average one extremely variable.
When can you expect the hardware cost to decrease?
Generally speaking, a company that meets any of the following criteria might gain substantial savings by moving to the cloud when:
- The company is managing, for every operated location, a small pool of hardware resources.
- Time to market or response times are critical for acquiring new markets or gaining new revenues.
- The company is starting a new business or targeting a new market where it can not consolidate on its existing hardware resources and knowledge.
Now that we have a good overview of the considerations related to staffing and hardware costs, we will move to the last cost type: licensing.
Licence costs related to cloud migrations are probably the easiest to assess. Some of the licensed products you might be using will be deprecated after the cloud migration, others will remain part of your IT landscape and new ones will be used. Therefore, we can group licence cost evolution into the following groups:
- Products that you will stop using. Their deprecation will generate cost savings.
- Products that you will replace by other products. You need to estimate the cost difference between both of them.
- New products that you will start using. Their usage will generate new costs.
- Products that you will keep using. Most of the costs in this category will remain stable. Some cloud providers who also provide licensed on-premise products encourage their existing customers to move to the cloud by providings savings on their cloud costs. For example:
- Oracle allows you to gain some “universal credits” by re-using your existing on-premise licences.
- Azure allows you to re-use some of your on-premise licences.
In most cases, we recommend not betting on licence savings following a cloud migration. Licence savings can happen by moving from a product to another (as we detailed in this article) but are not inherent to a cloud migration.
Wrapping up: you need a business plan to assess the financial outcome of a cloud migration
There is no escaping from building a proper business plan to assess the financial outcome of a cloud migration. However, in our point of view there are 2 types of cloud migrations:
- A cloud migration that aims to generate new revenues by leveraging the cloud as a differentiator to improve time to market, service quality and ultimately customer loyalty. This type has a high chance of success.
- A cloud migration that aims to optimise the Total Cost of Ownership of an existing product. In this case, a more careful approach would be to start optimising on-premise.
In this article, we tried to debunk some of the misconceptions behind most of the painful migration failures. At Canonical, we have the expertise and the products that can help you plan, execute and optimise your cloud migration:
- We can help you secure your assets. Our Ubuntu Pro offering provides you with 10 years maintenance and security coverage for more than 23,000 packages. It helps you harden your operating system and helps you acquire certifications for various compliance regimes, such as FedRAMP and HIPAA.
- We can manage some of your critical applications at a predictable and transparent cost.
- Our Juju platform helps you in operating your workloads the same way on the major public cloud providers and in your private cloud. Therefore, it allows you to lower the risk of any cloud migration as you can move workloads back and forth without any change in your model of operations.
- We can help you build your own private cloud through a wide variety of products:
- MAAS for setting up and operating bare-metal servers.
- LXD for managing containers and virtual machines.
- OpenStack and Canonical Kubernetes for modernising your data centre operations through a public cloud like paradigm.
- Canonical Ceph for centralising and automating your storage management.
Ubuntu offers all the training, software infrastructure, tools, services and support you need for your public and private clouds.