A look forward to storage in 2022

Philip Williams

on 22 December 2021

This article was last updated 2 year s ago.

It’s that time of year, where we start to look ahead, and think about the ongoing trends in our various industries. One thing is for certain in the storage industry: capacity demand remains high, with the industry observing continued exponential growth.

Growth, growth, growth

More and more data is being created every day. It truly is non-stop. In 2021 alone, it was predicted that enterprise storage vendors would ship almost 150 Exabytes in capacity, and this number is only expected to increase again in 2022!

We now see 20TB hard drives on the market to help with these needs, but we have to remain vigilant when building storage clusters, as the access speed of these drives hasn’t really changed at all over the last few years. In failure scenarios, where we have to recreate replicas or erasure-coded shards of data, it can take many many hours with drives of such high capacity.

So the rule of thumb remains the same: a larger number of smaller drives leads to a more predictable system for any amount of capacity. Of course, you do have to remain pragmatic to balance capacity needs with the cost of increasing the number of spindles.

Flash, denser, and faster

Over the last few years, we have seen huge leaps forward in capacity orientated flash. Intel recently launched a 30TB QLC 3D NAND drive, surpassing even the largest of traditional spinning drives. Whilst we wouldn’t suggest using these for very write-heavy workloads, there is definitely a place for them in storage systems to increase throughput above traditional spindle based configurations. Additionally, there are power usage benefits too, which in large-scale clusters becomes more and more important as you scale – and even at the Edge, where power budgets might be quite limited!

Computational storage

An interesting and novel area in hard drive technology is the concept of computational storage, that is, adding more intelligence to the hard drives and SSDs that we use in servers and storage clusters.

We have seen work in this area before, but the use case was almost too narrow. Seagate created a hard drive called Kinetic, which exposed a key/value object storage interface over Ethernet, rather than the usual block interfaces of SAS or SATA. This was interesting for those of us building larger scale object stores. It meant that, with each hard drive added to a cluster, an additional amount of compute resource was added too, leading to a highly scalable sea-of-compute-and-storage. Furthermore, it reduced failure domains significantly to a single disk, rather than a whole server containing multiple disks. However, this concept didn’t really gain much traction, as it required significant changes to the software used to build storage clusters. There just wasn’t enough resources on each drive to run an entire OSD in the case of Ceph.

Fast forward to 2021, and we see some smaller companies start to offer products that maintain typical SAS and SATA interfaces, but also provide capacity efficiency options like compression, or encryption, on-drive, without the requirement of any host processing power, or changes to the software running on the server.

This is a lot like what we have seen already in the Ethernet space, where certain tasks are offloaded to Smart-NICs. With some computationally aware storage devices, it is already possible to access the compute resources on these drives and use them for pre-processing datasets. When you may have a storage system with thousands of drives, this becomes a huge amount of additional computing power at your disposal.

Data repatriation – post pandemic splurge

Over the last two years, we have all seen huge changes in the way that we work. To support that, many companies have turned to public clouds to help them scale their operations immediately and maintain business as usual. Cost optimisation has largely been a secondary consideration.

However, as companies have settled into these new ways of operating, we now see a renewed focus on cost optimization and efficiency. Storage remains the least cloud-friendly piece of infrastructure, as usage is typically static or expanding, and doesn’t have peaks and troughs like compute might.

More and more companies are waking up to the costs of storing data in the cloud, and are considering near-cloud solutions where they operate their own hardware in co-location facilities adjacent to major cloud provider facilities, and link them together with private interconnects. Not only does this reduce costs immediately, it also means that there are no penalties when migrating to other cloud providers in the future too!

Wrap up

We wish you all Happy Holidays and a wonderful New Year!

Open source storage solutions such as Ceph can readily help solve for the growth and scaling challenges seen across the industry. Learn more about deploying Ceph from our recent webinar here.

What is Ceph?

Ceph is a software-defined storage (SDS) solution designed to address the object, block, and file storage needs of both small and large data centres.

It’s an optimised and easy-to-integrate solution for companies adopting open source as the new norm for high-growth block storage, object stores and data lakes.

Learn more about Ceph ›

How to optimise your cloud storage costs

Cloud storage is amazing, it’s on demand, click click ready to go, but is it the most cost effective approach for large, predictable data sets?

In our white paper learn how to understand the true costs of storing data in a public cloud, and how open source Ceph can provide a cost effective alternative!

Access the whitepaper ›

Interested in running Ubuntu in your organisation? Talk to us today

A guide to software-defined storage for enterprises

Ceph is a software-defined storage (SDS) solution designed to address the object, block, and file storage needs of both small and large data centres.

In our whitepaper explore how Ceph can replace proprietary storage systems in the enterprise.

Access the whitepaper ›

Interested in running Ubuntu in your organisation? Talk to us today

Performant, reliable and cost-effective cloud scaling with Ceph

Canonical Ceph simplifies the entire management lifecycle of deployment, configuration, and operation of a Ceph cluster, no matter its size or complexity. Install, monitor, and scale cloud storage with extensive interoperability.

Find out how Ceph scales clouds so cost-effectively ›

A look forward to storage in 2022

Philip Williams

Growth, growth, growth

Flash, denser, and faster

Computational storage

Data repatriation – post pandemic splurge

Wrap up

What is Ceph?

How to optimise your cloud storage costs

A guide to software-defined storage for enterprises

Performant, reliable and cost-effective cloud scaling with Ceph

Newsletter signup

Related posts

How to reduce data storage costs by up to 50% with Ceph

How to utilize CPU offloads to increase storage efficiency

Meet the Canonical Ceph team at Cephalocon 2024