Privacy enhancing technologies and confidential computing are two of my favorite topics to talk about! So much so that I am writing this blog post on a sunny Saturday afternoon. But wait, what’s that I hear you murmuring? “What is confidential computing? And how does it affect me?” Those are two very good questions.
Before we get into the details, let’s imagine you are the chief information security officer of Palabs, a leading genomics company which specialises in sequencing the DNA of curious citizens who are willing to spit into small containers and ship them across oceans for analysis. In exchange, your company provides them a data-driven and science-backed report of probabilities detailing where their ancestors might have come from (sorry grandma, you’re not from Italy! You might wanna go easy on all that pasta now).
Palabs takes pride in making us see how connected we all are, data-science style, one spit at a time! As the CISO, however, you’ve been losing sleep over the integrity and confidentiality of your customer’s code and data, for all Palabs workloads that are deployed on the public cloud!
You understand how sensitive customers’ data is. It is personally identifiable. It would be devastating if it ends up in the wrong hands. It’s not just the privacy of your individual customers which is at risk, but also that of their kids, parents, grandparents, siblings, first cousins, uncles…I could go on but you get the point. Any breach would compromise your brand which your company has invested years to build. Not to mention your healthy stock price, which has been steadily growing over the past couple of years.
In order to mitigate these risks, you have decided to follow the journey of your public cloud workloads’ data, and secure it at all stages:
- In-transit: for sending data to the public cloud over insecure networks, you only use secure protocols such as TLS.
- At-rest: to securely store the data when it is at rest and sitting idle in the public cloud’s storage, you always have it encrypted with a key that is generated and managed by your company, and that is further protected by the cloud’s hardware security modules.
In-transit security? Check. At-rest security? Check. So far so good. However, when it’s time to compute over the data, aka sequence the DNA, your public cloud provider needs to first decrypt it and then move it in cleartext from the server’s secondary storage and into its system memory, RAM. Computing over the data is unavoidable. After all, this is why your company is using the cloud: to take advantage of its elasticity and the great computational resources that DNA sequencing requires.
Alas, once in system memory, your code and data can be compromised by vulnerable or malicious system level software (OS, hypervisor, bios), or even by a malicious cloud operator with administrator or physical access to your vendor’s platforms.
But why does the security of user-level applications depend on the security of its underlying system software? The reason is the hierarchical architecture of commodity devices: privileged system software gets unrestricted access to all the resources of unprivileged user-level applications, because it controls its execution, memory, and access to the underlying hardware. Indeed, it’s a feature, not a bug!
This very lack of security guarantees over the integrity and confidentiality of your code and data at run time is probably keeping you awake at night as a CISO. So what to do now? Other than meditation and melatonin.
Enter confidential computing
Well, let’s first acknowledge that run-time security is a tough problem! In the case of Palabs, you want the cloud to analyse your customers’ DNA without learning anything about the content of that very particular DNA. And you want the cloud’s privileged system software to manage the lifecycle of your workload, but have no impact on its security guarantees. How can you compute over data without actually looking at that data? And how can you expect a vulnerable hypervisor not to threaten the security of the user-level applications it runs? It’s a fascinating riddle isn’t it? Yes indeed! So much so that it has its own name: privacy enhancing technologies, PETs.
PETs can be defined as the range of technologies that help us resolve the tension between data privacy and utility. They achieve this by allowing us to compute on data and derive value from it, while also preserving its privacy. This is unlike traditional cryptographic primitives, such as AES (advanced encryption standard), which only allow us to preserve data confidentiality, but make it impossible to perform any type of operation on the encrypted ciphertext. PETs can be realised through cryptographic approaches such as differential privacy, homomorphic encryption, secure multiparty computation and zero-knowledge proofs, as well as system approaches like trusted execution environments, otherwise referred to as confidential computing, CC.
In this context, the Confidential Computing Consortium defines CC as the set of technologies that allow us to “protect data in use by performing computation in a hardware-based Trusted Execution Environment. These secure and isolated environments prevent unauthorised access or modification of applications and data while in use, thereby increasing the security assurances for organizations that manage sensitive and regulated data”.
And this is why confidential computing is so exciting! It is here to address this very challenge of run-time insecurity. Instead of trying to make all system software secure, confidential computing takes a simple and pragmatic approach to PETs, which just works today.
It acknowledges that system software is either already malicious today or that it has the potential to become malicious at some point in the future. Therefore, it considers the execution environment it bootstraps to be untrustworthy, and proposes to run your security-sensitive workloads instead, in an isolated trusted execution environment (TEE) whose security guarantees can be remotely verified.
In order to be able to reason about TEEs and confidential computing, there are two main primitives that we need to understand: 1) Isolation and 2) remote attestation. We will explore how they are designed and implemented in Part II of this mini blog series. Stay tuned.