As we are constantly evaluating and refining our methodology for cloud penetration testing in a rapidly changing landscape, with both cloud native and hybrid cloud environments, we sometimes have a hard time placing our approach into established contexts of an on-prem world.
Some of the things that we now have to think about as we evaluate our potential attack surface:
- Are you cloud native or hybrid cloud?
- Are you migrating existing on-prem applications and services to the cloud?
- Are you using Docker containerization and orchestration tools like Kubernetes that are relatively new attack surfaces?
- To what extent is your infrastructure a mesh of services and configurations?
When organizations still struggle with physical asset management, can we expect the management of virtualized assets – besides simple virtual machines – and networks at scale will be any easier?
To even begin to determine a starting point and define what’s within the scope of an engagement can sometimes challenge the status quo.
We’re no longer just talking about pre-defined, publicly accessible endpoints into a closed network and lateral movement within physical infrastructure that is well known to its owners. We’re talking about service mesh infrastructure connecting networks all over the world, with multiple modes of authentication, and abstracting away traditional physical IT borders.
When the lines are more and more blurred and service configuration becomes the norm over managed infrastructure, where do we draw the line between testing methodologies for an API or a network, what’s external vs. internal, where does scope start and end? Or do we now need new methodologies that define new parameters for engagement?
Adjusting Our Threat Models and Operations
Just as we’ve come to assume that every network is a hostile network, as it relates to the cloud, we should also start to assume that every account is a compromised account.
If an account is exploited, it’s fairly easy for an attacker to maintain persistence for long periods of time without detection. System admins no longer have the ability to simply monitor processes and network connections for signs of compromise. They’re having to adapt to completely new environments, that may take some time to become as robust as it was previously with matured systems.
Hopefully you have global Cloudtrail event logging in place, but how many CloudWatch Event Rules are you configuring to detect anomalous behavior? Do you even know what to look for when monitoring is now essentially data parsing infinite, custom, largely undocumented JSON payloads coming through Log Streams?
As native Infrastructure as Code tools like CloudFormation and Azure Resource Manager are given more authority and autonomy to manage this complexity, and deployment pipelines with Jenkins, AWS Code or Azure Pipeline are simply trusted, it’s likely that a compromise of any portion of the pipeline thereof would go undetected with higher level privileges and access.
What if your deployment template was tampered with to use a slightly modified AMI that opens a backdoor when launched into an Auto Scaling Group? Or instead, creating and exfiltrating an Azure RunAs certificate. How many would have monitoring in place to detect this new AD Service Principal?
When developers were prototyping an application and requested a DNS CNAME record, and then took the application offline releasing the predictable AWS or Azure subdomain name, was there a process for detecting that event?
If an attacker was able to exploit the delivery or in-place code of an AWS Lambda with access to a DynamoDB customer database, version their exploit and re-publish the original code as the $LATEST version, who would notice infrequent invocations of a prior version?
Say you’re migrating existing legacy applications to the cloud as Docker containers and using Elastic Container Service and Auto Scaling Groups to achieve High Availability, you setup RBAC to prevent hardcoding credentials as you should. One problem, the application still has an RCE bug and the assigned role has `ecr:*` permissions. An attacker now has the ability to corrupt your internal image repository.
Not even sure how you would even mitigate some of these scenarios as you feel the pressure to “race to the cloud”? Exactly. The landscape is changing in every way. It’s forcing both red and blue teams to adapt…quickly.
We now have possible attack vectors originating from the cloud to on-prem infrastructure, or on-prem to cloud, or soon enough we’ll see attackers moving laterally between clouds as the major players are now racing towards cloud interoperability in an effort to capture more market share from one another.
The Road Ahead
Dyring Google Next’s 2019 keynote, they reported that roughly ~80% of enterprise workloads are yet to be migrated to the cloud…but they will be. AWS just launched 28 new services at this past year’s RE:Invent alone.
In Goldman Sachs’ latest report, it states that as of December 2019 the total estimate of enterprise workloads in the cloud was at about 23%, up from 19% from June of the same year. What’s even more is that they expect this number to be roughly 43% in the next 3 years.
To summarize, in the next 3 years, enterprise customers are projected to move as much to the cloud as they have since it’s inception. What could possibly go wrong?
“This is fine” by KC Green
Services are being “connected” more and more, and while the responsibility of security is being offloaded and we now enter the era of “shared responsibility” with cloud providers, do the operators really even know where their responsibility lies? How do you properly train for and assess the complexity of modern virtual infrastructure?
So while the cloud has made an impenetrable door that’s simple to install. They’ve also given people the ability to open 900 windows without telling them how a window really works.
When the challenge to build, monitor and patch comparatively slow moving operating systems that change every few years has been an ongoing challenge for the industry, what challenges await companies for keeping staff up to date with services that seemingly appear, disappear or morph overnight?
Businesses no longer have to worry about securing the physical data center and the machines connected within them, we still have the very real threat of human error of just trying to keep up, much less demonstrate mastery over the infrastructure that they deploy and manage.
While the cloud and modern application architectures are making the offensive playbooks adapt [limiting our reliance on the tools we’ve relied on], as well as making a stronger front door, what happens now **when** someone finds a way in?
When the business world has collectively struggled to move at the pace of networking and operating system technology over the years, how will they keep up with the pace of the cloud?
What confidence can you have that an Ops team has a grasp of the over 6,000 individual IAM permissions currently available in AWS, or have visibility into applied permissions of an S3 bucket when overlapping IAM policies exist?
How many to this day actually understand how to properly secure an Amazon S3 bucket and the difference between Bucket Policies and ACLs?
Have the developers inadvertently created an Amazon MySQL RDS with a publicly available endpoint?
What happens when your Kinesis Stream that feeds into your machine learning service isn’t setup to detect anomalies in data or validating event sources from an IoT monitoring fleet that triggers control mechanisms without user intervention?
The cloud is changing the game of how we think about, build, deploy, monitor and manage systems, users and data, and it will change the game of how we define our methodology, tooling and tactics for offensive security.
That time is now. Are you ready?