In the last 10 years, I’ve had the privilege of working on cloud security at Microsoft, Google, and Pivotal. Most recently at Pivotal, I’ve worked closely with some of the most forward-looking enterprises in the world. All of them want to deliver applications at a faster pace, and they are willing to try new tools, techniques, and processes to get there. I’ve observed another common trait — security is a top concern, both with their existing infrastructure and their next generation cloud infrastructure.
Behaviorally, there is an instinctual reach for previously defined tools and methodologies to help ensure the appropriate level of security. Often they are calcified within the organization. Some are helpful, some are not. In this post, I’ll describe what I believe to be the single most important concept for an enterprise security organization to grasp when evaluating cloud infrastructure. It’s a radical change from the status quo, but I believe it will dramatically and immediately improve the security posture of any IT organization.
Its idea is quite simple.Rotatedatacenter credentials every few minutes or hours.Repaveevery server and application in the datacenter every few hours from a known good state.Repairvulnerable operating systems and application stacks consistently within hours of patch availability.Faster is safer.It’s not a fantasy — the tools exist to make most of this a reality today. Do it, and you’ll see a dramatic improvement in enterprise security posture.
Before I describe why and how this works, let’s first take a step back and look at today’s enterprise security culture.
The Trap Of Resisting Change to MitigateRisk
The sad truth is that the foundation of traditional enterprise infrastructure centers on resisting change. Firewall rules, long-lived TLS credentials, and hard-to-update databases support that hypothesis.It’s natural to expect that an enterprise security team, after decades of consuming infrastructure that resists change, would have a culture that also resists change.It’s a seller/buyer socio-technical system. Traditional approaches force enterprises to choose between moving fast and accept unbounded risk, or slowing down and try to mitigate risk. Everybody chooses slowing down. It’s an easy decision when it’s your job to protect an organization.
The Dreaded Mega-Breach
At or near the top of security concerns in the datacenter is something called an Advanced Persistent Threat (APT). An APT gains unauthorized access to a network and can stay hidden for a long period of time. Its goal is usually to steal, corrupt, or ransom data.
It’s the dreaded, front-page newsworthy mega-breach.
A lot has beenwritten about the anatomy of an APT. Unfortunately, APT has become an umbrella buzzword, so it will likely mean different things to different people. In this post, I’m focusing on an attack that worms its way into the datacenter, sits in the network, observes, and then does something malicious. To avoid the hype, I’ll simply call it an attack and leave the labeling to the reader.
I believe these types of attacks need at least three resources in order to blossom — 1) time, 2) leaked or misused credentials, and 3) misconfigured and/or unpatched software. Time gives the malware more opportunity to observe, learn, and store. Credentials provide access to other systems and data, possibly even an ingress point. Vulnerable software provides room to penetrate, move around, hide, and gather more data. These are like sunlight, water, and soil to a plant. Remove one or more and it’s not likely to mature.
Now, consider the relationship between the calcified socio-technical system and attacks. First and foremost, there’s lots and lots of time. For example, credentials seldom rotate. So, if an attacker can find some, they are likely to remain valid and useful for a long time. As well, it often takes months to deploy patches to operating systems and application stacks, even in a virtualized world. It’s not uncommon for an enterprise to leave a server vulnerable for 6 months or more. Almost no one regularly repaves their servers or applications from a known, good-state. Instead we often apply incremental changes, so the slate almost never gets wiped clean. Traditional enterprise software vendors and the trap of the rigid enterprise create the rich, fertile, undisturbed pastures for attacks to flourish.
The AcmePattern
To get a clearer picture of what this means in real terms, let us look at the accreditation process at Acme corporation as an example. The process is there for a good reason, but it also has a nasty side effect.
Let’s say Acme has an enterprise accreditation process that takes two months, and it’s required on every major software release. It’s there to ensure baseline security standards and to ensure the new version doesn’t break existing systems. If a software vendor releases version 2.0 in January, Acme starts the accreditation process in early February. There’s a minor hiccup, so the process doesn’t complete until mid-April. Installation is planned for June, and it takes one month to complete. The total delay is at least six months — plenty of time to give an attack all the resources it needs to transition from a seedling into a monster.
There’s more. Acme will often push back on the software vendor to keep prior versions of their software patched for a long period of time. This further complicates improvements because the software vendor must devote non-trivial resources to this effort. Those resources can’t be used to improve the product, so releasing new versions of software can take even longer.
The cycle perpetuates and grows, and inadvertently feeds the attacker. The cycle needs to be broken.
Faster isBetter
What’s the industry’s response to this phenomenon? I’ll give you a clue — it isn’t working.
Enter a parade of security software vendors. With the confluence of the slowly changing socio-technical culture and attacks, gas is poured on the demand for security monitoring and detection tools. I guess the reasoning is something like, “It can’t change quickly, so change is a sign of a malicious actor.” We’ve resigned ourselves to live with slowly changing infrastructure, so we spend lots of money monitoring it for change and hoping for the best.
This is a decision to treat the symptoms rather than cure the disease. In fact, many security monitoring solutions embody a self-fulfilling prophecy — since updating a system often looks like an attack, you either stop paying attention to alerts or you resist updates. Both help attackers. I’m not saying all monitoring and detection is unnecessary — I’m asserting that it’s palliative treatment at best. Most monitoring solutions help enterprises deal with attacks in about the same way a table knife helps one eat a bowl of soup.
If you identify with the above reasoning, then it’s natural to wonder about the cure. I don’t have all the answers, butI believethe key is to starve attacks of the resources they need to grow into monsters.Rotate the credentials frequently so they are only useful for short periods of time. Repave servers and applications from a known good state to cut down on the amount of time an attack can live. Repair vulnerable software as soon as updates are available.
Rotate,repave,repair. I call these the three Rs of enterprise security.
Starving Attacks with the Three Rs — Rotate, Repave, andRepair
At high velocity, the three Rs starve attacks of the resources they need to grow. It’s a complete 180-degree change from the traditional careful aversion to change to mitigate risk.Go fast to stay safer — in other words, speed reduces risk.
To an attacker, it’s like playing a nearly unsolvable video game. She needs to get to level 100, but she can’t get past level 5 because there’s not enough time. In addition, what worked the first try didn’t work on the 20th try.
The promise of the three Rs is profound.
It’s no secret that I’m biased towardsPivotal Cloud Foundry,BOSH,ConcourseandOpsManager. By themselves and in their current state, these products change the attacker/defender game dramatically. It’s totally possible to repave every VM in your datacenter from a known good state every few hours without application downtime. Deploy your applications from CI, and your application containers will also be repaved every few hours. Our patch turnaround time for the entire stack is second to none, and you can deploy those patches to your entire datacenter with a few clicks of a mouse.
We’ve got the repave and repair angles pretty well covered with OpsManager, BOSH, and Concourse, and Pivotal Web Services. For example, all the VMs in a Pivotal Cloud Foundry cluster are imaged with an image called a stem cell. Though it’s not yet a default option, It’s possible to repave every VM in the cluster on an interval of your choosing with BOSH.Pivotal Web Services automatically updates buildpacksto ensure application environments are always patched.We’re still working on automated credential management, so stay tuned for updates on that front.
From a security point of view, I can’t think of a reason not to embrace this model immediately. Regardless of the tools you’re investigating, I encourage you to consider the three Rs when evaluating the security of your cloud platform. If a tool doesn’t help you get there, then it’s probably best to run away.
FAQs
The Three R’s of Enterprise Security: Rotate, Repave, and Repair? ›
Organizations need to follow Three Rs of Enterprise security — Rotate, repair and repave in the way of continuous delivery and infrastructure automation. Rotate the stack credentials every few minutes or hours. Repave every server and application in the every few hours from a recognized good state.
What are the 3 R's in security? ›Organizations need to follow Three Rs of Enterprise security — Rotate, repair and repave in the way of continuous delivery and infrastructure automation. Rotate the stack credentials every few minutes or hours. Repave every server and application in the every few hours from a recognized good state.
What does it mean to repave a server? ›Repaving servers typically means recreating your VMs from a known “good” image, as opposed to the much more common approach of applying incremental changes (meaning the “the slate almost never gets wiped clean”). Your ability to do these things quickly greatly reduces your risk of being compromised.
What is repaving in Kubernetes? ›"Repaving" is the process of updating a host to the latest OS and dependent libraries. Some customers do this by policy. We need to add a how-to with the recommended steps for repaving the host that a node is running on without interrupting the cluster's operations. This is similar to decommissioning in part.
What are the 3 R's stand for? ›Reduce, reuse and recycle: The “three Rs” to help the planet
Reducing, reusing and recycling plastic is key in countering the devastation wreaked by climate change. Plastics are a major source of pollution on Earth. Unbridled manufacturing and low recycling rates of plastic products threaten our planet.
Confidentiality, integrity and availability together are considered the three most important concepts within information security. Considering these three principles together within the framework of the "triad" can help guide the development of security policies for organizations.
What does Repave mean? ›(also re-pave) /ˌriːˈpeɪv/ to put a new hard, flat surface on an area of ground: Plans to repave roads and improve lighting could be delayed or abandoned if the cuts go ahead.
What does pave and repave mean? ›: to pave (something) again. repaved the road. This project will include … fixing curbs and sidewalks, and re-paving with asphalt.
What is the difference between repave and resurface? ›Resurfacing: Adding a new layer of asphalt atop the old lot, often followed by resealing. Repaving: Removing the old pavement/asphalt and adding a whole new asphalt layer, also often followed by sealcoating.
What is a patch in Kubernetes? ›Patch is a command line option for updating Kubernetes API objects. You can use it to update a running configuration. You do this by supplying it with the section to update, instead of a completely new configuration, as you would with kubectl apply.
What is the difference between replace and patch in k8s? ›
The patch command allows you to modify part of a resource spec, providing just the changed part on the command line. The replace command behaves kind of like a manual version of the edit command.
What is rollout restart in Kubernetes? ›In Kubernetes, kubectl rollout restart is a command used to start a new rollout process for three specific Kubernetes objects: Deployment, DaemonSet, and StatefulSet.
What is the difference between rolling and recreate in Kubernetes? ›Rolling deployment—replaces pods running the old version of the application with the new version, one by one, without downtime to the cluster. Recreate—terminates all the pods and replaces them with the new version.
What is redundancy in Kubernetes? ›Node-Level Redundancy
Kubernetes clusters typically consist of multiple worker nodes. By distributing pods across these nodes, Kubernetes achieves redundancy at the node level. If a node fails, the pods running on that node are automatically rescheduled on other healthy nodes, maintaining application availability.
While Kubernetes does indeed offer a failover mechanism it is not automated in such a way that in the event of a cluster or a service failure, the services are instantly transferred to a replica cluster configuration where they resume functionality.
How do I restore Kubernetes? ›- Install the etcd client. ...
- Identify appropriate IP addresses. ...
- Edit a manifest file to update paths. ...
- Locate the Spec section. ...
- Add the initial cluster token to the file. ...
- Update the mount path. ...
- Replace the name of the hose path.