Graph Database Blog Posts

Solving Infrastructure Fragmentation with Data

March 31, 2023 · 12 min read

Some Engineer

As companies grow, their cloud infrastructure quickly becomes fragmented and gets out of control. Data about what resources exist and how resources they relate to each other is tedious to acquire.

In practice, this means that the infrastructure layer often remains a mystery, and engineering teams are unable to see what's happening in their infrastructure. This makes capacity planning impossible, limits organizations' ability to control cloud costs, and leaves teams in the dark about potential security vulnerabilities.

The data to understand cloud growth exists as cloud resource metadata describing the state, configuration, and dependencies of cloud resources. Acquiring and unifying this "infrastructure data" into a single place is the solution for a lot of the problems that infrastructure engineers deal with today—not just cost, but also security and reliability.

But infrastructure is fragmented. Data is locked behind cloud APIs, and the tools that use those APIs to control the deployment of cloud resources. In this post, I'll explain how Resoto acquires infrastructure data, and then uses that data to write code.

Building an EC2 Asset Inventory

March 8, 2023 · 6 min read

Lars Kamp

Some Engineer

EC2 instances often account for the largest portion of your AWS bill. Yet, it's notoriously difficult to get a simple list of all EC2 instances across all regions and accounts, as threads on StackOverflow and Reddit show.

You also then want to use that list to ask questions about your inventory, such as:

How many total instances are there?
Which instances are running?
Which instances are missing tags?
Which resources have an expiration date?

In this post, I'll describe how to use Resoto to build an EC2 cloud asset inventory. The baseline inventory is a list with all EC2 instances, which you then can use to create more narrow and detailed views.

What We Can Learn from History

December 23, 2022 · 7 min read

Matthias Veit

Some Engineer

"A generation which ignores history has no past—and no future."
— Robert A. Heinlein

While Heinlein's words refer to human history, they also apply to cloud infrastructure. Most of the time, we care about the current state of resources; but sometimes, we want to know the origin of a resource, when a resource was deleted, or when/how a resource was updated.

Such knowledge is necessary in situations where you need to understand the timeline to investigate a specific system behaviour:

To perform the post-mortem analysis of an outage, we need to know which cloud resources changed and how they changed to yield the behaviour that we saw in our application. Without the ability to review a change log this becomes impossible.
To understand cost spikes in your cloud billing dashboard, you need to understand what resources were created, when they were created, and by whom they were created. Not only do you need a list of changes, but also the ability to filter, group, sort, and aggregate the data to see the big picture.
To check for security issues or compliance violations, you may need to reduce the scope to verify only those resources that were created or updated since the previous scan. Even complex checks can be performed on large infrastructures if they are only run against changed resources.

History is a log of events defining your infrastructure. This event log is important, as it will enable you to answer future questions about the state of your infrastructure retrospectively, including tomorrow's questions that have not yet crossed your mind.

Discover Kubernetes Resources

August 31, 2022 · 12 min read

Matthias Veit

Some Engineer

Kubernetes has dramatically improved the way we manage our workloads. It has become the de-facto standard for deploying and managing containerized applications, and is available in all major cloud providers.

A typical setup consists of distinct Kubernetes clusters for each application stage (e.g., dev, test, prod) or a cluster per tenant, and Kubernetes clusters shared between different users and teams often utilize namespaces and roles to control access. Deploying a single application to a Kubernetes cluster usually consists of tens to hundreds of resources (e.g., deployments, services, ConfigMaps, secrets, ingresses, etc.).

Even a relatively simple setup quickly becomes tedious to manage as the resource count grows. It is difficult for a human to keep track of resources, especially with user access limited to certain clusters in select namespaces.

A Walk in the Graph

May 17, 2022 · 7 min read

Matthias Veit

Some Engineer

Resoto uses a directed graph to represent your infrastructure resources as nodes and relationships between them as edges. A load balancer for example is represented as node with edges pointing to all target compute instances. The compute instance might have a volume attached, where we would see an edge between the instance node and the volume node.

Nodes represent resources, while edges define the relationship between nodes. It is often the case that a resource has multiple relationships to other resources.

Contact Us