Skip to main content

Aggregating Search Data

· 5 min read
Matthias Veit
note

This blog post is the second in a series about Resoto's powerful search functionality. Resoto Search 101 provides an introduction to Resoto's search capabilities.

Resoto's search allows for resources to be selected using filters, combinators, and traversals. Search results can be combined, grouped, and aggregated.

Left: Sheep Finding a Diamond in a Gold Mine

The simplest example of search aggregation in Resoto is the count command, which enables you to count objects or the occurrences of a specific property. Let's say we are interested in the number of compute instances we maintain:

> search is(instance) | count
‚Äčtotal matched: 540
‚Äčtotal unmatched: 0

Compute instances are of kind instance regardless of cloud provider, so is(instance) selects both aws_ec2_instances and gcp_instances. The count command then takes the results and returns the number of occurrences.

The count command also allows specifying a grouping value. The following search would return counts by instance_status:

> search is(instance) | count instance_status
‚Äčstopped: 48
‚Äčterminated: 151
‚Äčrunning: 341
‚Äčtotal matched: 540
‚Äčtotal unmatched: 0

While count is often sufficient, the aggregate command is required for more advanced use cases. For example, we could get CPU core and memory data using aggregate:

> search is(instance) | aggregate
sum(instance_cores) as sum_of_cores,
max(instance_cores) as max_cores,
sum(instance_memory) as sum_of_memory,
max(instance_memory) as max_mem
‚Äčsum_of_cores: 3441
‚Äčmax_cores: 16
‚Äčsum_of_memory: 12802.25
‚Äčmax_mem: 64

In this example, we have 3441 cores in total and each instance has a maximum of 16 cores. The same data is also available for provisioned memory: we have almost 13 TB of RAM with no instance having more than 64 GB.

We can further analyze this aggregated data using grouping variables, which we have already seen in an above example of the count command. Let's try aggregating the available memory by instance status:

> search is(instance) | aggregate
instance_status as status:
sum(instance_memory) as memory
‚Äčgroup:
‚Äč status: running
‚Äčmemory: 8538
‚Äč---
‚Äčgroup:
‚Äč status: stopped
‚Äčmemory: 1345
‚Äč---
‚Äčgroup:
‚Äč status: terminated
‚Äčmemory: 2919.25

This search returns multiple results, each of which has a group property. The grouping variable value for each result is a property of its group object. In this case, running compute instances have 8 TB of available memory altogether, while the remaining stopped or terminated instances have a total of 4 TB of allocated memory.

Ancestors and Descendants‚Äč

The aggregation capabilities we have seen so far include grouping and functions. Resoto captures the state of your infrastructure as nodes, and their relationships as edges.

Wouldn't it be great if we could aggregate over not only the data of a single node, but the data of ancestor or descendant nodes in the graph? Resoto's search engine can perform nested search statements for this exact purpose.

Instance Relationships

The above diagram illustrates the relationship between compute instances. AWS resources are attached to a region, while Google Cloud resources are associated with a zone. Each instance also has a instance_type predecessor node. To access properties of ancestor nodes of a given kind, we can use the following notation:

> search is(instance) | aggregate
sum(/ancestors.instance_type.reported.ondemand_cost) as cost
‚Äčcost: 155.73

This search selects all instances, then aggregates the on-demand cost of each element by traversing up to the instance type and selecting the reported.ondemand_cost property.

The path /ancestors.instance_type.reported.ondemand_cost can be translated as a traversal over the node's ancestors until an ancestor of kind instance_type is found. The last part of this path is relative to the node that is found (reported.ondemand_cost in this example). The result is the on-demand cost of all instances.

It is possible to walk the graph inbound with ancestors, and outbound using descendants. You can apply this syntax anywhere a property path is defined in a search. Let's use this technique to find running instances aggregated by account and region:

> search is(instance) and instance_status==running | aggregate
/ancestors.account.reported.name as account,
/ancestors.region.reported.name as region:
sum(instance_memory) as memory,
sum(instance_cores) as cores,
sum(/ancestors.instance_type.reported.ondemand_cost) as cost
‚Äčgroup:
‚Äč account: sales
‚Äč region: us-west-2
‚Äčmemory: 1936
‚Äčcores: 484
‚Äčcost: 23.232
‚Äč---
‚Äčgroup:
‚Äč account: sales
‚Äč region: us-west1
‚Äčmemory: 30
‚Äčcores: 8
‚Äčcost: 0.3799
‚Äč---
‚Äčgroup:
‚Äč account: dev
‚Äč region: us-east-1
‚Äčmemory: 576
‚Äčcores: 144
‚Äčcost: 7.2
‚Äč.
‚Äč.

As you can see, Resoto's search enables you to gather data about your infrastructure that would otherwise be extremely challenging and tedious to tabulate. This search result is also refreshed whenever the graph is updated (every hour by default), which enables the collection of data that is not feasible to manage manually.

Now, imagine feeding results of an aggregation search into a Prometheus time series database and being able to visualize the data in a Grafana dashboard. Resoto Metrics serves this exact purpose, feeding robust aggregation metrics into a time series database.

Please refer to the Resoto documentation for more details about supported aggregation capabilities. I hope the examples presented here clearly illustrate the power of Resoto's search aggregation. If you're new to Resoto, we hope you will try it out! ‚ú®

Contact Us

 

 

 

Some Engineering Inc.