Skip to main content

Lars Kamp
Alex Chantavy

Alex Chantavy is a Senior Software Engineer at Lyft and one of the maintainers of Cartography.

Cartography is a Python-based tool that collects infrastructure assets and their relationships into a graph view.

Cartography is open-source and was developed in-house at Lyft to solve offensive security scenarios. Today, Cartography is also used at Lyft to solve other InfoSec use cases, like container vulnerability management.

Cartography is built on top of the Neo4j graph data platform. The power of the graph is that it facilitates the exploration of many-to-many relationships.

In this episode, Alex and I discuss the origins of Cartography, how the engineering team at Lyft uses Cartography data for remediation of security issues, and how the graph powers an automated issue management system.

Lars Kamp
Hassan Khajeh-Hosseini

When developers deploy resources, there is little to no insight for them to understand how much a resource is going to cost.

Infracost is changing this by shifting the cost component of cloud resources "left"—i.e. into the hands of developers in a new approach to FinOps.

The existing paradigm of cloud financial management and the traditional FinOps way of managing cloud spend mean waiting for the cloud bill to arrive, then trying to identify opportunities for cost savings.

First-generation FinOps companies like Flexera, Cloudability, and CloudHealth emerged around 2011. They provided an improved user interface for complex billing data, and followed the monthly billing cycle of cloud providers.

However, a month is not sufficient with today's automated and dynamic cloud environments driven by infrastructure–as-code. A new generation of tools have shortened cycles, and delays between cloud bills and their analysis has come down to a day or less.

In broad terms, efforts to lower the cloud bill are based on a simple formula:

cost=usage×pricecost = usage \times price

Existing approaches mostly focus on the "price" component of the equation. Procurement mechanisms to lower the price point of a cloud resource include reserved instances, enterprise discount programs, savings plans, etc. Finance "slices and dices" the cloud bill after resources have been deployed to optimize price points and the overall size of the cloud bill.

However, the procurement-driven approach doesn't account for the "usage" component of the equation, which is a function of developer activity. Finance lacks the context that developers have when deploying resources, while developers lack visibility into resource prices and the cost of their deployments.

Infracost is closing this gap by providing cloud cost estimates for Terraform in pull requests to show engineering teams how code changes affect their cloud bills. Infracost adds comments to pull requests (e.g., "this change will increase your cloud bill by 25%") which are visible to engineering management, FinOps, and product teams.

Hassan Khajeh-Hosseini is Co-Founder and CEO at Infracost, which he co-founded with his brother Ali Khajeh-Hosseini and their friend Alistair Scott. The founding team has a decade of cloud cost history together, with two previous cloud cost start-ups founded and exited.

In this episode, Hassan walks us through the science and engineering behind building Infracost. We also discuss broader infrastructure trends, including "cloud financial engineering" and the general "shift left" of testing, security, and (of course) cost in the development process.

Lars Kamp

ITAM is an established category in the IT market, with its own Gartner Magic Quadrant.

Gartner defines ITAM as "[providing] an accurate account of technology asset lifecycle costs and risks to maximize the business value of technology strategy, architecture, funding, contractual and sourcing decisions." ITAM is usually divided into two subcategories, SAM and HAM.

With cloud computing and SaaS tools, the requirements for ITAM have changed.

In the old world of IT, there was tight control over who could purchase servers and software licenses. IT was a (literal) gatekeeper that determined who could push a new server into a rack and provision that server with software.

That control is gone in today's world, where developers and employees have the flexibility to swipe a credit card or push a button in a console to "procure" cloud resources and software.

There are, of course, benefits of giving employees flexibility—namely, "development velocity", the speed to build and launch new products.

A challenge remains to optimize the value of these infrastructure expenditures, however, which means balancing "development velocity" and "business velocity." Without balance, the result is tool and infrastructure sprawl, as well as out-of-control spending. Decentralized procurement may sound great on paper, but usually leads to the "worst best deal."

Balancing business with development velocity is Amit Mizrahi's job as Head of Strategic Operations at Wix.

Wix's flagship product is their free website builder, around which they've also built a portfolio of e-commerce products. The Wix company mantra is "to measure everything," and Amit's work includes measuring the ROI on Wix's IT assets—a tall order when Wix's employees number nearly 6,000.

In this episode, Amit walks us through how he built an ITAM program at Wix from scratch. The ITAM program is part of the "Value & Impact Center of Excellence at Wix," which has two pillars:

  1. ITAM: Managing procurement and operations for everything related to SaaS products and tools within Wix.

  2. FinOps: An organizational function that is in charge of monitoring cloud activities, governing cloud spending, and educating teams on financial-driven KPIs. (See Episode 5: Shifting From FinOps to Financial Engineering.)

To understand the business value of tools, Amit and his team built an internal data integration and analytics layer that extracts usage data from all tooling—an abstraction across Wix's IT assets. This abstraction layer is coupled with procurement processes that create alignment between development and business velocity for Wix.

Lars Kamp
Yevgeny Pats

ELT describes the process of extracting raw data from a source, loading it into a destination, and then transforming the data for analytics purposes. ELT has become mainstream with the rise of cloud warehouses and data lakes, in a shift away from ETL.

ETL was the dominant paradigm when storage and compute were expensive and pre-aggregating data (i.e., transform) made economic sense. But ETL comes with a trade-off—aggregating data before analysis also means losing fidelity, granularity, and the flexibility to iterate and re-run an analysis in a different way.

The cloud has driven down the cost of compute and storage so that it no longer makes sense to pre-aggregate data in an external processing layer, resulting in the shift to ELT. Today, we can store raw data in data lakes at high fidelity and with the flexibility to write queries tailored to any use case.

The main use case for ELT until now has been sales and marketing data, where data sources include systems like Salesforce, Marketo, and Google Analytics.

A new type of data source is cloud infrastructure data, which encompasses information about cloud resources like compute instances, storage buckets, or databases. Cloud infrastructure data describes the configuration of and relationships between cloud resources.

Examples of cloud infrastructure data include not only general properties like start date, name, and tags; but also resource-specific properties like price or type. This data is available via the cloud APIs that infrastructure-as-code tools like Terraform and Pulumi use to deploy resources.

CloudQuery is a high-performance open-source ELT framework built for developers. CloudQuery extracts data from cloud APIs and loads it into databases, data lakes, or streaming platforms for further analysis.

With raw infrastructure data, CloudQuery users are building solutions for security, cost, and governance use cases by writing SQL queries. Querying raw infrastructure SQL provides more flexibility and coverage than an opinionated DevOps tool could provide.

In this episode, I chat with Yevgeny Pats, CEO and co-founder at CloudQuery. We cover the "why now?" for infrastructure data, and the change in mindset observed among infrastructure engineers and their shift to using data lakes.

Watch this episode to also see a demo of CloudQuery, and learn how the tool evolved from a niche data sync solution to a high-performing ELT framework.

Lars Kamp
Kevin Hu

In this episode, we interview Kevin Hu, co-founder and CEO at Metaplane. Metaplane offers data observability for the modern data stack. Kevin calls Metaplane the "Datadog for data," in reference to observability for microservices and cloud-native stacks.

As data volume and tool usage grow, so does the potential for something to break—resulting in errors and data downtime. In the modern data stack, the chain of SQL-based transformations between the original data source and the computed result is long and complex. For this reason, it's often nearly impossible to pinpoint the source of data errors.

Metaplane's focus is data criticality, and Metaplane has built instrumentation to understand exactly where errors occur. When data is mission-critical to the business, data teams become "solution-aware."

We take a walk down memory lane in this episode. We discuss the early days of the cloud warehouse market and the paradigm shift to separate storage and compute that, overnight, turned Snowflake into a market leader.

As a result of this shift, the market for analytics expanded and spawned a new generation of data tooling across categories like data integration and ETL, customer data platforms, data catalogs, reverse ETL, and data observability by companies like RudderStack, Airbyte, Census, Hightouch, and, of course, Metaplane.

Contact Us

Have feedback or need help? Don’t be shy—we’d love to hear from you!

 

 

 

Some Engineering Inc.