Why you’re overspending on Observability vendors, and what to do about it

Observability (the collection and analysis of logs, metrics, and traces) is a hard requirement to understand and maintain cloud native applications. Observability vendors like Splunk, Datadog and New Relic offer a wide range of observability products that make monitoring system health possible and (fairly) painless.

But you don’t have to look far to find stories of exorbitant and unexpected observability vendor bills.

Today, drawing on over a decade of experience at New Relic, I’m going to break down how traditional observability vendors can end up being so expensive, and what you can do about it.

The Devil’s in the Details - Vendor Pricing Models

In general, observability vendor pricing is at best opaque, and at worst misleading. The complexity surrounding different products and pricing models is one of the primary drivers of unexpected and excessive observability spending. To make matters worse, pricing models may seem similar across companies on the surface, but can be starkly different under the hood.

Here are the most common pricing models at a high level, as well as some of the gotchas you can look out for next time you speak with your sales rep.

User-based Pricing

Products that charge on a per-seat or per-active-user basis are the easiest to understand and forecast. There isn’t a lot you can do to reduce costs here, besides freeing up seats.

Host-based Pricing

Products and services that fall under this category are charged per host monitored. Historically, this model mapped well to monitoring co-located hardware. You wanted to monitor your rack, so you paid by the CPU or host. This pricing model doesn’t make as much sense in a cloud-native world, where resources are dynamically allocated.

Because of overage costs, you’re incentivized to over-provision resources (ie. spend a lot more). Ironically, this method can also discourage best practices for high availability and redundancy, as the cost of monitoring additional instances for these reasons can significantly increase your bill.

Volume/Consumption-based Pricing

This is where things start to get murky. While it may seem straightforward on the surface, volume-based pricing is often the most opaque and unpredictable pricing model. These models need to encapsulate the cost of ingesting the data into the vendor’s platform, indexing that data, and compute used to query that data.

How your data is treated and metered makes a huge difference in what you pay. Unfortunately, how your data is handled and measured is often obfuscated by providers, or simply never shared.

For example, some providers will transform log strings into JSON objects - before metering occurs. That can easily multiply the number of bytes ingested by 100x. Another consideration is whether or not the provider wraps your data with additional metadata before metering. This is data you don’t have any control over, but will pay for nonetheless.

An Ode to Overages

When you commit to a set level of resource consumption, you lock in more affordable rates. This makes sense - it works the same with your phone plan. But unlike your phone plan, observability data is incredibly hard to forecast. Things out of your control - a spike in traffic, a DDoS attack, or an errant log producer can push your usage above the committed threshold - and that’s where costs really start to spiral out of control.

Feature-based pricing

Don’t forget the add-ons! Enterprises are upsold SSO, “enterprise” integrations, longer retention periods, better support, audit logs, and other bells and whistles - each with their own pricing model.

The Real Kicker - “But What if…?”

All of this complexity makes reasoning about your actual needs way more difficult than it needs to be. Vendors will often couple this complexity with an appeal to legitimate fears you might have about outages. The result? Over-provisioning and wasteful consumption. Do you really need to keep that trace data for 30 days?

There is clearly a lot of nuance involved in the pricing of observability products. Some of it is to be expected - logs, metrics and traces are all different in nature, and you’d expect their treatment to be different.

But when you look at a company like Datadog, you’ll find they have dozens of products, sub-products, and add-ons, each with their own fineprint and pricing model. The excessive complexity means that turning their rules into a concrete framework to optimize spending is nearly impossible. That’s how orgs end up in situations where they are much more likely to under-/over-commit to usage, and where teams are more likely to inadvertently trigger massive fees based on minor changes.

It’s a bit of a shakedown - you need observability, and the observability vendors know it. But not all hope is lost. There is a very good chance that you can significantly lower your bill through negotiations.

Negotiating your way to a lower observability bill

As with most SaaS services, observability vendors charge a handsome premium and enjoy hefty margins. That leaves a lot of wiggle room. The key to successfully negotiating a lower price with your observability vendors lies in building leverage. Understanding how to effectively communicate and leverage your position can lead to substantial savings and more favorable terms.

Step 1: Just Ask

If you’re talking to a sales rep, you can get a discount. Ask what they can do for you - you might be surprised at how much the needle will move.

Step 2: Abstract Your Observability Tooling

The foundation of any negotiation is leverage. One of the most effective strategies to build leverage is to decouple your telemetry data generation and collection from your observability vendor. Open-source tooling like OpenTelemetry (OTel) allows you to collect and send data to multiple observability platforms simultaneously, giving you the flexibility to switch vendors or use multiple services as needed. This not only provides leverage in negotiations, but also ensures that you're not locked into a single vendor, reducing the risk of vendor lock-in and increasing your bargaining power. Shameless plug, but observability pipeline tools like Datable.io also make it easy to evaluate and churn between different vendors by applying routing and filtering logic to your data in real-time.

While moving from a vendor’s tools to OTel can be a significant engineering lift, the threat of making that move (and sending less data) will be enough to unlock some discounts.

Step 3: Play Vendors Off Each Other

Churn is a four letter word. In a very similar vein to abstracting your observability tooling, just invoking a competitor’s name will instill fear in your sales rep.

The observability market is competitive, and playing vendors against each other is a highly effective negotiation tactic. Let each vendor know that they're in competition for your business. This can lead to more competitive pricing and better service offerings as each vendor tries to win or retain your business.

Step 4: Time Your Negotiation

Timing can significantly impact the outcome of your negotiations. Engaging with your sales rep towards the end of a quarter or fiscal year can be shockingly successful, as sales teams are often under pressure to meet quotas and more likely to offer concessions to close deals. This is a well-known strategy in sales negotiations and can be particularly effective in the SaaS industry.

Step 5: Just Ask Again

If you followed the above steps, congratulations, you’re holding all the cards! At this point, you can get creative. I’ve seen customers drop their data volume by 10x as part of their negotiation strategy. All is fair in love and observability.

Following these steps isn’t a guarantee, but it’s more than worth a try. After dropping INFO level logs, negotiating your bill is the fastest and most effective way to cut your cloud costs.

Wrapping up

At the end of the day, everyone needs Observability, and vendors know it. Observability vendors do provide a lot of value, but the cost of using these tools is unnecessarily opaque. With a little confidence and the right leverage, you can make a big impact on your cloud spend in a fairly short amount of time.

I hope you found today’s article useful. In our next blog, we’ll look at some of the technical and architectural decisions you can implement to reduce your observability spend and increase your operational efficiency.

At Datable.io, we're in the business of saving users money and improving the usefulness of telemetry data. If you're interested in taking control of your observability data, sign up today.

Stay tuned!

‍

Follow Us