Consider this our definitive guide to where OverOps’ continuous reliability solution fits into your CI/CD pipeline.
Confused about all the different vendors in our ecosystem? You’re not alone.
The Cloud Native landscape contains over 1,400 projects and companies, and you literally have to zoom in to be able to see all the logos. So it’s no surprise that engineering teams are overwhelmed by vendor proliferation:
While we’re thrilled to see the ecosystem thriving, we know that with all the overlapping messages it’s sometimes hard to understand what’s what. After all, every vendor under the sun will promise you speed, productivity and reduced costs. Uncovering what they mean by that and what they actually do can often feel like having to solve a riddle.
In this post, we’ll help you decide if OverOps might be a good fit and provide a no-nonsense, simple, and factual description of what we do and where we fit.
First things first, does your organization rely on mission-critical Java or .NET based applications? If not, we’re not going to be a fit. But if it does, there’s a high likelihood that you’ll find the following content useful.
Everyone’s Pipeline is Different, but the Basics are the Same
Before we dive into where OverOps fits, let’s make sure we’re all on the same page. Although software delivery pipelines can get pretty complex, at the very basic level, they generally look like this. Code is written locally in the development environment, and then we’re off to the races through testing, staging and production:
Different teams opt for different levels of automation depending on their needs, and the infrastructure layer slowly but surely blends in with the application layer, but the underlying principles remain the same. The most common solutions we see our customers using here are CI pipelines like CloudBees Jenkins and GitLab, with increasing interest in dedicated CD solutions like Spinnaker and Armory. And in terms of architecture, we’re seeing a growing trend of transitioning to containers and microservices based architectures, on top of both public and private clouds.
How Do You Ensure Code Quality and Reliability Across the Pipeline?
For the next step, let’s focus solely on the application layer with the tools and processes that are used to ensure code quality and reliability. Most of the engineering teams we work with already have at least one solution in each of the following categories:
The bigger the application, the more solutions it’s likely to rely on, and it’s not uncommon for engineering teams to rely on 10+ vendors to inform the go/no-go decisions they need to make between stages and in production. In fact, we didn’t even mention the dedicated alerting and incident management tools that play a big role in communication for engineering teams. Unfortunately, even with all that, code still breaks, negatively impacting customer experience and developer productivity. What used to be enough to meet customer expectations is no longer acceptable in today’s world where applications are an integral part of how we live our lives. The point we want to make here is that while existing solutions are essential, they’re not sufficient. Tests require foresight and can’t detect 100% of errors, logs & metrics are noisy, and APMs lack developer context, catering mostly to Ops and SRE personas.
Continuous Reliability – the Missing Piece in CI/CD
In one sentence, OverOps is a continuous reliability solution that analyzes code at runtime across the entire pipeline, from development to production, to deliver application error analytics that help engineering teams identify and resolve critical application errors. Continuous Reliability (CR) has emerged as the missing piece in the CI/CD pipeline. It groups together a set of practices that empower development teams to deliver reliable software, improve customer experience and drive innovation with rapid releases by ensuring that code is always in a deployable state, even in the face of thousands of development teams making changes on a daily basis. It’s the difference between hoping that your code will work when released to production and knowing that it will.
For pre-production and CI pipelines, we help engineering teams shift left. OverOps acts as a quality gate that’s integrated into existing static analysis, testing and CI pipeline solutions, reporting on unknown errors and informing go/no-go decisions for new releases. In production and CD pipelines, we help you shift right. Helping limit the blast radius of new errors by proactively detecting issues and capturing the relevant variable context. Integrating with your existing log management and APM solutions, as part of the CD pipeline.
In both cases, OverOps enables an automated feedback loop that’s integrated with logs, APM, ticketing and alerting to deliver rich error snapshots with source code, variable state, relevant logs, and operational context for every critical error.
Now, we know that the last thing you need is another dashboard. And this is why we provide our customers with deep, code-level data directly within the tools they already use. Quickly growing the number of available use cases as our vision for Continuous Reliability is on its way to becoming a reality:
Let’s Dive into How This Might Look in Practice
Imagine the following scenario: your team has just promoted a new version into production, all tests have passed, and no outstanding issues were identified in staging. The logs look okay, your APM is not reporting any issues, and then… it all starts. It could be a spike in online shoppers that are unable to complete their transaction, patients that are running into issues on video calls with their doctor, or a bank application that suddenly hangs when you try to view your balance. If something can go wrong, it most certainly will.
And while there could be almost infinite underlying reasons for these types of errors, one area that often causes problems is exceptions and application errors. Specifically those that weren’t logged, or as we like to call them “swallowed exceptions.” But even for those that are logged, an average application would have so much noise that it would be impossible to understand what to prioritize and what to ignore. Even when dealing with known errors, getting down to their code-level root cause is anywhere between hard and impossible. Not a stack trace, but the actual code that ran, with the variable state that caused it.
For these types of errors, the status quo seems to be just to deploy, cross your fingers, and hope no one complains!
With OverOps, we’re able to detect those issues, even if they weren’t logged, classify them into new, critical and swallowed errors, and automatically capture rich error snapshots with deep, code-level data about the state that caused them – either as a quality gate in earlier stages of the pipeline, or in production. In addition, for known issues that you already identified through your logs and APM, we’ll capture additional snapshots and allow you to quickly resolve them without having to add new logs.
Start Analyzing your Mission-Critical Applications Today With OverOps, you get deep, code-level visibility into your Java and .NET applications in testing and production. Sign up for a free trial, or request a live demo to learn more. See how our customers like Aflac, Comcast, TripAdvisor and Intuit are using OverOps today: overops.com/customers/