New OverOps Reliability Dashboards Deepen Code-Level Visibility Across the Entire SDLC

 ● 05th Feb 2019

5 min read

New deployment scores and release certification help QA, DevOps and SRE teams detect anomalies across versions in pre-production and production to proactively prevent Sev1 issues

Today, we’re excited to be debuting our new Reliability Scoring capabilities that deepen enterprise visibility across both pre-production and production environments, and enable you to automatically identify and prioritize anomalies prior to a release in order to avoid promotion of bad code.

As the pace of software delivery quickens, the risk of poor application quality increases. We help development, QA and operations teams balance this speed-stability paradox by leveraging our unique real estate inside an application to detect anomalies that would otherwise go unnoticed by existing monitoring tools. Utilizing micro-agents that operate between application code and the hardware, we can capture data that was never previously available, then deduplicate, classify and gate critical issues from moving into staging and production.

Our new Reliability Dashboards help organizations visualize this data in out-of-the-box Grafana dashboards or via any tool of your choice via open REST APIs, so they can see at a glance where the issues are and drill in deeper into the cause with one click. Executives can see instantly if the quality of code is improving across teams, applications or services to identify areas of weakness that may require additional attention or resources.

Highlights of the new OverOps Reliability Dashboards include:

Reliability Scorecards and Release Certification

The OverOps Reliability Scorecard allows DevOps teams to observe the reliability of their environment at the highest level and triage critical issues that need attention. Each deployment, application and infrastructure tier is assigned its own dynamic score derived from the detection, classification and prioritization of all anomalies – including newly introduced errors, increasing errors and performance slowdowns.

Using these scores, organizations can certify releases to be moved through their delivery pipeline, or stop them in their tracks to proactively fix any issues. Through the new Jenkins integration, QA teams can see all new anomalies introduced by any release in test or stage, and automatically assign it a severity based on its potential impact to the code. OverOps will certify each release based on how many issues it introduced, and can automatically stop a bad release from being promoted, if it’s risky, sending it back to the engineers with True Root Cause.

True Root Cause Drill-Down

From the Reliability Scorecard dashboard, users can drill into the details of low scoring deployments, applications or infrastructure tiers, like AWS or RDBMs. The Reliability Analysis dashboard shows the corresponding anomalies and allows users to click straight into the True Root Cause screen using OverOps’ ARC AI technology, where they can view the code and variable state at the moment of an error across the entire call stack, as well environment state and DEBUG-level statements. With this complete context, QA, DevOps and SRE teams can easily route issues back to the right developer, arming them with all the context needed to fix the error – programmatic or operational.

Reliability Trends Over Time

The OverOps Reliability Trends dashboard provides a simple and effective way of comparing releases, or two instances of an application running on different nodes, to identify patterns. Building on this capability, the dashboard provides executives with an easy way to see how well their applications and deployments are doing over time with respect to error volume, unique error count, newly introduced or increasing errors, and slowdowns. At a glance, VPs and CXOs can see which applications are falling behind, as well as which teams require more attention. By understanding application quality release over release, executives can make informed decisions about resources and protect application revenue and customer experience.

General Availability

OverOps Reliability Dashboards are now available out-of-the-box! For more information, visit

Additional Resources

Read our whitepaper about the four quality gates every SRE team must check before promoting code.
Read our whitepaper about expecting the unexpected to block Sev1 issues from reaching production.
Sign up to attend a webinar detailing how OverOps helps at every point in the software delivery life cycle.
Watch a live demo to see the OverOps Platform in action.

About Us

OverOps captures code-level insight about application quality in real time to help DevOps teams deliver reliable software. Operating in any environment, OverOps employs both static and dynamic code analysis to collect unique data about every error and exception – both caught and uncaught – as well as performance slowdowns. This deep visibility into an application’s functional quality not only helps developers more effectively identify the true root cause of an issue, but also empowers ITOps to detect anomalies and improve overall reliability. As more organizations aim to innovate faster and deliver a seamless digital experience for their customers, OverOps helps avoid costly downtime that can lead to lost revenue and brand degradation. Backed by Lightspeed Venture Partners and Menlo Ventures, OverOps enterprise customers include Comcast, TripAdvisor and Intuit. The company has offices in San Francisco and Tel Aviv.

Tali is a content manager at OverOps covering topics related to software monitoring challenges. She has a degree in theoretical mathematics, and in her free time, she enjoys drawing, practicing yoga and spending time with animals.

Troubleshooting Apache Spark Applications with OverOps OverOps’ ability to detect precisely why something broke and to see variable state is invaluable in a distributed compute environment.
Troubleshooting Apache Spark Applications with OverOps

Next Article

The Fastest Way to Why.

Eliminate the detective work of searching logs for the Cause of critical issues. Resolve issues in minutes.
Learn More