OverOps Automated Timers And Performance Monitoring in Splunk

 ● 02nd Oct 2018

6 min read

We are excited to be a sponsor at the Splunk User conference, .conf18 in Orlando this week. And we are even more excited to meet with hundreds of great Splunk users. Over the years we’ve built a strong integration with Splunk, and as we head into the conference, we are excited to highlight our new app on Splunkbase. But also, we added a new capability that will further enhance our partnership and excite the community, automated timers! Finally, ahead of the conference we have deepened our relationship by being approved as a Splunk authorized Technology Alliance Partner (TAP).

Introducing OverOps Automated Timers

Traditionally, timers monitor application loads and response times to give you visibility into the cause of a production slowdown or bottleneck. Once an issue is detected, performance monitoring tools will capture and present the stack trace so that you can hopefully gain insight into what might have caused the delayed response. However, this typically only shows a sliver of the problem and doesn’t provide enough information to get to the bottom of what’s causing the latency issue – and this is typically in production only. What about performance monitoring in lower level environments?
This past week, OverOps has added an automated timer capability into our platform which identifies every entry point into your code and then monitors these locations for slowdowns, allowing you to set a target threshold (in milliseconds). If exceeded, OverOps will start capturing a root cause snapshot of the entire state of the software virtual machine at that moment in time. This extends well beyond the stack trace to include full variable state across the entire stack, environment variables, threads, the frequency and failure rate of the event, classification of new and reintroduced events and even the associated release numbers for every issue.
This information provides code-aware insights to developers so they can not only understand if a threshold was exceeded, but can now detect and troubleshoot issues more effectively using our rich data and context. Furthermore, OverOps can be used in any environment so you can capture slowdowns before they even make it into production.

Automated Timers, OverOps and Splunk

At the Splunk .conf18 conference this week, we are excited to not only debut our automated timers, but to also show attendees how we can surface these issues directly in Splunk! We’ve been busy building this integration and creating new dashboards that not only present this information, but more importantly, allow practitioners to drill down into the exact root cause of the slowdown using the rich OverOps snapshot data.

Application Performance in Splunk

Application Performance

As you can see in the screenshot above we are showing a slowdown in spunk and there are direct links to investigate what has happened using OverOps. We give visibility into these performance issues and allow you to drill into them and classify them by release or app or various other characteristics.
This capability is a massively valuable extension to how we already extend value in Splunk beyond your log files. While log files help identify the “when” or “where” of what went wrong, in order to troubleshoot an issue, they still require a lot of work to get to the “why”. At OverOps, we deliver a whole new approach to gaining insight into the quality of our apps and services. Using both static and dynamic code analysis, we capture complete, code-aware insight into every known and unknown error and exception at the moment they occur.

Building on our Existing Value with Splunk

We’ve built out an already diverse and extensive integration capability with Splunk. Our metrics dashboard is available on Splunkbase today and a large percentage of our customers use our tools together to investigate root cause. And we are integrating with ITSI, so that we can allow our rich data to extend the already rich predictive analysis features of that new product.

  • OverOps inserts links into your current Splunk log files
    With the insertion of the OverOps links into a log file, you can find where the error occurred using Splunk and then easily link to OverOps to access complete insight into what happened. A Splunk user can now see these links and connect directly to the powerful OverOps ARC UI to troubleshoot issues with this complete information. This can both help developer productivity and have a huge impact on the QA to dev conversation as your team is armed with exact information about every event.
  • OverOps data within your Splunk Metrics Dashboard
    OverOps also exposes all the data we collect through direct integration with Splunk so you can profile the OverOps data directly in a Splunk dashboard and see in a glance the applications, deployments, servers, and methods that were affected by different issues. It provides granular information about application level events and errors classified by “New Today”, “Resurfaced”, “Network”, “Database”, and more. You can analyze and monitor your application’s health, which includes the number of new errors that were detected, the number of errors that resurfaced, and other information that is beneficial for your team and product. This is available on Splunkbase today!
  • IT Service Intelligence (ITSI) integration and how great data makes AIOps a differentiator
    A key emerging capability from Splunk is the IT Service Intelligence (ITSI) product which brings artificial intelligence to events so you can get visibility across IT and business services. It enables you to use AI to go from reactive to predictive IT. OverOps understands the importance of data in this new effort and our granular set of contextual insights provides an incredibly useful baseline for deep, effective AI. Our API and extracts can be used within ITSI.

The addition of timers to our platform gives Splunk users yet another reason to love our partnership. By integrating OverOps’ rich data with Splunk, we are able to expose critical performance issues in the same tool you use to identify errors and exceptions. Splunk users now have a single console to not only identify what went wrong, but to also troubleshoot more efficiently.
If you are headed to .conf18, swing by booth M21 to see firsthand how OverOps works with Splunk. If you would like to arrange a meeting before the conference or separate from the event, please visit us at overops.com/splunk.

Jim is a 20 year veteran of tech and is a developer turned marketer. During his time at OverOps, Jim was responsible for the company narrative and drove field enablement and press/analyst relations.

Troubleshooting Apache Spark Applications with OverOps OverOps’ ability to detect precisely why something broke and to see variable state is invaluable in a distributed compute environment.
Troubleshooting Apache Spark Applications with OverOps

Next Article

The Fastest Way to Why.

Eliminate the detective work of searching logs for the Cause of critical issues. Resolve issues in minutes.
Learn More