How We Built OverOps for Enterprise-Scale Environments

 ● 17th Dec 2019

7 min read

OverOps analyzes code at runtime to tell you when, where and why code breaks in the world’s most demanding production and pre-production environments.

OverOps is a continuous reliability solution designed to support reliability at every stage of the SDLC, by enabling organizations to identify, prevent and resolve the most critical issues before customers are impacted. Whether you’re employing a shift-left strategy, or closely monitoring complex production systems, OverOps is optimized for performance and security in any environment. 

How Do We Keep Overhead Low While Running In-Depth Analysis?

OverOps analyzes code at runtime and can perform under extreme performance restrictions in both your production and pre-production environments. Our solution runs on your server as a micro-agent (per VM) and a daemon collector process. These elements combined detect all events (caught, uncaught and swallowed exceptions, HTTP errors, and log errors and warnings) inside your app, and collects the code and variable data needed to reproduce and fix errors.

OverOps supports all JVM and CLR-based languages, and because of our micro-agent, we don’t require you to change your code or build configurations in order to use it. We designed it to leverage cloud computing power for all the heavy lifting such as complicated analysis and processing tasks, so that your server will only be marginally affected. The detection of exceptions, HTTP and log errors and warnings is done directly from the JVM, without any reliance on the log files.

Our unique technology enables OverOps to maintain extremely low CPU and IO overhead for minimal effect on your production environment.

How it Works

There are 6 methods we use to reduce overhead:

1. Minimal CPU Overhead 

OverOps only reacts to errors, and does not affect normal code execution. Even if one of your transactions is experiencing a high degree of failures (whether expected or not), OverOps’ micro-agent will auto-adjust to ensure there’s no impact to throughput.

2. RAM Overhead

OverOps’ installed components only use a pre-allocated block of memory during their operation, making sure RAM consumption will not increase uncontrollably and remains virtually unnoticeable. OverOps’ RAM consumption doesn’t affect the JVM that’s being monitored.

3. Capped Network Overhead (< 50MB per hour)

Error information captured by the agent is placed into shared memory and sent to storage by the daemon process. The information needed to display the in-depth error analysis is highly compact and will never use more than 50MB of network resources per hour.

4. No Dependency on OverOps

OverOps’ Java micro-agent is not dependent on the availability of OverOps’ central analysis service or the local collector process. That way, in the unlikely event they become unavailable, the micro-agent will enter a dormant state that will not affect the execution of code inside the JVM.

OverOps’ micro-agent collects statistics on exceptions and error logging directly from the JVM, regardless of if and how events are eventually logged. OverOps does not access or upload log files from your machine.

5. Garbage Collection

Unlike other tools, OverOps runs at the native JVM level and does not allocate Java objects at run-time. The information is placed directly in shared memory outside the managed heap, to ensure that no overhead is added to your application’s garbage collection (GC) time.

6. Disk Space

OverOps’ JVM micro-agent is designed to use only a few MB of your disk space, it’s combined with a daily maintenance job to clear unused files and a predefined disk space limitation enforced by the OverOps daemon. Apart from the micro-agent, disk usage is around 400MB and will never affect the JVM that’s being monitored.

Meet Our 5 Security Layers

In order to keep your information safe, we implement multiple levels of encryption and administration while making sure you have complete visibility and control over how data is collected and managed.

We use 5 main layers to ensure the security, protection and privacy of collected code and data.

1. Data Redaction

OverOps enables you to filter out personal and business-sensitive information before it leaves your machine. We offer two filtering options for redacting variable data at runtime: pattern based and identifier based.

Pattern filtering asynchronously scans all collected variable values against sets of predefined regex patterns, identifying values such as phone numbers, credit cards or any personal information about your users. You can pick out of a default list, or add any other sensitive regex patterns that might be critical for you or your users.

Identifier filtering redacts all data collected from predefined variables, fields and classes in your application. Just like Pattern filtering, you’ll be able to create your own customized list and specify the exact class names that should never be collected.

Both redaction modes are enabled by default.

2. Private Encryption

We encrypt information using 256­bit AES encryption and a set of private keys known only to you, that can only be viewed by you locally. In order to offload work from your local JVM and efficiently analyze errors, we convert bytecode at runtime into an encrypted abstract graph structure.

Our conversion process removes all jar, package, class, field, method and variable names from your code, along with any other Java or third-party frameworks. It also removes all logical and numeric operators, number and string constants, and code attributes.

OverOps’ abstract graph structure cannot be reverse engineered or executed, so you can be certain your information will not reach unwanted hands. This structure is the fastest method of collecting code fragments and variable values in order to maintain low CPU and IO overhead.

3. Data Storage

OverOps offers three modes for storing code and variable data collected on your machines:

  • On-premise – Code and variables are redacted for PII (Personally Identifiable Information) and privately encrypted. This information is stored and analyzed locally, so it stays behind your firewall and doesn’t leave your network
  • Hybrid – Code and variables collected are redacted for PII, encrypted locally and stored on your machine. Only the metadata is sent out to OverOps’ cloud for analysis
  • Hosted – Code and variables are collected on your machine, redacted for PII and encrypted locally, using your private encryption key. After this process, your encrypted information is stored in the cloud (AWS)

4. User and Access Point Control

The administrator of each cluster, which is defined by the installation key, has full control over the information and the team members who are exposed to it.

You or your admin can control which team members have access to the error analyses collected from monitored machines, and you’re able to set the IP address range from which authenticated users can access the service to a specific network or VPN.

5. Secure Transport

Communication between OverOps’ daemon process and our central analysis service, which is hosted on AWS, are made over signed HTTPS outbound on port 443 to a set of fixed IP addresses. You are not required to open an inbound port for communications.

Fast, Reliable and Secure

OverOps was built with performance and security in mind. Every new version and feature goes through a rigorous testing process to ensure there’s no negative effect on performance and that all data is safe and secure. This ensures our customers can identify, prevent and resolve critical issues in any environment, at any scale.

For more information about OverOps’ architecture and security, visit our website or schedule a demo with one of our solutions experts.

Nicole is a communications and product marketing manager at OverOps. Her expertise includes technologies ranging from artificial intelligence and predictive analysis to DevOps, incident management and more.

Troubleshooting Apache Spark Applications with OverOps OverOps’ ability to detect precisely why something broke and to see variable state is invaluable in a distributed compute environment.
Troubleshooting Apache Spark Applications with OverOps

Next Article

The Fastest Way to Why.

Eliminate the detective work of searching logs for the Cause of critical issues. Resolve issues in minutes.
Learn More