The top Java anomaly detection tools you should know
Application failures can happen due to a wide set of reasons, and there are tools that address each one of the possible sources for errors, such as log management tools, error trackers, performance monitoring solutions and so on. We’ve actually researched this quite a bit, and found the different methods of logging in production, most common ways to solve Java application errors and how application monitoring tools can assist in detecting errors.
And here comes the BUT…
The data that these tools collect is often made up of lots of noise. How can we know what’s important and what’s not? That’s where anomaly detection tools fit in. In the following post we’ll go over some of the tools that focus on detecting and predicting when anomalies might happen. Let’s check them out.
Anomaly detection tools
X-Pack is an extension to the ELK Stack that offers anomaly detection. It uses algorithms that help users understand the behavior of their logs, detecting when they’re not acting as usual. The package relies on logs as its data source, letting the users understand how specific metrics might impact the product and how users experience it.
- Detecting anomalies within Elasticsearch log data and metrics
- Identifying security issues by monitoring network activity and user behavior
- Identifying log events that usually lead to an anomaly
How it works:
X-Pack uses Elasticsearch log data and models a baseline of its behavior. By analyzing the logs from the application, servers and services, X-Pack can detect trends and periodicity of use, and analyze the data to try to predict when an issue might occur.
The anomaly detection feature is enabled by default when installing X-Pack, and it implements existing ELK cluster privileges and built-in roles to make it easier to control which users have authority to view and manage the jobs, data feeds, and results.
Secret sauce: X-Pack anomaly detection is auto-enabled, aggregating data directly from Elasticsearch and is made for those who use ELK and want an anomaly detection solution as part of the Elastic suite of tools.
Bottom line: The “unfair” advantage X-Pack has is its integration with the Elastic suite of tools. With that said, if you’re using ELK, you probably already know that you’re not limited to using Elastic’s own tools, there’s a wide ecosystem to choose from. Also, if you’re not using ELK, this tool is not the one for you.
2. Loom Systems
Loom Systems offers an analytics platform for anomaly detection in logs and metrics. It detects anomalies in logs, and also provides anomaly detection within operational analytics.
- Automated log parsing and analysis from different applications
- Recommended resolutions – Based on the company’s solution database
- Business operation anomaly detection
How it works:
On the technical side, Loom collects log data, parses it to break down log lines to separate fields, and applies anomaly detection algorithms according to each fields data type. Alongside log events, Loom’s algorithms can handle other textual sources or streams of events, and create anomaly baselines for them.
The baselines and thresholds set by Loom are dynamic, which means that they change and adapt according to the user’s behavior and application updates. Each anomaly is accompanied by an explanation of what happened, along with recommended resolutions.
Secret Sauce: Along with detecting anomalies, Loom offers its knowledge base that shares solutions across the company, helping other developers and teams understand why an anomaly occurred and how it was handled.
Bottom line: Loom uses application logs and metrics to try to understand how applications normally behave, and offers recommended resolutions and action items.
OverOps tells you when, where and why code breaks in production. It is the only tool that gives you the complete source code and variable state across the entire call stack for every error, and lets you proactively detect when new errors are introduced into the application.
- Full visibility into code and variable state to automatically reproduce any error
- Proactive detection of all new and critical errors by code release
- Native Java agent that doesn’t rely on log files
- Working with any StatsD complaint tool for custom anomaly detection visualization
- No code and configuration changes, installs in 5 minutes through SaaS, Hybrid, and On-Premises
- A badass dashboard with a dark theme
How it works:
OverOps is a native monitoring agent that operates between the JVM and the processor, extracting information from the application itself. It doesn’t require any code changes, and it doesn’t rely on the information that was logged, but instead on the information coming directly from the application. OverOps helps companies like Fox, Comcast and TripAdvisor transform manual reactive processes of sifting through logs, and turn them into proactive automated processes.
OverOps uses REST APIs to offer advanced visualization and anomaly detection abilities to its users, and correlates the variable state of the application with internal JVM metrics (such as CPU utilization, GC and others), when application errors occur across microservices and deployments. It uses AI to identify issues that are usually hidden within the context of mass information, which in return help dev teams resolve issues quicker than before.
The data that OverOps collects is unique and extremely valuable in the context of AI. It provides native capabilities within the product to apply an algorithm for anomaly detection against our data. This allows organizations to identify a critical issue, a new issue or reintroduced issue amongst billions of events. It is critical for cutting through the noise of a log file.
Secret sauce: OverOps knows log files suck. That’s why it has zero reliance on log files, and the data comes directly from the JVM itself. Since OverOps is the only tool to give you the complete source, state and stack for each error, it offers a 360 view of anomalies and issues within your application.
Bottom line: Detecting anomalies is important, but it’s not going to help if you don’t have the real root cause and variables that lead to it.
Coralogix clusters and identifies similarities in log data. The tool focuses on common flows, detecting the log messages that are connected to them, and alerting when an action didn’t cause the expected outcome.
- Loggregation – Bundle and summarize logs that have the same pattern
- Flow anomaly – Identification of connected actions, and detection of anomalies within them
- Version based anomalies – Specifying anomalies that only occurred after a new version of the user’s product was deployed
How it works:
Coralogix operates under the assumption that most logs are similar, when the only thing that differentiates them from one another is the variables within them. That’s why Coralogix auto clusters the data to identify patterns, and connects the dots between the data. If an action calls for a certain response and doesn’t get it, that’s when an anomaly is detected.
Secret sauce: Coralogix has the ability to aggregate logs into their original templates and analyze that data to understand anomalies.
Bottom line: Coralogix bundles logs with similar patterns, focusing on the different fields within each message. By doing so, the company can detect anomalies within certain actions and flows, and focus on the biggest anomaly picture on not on single incidents that might occur in the application.
Anodot offers an anomaly detection system with the relevant analytics for the users. Their focus is on detecting anomalies in databases of any type, along with identifying anomalies in business related data.
- Behavioral correlation and grouping of similar logs
- Business data anomalies detection to offer anomaly detection within marketing campaigns, clicks and performance indicators
- Alerts handling – Reducing noise by grouping similar anomalies into one alert
How it works:
Anodot uses their algorithms to isolate issues and correlate them across a number of parameters. On the practical side, the company determines the normal range of the application or the action, and gives it a score that it has to keep.
When an event changes that score, the system assesses the importance of the anomaly based on the status of the data, and how long it acted this way. Anodot always alerts the user of the anomaly, whether it’s good or bad, so that they can handle it as they see fit.
Secret sauce: Anodot can auto-select the most relevant algorithm needed for the data pattern, which changes and adapts as the patterns change.
Bottom line: Anodot focuses on logs, metrics and business indicators, which can address not only the development team, but other members of the company as well.
Speaking of anomaly detection…
Numenta offers an open source project that takes a broader look at the world of anomaly detection. Its technology can detect anomalies in servers and applications, along with human behavior, geospatial tracking data (GPS tracking), and prediction and classification of natural language. Basically, any dataset that has a baseline or trends.
The most interesting thing about Numenta is the Numenta Anomaly Benchmark (NAB). It’s a benchmark that allows evaluation of algorithms for anomaly detection in streaming, real-time applications. It allows you to test your current algorithms, see benchmarks from the community and get a deeper understanding as to how to detect anomalies.
The library is open sourced, and comprised of over 50 labeled real-world and artificial time series data files plus a scoring mechanism designed for real-time applications. If you’re already using an anomaly detection algorithm, Numenta can help you evaluate it. Also, if you’re looking for an open source tool, this might be the answer for you.
Anomaly detection helps gain better insights out of production applications. Each tool has its own way to identify anomalies. The most important thing we should remember is that it’s not only about the dashboard; it’s about the data. That’s why we urge you to explore each one, and base your final decision on the one tools that give you the best value according to the problem that you’re trying to solve. Need more information to help you with a decision? Let’s schedule a call and talk about it.