What are some of the most useful monitoring tools for Java developers?
Monitoring is an essential function in production environments today. Errors and performance issues pop up all the time – not just during business hours – so good monitoring tools need to be active 24/7. There are a lot of tools out there that tackle this issue from different angles, so getting a sense of which ones to consider can be tough.
Today, I’m taking a look at 7 monitoring tools that are on the newer side or are worth considering as an alternative or addition to tools like New Relic and AppDynamics. The tools comprise a mix of open source and SaaS models, and each of them has their own specialty or lean, be it metrics, visualizations, or error tracking.
Datadog is SaaS monitoring tool targeted for DevOps teams that takes data from your app and a wide range of other tools and provides insights and visualizations. It unifies data produced from your infrastructure and software into one location, allowing you to build dashboards or search across the data you feed it. They are currently built around aggregating and presenting data rather than performing analytics of their own.
One benefit of Datadog is that they offer full access to their API, which opens up the flexibility to develop your own metrics or integrations.
Downsides: Datadog doesn’t provide much in the way of their own analytics today, which can be something you may be looking for in a monitoring tool. They also require you to weave them into your code, which creates dependencies.
The Java Angle: Officially, the Datadog API supports Python, Ruby, and C#. However, thanks to their open API access, the Datadog community has written several libraries for Java, including ones for StatsD and Codahale metrics.
Takeaway: With their alerting capabilities and performance monitoring, Datadog can be used as a cheaper APM alternative. Their range of integrations helps plug them into your environment without much trouble. Datadog is a solid tool for gathering and visualizing metrics, but they aren’t an analytics tool, so look elsewhere if you want those functionalities.
Developed by Dynatrace, Ruxit is an application performance monitoring tool that operates in a SaaS model. It was developed to provide a different APM experience through SaaS and to work in high scale environments. It installs as a single agent and doesn’t require you to configure your environment for it, which provides an ease of setup advantage over similar tools like New Relic. They monitor user activity, application performance, servers, and network activity.
Downsides: Ruxit doesn’t provide as much feature depth as other APM tools in the space.
The Java Angle: Ruxit works with any Java server and Java versions 1.5+. Java was one of their first languages, so it has some of the better support.
Takeaway: Ruxit is a SaaS APM tool that is easy to setup and integrate into your environment. The tool provides a good range of breadth into your environment, providing views into several different areas.
OverOps tells you when and why your code breaks in production. It detects all types of errors and gives you the code and variable state when they happened. OverOps runs as a Java agent, and has no reliance on log files, which enables it to maintain <3% CPU and IO overhead. On the installation side, it does not require code changes, binary dependencies or build configurations. With integrations like JIRA and Slack, OverOps is simple to slide into your existing workflow.
An error analysis view in OverOps
Downsides: It’s a JVM-level tool exclusively, so non-JVM languages are not currently supported.
The Java Angle: OverOps is a JVM-level tool. It works for any JVM-based language without requiring workarounds.
Takeaway: Unlike other tools which stop at the stack trace level, OverOps gets down to the JVM level to bring you the actual code and variable state you need to solve each error. With OverOps, you can tell if a new deployment broke something in your code, get insight into all the errors happening in your application, and zoom in on critical issues.
Rollbar focuses on error tracking and monitoring. They use stack tracing to capture errors in your application. The upside of this is that it can work with a wide range of languages and environments. Rollbar offers the ability to go in and report exceptions and events manually as well. Beyond tracking uncaught exceptions, they also provide some alerting and analysis capabilities.
Downsides: Rollbar can only capture uncaught exceptions. If you want to capture caught exceptions or anything else, you have to do so by hand. By relying on top-level uncaught exception handlers, it can miss exceptions that are swallowed up by your framework to prevent thread death.
The Java Angle: Rollbar does not have official libraries for Java, but their community has written a few that can send your logs to Rollbar.
Takeaway: Rollbar is a stack trace-based error tracking application that can play well with most languages. The tradeoff for that is that they can only capture uncaught exceptions and events you manually send them.
Sensu is an open source monitoring framework for application and system services. It can collect and ship metrics to a variety of tools and provide alerts for defined events. Written in Ruby, Sensu uses a “checks and handlers” setup, wherein periodic check scripts are run to look for pre-defined conditions, which are then reported to handlers if present. Handlers are used to send notifications or take other actions.
Downsides: Sensu isn’t as broadly reaching as true APM tools like Ruxit. It’s focus is on server monitoring. On the installation front, Sensu has dependencies on RabbitMQ and Redis, as well as several other dependencies wrapped in their required repositories. There are some concerns around scaling capabilities and maintenance complications, but your mileage may vary.
The Java Angle: Sensu check and handler scripts can be written in any language, and the Sensu community has written some plugins for Java.
Takeaway: Sensu is an open source framework alternative for cloud and server monitoring. There are some questions around high scale and complexity, but it’s a strong tool for metrics gathering and service monitoring.
6. ELK stack
The ELK Stack isn’t exactly new, but we’d be remiss to make this list and leave it off. Made up of ElasticSearch, Logstash, and Kibana, the ELK stack is a popular set of open source tools for monitoring, logging, and visualizing your data. Elasticsearch handles the search and analytics, Logstash is the log aggregator, and Kibana does the fancy dashboard visualizations. We’ve been using it at OverOps for a while, feeding it from Java through our logs and Redis, and it’s in use both by developers and for BI. Today, Elasticsearch is pretty much built-in with Logstash, and Kibana is an Elastic product as well, making integration and setup very simple. You can mix and match the three tools if you’d like as well.
When a new deployment rolls out, the dashboards follow custom indicators that you can set up about your app’s health. These indicators update in real time, allowing close monitoring when freshly delivered code takes its first steps after being uploaded to production.
Downsides: The ELK Stack faces the standard downsides for open source tools (namely that setup costs and deployment issues are yours to shoulder). At higher scale, the number of machines needed to run ELK Stack starts to multiply aggressively, which creates a need to upkeep and monitor them yourself. One possible solution is using hosted services like Logz.io to help manage this pressure.
The Java Angle: The ELK Stack is designed for Java. In fact, a Java runtime is required to run ElasticSearch and Logstash. Elastic recommend having at least Java 7, and all ElasticSearch nodes should run on the same JVM version.
Takeaway: ELK stack, the name for ElasticSearch, Logstash, and Kibana, is a set of open source tools that delivers search & analytics, logging, and visualization capabilities. The tools are built to integrate well together, so using the full set is simple (although not required).
Graphite is a visualization tool for monitoring metrics in your application. Made up of three components (Carbon, Whisper, and Graphite-web), its open source nature makes it easy to customize and tinker with. We wrote about building your own Graphite architecture here.
Graphite has a powerful querying API and a fairly feature-rich setup. It doesn’t capture its own metrics, but the Graphite metric protocol is often chosen the de facto format for many metrics gatherers, so feeding it data is rarely a problem. Using Graphite enables you to create an extensive range of views into your application.
Downsides: Graphite faces the standard downsides for open source tools (namely that setup costs and deployment issues are yours to shoulder). Additionally, Graphite can run into problems at higher scale thanks to design decisions it made for its Carbon and Whisper components. This one is a matter of preference, but many people aren’t too enthused with the default GUI either.
The Java Angle: Graphite is language agnostic, and there are many tools that can collect metrics from Java applications and send them to Graphite.
Takeaway: Graphite is a popular open source tool for visualizing and querying metrics you gather from your application. It has the standard open source downsides as well as some limitations around scaling, but both of those can be overcome in various ways if you’re willing to either pay or customize.
Monitoring tools are an essential inclusion in production environments today. Visualizing metrics, tracking errors, monitoring performance, and analyzing your application are all key activities for gaining insight into the workings of your application. Recognizing the need is easy, but choosing which monitoring tool or set of tools to use can be difficult.
The seven tools I wrote about here – Datadog, Ruxit, OverOps, Rollbar, Sensu, ELK Stack, and Graphite – are worthwhile tools to check out. They’re all either on the newer side or provide a valuable alternative to some of the larger tools out there. Part of choosing which tool to deploy is knowing where to start your search. The tools here make for a good starting point.