The best alerting and ChatOps tools for getting the most out of your monitoring ecosystem
We live in a time of automation and continuous deployments, but we still need to take care that our applications are working as they should. Having all the automation and management in the world means nothing, if you have no visibility into what’s actually happening in your application when something goes wrong.
Alerting tools provide insight into the inner workings and status of your application in production in real time. The main goal of these tools is not to monitor your application, but to aggregate the data from your entire monitoring ecosystem and to notify you when something worth your attention pops up.
Some achieve this by providing a dashboard for error tracking, others use advanced integrations and notification systems and some even run continuous testing on your application. It’s important to recognize your own goals in implementing your alerting system so that you can be sure you get visibility into the metrics that are most important to you and your team.
Pagerduty is an alerting tool that pings you when issues arise in the different monitoring tools that you’re having it watch. It doesn’t monitor anything on it’s own, but takes alerts generated by other tools and sends them out to you and your team based on escalation and priority rules you set up. It can send out alerts through email, phones, and several other means of contact. You can use it to collect alerts from your monitoring tools and create schedules to coordinate your team.
When to use it: PagerDuty is the main player in this field with only a few competitors (such as VictorOps), so if you’re using several tools and want one unified alerting tool, it’s the one to check out. Also give it a try if an inside-your-environment alerting tool is what you’re looking for in general.
Price: $9-$99 per month per user, depending on feature needs
Pros: PagerDuty integrates with a huge variety of different tools to gather their alerts and notifications. It provides a fairly extensive and nuanced means of getting alerts out to the right people at the right level of urgency. It has means for escalating alerts as well.
Cons: It does require some installation and set up. To get the most out of it, you have to spend a bit of time hooking it up with your different tools and entering in the rules for the different alerts you want it to give. It’s also one of the most expensive tools in this area.
BigPanda is an algorithmic reporting system that helps Operations teams focus on issues that matter. Their platform uses machine learning to correlate alerts from all of your integrated monitoring systems into “incidents” and then can create tickets and notify your team.
When to use it: If you’re looking for a cheaper alternative to PagerDuty. If you want to gain added benefits from AI and machine learning.
Price: Three tiers: Standard, Pro and Enterprise. Starts at $6 per physical/virtual device that sends monitoring data to BigPanda.
Pros: BigPanda uses AI and machine learning to give a boost to what PagerDuty offers. In addition to aggregating monitoring data with custom view filters and streamlining ticketing and notifications, BigPanda correlates related alerts from across the monitoring ecosystem to provide a deeper understanding of what’s happening inside the application.
Cons: As a younger company, BigPanda is still working to develop their community and expand their integration offers.
OpsGenie is an alerting management solution for DevOps teams through the incident lifecycle. The OpsGenie platform offers 200+ integrations to tools in the SDLC to centralize alerting, notify the relevant team members and improve collaboration. Setup doesn’t take a lot of time, and in addition to detailed tracking and alerting, OpsGenie helps with sophisticated on-call scheduling and incoming call routing.
When to use it: If you want to combine alerting from your application monitoring ecosystem with user-reported incidents. If you want an advanced routing system with alert escalation policies. If you’re looking for a less expensive alternative to PagerDuty.
Price: 14-day free trial. Five account tiers with prices ranging from $5-99/user/month.
Pros: Get coverage for the full error alerting system including user-reported issues. Beyond routing for alerts coming from integrated platforms, OpsGenie provides scheduling and routing for incoming support calls. They also have a useful mobile app that can be used to respond to alerts.
Cons: It’s still in the early years and has some basic functionality issues with the mobile app like needing to login each time. There is still significant potential for more advanced iterations of current features like accounting for scheduling with team members in different timezones.
Slack is quickly becoming one of the most popular messaging services for modern office communication. It’s an excellent system for direct messaging, team collaboration and company-wide communications. Plus, with all of the apps and bots that you can create and use, it has endless potential. The integrations offered for alerting and monitoring tools are extensive, plus you can check out other useful Slack integrations here.
When to use it: If your team already uses Slack for internal communications, and you want to see alerts from monitoring tools in the same place. If you don’t routinely check your monitoring dashboards, but have a Slack window open at all times. If you prefer seeing the essential alert information without a complex dashboard setup.
Price: Free for small teams, $6.67/user/month for Standard tier and $12.50/user/month for Slack Plus
Pros: For teams that already use Slack, it’s already a significant and valuable part of the software development process by improving and simplifying internal communication. Integrating with external monitoring tools extends its influence to the error tracking and resolution part of the SDLC by aggregating alerts and facilitating communication for faster resolution times.
Cons: This isn’t the optimal solution for teams not already using Slack. For most teams, this solution won’t be enough to replace a single aggregated dashboard or the dashboards provided by each individual platform.
5. HipChat / Stride
HipChat is a popular platform from Atlassian that was designed specifically for business team communications. Along with direct and group chat capabilities, HipChat also integrates video chat so team members working in different locations can meet face-to-face in the same platform.
Stride is a newly released product from Atlassian that they recommend teams upgrade to and that will likely replace HipChat entirely in the future (though no plans for deprecation of HipChat have been announced). Stride was designed to be a more formidable competitor to Slack with new and enhanced messaging and communications offerings.
When to use it: If you’re looking specifically for a communication tool to fit along perfectly with your Atlassian suite of tools.
Price: 30-day free trial for Stride Standard. After that, pricing starts at $3/user/year. There is also a free version with more limited feature availability. Stride’s pricing is slightly more expensive than HipChat ($1/user/year for Standard tier), but HipChat’s Standard Tier was limited to 10 users and beyond that pricing changed to Enterprise tier. Stride’s pricing is standard regardless of number of users.
Pros: It offers many of the same messaging features as Slack like direct messages and group chats, but also adds video-chat and screen sharing into the mix. For teams working with other Atlassian products, it’s hard to deny the benefit of being able to access everything without toggling between multiple systems.
Cons: Less extensive integrations list, lower limit on file sharing size (50MB). Stride is a new product and as such still needs some fine tuning on basic features like web notifications.
The bottom line here is that delivering high-quality applications doesn’t end with deployment. Building a comprehensive monitoring ecosystem for your application in production is just as important as any other stage of the development lifecycle.
Each of these tools offers different features and capabilities, and choosing the “best” tool is really about choosing the right fit for your own ecosystem and needs. No matter which tool you choose, remember that your alerts will only be as useful as the data that they’re able to send.
Imagine getting an alert for a NullPointerException and seeing a snapshot of the complete state of the JVM at the moment the exception was thrown. That’s the level of insight that OverOps gives you for all known and unknown errors across the entire development cycle, including production and pre-production environments. To learn more about the benefits of integrating your alerting and monitoring tools with OverOps data, check out this whitepaper about how we use OverOps to detect unknown errors before they hit production.