Fuze offers Unified Communications as a Service (UCaaS), and my team is responsible for the company’s provisioning portal. This application manages most of our core business processes, such as quoting, contract management, provisioning services, managing handset and client configurations, invoicing and more.
The application is a combination of web-based UIs and REST APIs, providing customers, engineers and support teams with a variety of interfaces via which they can manage multiple operations. These operations include taking contracts and turning them into actual functioning services, along with building call flows, managing call centers and more.
Fuze has been around for over a decade, and during those years the portal code base has exceeded over 12 million lines of code. As the company evolved, some engineers moved between different teams, some started working on other parts of the application and, of course, new engineers are always joining us. At this point, the code base has become very extensive, with some portions that are very complex; and so the platform needs constant monitoring and handling.
Given that the portal is the central management application for our deployment, bugs can impact both our customers and internal stakeholders; and every issue becomes critical. Debugging is a significant part of our daily activities and detecting bugs, writing code fixes and deploying them to production can take anywhere from a few hours to a couple of days.
One of our main focus points is application health. On one occasion we came across a specific exception that occurred over 3 million times an hour. Like most companies, our workflow used to consist of using log management tools to try and identify the frequency of the most common errors. However, those tools only gave out a partial image as to what’s really going on within the application.
With OverOps, we are able to find every error or exception before it impacts our users. We are also able to see the impact of caught exceptions, and detect when their volume poses a performance risk.
That’s one of the biggest advantages OverOps offers our team, and one of the biggest eye openers we had - giving us an in-depth look into every error and exception. Thanks to OverOps, we were able to significantly reduce our heap size, improve CPU utilization and cut down our debugging time.
Within 2 months of deploying OverOps, we were able to improve CPU utilization and heap sizeby over 50%. A side effect of this was reducing the time it takes to run full automated regression tests to under two hours. Also, OverOps significantly helps our developers in finding the root cause of errors, and reduces the time it takes to actually fix them.
Every single piece of code we write goes through code review, and our QA team have incorporated OverOps as part of their testing process. Instead of searching through the log files trying to recreate a certain issue, they have OverOps to point them in the right direction.
The OverOps dashboard pinpoints where an issue happened within the code. When QA open a ticket, they add the relevant URL to the error’s analysis from OverOps, giving our engineers a complete analysis of what actually happened.