RJ Lim
Senior Software Engineer at Zynga

Production Monitoring Ecosystem

Key challenges and pain points:

Even when an application is stable, there are millions of lines written to the log. If an exception is thrown and we want to find it, we have to sift through the log in hopes of finding the specific issue we’ve encountered. Some problems are very hard to debug even if you have logs, and on some occasions they might not be debuggable at all. We needed something that could help us debug in production.

Example problem that OverOps helped resolve:

We had updated configurations through an ExecutorService and realized those changes were not picked up. Since there’s no logging for uncaught exceptions, there was no way to know that the ExecutorService was failing, and no way we would have been able to detect that though logs. However, OverOps catches all uncaught exceptions, so it showed us where the issue happened and gave us the complete variables across the call stack. Instead of guesswork, we diagnosed the problem quickly and fixed it right away.

What is the “secret sauce” that made you choose OverOps?

There’s only some information you can fit inside a log line, and it gives a narrow view of what’s going on inside the application. With OverOps we have all of the information we need, including the complete source code and variable state, across the entire call stack.

Thanks to OverOps, our errors and exceptions gained meaning and we were able to find and fix them rapidly.

How are you integrating OverOps with your daily workflow?

We have a weekly debugging session, to clean up errors inside the application. We’re using OverOps to detect those errors and exceptions, and also to tidy up the logs themselves and make them meaningful. Instead of ignoring millions of error lines, we can now know what’s important, get real-time notifications via HipChat and fix issues as soon as they occur.

