Comcast (NASDAQ: CMCSA) is a global media and technology company with two primary businesses: Comcast Cable and NBCUniversal.
Production Monitoring Ecosystem
About the Team:
X1 for XFINITY TV is the flagship video application for Comcast, providing over 10 million customers with a cutting-edge video experience. The application offers an interactive platform combining a broad content catalogue including live TV, XFINITY On Demand, Netflix, and DVR recordings into a single experience that offers personalized recommendations, voice integration and apps.
Key challenges and pain points:
X1 delivers video service to tens of millions of set-top boxes nationwide, so an issue in production can possibly impact a substantial percentage of our user base. We strive to deploy a new version of our application on a weekly basis, and as such, we have to stay on top of every new error and exception that might impact the application’s experience.
While we do use logs, they’re often too tedious to work with and don’t reveal the root cause of each issue. Since there are millions of devices that run our application, pinpointing a single error or trying to reproduce it takes up a substantial amount of our time and resources.
How OverOps helped you solve issues?
We use OverOps regularly for all of the unknown error conditions that we didn’t foresee. It helps us automate the process of sifting through log files, making it easier to detect issues as soon as they appear.
In fact, we had a full day where the whole team participated in what we called an “Exception Burn-down Day.” We spent an entire day fixing exceptions and log errors that were identified by OverOps. As a result, we were able to materially reduce the noise in our application logs and fix problems that had eluded us in the past.
Thanks to OverOps, we now have visibility into the long tail of problems that the system experiences that we otherwise wouldn't have visibility into. We know as soon as an error occurs and have the ability to react fast to every issue, error or exception.
How are you integrating OverOps with your daily workflow?
After installing OverOps, we almost immediately saw detailed data about our application’s performance. We were able to detect where exceptions were thrown, and identify the customer impact of the issues that the tool identified.
We use OverOps along with our homegrown telemetry and alarming tools to get a broad and detailed view of each error. The OverOps dashboard displays the application's behavior, and along with its Slack integration, we get alerts as soon as a new error is introduced into the system. This allows us to quickly fix any error, exception or issue without harming the user’s experience.