How a new OverOps user reduced their AWS bill and solved a 4-year-old performance problem

 ● 19th May 2021

2 min read

A relatively new user of OverOps was examining exceptions in their production environment and noticed an AmazonS3Exception. What surprised the user was that the exception had been seen for the first time 4 years ago, and since then it had happened 2.6 million times.

They drilled into the exception and saw this:

At first blush, this seemed reasonable. They built in a retry count when trying to retrieve something from S3. The code would try up to five times before giving up and raising the exception back up the chain.

OverOps allowed the user to get at root-cause level information on the error. On the right-hand side of this screenshot there is a box that says “Recorded Variables”. Immediately below that it says “AmazonS3Exception”. 

The user expanded the “AmazonS3Exception” to see what was going on and was expecting to see network timeouts. What the user actually saw was a big surprise! The error code (errorCode) from S3 is “NoSuchKey” (see screenshot below). This is basically the S3 version of File Not Found.

Thinking further, then re-looking at the code revealed a serious flaw in the logic. Nowhere in the code is it actually checking the error code. So if a method calls to the S3Connector asking for a file that doesn’t actually exist, it will try 5 times to load a file that doesn’t exist! Each of those interactions with AWS S3 costs money and time.

The code needed to be fixed to check the error code and avoid the additional calls to S3.

After making the fix, and deploying the fix, they evaluated the ARC (automated root cause) screen to see the results of the fix. Before making the fix, the exception would happen hundreds of thousands of times a day:

After the fix was deployed, the exception happens only a few thousand times a day:

This is part of the value of OverOps! It will tell you things you didn’t know about your production application as it’s running in the “real world”. Armed with that information, you can make your code better for the customer, and better for your bottom line.

Learn more how OverOps can help, and feel free to reach out to us for a conversation at

Marc is the Director of Engineering at OverOps, focused on growing and expanding the global dev teams. Prior to OverOps, Marc has founded 3 companies, and worked in almost every facet of the organization as Engineering Manager, Chief Engineer, IT Director, Director of Engineering Operations, VP of Engineering, and VP of Software Security and Compliance

Troubleshooting Apache Spark Applications with OverOps OverOps’ ability to detect precisely why something broke and to see variable state is invaluable in a distributed compute environment.
Troubleshooting Apache Spark Applications with OverOps

Next Article

The Fastest Way to Why.

Eliminate the detective work of searching logs for the Cause of critical issues. Resolve issues in minutes.
Learn More