What are some of the biggest obstacles with Java exception handling?
Warning: the following string of characters is known to induce stress responses in certain individuals of the human species: “java.lang.NullPointerException”.
If you’ve ever been frustrated with an exception, you’ve reached the right place!
In this post, we highlight the shortcomings of relying on stack traces alone for investigating Java NullPointerExceptions. Although you get the line from which the exception was thrown, knowing if it’s new, why it happened, and who introduced the change that caused it is a whole different ball game.
The typical NullPointerException resolution workflow
While the issue we’re covering here isn’t exclusive to NullPointerExceptions, it makes a good simple example. After all, they’re the most common exception in Java production environments.
Let’s assume a NullPointerException just happened, how are you made aware of it?
Worst case – your customers are negatively impacted and your team is made aware of it through an angry stream of tweets.
Best case – it fails one of your tests and you’re able to stop if from reaching production.
The common case – exceptions happen left and right but you don’t know if they’re new or critical.
For the purpose of this exercise, let’s assume that we have an exception in our hands that we’re tasked with solving so identification is out of the way (for now). The starting point of the investigation phase would often be your application logs and the the exception’s corresponding stack trace. There’s also the possibility that the exception wasn’t logged – we like to call these the silent killers of Java applications.
Let’s work with the best case scenario, assuming the exception was indeed logged:
Now, let’s clear the noise and strip down the 3rd party code to stay with the most relevant information:
We see there’s a NullPointerException on line number 64 in the GetUserBillingServlet class.
When we follow through and examine the code, there are 2 possible scenarios. The snakes and ladders of debugging:
1. We’re in luck, there’s only one value that could’ve been null on that line and maybe we also logged it in a few different spots in the code so we can narrow down on the problematic step. Something like:
The “user” object is definitely the source of trouble.
2. Murphy’s law. If something can go wrong, it will go wrong. Consider the following if statement:
Now we’re not sure if it’s the “user” or “account” who are null and we’re stuck.
As Mr. T once said, “Life’s tough, but I’m tougher”. Let’s look into some possible solutions that would help us advance the investigation.
Solution #1: Breaking down complex lines of code
In the above example, the if statement could have been broken down to:
The stack trace would include the appropriate line number and let us move forward faster. This is also why splitting aggregate operations on streams is a good practice.
In fact, some style guides insist on the same principle also for readability issues. Check out the post where we compared Java style guides from companies like Google, Twitter and Mozilla (and Pied Piper).
Solution #2: More null checks
This is probably the most obvious solution, keeping the nulls at check and making sure no rogue values pass to critical areas. Code filled with null checks is not pretty, but sometimes it’s a necessary evil.
In a previous post about JVM JIT optimization techniques we elaborated on how the JVM makes use of the common trap mechanism to work around possibly redundant null checks that affect performance.
Solution #3: Higher verbosity logging
If there’s an exception, there’s usually a log message which contains additional hints. Whether it will contain useful information or not is a different story.
The next step could be to add information to the message or add additional log statements that would shine some light on the path to the… explosion. Which creates the debugging paradox – hoping the error would happen again to make it stop from happening again.
For additional methods to debug production servers at scale, check out this post on the High Scalability blog (which is a great resource for anything related to high scale systems).
Solution #4: Adopt a Continuous Reliability mindset
Improving code quality and ensuring application reliability are tough problems to solve. We made a few assumptions in this post to make things easier but as you know, application errors are a much more complex problem in reality.
Continuous Reliability (CR) helps define a new approach for ensuring software quality in Continuous Integration (CI) and Continuous Delivery (CD) pipelines. Helping promote “Shift Left”, “Shift Right” and Developer Productivity initiatives by introducing structured practices for identifying and resolving critical software issues, based on quality gates, application observability, and contextual feedback loops.
At OverOps, we’re laser focused on making the vision of Continuous Reliability a reality within the scope of mission-critical Java and .NET based applications. Whenever an exception, logged error or warning occurs, OverOps captures and analyzes it, helping prioritize it and providing a snapshot with the complete variable state from the moment of error with the code that caused it.
This way, no matter the issue, identifying, prioritizing and resolving it only takes minutes:
NullPointerExceptions aren’t going anywhere anytime soon. That’s why it’s critical to have a good strategy in place to identify, prioritize and resolve them!
This blog post was originally published on August 11, 2016