Error Buckets and Root Causes

by

Matthew Boulton

April 13, 2021

4 Min

reads

I’m sure anyone out there using redis has probably seen log output like this from time to time:

Users can’t sign up… and our payment processor is failing… and users can’t login… and our SSO is broken! A low-level exception is bubbling all the way up and it looks like we have lots of issues but in fact there’s only one root cause.

If you’re using redis throughout your application and letting exceptions bubble up, those timeouts could trigger multiple alerts from different entry points. If you’re using something like Key Transactions or Keyword Log Alerts to monitor the most important parts of your system then you’ll be getting several alarms indicating multiple things have failed. I’m not picking on redis in particular, any component that is subject to intermittent failures can cause this if it’s used liberally throughout the application: sql database, cloud storage etc. The point is that when one of these services goes down, you suddenly start to get alerts saying that multiple parts of your application have failed.

Things are a little different if you are using Railtown.

When these errors starting occurring, you’ll get one notification to your preferred channel (slack, teams, zapier, email… etc.) or if you are in the app already, you’ll see new errors in your deployed environments right on the home page:

Here we can see the error is not affecting the production environment, even though it’s running the same build as the test environment. Good news. Now we can dive into the Error Bucket and see what’s going on.

We can see several entry points for the same error, so instead of three alerts or key transaction failures we just have one root cause to investigate:

Now we can dig into those entry points and see the associated error logs and stack traces. Railtown has already linked them together for you.


This is the just the start of how Railtown can make your life easier. If you’ve linked Jira or Azure DevOps to your account, we’ll try to match these errors to recent changes in your application via the tickets/work items. Perhaps in this case, you took a ticket to reduce the connection timeout to the redis server and, in the test environment, the infrastructure is not powerful enough to handle that change. Our smart AI ticket matching can figure this out, but that’s a story for another time.


I hope this post has explained how Railtown’s Error Buckets can reduce the noise and take some of the stress out of dealing with intermittent errors in your application’s dependencies.


If you don’t already have a railtown.ai account sign up and let us help you improve the quality of your software and increase your developer velocity.

Keep reading

AppInsightsBest Tools to Integrate with Railtown.ai

Railtown.ai helps you gather the information you need to efficiently resolve an error, but by combining our application with other tools can take your developer workflow to the next level. So today, we’d like to go through 2 popular tools that we recommend for expanding your team’s capabilities: New Relic and Azure Application Insights.

by

Marwan Haddad

January 6, 2023

5 Min.

reads

CultureHow to Build a Positive Culture Within Development Team on Debugging

Too many software development teams treat error logging as a burden rather than a chance to grow.That’s why Jeli’s “Howie: Post-Incident Guide” felt like a breath of fresh air. In the guide, Jeli’s team notes that negative treatment of bugs is everywhere, including the language we use to describe software development. Terms like “post-mortem” and “root-cause analysis” speak for themselves.We’d like to build on the suggestions put forward in Howie and share our perspective on error logging as well. So let’s take a look at how a small mindset shift around debugging could encourage more positive moments in your team.

by

Marwan Haddad

December 1, 2022

5 Min

reads

Error BucketNode.js Monitoring: Performance Monitoring Best Practices

Plenty of developers who start coding in Node.js do so because of how easy it is to get started. But once you are ready to take your application to the next level, you need to take a step back. Why did you choose to build in Node.js, and where do you want to take your application from here? To scale your Node.js application, you need to figure out just how to use performance monitoring to your advantage.

by

Marwan Haddad

November 21, 2022

5 Min

reads