Debugging 502 errors in Mastodon
Recently after making some changes to my Mastodon server, I began suffering a host of 502 and 500 errors. The root cause turned out to be something unexpected, so I thought I’d document my debugging process to explain how I tracked it down. This post will also serve as, I hope, an answer to someone wrestling with their own server if they have the same problem.
Spoiler: if you get a lot of IPAddr::InvalidAddressError
in your service logs,
make sure you haven’t set the ALLOW_PRIVATE_ADDRESSES
environment variable to
something invalid like localhost
. That’s a comma-separated list of IP
addresses; it should be 127.0.0.1
if it’s anything.
Read on to learn why it took so long to sort this out and how I eventually solved it.
The Manifestation
The problem began with me noticing that several features of Mastodon were no longer working on my server:
- I couldn’t search for accounts on other servers.
- If I tried to ‘like’ a post on somoene else’s server and it navigated back to my server, I got a “This page failed” page instead of the opportunity to like a post.
- Several images failed to load.
Popping open Chrome Inspector and checking the Network tab showed these were all HTTPS 500 or 502 errors. The error message itself revealed nothing (I’m assuming Mastodon doesn’t give a stacktrace on an error page for security reasons). I saw no obvious cause, so off we go to debug.
The Debuggening
Step 1: Did we break it?
I’ve made a couple customizations to my instance of Mastodon, so my first
instinct was to check if I broke it. I did a quick git rollback to the head of
the glitch-soc codebase that I base my
changes on. Unfortunately, a rollback and a reset (systemctl restart mastodon-web
) did not fix the issue. At this point, I suspected I had bad
state; time to learn more about the error.
Step 2: Finding the error
Since the 500 sent to the client was useless, I decided to check on the server
logs. Since I have Mastodon set up to run as a systemd service, it has fairly
robust log storage; journalctl -u mastodon-web -f
gave me access to the stdout
/ stderr of the process. I kicked off another off-node search and sure enough,
this log showed the issue:
IPAddr::InvalidAddressError (invalid address: on https://hachyderm.io/users/<edit: remaining URL omitted for privacy>):
/home/mastodon/.rbenv/versions/3.2.2/lib/ruby/3.2.0/ipaddr.rb:684:in `in6_addr'
… okay, now we’re getting somewhere. I’m not seeing the invalid address that’s being attempted, but I have an invalid address… Somewhere. The backtrace is uselessly bare-bones. So let’s see if I can learn more about this error:
git grep -rn InvalidAddressError
app/lib/request.rb:254: rescue IPAddr::InvalidAddressError
app/models/account_filter.rb:137: rescue IPAddr::InvalidAddressError
lib/mastodon/cli/ip_blocks.rb:139: rescue IPAddr::InvalidAddressError
I can probably rule out the cli
file because I’m not in the command line
tools. account_filter
might matter. But request
deals with making outbound
requests from my node, the kind of thing you’d do to resolve search or try to
send out a like… Okay. Let’s start there.
Here I got a bit lucky. The line that showed up in the grep
was irrelevant. But while reading the rest of the file, I chanced across these lines:
def perform
begin
response = http_client.request(@verb, @url.to_s, @options.merge(headers: headers))
rescue => e
raise e.class, "#{e.message} on #{@url}", e.backtrace[0]
end
That "#{e.message} on #{@url}"
pattern matches the actual error in the
logs. And I can see precisely why my backtrace is so bare-bones: we’re trimming
it on purpose down to the top of the backtrace!
Step 3: Hack the code
We need more information to be able to isolate this error. It’s simple enough to just dump the rest of the backtrace by adding this line above the raise:
Rails.logger.info "Oops! Request error with backtrace {e.backtrace}"
I had to do a systemctl restart mastodon-web
to get the change taken up, and
it’s slow to test on production, but I couldn’t get the issue to reproduce at
all in my development environment.
Now running the search dumps the entire stacktrace. I threw some formatting regexps at it, and saw the following:
["/home/mastodon/.rbenv/versions/3.2.2/lib/ruby/3.2.0/ipaddr.rb:684:in `in6_addr'",
"/home/mastodon/.rbenv/versions/3.2.2/lib/ruby/3.2.0/ipaddr.rb:620:in `initialize'",
"/home/mastodon/live/app/lib/request.rb:332:in `new'",
"/home/mastodon/live/app/lib/request.rb:332:in `block in private_address_exceptions'",
"/home/mastodon/live/app/lib/request.rb:332:in `map'",
"/home/mastodon/live/app/lib/request.rb:332:in `private_address_exceptions'",
"/home/mastodon/live/app/lib/request.rb:326:in `check_private_address'",
"/home/mastodon/live/app/lib/request.rb:268:in `block in open'",
... etc ...
So line 332 jumped out at me.
@private_address_exceptions = (ENV['ALLOWED_PRIVATE_ADDRESSES'] || '').split(/(?:\s*,\s*|\s+)/).map { |addr| IPAddr.new(addr) }
Ah-hah! I had recently added
ALLOWED_PRIVATE_ADDRESSES=localhost
to my .env.production
file while setting
up translations (editor’s note: to minimize confusion of future readers, I’ve
corrected the previous blogpost). Turns out that’s incorrect; that should be an
IP address, not a domain name, even localhost. Sure enough, I confirmed that
loading IPAddr
and attempting IPAddr.new('localhost')
is an error in
Ruby. And the issue didn’t show up when I tried to roll all my changes back
because the .env.*
files aren’t checked into version control; I didn’t have
history on changes to this file.
The fix was simple enough: I removed the whole line from the config. It turns out it wasn’t necessary anyway.
Should Mastodon crash like this?
The short answer is no; it’s not great that Mastodon isn’t treating environment variables as unsanitary external input and is ram-rodding them into the code at a low-level without validating them. But that’s extremely par-for-the-course, not just for open-source software, but software in general; I’ve seen plenty of projects that assume things like env vars, because they’re also touched by developers, are trusted data. I might try to propose a validator change on the Mastodon project, but more important than preventing this issue in the first place is having the skills to identify it when it’s an issue. Hopefully, this deep(-ish) dive in the debugging process helps someone down the line pick apart why their server is misbehaving.
I suspect that someone will be me, a couple years from now. ;)
Comments