Backing Up Mastodon and expanding its storage

My Mastodon server has reached the point where it’s complex enough that backing up would be prudent. I was a litle surprised to discover that a script to do this for me is not provided in the default project. The project does have a comprehensive overview of the steps to backup, so I set about making my own backup script. In the process, I also discovered a need for way more storage, so I’ll talk a bit about how I did that also.

Expanding storage on a Raspberry Pi

In two weeks of operation, the cache of user-generated data (in public/system) has expanded to about 20GB. I can tune how much is retained and for how long, but in general I expect needing to store images and video will continue to be a concern. So I went ahead and sprang for a 1TB drive from Seagate for all my storage needs.

Getting that configured onto the Pi wasn’t too hard; it involved a couple steps and some tools I’m not practiced with.

  1. Find the drive. After plugging in the drive, run lsblk to find the drive’s mount point (probably /dev/sd<something>; mine was /dev/sda with a partition at /dev/sda1; we care about the partition).
  2. Format the drive for ext4. The drive comes with NTFS formatting, which doesn’t support POSIX-style permisisons. To format it to ext4, we do sudo mkfs -t ext4 <whatever device your drive is on>. So for me, sudo mkfs -t ext4 /dev/sda1.
  3. Find device PARTUUID. Run sudo blkid to get information on your devices. You’re looking for the PARTUUID of the drive. Write that down.
  4. Configure the system to auto-mount the drive. Raspberry Pi’s OS uses fstab to configure hardware mounting. To configure, edit /etc/fstab with root permissions and add a line like this to the file: PARTUUID=<the UUID you wrote down> /media/squire ext4 defaults 0 2 This configures that device to mount at the /media/squire directory. It also specifies it uses ext4 and that the default set of configuration rules should be used. The 0 denotes how often the filesystem should be dumped (“never,” which is generally correct for modern filesystems). The 2 indicates this is not the root filesystem (sets fsck order).
  5. Make the directory. sudo mkdir -p /media/squire.
  6. Restart, or mount by hand. At this point I can restart the Pi and the fstab config should work, or I can hand-mount the drive with sudo mount /dev/sda1 /media/squire. Now when I change directory to /media/squire, there’s whole terabyte of storage there!

Using the expanded storage to store assets

To take advantage of the new storage, we’ll symlink from the Mastodon install to that location. I have my Mastodon node set up as per the recommendation on the site, so my downloaded assets (all 20GB of them) are at /home/mastodon/live/public/system. To shift that to the new drive (all of these as root):

mkdir -p /media/squire/mastodon/live/public
chown -R mastodon:mastodon /media/squire/mastodon
cd /home/mastodon/live/public
# Bring the server down so new assets aren't loaded while you're copying the storage
systemctl down --now mastodon-web mastodon-streaming mastodon-sidekiq
mv system system-old
cp -Rp system-old /media/squire/mastodon/live/public/system
# ... wait a long time; flash memory is not fast. This took about a half hour for me
ln -s /media/squire/mastodon/live/public/system system
chown -h mastodon:mastodon system

Now we’re all set; with the symlink in place, Mastodon is writing the assets to the remote drive.

… or it would be, if this were 2002.

Broccoli Man, the superhero from the 'I just want to serve five terabytes' video

You think you are done? Haha. You are so funny.

Systemd services have an additional safety feature: they scope the directories they are allowed to access. That normally would be no problem, but the scopes don’t follow symlinks and since we’re symlinking to a directory in a different part of the filesystem, systemd will block the daemons it launches from accessing that path. To address this, we need to edit the files mastodon-web.serivce, mastodon-streaming.service, and mastodon-sidekiq.service under /etc/systemd/system to include our new location in scope.

In each file, locate the ReadWritePaths line and add the path you just copied your system directory to:

ReadWritePaths=/home/mastodon/live /media/squire/mastodon/live/public/system

Then you’ll need to let systemd read the updated files:

sudo systemctl daemon-reload

Now you’re all set. Bring the services back online and you’re all configured to serve from the larger disk.

systemctl up --now mastodon-web mastodon-streaming mastodon-sidekiq
# At some point in the future, once you're sure everything is working
rm -rf /home/mastodon/live/public/system-old

Creating backups

With the new storage in place, I can put some backups together. For my backup script, I wrote a bit of Python at /home/mastodon/live/admin/make-archive. No specific reason for Python, other than (a) I’m familiar with it and (b) it’s flexible. If I were more familiar with Rails, that would probably be the best option.

As documented, backing up Mastodon involves archiving data from four sources:

  1. The PostgreSQL database
  2. The .env.production application secrets
  3. The user-uploaded files
  4. The Redis database

To easily access Redis, I expanded my Mastodon user’s permissions with usermod -a -G redis mastodon. Then it was just a matter of putting the script together.

#!/usr/bin/python
# Intended for Pyton v3.9.2
# Uses click (click.palletsproject.com)
# Copyright 2023 Mark T. Tomczak
# Released under Creative Commons Attribution-ShareAlike license
# (CC BY-SA 4.0)

import click
from datetime import datetime
import pathlib
import shutil
import subprocess
from typing import Final

README_TEMPLATE: Final = """
# Mastodon server archive

Date: {date}

To restore:
* Read te dump file into postgres: `psql mastodon_production < mastodon_production.sql`
* Copy .env.production back to live/.env.production
* Merge public/system into live/public/system
* Overwrite /var/lib/redis/dump.rdb with the archived dump.rdb
"""

def dump_postgres(path:pathlib.Path, db_name: str) -> None:
  """
  Dumps the postgres database to path.

  path: Location to store the dump
  db_name: Database to dump
  """
  subprocess.run(['pg_dump', f'--file={path}', db_name]).check_returncode()

def emit_readme(path: pathlib.Path) -> None:
  """
  Emits a README at the specified path
  """
  with open(path, 'w') as f:
    f.write(README_TEMPLATE.format(date=datetime.now().strftime("%d/%m/%Y %H:%M:%S")))

def make_tar(pathdir: pathlib.Path) -> None:
  """
  TARs the archive dir to a <dir>archive.tar.gz file
  """
  archive_members = [p.relative_to(pathdir) for p in pathdir.glob('*')]
  cmd = ['tar', '-zchf', pathdir / 'archive.tar.gz'] + archive_members
  subprocess.run(cmd, cwd=pathdir).check_returncode()

@click.command()
@click.argument('dirpath', type=click.Path(file_okay=False, dir_okay=True, path_type=pathlib.Path))
def make_archive(dirpath: pathlib.Path) -> None:
  """
  Generate an archive of the Mastodon server in the directory specified by dirpath.

  Directory will be emptied by the operation and is created if it does not exist.
  """

  dirpath.mkdir(parents=True, exist_ok=True)

  for path in dirpath.glob('**/*'):
    if path.is_file():
      path.unlink()
    elif path.is_dir():
      shutil.rmtree(path)

  print("Dumping postgres...")
  dump_postgres(dirpath / 'mastodon_production.sql', 'mastodon_production')

  print("Backing up .env.production...")
  shutil.copy(pathlib.Path('../.env.production'), dirpath)

  print("Symlinking public/system...")
  (dirpath / 'public').mkdir()
  (dirpath / 'public' / 'system').symlink_to(pathlib.Path('../public/system').resolve())

  print('Archiving redis...')
  shutil.copy(pathlib.Path('/var/lib/redis/dump.rdb'), dirpath)

  print('Generated README.md...')
  emit_readme(dirpath / 'README.md')

  print('Building archive...')
  make_tar(dirpath)


if __name__ == '__main__':
  make_archive()

The script works as follows:

  1. Create an archival directory at a path (I use /media/squire/mastodon/archive).
  2. Into that directory, dump
    1. Postgres via pg_dump
    2. A copy of .env.production
    3. A symlink to the public/system directory
    4. The Redis rdb.
  3. Generate a README.md file describing the date and how to restore from backup. This step I find super-useful: backups drift over time, so by spending just a few kB archiving the steps to restore from backup within the archive itself, I make it much less likely I’ll lose track of that knowledge or try to clobber a newer configuration with an old backup.
  4. tar the archive directory with -zchf options into archive.tar.gz. The h option makes tar follow the symlink to the public/system directory instead of just archiving the link file itself.

Even with the new hard drive, generating the archive takes awhile. But at the end of it all, I have a clean backup I can squirrel away elsewhere for peace of mind.

Comments