Backing Up Mastodon and expanding its storage

2023-08-08

Mastodon , sysadmin , Raspberry Pi

My Mastodon server has reached the point where it’s complex enough that backing up would be prudent. I was a litle surprised to discover that a script to do this for me is not provided in the default project. The project does have a comprehensive overview of the steps to backup, so I set about making my own backup script. In the process, I also discovered a need for way more storage, so I’ll talk a bit about how I did that also.

Expanding storage on a Raspberry Pi

In two weeks of operation, the cache of user-generated data (in public/system) has expanded to about 20GB. I can tune how much is retained and for how long, but in general I expect needing to store images and video will continue to be a concern. So I went ahead and sprang for a 1TB drive from Seagate for all my storage needs.

Getting that configured onto the Pi wasn’t too hard; it involved a couple steps and some tools I’m not practiced with.

Find the drive. After plugging in the drive, run lsblk to find the drive’s mount point (probably /dev/sd<something>; mine was /dev/sda with a partition at /dev/sda1; we care about the partition).
Format the drive for ext4. The drive comes with NTFS formatting, which doesn’t support POSIX-style permisisons. To format it to ext4, we do sudo mkfs -t ext4 <whatever device your drive is on>. So for me, sudo mkfs -t ext4 /dev/sda1.
Find device PARTUUID. Run sudo blkid to get information on your devices. You’re looking for the PARTUUID of the drive. Write that down.
Configure the system to auto-mount the drive. Raspberry Pi’s OS uses fstab to configure hardware mounting. To configure, edit /etc/fstab with root permissions and add a line like this to the file: PARTUUID=<the UUID you wrote down> /media/squire ext4 defaults 0 2 This configures that device to mount at the /media/squire directory. It also specifies it uses ext4 and that the default set of configuration rules should be used. The 0 denotes how often the filesystem should be dumped (“never,” which is generally correct for modern filesystems). The 2 indicates this is not the root filesystem (sets fsck order).
Make the directory. sudo mkdir -p /media/squire.
Restart, or mount by hand. At this point I can restart the Pi and the fstab config should work, or I can hand-mount the drive with sudo mount /dev/sda1 /media/squire. Now when I change directory to /media/squire, there’s whole terabyte of storage there!

Using the expanded storage to store assets

To take advantage of the new storage, we’ll symlink from the Mastodon install to that location. I have my Mastodon node set up as per the recommendation on the site, so my downloaded assets (all 20GB of them) are at /home/mastodon/live/public/system. To shift that to the new drive (all of these as root):

mkdir -p /media/squire/mastodon/live/public
chown -R mastodon:mastodon /media/squire/mastodon
cd /home/mastodon/live/public
# Bring the server down so new assets aren't loaded while you're copying the storage
systemctl down --now mastodon-web mastodon-streaming mastodon-sidekiq
mv system system-old
cp -Rp system-old /media/squire/mastodon/live/public/system
# ... wait a long time; flash memory is not fast. This took about a half hour for me
ln -s /media/squire/mastodon/live/public/system system
chown -h mastodon:mastodon system

Now we’re all set; with the symlink in place, Mastodon is writing the assets to the remote drive.

… or it would be, if this were 2002.

Broccoli Man, the superhero from the 'I just want to serve five terabytes' video — You think you are done? Haha. You are so funny.

Systemd services have an additional safety feature: they scope the directories they are allowed to access. That normally would be no problem, but the scopes don’t follow symlinks and since we’re symlinking to a directory in a different part of the filesystem, systemd will block the daemons it launches from accessing that path. To address this, we need to edit the files mastodon-web.serivce, mastodon-streaming.service, and mastodon-sidekiq.service under /etc/systemd/system to include our new location in scope.

In each file, locate the ReadWritePaths line and add the path you just copied your system directory to:

ReadWritePaths=/home/mastodon/live /media/squire/mastodon/live/public/system

Then you’ll need to let systemd read the updated files:

sudo systemctl daemon-reload

Now you’re all set. Bring the services back online and you’re all configured to serve from the larger disk.

systemctl up --now mastodon-web mastodon-streaming mastodon-sidekiq
# At some point in the future, once you're sure everything is working
rm -rf /home/mastodon/live/public/system-old

Creating backups

With the new storage in place, I can put some backups together. For my backup script, I wrote a bit of Python at /home/mastodon/live/admin/make-archive. No specific reason for Python, other than (a) I’m familiar with it and (b) it’s flexible. If I were more familiar with Rails, that would probably be the best option.

As documented, backing up Mastodon involves archiving data from four sources:

The PostgreSQL database
The .env.production application secrets
The user-uploaded files
The Redis database

To easily access Redis, I expanded my Mastodon user’s permissions with usermod -a -G redis mastodon. Then it was just a matter of putting the script together.

#!/usr/bin/python
# Intended for Pyton v3.9.2
# Uses click (click.palletsproject.com)
# Copyright 2023 Mark T. Tomczak
# Released under Creative Commons Attribution-ShareAlike license
# (CC BY-SA 4.0)

import click
from datetime import datetime
import pathlib
import shutil
import subprocess
from typing import Final

README_TEMPLATE: Final = """
# Mastodon server archive

Date: {date}

To restore:
* Read te dump file into postgres: `psql mastodon_production < mastodon_production.sql`
* Copy .env.production back to live/.env.production
* Merge public/system into live/public/system
* Overwrite /var/lib/redis/dump.rdb with the archived dump.rdb
"""

def dump_postgres(path:pathlib.Path, db_name: str) -> None:
  """
  Dumps the postgres database to path.

  path: Location to store the dump
  db_name: Database to dump
  """
  subprocess.run(['pg_dump', f'--file={path}', db_name]).check_returncode()

def emit_readme(path: pathlib.Path) -> None:
  """
  Emits a README at the specified path
  """
  with open(path, 'w') as f:
    f.write(README_TEMPLATE.format(date=datetime.now().strftime("%d/%m/%Y %H:%M:%S")))

def make_tar(pathdir: pathlib.Path) -> None:
  """
  TARs the archive dir to a <dir>archive.tar.gz file
  """
  archive_members = [p.relative_to(pathdir) for p in pathdir.glob('*')]
  cmd = ['tar', '-zchf', pathdir / 'archive.tar.gz'] + archive_members
  subprocess.run(cmd, cwd=pathdir).check_returncode()

@click.command()
@click.argument('dirpath', type=click.Path(file_okay=False, dir_okay=True, path_type=pathlib.Path))
def make_archive(dirpath: pathlib.Path) -> None:
  """
  Generate an archive of the Mastodon server in the directory specified by dirpath.

  Directory will be emptied by the operation and is created if it does not exist.
  """

  dirpath.mkdir(parents=True, exist_ok=True)

  for path in dirpath.glob('**/*'):
    if path.is_file():
      path.unlink()
    elif path.is_dir():
      shutil.rmtree(path)

  print("Dumping postgres...")
  dump_postgres(dirpath / 'mastodon_production.sql', 'mastodon_production')

  print("Backing up .env.production...")
  shutil.copy(pathlib.Path('../.env.production'), dirpath)

  print("Symlinking public/system...")
  (dirpath / 'public').mkdir()
  (dirpath / 'public' / 'system').symlink_to(pathlib.Path('../public/system').resolve())

  print('Archiving redis...')
  shutil.copy(pathlib.Path('/var/lib/redis/dump.rdb'), dirpath)

  print('Generated README.md...')
  emit_readme(dirpath / 'README.md')

  print('Building archive...')
  make_tar(dirpath)


if __name__ == '__main__':
  make_archive()

The script works as follows:

Create an archival directory at a path (I use /media/squire/mastodon/archive).
Into that directory, dump
1. Postgres via pg_dump
2. A copy of .env.production
3. A symlink to the public/system directory
4. The Redis rdb.
Generate a README.md file describing the date and how to restore from backup. This step I find super-useful: backups drift over time, so by spending just a few kB archiving the steps to restore from backup within the archive itself, I make it much less likely I’ll lose track of that knowledge or try to clobber a newer configuration with an old backup.
tar the archive directory with -zchf options into archive.tar.gz. The h option makes tar follow the symlink to the public/system directory instead of just archiving the link file itself.

Even with the new hard drive, generating the archive takes awhile. But at the end of it all, I have a clean backup I can squirrel away elsewhere for peace of mind.

Expanding storage on a Raspberry Pi

Using the expanded storage to store assets

Creating backups

Comments