Backing Up Mastodon and expanding its storage
My Mastodon server has reached the point where it’s complex enough that backing up would be prudent. I was a litle surprised to discover that a script to do this for me is not provided in the default project. The project does have a comprehensive overview of the steps to backup, so I set about making my own backup script. In the process, I also discovered a need for way more storage, so I’ll talk a bit about how I did that also.
Expanding storage on a Raspberry Pi
In two weeks of operation, the cache of user-generated data (in public/system
)
has expanded to about 20GB. I can tune how much is retained and for how long,
but in general I expect needing to store images and video will continue to be a
concern. So I went ahead and sprang for a
1TB drive from Seagate
for all my storage needs.
Getting that configured onto the Pi wasn’t too hard; it involved a couple steps and some tools I’m not practiced with.
- Find the drive. After plugging in the drive, run
lsblk
to find the drive’s mount point (probably/dev/sd<something>
; mine was/dev/sda
with a partition at/dev/sda1
; we care about the partition). - Format the drive for ext4. The drive comes with NTFS formatting, which
doesn’t support POSIX-style permisisons. To format it to ext4, we do
sudo mkfs -t ext4 <whatever device your drive is on>
. So for me,sudo mkfs -t ext4 /dev/sda1
. - Find device PARTUUID. Run
sudo blkid
to get information on your devices. You’re looking for thePARTUUID
of the drive. Write that down. - Configure the system to auto-mount the drive. Raspberry Pi’s OS uses
fstab to configure hardware mounting. To configure, edit
/etc/fstab
with root permissions and add a line like this to the file:PARTUUID=<the UUID you wrote down> /media/squire ext4 defaults 0 2
This configures that device to mount at the/media/squire
directory. It also specifies it usesext4
and that the default set of configuration rules should be used. The0
denotes how often the filesystem should be dumped (“never,” which is generally correct for modern filesystems). The2
indicates this is not the root filesystem (setsfsck
order). - Make the directory.
sudo mkdir -p /media/squire
. - Restart, or mount by hand. At this point I can restart the Pi and the
fstab config should work, or I can hand-mount the drive with
sudo mount /dev/sda1 /media/squire
. Now when I change directory to/media/squire
, there’s whole terabyte of storage there!
Using the expanded storage to store assets
To take advantage of the new storage, we’ll symlink from the Mastodon install to
that location. I have my Mastodon node set up as per the
recommendation on the site, so my
downloaded assets (all 20GB of them) are at /home/mastodon/live/public/system
. To
shift that to the new drive (all of these as root):
mkdir -p /media/squire/mastodon/live/public
chown -R mastodon:mastodon /media/squire/mastodon
cd /home/mastodon/live/public
# Bring the server down so new assets aren't loaded while you're copying the storage
systemctl down --now mastodon-web mastodon-streaming mastodon-sidekiq
mv system system-old
cp -Rp system-old /media/squire/mastodon/live/public/system
# ... wait a long time; flash memory is not fast. This took about a half hour for me
ln -s /media/squire/mastodon/live/public/system system
chown -h mastodon:mastodon system
Now we’re all set; with the symlink in place, Mastodon is writing the assets to the remote drive.
… or it would be, if this were 2002.
Systemd services have an additional safety feature: they scope the directories
they are allowed to access. That normally would be no problem, but the scopes
don’t follow symlinks and since we’re symlinking to a directory in a different
part of the filesystem, systemd will block the daemons it launches from
accessing that path. To address this, we need to edit the files
mastodon-web.serivce
, mastodon-streaming.service
, and
mastodon-sidekiq.service
under /etc/systemd/system
to include our new
location in scope.
In each file, locate the ReadWritePaths
line and add the path you just copied your
system directory to:
ReadWritePaths=/home/mastodon/live /media/squire/mastodon/live/public/system
Then you’ll need to let systemd read the updated files:
sudo systemctl daemon-reload
Now you’re all set. Bring the services back online and you’re all configured to serve from the larger disk.
systemctl up --now mastodon-web mastodon-streaming mastodon-sidekiq
# At some point in the future, once you're sure everything is working
rm -rf /home/mastodon/live/public/system-old
Creating backups
With the new storage in place, I can put some backups together. For my backup
script, I wrote a bit of Python at /home/mastodon/live/admin/make-archive
. No
specific reason for Python, other than (a) I’m familiar with it and (b) it’s
flexible. If I were more familiar with Rails, that would probably be the best
option.
As documented, backing up Mastodon involves archiving data from four sources:
- The PostgreSQL database
- The
.env.production
application secrets - The user-uploaded files
- The Redis database
To easily access Redis, I expanded my Mastodon user’s permissions with usermod -a -G redis mastodon
. Then it was just a matter of putting the script together.
#!/usr/bin/python
# Intended for Pyton v3.9.2
# Uses click (click.palletsproject.com)
# Copyright 2023 Mark T. Tomczak
# Released under Creative Commons Attribution-ShareAlike license
# (CC BY-SA 4.0)
import click
from datetime import datetime
import pathlib
import shutil
import subprocess
from typing import Final
README_TEMPLATE: Final = """
# Mastodon server archive
Date: {date}
To restore:
* Read te dump file into postgres: `psql mastodon_production < mastodon_production.sql`
* Copy .env.production back to live/.env.production
* Merge public/system into live/public/system
* Overwrite /var/lib/redis/dump.rdb with the archived dump.rdb
"""
def dump_postgres(path:pathlib.Path, db_name: str) -> None:
"""
Dumps the postgres database to path.
path: Location to store the dump
db_name: Database to dump
"""
subprocess.run(['pg_dump', f'--file={path}', db_name]).check_returncode()
def emit_readme(path: pathlib.Path) -> None:
"""
Emits a README at the specified path
"""
with open(path, 'w') as f:
f.write(README_TEMPLATE.format(date=datetime.now().strftime("%d/%m/%Y %H:%M:%S")))
def make_tar(pathdir: pathlib.Path) -> None:
"""
TARs the archive dir to a <dir>archive.tar.gz file
"""
archive_members = [p.relative_to(pathdir) for p in pathdir.glob('*')]
cmd = ['tar', '-zchf', pathdir / 'archive.tar.gz'] + archive_members
subprocess.run(cmd, cwd=pathdir).check_returncode()
@click.command()
@click.argument('dirpath', type=click.Path(file_okay=False, dir_okay=True, path_type=pathlib.Path))
def make_archive(dirpath: pathlib.Path) -> None:
"""
Generate an archive of the Mastodon server in the directory specified by dirpath.
Directory will be emptied by the operation and is created if it does not exist.
"""
dirpath.mkdir(parents=True, exist_ok=True)
for path in dirpath.glob('**/*'):
if path.is_file():
path.unlink()
elif path.is_dir():
shutil.rmtree(path)
print("Dumping postgres...")
dump_postgres(dirpath / 'mastodon_production.sql', 'mastodon_production')
print("Backing up .env.production...")
shutil.copy(pathlib.Path('../.env.production'), dirpath)
print("Symlinking public/system...")
(dirpath / 'public').mkdir()
(dirpath / 'public' / 'system').symlink_to(pathlib.Path('../public/system').resolve())
print('Archiving redis...')
shutil.copy(pathlib.Path('/var/lib/redis/dump.rdb'), dirpath)
print('Generated README.md...')
emit_readme(dirpath / 'README.md')
print('Building archive...')
make_tar(dirpath)
if __name__ == '__main__':
make_archive()
The script works as follows:
- Create an archival directory at a path (I use
/media/squire/mastodon/archive
). - Into that directory, dump
- Postgres via
pg_dump
- A copy of
.env.production
- A symlink to the
public/system
directory - The Redis rdb.
- Postgres via
- Generate a README.md file describing the date and how to restore from backup. This step I find super-useful: backups drift over time, so by spending just a few kB archiving the steps to restore from backup within the archive itself, I make it much less likely I’ll lose track of that knowledge or try to clobber a newer configuration with an old backup.
- tar the archive directory with
-zchf
options intoarchive.tar.gz
. Theh
option makes tar follow the symlink to thepublic/system
directory instead of just archiving the link file itself.
Even with the new hard drive, generating the archive takes awhile. But at the end of it all, I have a clean backup I can squirrel away elsewhere for peace of mind.
Comments