With my new NAS, it was also time to rethink my backup strategy. I used to have a simple strategy on the NAS itself:
- Stored all data at a RAID-1 array
- Once a day, ran a small rsync script that would create a snapshot (at a separate disk), using
--link-dest
to only store the actual changes (and hardlinks to unchanged files) - Once a week, used rclone to sync the backup to Backblaze B2 (and had Backblaze configured to perform deletes after 6 months)
My data is as follows:
- In total about 1 TB uncompressed
- Slowly changing. Most files are never changed. And the files that are changing as small documents, and not really that often. I am not too worried about losing a week of changes (or even a month).
- Most additions are probably photos (that are stored on a phone as well)
This was working quite ok, but had a few disadvantages:
- As I obviously didn’t take the effort to write a script to purge old backups, the backup disk filled up slowly, and the backup destination became slower and slower (apparently ext4 doesn’t like 1500 hardlinks to hundreds (?) of thousands of files)
- The backup was not encrypted. At that time, I prefered the availability over security. Being more worried about not being able to restore my files.
When looking at alternatives, one major concern that I had with ‘real’ backup software was the non-standardized file structure of the backups. I wanted to be sure that at a certain moment, I was able to restore my backup. But after some time I came to the realisation that in practice, if a file structure / backup application would change, you would notice that, and have time to switch to something else (and you could argue the value of a backup of 10 year old in a home-situation anyway).
In the end I opted to go with restic. A few tests were really successful, and although the structure of the backup is not ‘just plain files’ it was described well enough and was logical enough that in worst-worst-case scenario I probably would be able to write something myself to restore something.
I have been using restic for almost a year, and I really like it. It is almost perfect. It is fast, it is encrypted, after data is stored, it will not change anymore. It just works.
My setup is as follows:
- On my old NAS, I have a restic-server running, accepting backups in append-only mode.
- On my new NAS I backup data to the restic-server. The passphrase / repository are stored in the environment of the backup user. If this gets exposed, then you’re only able to 1) read the backup (but you have access to the files anyway already), 2) add new (junk?) to the backup. But you’re not able to delete things from the backup
- On my old NAS, I am able to forget/prune old backups. I use a default removal policy (keep daily for a week, weekly for a month, monthly for a year, etc). On that NAS, I don’t store the passphrase. I use
RESTIC_PASSWORD_COMMAND
to fetch it from my new NAS. - On my old NAS, I use rclone to sync the restic repository to Backblaze B2
- The Backblaze B2 repository has file versioning enabled
- Once every x months (yeah, yeah, I know, I did this once), I copy the repository from my old NAS to an external drive, and keep it offsite (just plain ext4 - as the repository is encrypted, no need for LUKS)
- My old NAS is shut down except during a backup
With this approach I think I am relatively safe:
- When I as a human make a mistake with my files, I can just restore it from the old NAS
- When my house burns down, I can use an external drive and/or Backblaze B2 to restore. I tested, through rclone, you can directly restore from Backblaze, but by using the external drive as a base, I could even only download just the delta when I would need to do a full recovery.
- When a cryptolocker encrypts my NAS, it can’t delete the backup. Even if it manages to do that, my Backblaze backups are versioned for months, so I can go back to a previous moment in time and restore from there.
- When the external drive is lost, you need the passphrase to see the data
- When you have access to Backblaze, yo need the passphrase to see the data