Incremental backups to optical media: tar, dar, or something else?

traches@sh.itjust.works · 11 days ago

I’ve spent many nights roaming in an EVE online pirate gang shooting the shit on mumble. Can recommend.

traches@sh.itjust.works · 2 months ago

Make your own dockerfile, and the first line will be FROM <upstream>. Then make your changes.

traches@sh.itjust.works · 4 months ago

Yeah, syncthing can do all of that except public share links. Run an instance on your NAS so there is always a sync target online.

traches@sh.itjust.works · edit-2 4 months ago

I strongly recommend ZFS as a filesystem for this as it can handle your sync, backup, and quota needs very well. It also has data integrity guarantees that should frankly be table stakes in this application. Truenas is an easy way to accomplish this, and it can run docker containers and VMs if you like.

Tailscale is a great way to connect them all, and connect to your nas when you aren’t home. You can share devices between tailnets, so you don’t all have to be on the same Tailscale account.

I’ll caution against nextcloud, it has a zillion features but in my experience it isn’t actually that good at syncing files. It’s complicated to set up, complicated to maintain, and there are frequent bugs. Consider just using SMB file sharing (built into truenas), or an application that only syncs files without trying to be an entire office suite as well.

For your drive layouts, I’d go with big drives in a mirror. This keeps your power and physical space requirements low. If you want, ZFS can also transparently put metadata and small files on SSDs for better latency and less drive thrashing. (These should also be mirrored.) Do not add an L2ARC drive, it is rarely helpful.

The boxes are kinda up to you. Avoid USB enclosures if at all possible. Truenas can be installed on most prebuilt NAS boxes other than synology, presuming it meets the requirements. You can also build your own. Hot swap is nice, and a must-have if you need normies to work on it. Label the drive serial number on the outside so you can tell them apart. Don’t go for less than 4 bays, and more is better even if you don’t need them yet. You want as much RAM as feasibly possible; ZFS uses it for caching, and it gives you room to run containers and VMs.

traches@sh.itjust.works · 4 months ago

I wish he didn’t feel the need to be so defensive about his choices. Bazzite is perfect for this use case

traches@sh.itjust.works · 6 months ago

Oh nice! Does the fact that it’s an appimage mean I don’t need developer mode?

traches@sh.itjust.works · edit-2 6 months ago

NAS at the parents’ house. Restic nightly job, with some plumbing scripts to automate it sensibly.

traches@sh.itjust.works · 7 months ago

Have you considered karakeep (formerly hoarder)? It does all of this really well - drop it a URL and it saves a copy. Has lists & tagging (can be done by AI if you want), IOS & android apps as well as browser extensions that make saving stuff super easy.

https://karakeep.app/

traches@sh.itjust.works · 7 months ago

Broadly similar from a quick glance: https://www.amazon.pl/s?k=m-disc+blu+ray

traches@sh.itjust.works · 7 months ago

My options look like this:

https://allegro.pl/kategoria/nosniki-blu-ray-257291?m-disc=tak

Exchange rate is 3.76 PLN to 1 USD, which is actually the best I’ve seen in years

traches@sh.itjust.works · 7 months ago

I only looked how zfs tracks checksums because of your suggestion! Hashing 2TB will take a minute, would be nice to avoid.

Nushell is neat, I’m using it as my login shell. Good for this kind of data-wrangling but also a pre-1.0 moving target.

traches@sh.itjust.works · 7 months ago

Tailscale deserves it, bitcoin absolutely does not

traches@sh.itjust.works · 7 months ago

Where I live (not the US) I’m seeing closer to $240 per TB for M-disc. My whole archive is just a bit over 2TB, though I’m also including exported jpgs in case I can’t get a working copy of darktable that can render my edits. It’s set to save xmp sidecars on edit so I don’t bother with backing up the database.

I mostly wanted a tool to divide up the images into disk-sized chunks, and to automatically track changes to existing files, such as sidecar edits or new photos. I’m now seeing I can do both of those and still get files directly on the disk, so that’s what I’ll be doing.

I’d be careful with using SSDs for long term, offline storage. I hear they lose data if not powered for a long time. IMO metadata is small enough to just save a new copy when it changes

traches@sh.itjust.works · 7 months ago

I’ve been thinking through how I’d write this. With so many files it’s probably worth using sqlite, and then I can match them up by joining on the hash. Deletions and new files can be found with different join conditions. I found a tool called ‘hashdeep’ that can checksum everything, though for incremental runs I’ll probably skip hashing if the size, times, and filename haven’t changed. I’m thinking nushell for the plumbing? It runs everywhere, though they have breaking changes frequently. Maybe rust?

ZFS checksums are done at the block level, and after compression and encryption. I don’t think they’re meant for this purpose.

traches@sh.itjust.works · edit-2 7 months ago

Aww, man, I’m conflicted here. On one hand, I’ve enjoyed their work for years and they seem like good dudes who deserve to eat. On the other, they’re AI enthusiast crypto-bros and that’s just fucking exhausting. I deal with enough of that bullshit at work

Edit: rephrase for clarity

traches@sh.itjust.works · 7 months ago

humans are neat

traches@sh.itjust.works · 7 months ago

Yeah, you’re probably right. I already bought all the stuff, though. This project is halfway vibes based; something about spinning rust just feels fragile you know?

I’m definitely moving away from the complex archive split & merge solution. fpart can make lists of files that add up to a given size, and fd can find files modified since a given date. Little bit of plumbing and I’ve got incremental backups that show up as plain files & folders on a disk.

traches@sh.itjust.works · edit-2 7 months ago

Ohhh boy, after so many people are suggesting I do simple files directly on the disks I went back and rethought some things. I think I’m landing on a solution that does everything and doesn’t require me to manually manage all these files:

fd (and any number of other programs) can produce lists of files that have been modified since a given date.
fpart can produce lists of files that add up to a given size.
xorrisofs can accept lists of files to add to an iso

So if I fd a list of new files (or don’t for the first backup), pipe them into fpart to chunk them up, and then pass these lists into xorrisofs to create ISOs, I’ve solved almost every problem.

The disks have plain files and folders on them, no special software is needed to read them. My wife could connect a drive, pop the disk in, and the photos would be right there organized by folder.
Incremental updates can be accomplished by keeping track of whenever the last backup was.
The fpart lists are also a greppable index; I can use them to find particular files easily.
Corruption only affects that particular file, not the whole archive.
A full restore can be accomplished with rsync or other basic tools.

Downsides:

Change detection is naive. Just mtime. Good enough?
Renames will still produce new copies. Solution: don’t rename files. They’re organized well enough, stop messing with it.
Deletions will be disregarded. I could solve this with some sort of indexing scheme, but I don’t think I care enough to bother.
There isn’t much rhyme or reason to how fpart splits up files. The first backup will be a bit chaotic. I don’t think I really care.
If I rsync -a some files into the dataset, which have mtimes older than the last backup, they won’t get slurped up in the next one. Can be solved by checking that all files are already in the existing fpart indices, or by just not doing that.

Honestly those downsides look quite tolerable given the benefits. Is there some software that will produce and track a checksum database?

Off to do some testing to make sure these things work like I think they do!

traches@sh.itjust.works · 7 months ago

Yeah, I already use restic which is extremely similar and I don’t believe it could do this either. Both are awesome projects though

traches@sh.itjust.works · edit-2 7 months ago

Incremental backups to optical media: tar, dar, or something else?

traches@sh.itjust.works · 1 year ago

Best way to keep a hot spare SD card for a raspberry pi?