Backups: Am I doing this right?

RadDevon@lemmy.zip · 2 months ago

Backups: Am I doing this right?

thelittleblackbird@lemmy.world · 2 months ago

Some clarifications :

The 3 2 1 rule applies only for the data. Not the backup, in my case I have the real/live data, then a daily snapshot in the same volume /pool and a external off-site backup

For the databases you got misleading information, you can copy the files as they are BUT you need to be sure that the database is not running (you could copy the data and n the middle of a transaction leading to some future problems) AND when you restore it, you need to restore to the exact same database version.

Using the export functionality you ensure that the data is not corrupted (the database ensure the correctness of the data) and the possibility to restore to another database version.

My suggestion, use borgbackup or any other backup system with de duplication, stop the docker to ensure no corruptions and save everything. Having a downtime of a minute every day is usually not a deal breaker for home users

RadDevon@lemmy.zip · 2 months ago

Much easier than what I was trying to do. Thank you!

SavvyWolf@pawb.social · 2 months ago

In regards to full system backups, there’s no real need to back up the OS itself. Canonical will give you a clean Ubuntu install if you ask then nice enough, after all. Personally, the risk of having to spend an afternoon reconfiguring my system isn’t that big a deal compared to the storage and time needed to back up an entire image.

I know systems generate a lot of “cruft” in terms of instslled programs and tweaked configurations over time which can be hard to keep track of and remember. But imo that should be avoided at all costs because it leads to compatibility and security issues.

For backing up databases, there’s scripts like automysqlbackup and pg_dump which will export a database to an sql file which can be easily backed up without worrying about copying a broken file.

I actually recently set up borgmatic earlier today and I’d recommend it except for the fact that you seem to be using Docker, and I’m not sure how best to backup containers.

Kng@feddit.rocks · 2 months ago

I usually also backup the etc directory so if I had an issue I would at least have the config files from the old setup. This has already saved me a few times when I have really messed up configuration files.

sugar_in_your_tea@sh.itjust.works · 2 months ago

Yeah, I keep everything as simple as possible. Everything is containerized, and all the configs live in one directory and they store their data on my RAID. I don’t need to go track down configs across the system, and adding a new service doesn’t require any backup config so no risk of forgetting something.

Docker is simple. You map directories in the container to directories on your host, so you put the important data where it’ll get backed up and the less important data (e.g. logs) where it won’t.

Nibodhika@lemmy.world · 2 months ago

I figure the most bang for my buck right now is to set up off-site backups to a cloud provider.

Check out Borgbase, it’s very cheap and it’s an actual backup solution, so it offers some features you won’t get from Google drive or whatever you were considering using e.g. deduplication, recover data at different points in time and have the data be encrypted so there’s no way for them to access it.

I first decided to do a full-system backup in the hopes I could just restore it and immediately be up and running again. I’ve seen a lot of comments saying this is the wrong approach, although I haven’t seen anyone outline exactly why.

The vast majority of your system is the same as it would be if you install fresh, so you’re wasting backup space in storing data you can easily recover in other ways. You would only need to store changes you made to the system, e.g. which packages are installed (just get the list of packages then run an install on them, no need to backup the binaries) and which config changes you made. Plus if you’re using docker for services (which you really should) the services too are very easy to recover. So if you backup the compose file and config folders for those services (and obviously the data itself) you can get back in almost no time. Also even if you do a full system backup you would need to chroot into that system to install a bootloader, so it’s not as straightforward as you think (unless your backup is a dd of the disk, which is a bad idea for many other reasons).

I then decided I would instead cherry-pick my backup locations instead. Then I started reading about backing up databases, and it seems you can’t just back up the data directory (or file in the case of SQLite) and call it good. You need to dump them first and backup the dumps.

Yes and no. You can backup the file completely, but it’s not a good practice. The reason is that if the file gets corrupted you will lose all data, whereas if you dumped the database contents and backed that up is much less likely to corrupt. But in actuality there’s no reason why backing up the files themselves shouldn’t work (in fact when you launch a docker container it’s always an entirely new database pointed to the same data folder)

So, now I’m configuring a docker-db-backup container to back each one of them up, finding database containers and SQLite databases and configuring a backup job for each one. Then, I hope to drop all of those dumps into a single location and back that up to the cloud. This means that, if I need to rebuild, I’ll have to restore the containers’ volumes, restore the backups, bring up new containers, and then restore each container’s backup into the new database. It’s pretty far from my initial hope of being able to restore all the files and start using the newly restored system.

Am I going down the wrong path here, or is this just the best way to do it?

That seems like the safest approach. If you’re concerned about it being too much work I recommend you write a script to automate the process, or even better an Ansible playbook.

RadDevon@lemmy.zip · 2 months ago

Check out Borgbase, it’s very cheap and it’s an actual backup solution, so it offers some features you won’t get from Google drive or whatever you were considering using e.g. deduplication, recover data at different points in time and have the data be encrypted so there’s no way for them to access it.

I looked at Borgbase, but I think it will be a bit more pricey than Restic + Backblaze B2. Looks like Borgbase is $80/year for 1TB, which would be $72/year on B2 and less if I don’t use all of 1TB.

The vast majority of your system is the same as it would be if you install fresh, so you’re wasting backup space in storing data you can easily recover in other ways.

I get this, but it would be faster to restore, right? And the storage I’m going to use to store these files is relatively little compared to the overall volume of data I’m backing up. For example, I’m backing up 100GB of personal photos and home movies. Backing up the system, even though strictly not necessary, will be something like 5% of this, I think, and I’d lean toward paying another few cents every month for a faster restore.

Thanks for your thoughts on the database backups. It’s a helpful perspective!

Nibodhika@lemmy.world · 2 months ago

If all you care is money, then it’s even less on hertzner at 48/year. But the reason I recommended Borgbase is because it’s a bit more known and more trustworthy. $8 a year is a very small difference, sure it will be more than that because, like you said, you won’t use the full TB on B2, but still I don’t think it’ll get that different. However there are some advantages to using a Borg based solution:

Borg can do backup to multiple places at once, so you can have the same thing do a backup to the cloud and to some secondary disk
Borg is an open source tool, so you can run your own Borg server, which means you can have backups sent to your desktop
Again, because Borg is open you can run a raspberry pi with a 1TB usb disk for backup, and that would be cheaper than any solution
Or you could even pair with a friend hosting their backup on your server and he doing the same for you.

And the most important part, migrating from one to the other is simple, just changing config, so you can start with Borgbase, and in a year buy a minicomputer to leave on your parents house and having all of the config changes needed in seconds. Whereas migrating away from B2 will involve a secondary tool. Personally I think that this flexibility is worth way more than those $8/year.

Also Borg has deduplication, versioning and cryptography, I think B2 has all of that but I’m not entirely sure, because it’s my understanding that they duplicate the entire file when some changes happen so you might end up paying lots more for it.

As for the full system backup I still think it’s not worth it, how do you plan on restoring it? You would probably have to plug a liveusb and perform the steps there, which would involve formating your disks properly, connect to the remote server and get your data, chroot into it and install a bootloader. It just seems easier to install the OS and run a script, even if you could shave off 5 minutes if everything worked correctly in the other way and you were very fast doing stuff.

Also your system is constantly changing files, which means more opportunities for files to get corrupted (a similar reason why backing up the folder of a database is a worse idea than backing um a dump of it), and some files are infinite, e.g. /dev/zero or /dev/urandom, so you would need to be VERY careful around what to backup.

At the end of the day I don’t think it’s worth it, how long do you think it takes you to install Linux on a machine? Because I would guess around 20 min, restoring your 1TB backup will certainly take much longer than that (probably a couple of hours) and if you have the system up you can get critical stuff that doesn’t require the full backup early. Another reason why Borg is a good idea, you can have a small critical stuff backup to restore in seconds, and another repository for the stuff that takes longer. So Immich might take a while to come back, but authentik and Caddy can be up in seconds. Again, I’m sure B2 can also do this, but probably not as intuitively.

bluGill@fedia.io · 2 months ago

Just remember any backup is better than nothing. Even if the backup is done wrong (this includes untested!) odds are you can read it and extract at least some data, it just may take a lot of time. Backups that are done right just mean that when (not if!) your computers break you are quickly back up and running.

There are several reasons to backup data only and not the full system. First you may be unable to find a computer exactly/enough like the one that broke, and so the old system backup won’t even run. Second, even if you can find an identical enough system, do you want to, or maybe it is time to upgrade anyway - there are pros and cons of arm (raspberry pi) vs x86 servers (there are other obscure options you might want but those are the main ones), and you may want to switch anyway since you have. Third, odds are some of the services need to be upgraded and so you may as well use this forced computer time to apply the upgrade. Last, you may change how many servers you have, should you split services to different computers, or maybe consolidate the services on the system that died to some other server you already have.

The only advantage of a full system backup is when they work they are the fastest way to get going again.

RadDevon@lemmy.zip · 2 months ago

Just remember any backup is better than nothing.

This is comforting.

There are several reasons to backup data only and not the full system. First you may be unable to find a computer exactly/enough like the one that broke, and so the old system backup won’t even run. Second, even if you can find an identical enough system, do you want to, or maybe it is time to upgrade anyway - there are pros and cons of arm (raspberry pi) vs x86 servers (there are other obscure options you might want but those are the main ones), and you may want to switch anyway since you have. Third, odds are some of the services need to be upgraded and so you may as well use this forced computer time to apply the upgrade. Last, you may change how many servers you have, should you split services to different computers, or maybe consolidate the services on the system that died to some other server you already have.

Some good things to consider here. Whether or not I’ll want to upgrade will depend on how far this theoretical failure is. If storage fails, I might just replace that and restore the backup. If it’s something more significant than that and we’re 2-3 years down the line, I’ll probably look at an upgrade. If it’s less than that, I might just replace with the same to keep things simple.

I guess one other upside of the full system backup is that I could restore just the data out of it if I decide to upgrade when some hardware fails, but I don’t have the reverse flexibility (to do a full system restore) if I opt for a data-only backup.

MangoPenguin@lemmy.blahaj.zone · edit-2 2 months ago

I first decided to do a full-system backup in the hopes I could just restore it and immediately be up and running again. I’ve seen a lot of comments saying this is the wrong approach, although I haven’t seen anyone outline exactly why.

The main downside is the size of the backup, since you’re backing up the entire OS with cache files, log files, other junk, and so on. Otherwise it’s fine.

Then I started reading about backing up databases, and it seems you can’t just back up the data directory (or file in the case of SQLite) and call it good. You need to dump them first and backup the dumps.

You can back up the data directory, that works fine for selfhosted stuff generally because we don’t have tons of users writing to the database constantly.

If you back up /var/lib/docker/volumes, your docker-compose.yaml files for each service, and any other bind mount directories you use in the compose files, then restoring is as easy as pulling all the data back to the new system and running docker compose up -d on each service.

I highly recommend Backrest which uses Restic for backups, very easy to configure and supports Healthchecks integration for easy notifications if backups fail for some reason.

Shimitar@downonthestreet.eu · 2 months ago

Second rest and backrest!

RadDevon@lemmy.zip · 2 months ago

If that’s the main downside to a full-system backup, I might go ahead and try it. I’ll check out Backrest too. Looks great!

MangoPenguin@lemmy.blahaj.zone · edit-2 2 months ago

Yeah there are plenty of advantages of a full system backup, like not having to worry that you’re backing up all the specific directories needed, and super easy restores since the whole bootable system is saved.

Personally I do both, I have a full system backup to local storage using Proxmox Backup Server, and then to Backblaze B2 using Restic I backup only the really important stuff.

marcos@lemmy.world · 2 months ago

I figure the most bang for my buck right now is to set up off-site backups to a cloud provider.

If you don’t have the budget for on-premises backup, you almost certainly can’t afford to restore the cloud backup if anything goes wrong.

Then I started reading about backing up databases

Go read the instructions for your database in particular. They are completely different from each other. Ignore generic instructions.

now I’m configuring a docker-db-backup container

What is perfectly fine. But I’d first look how this interferes with the budget you talked about earlier and if it wouldn’t be better to keep things simpler and put the money on data replication.

Either way, if your budget is low, I’d focus a lot on making sure you have the data when you need to restore, and less on streamlining the restore procedure. (That seems to be the direction you are going, so yeah, I’d say it’s good.) Just make sure to test the restore procedure once in a while.

RadDevon@lemmy.zip · 2 months ago

If you don’t have the budget for on-premises backup, you almost certainly can’t afford to restore the cloud backup if anything goes wrong.

I believe egress is free on Backblaze B2.

Just make sure to test the restore procedure once in a while.

Good call on this. Curious if you have a procedure for actually doing this. I could just wipe out my system and rebuild it from the backup, but then I’m in trouble if it fails. What does a proper test of a backup actually look like?

marcos@lemmy.world · 2 months ago

You test your backup by recreating your system, either in a local environment or in some cheap simulated one.

It’s even better if you write a manual with the steps you needed. And try to follow (and update it) when you do it again.

bigDottee@geekroom.tech · 1 month ago

You are looking for a disaster recovery plan. I believe you are going down the right path, but it’s something that will take time.

I backup important files to my local NAS or directly store them on the local NAS.

This NAS then backs up to an off site cloud backup provider BackBlaze B2 storage.

Finally, I have a virtual machine that has all the same directories mounted and backs up to a different cloud provider.

It’s not quite 3-2-1… but it works.

I only backup important files. I do not do full system backups for my windows clients. I do technically backup full Linux vms from within Proxmox to my NAS…but that’s because I’m lazy and didn’t write a backup script to back up specific files and such. The idea of being able to pull a full system image quickly from a cloud provider will bite you in the ass.

In theory, when backing up containers, you want to backup the configurations, data, and the databases… but you shouldn’t worry about backing up the container image. That can usually be pulled when necessary. I don’t store any of my docker container data in volumes… I use the folder mapping from host to directory in docker container… so I can just backup directories on the host instead of trying to figure out the best way to backup a randomly named docker volume. This way I know what I’m backing up for sure.

Any questions, just ask!

unlogic@lemmy.zip · edit-2 10 days ago

Removed by mod

just_another_person@lemmy.world · edit-2 2 months ago

Some things you should determine first:

Total amount of data you will be backing up
Frequency of backups
Number of copies to keep

Plug these numbers into cost calculators for whatever cloud service you’re hoping to use, because this is honestly not going to be the cheapest route to store off-site if there are ingress charges like with S3.

I know Cloudflare’s R2 service doesn’t charge for ingress or egress (for now), but you might be able to find something even cheaper if you’re only backing up certain types of data that can be easily compressed.

I’d also investigate cheap ways to maybe just store an off-site drive with your data: office/work, family house, friends house…etc. Storage devices are way cheaper than monthly cloud costs.

RadDevon@lemmy.zip · 2 months ago

Had considered a device with some storage at a family member’s house, but then I’d have to maintain that, fix it if it goes down, replace it if it breaks, etc. I think I’d prefer a small monthly fee for now, even if it may work out more expensive in the long run.

Good call on the cost calculation. I’ll take another look at those factors…

Atemu@lemmy.ml · 2 months ago

There’s also the option of just leaving an offline disk at someone’s and visiting them regularly to update the backup.

Having an entirely offline copy also protects you/mitigates against a few additional hazards.

Appoxo@lemmy.dbzer0.com · edit-2 2 months ago

I’m on a shoestring budget at the moment, so I won’t really be able to implement a 3-2-1 strategy just yet

Having any sort of functional (best case tested) backup is more than a majority has ever achieved on a computer.

Personally I am using Veeam with a (free) NFR license.
Though the community edition is plenty for most situations.

As for what I backup: It depends.
My main file storage is backed on a volume basis
My gaming/pc is also volume based but I have another job that backs my game ROMs
My linux server is backed fully as to not loose anything on it
My LDAP server as well as my financial VM is fully bavked up because it’s so small.

Scrubbles@poptalk.scrubbles.tech · edit-2 2 months ago

If you’re using docker (like your DBs run in docker), then I think you’re overthinking it personally. Just back up the volume that the container uses, then you can just plop it back and it will carry on carefree.

I usually did a simple tar cvf /path/to/compressed.tar.gz /my/docker/volume for each of my volumes, then backed up the tar. Kept symlinks and everything nice and happy. If you do that for each of your volumes, and you also have your config for running your containers like a docker-compose, congrats that’s all you need.

I don’t know who said you can’t just back up the volume, to me that’s kind of the point of docker. It’s extreme portability.

RadDevon@lemmy.zip · 2 months ago

OK, cool. That’s helpful. Thank you!

I know in general you can just grab a docker volume and then point at it with a new container later, but I was under the impression that backing up a database in particular in this way could leave you with a database in a bad state after restoring. Fingers crossed that was just bad info. 😅

supersheep@lemmy.world · 2 months ago

In theory the database can end up in an invalid state when you leave the database container running. What I do for most containers is to temporarily stop them, backup the Docker volume and then restart the container.

Scrubbles@poptalk.scrubbles.tech · edit-2 2 months ago

Seconded, and great callout @RadDevon@lemmy.zip , yes part of my script was to stop the container gracefully, tar it, start it again, and then copy the tar somewhere. it “should” be fine, in a production environment where you could have zero downtime I would take a different approach, but we’re selfhosters. Just schedule it for 2am or something.

Oh, and feel free to test! Docker makes it super easy. Just extract the tar somewhere else on the drive, point your container to the new volume, see if it spins up. Then you’ll know your backup strategy is working!

RadDevon@lemmy.zip · 2 months ago

Is your script something you can share? I’d love to see your approach. I can definitely live with a few minutes of down time in the early morning.

Scrubbles@poptalk.scrubbles.tech · 2 months ago

That particular one is long gone I’m afraid, but it’s essentially just docker compose down, tar like I did above, docker compose up -d, and then I used rclone to upload it

RadDevon@lemmy.zip · 2 months ago

Much simpler than my solution. I’ll look into this. Thank you!

just_another_person@lemmy.world · 2 months ago

deleted by creator