Wednesday, January 7, 2015

Using Amazon Simple Storage Services (S3) as cheap, reliable DIY backup solution

Using Amazon Simple Storage Services (S3) as a cheap, reliable DIY backup solution


Background:


A non-profit organization's machine had a MySQL failure and part of the database got corrupted. Unfortunately, we did not have a recent backup. We restored the last available backup.

However, this was a wake up call to implement a regular backup plan for the organization's data:
A. MySQL dumps - approximately 500Mb
B. Regular disk files - approximately 50+Gb

Backup frequency could be a full backup once per week and an incremental backup once per day.


Investigating available options:


Implementing a regular scp or rsync over the internet to a home machine was not feasible, as the home machines are not constantly connected. I checked out a few online backup service providers like SpiderOak, BackBlaze and Carbonite. However, for reasons of either cost, lack of OS support and ease of use, I could not utilize any of these.

On various forums, I saw mention of Amazon S3 as a highly reliable storage option. 
However, there was no pre-packaged tool for doing backups to S3. So I decided to roll my own using the Amazon SDK. It looks like Amazon monthly charges are about 1 cent for 2000 files uploaded and a fraction of a penny for each GB stored. For 50GB and 20K files, this likely comes to about $2 per month or less. With a homegrown backup solution and Amazon's highly reliable S3 service, we have a very cheap, very reliable cloud backup solution.

Amazon S3 DIY backup solution


1. Create an Amazon Web Services account. This requires you to provide a credit card number. They do have a free service tier (a tiny bit of storage) which provides you the ability to play with the Amazon services for no cost.

2. From the Amazon console's S3 screen, create an S3 bucket (essentially a directory) that will be the root folder for the backups. Within this bucket, create a sub-folder with the same name as the directory you wish to backup from your machine (keeping the same names keeps things simple).

3. Create a user account  within your main account with correct permissions to read/write to this S3 bucket.

4. Generate API credentials for this user for use with the Amazon SDK. These credentials are only displayed once. Download them as a CSV and save them.

5. Install Python-2.6 on the CentOS machine which is required by the Amazon SDK. This requires enabling some alternative yum repositories. The default python on CentOS 5 is Python-2.4 which is not compatible with the Amazon SDK.

$  sudo yum install epel-release
$  sudo yum update
$  sudo yum install python26

 6. Download and install the Amazon CLI. This CLI has many different options for interacting with the various Amazon services. Specifically, I wanted to use 'aws s3 sync' for using it to sync all files from the machine to S3.

$ sudo python26 get-pip.py
$ sudo pip install awscli 

6.1.  Configure the AWS CLI with the user credentials

user@user.org [~]# aws configure
AWS Access Key ID [None]: AAAAAAAAAAAAAAAAAAAAAAA
AWS Secret Access Key [None]: asdfasdfasdfasdfasdfasdfasdfasdfasfasdfasdf
Default region name [None]: us-west-2
Default output format [None]: json



7. Write a few scripts that do the backups to S3

7.1. Script to dump the MySQL database into a file and compress it. Put the script in cron to make nightly backups and delete backups that are older than 7 days. This way we maintain 7 days worth of database backups on local disk. Recommend doing this in a directory set aside for this purpose.

$ mysqldump  -u dbuser --password=dbpassword databasename > db_dump`date +%F`.sql
$ find . -name db_dump\* -mtime 7 -delete


7.2. Do an initial full sync of top level directory on the machine to S3. This can take hours to run.

$ aws s3 sync /home/user/data s3://bucket/data --recursive --quiet

7.3 Write a script to do the same sync and put it in cron. This is the incremental backup.
It's possible to optimize the synch to first use the find command to locate recently modified files and do a copy of these files only. That is:

#!/bin/bash
set -e
# presume the current working directory is to be backed up.
modified_files=`find . -mtime 1`
for modfile in $modified_files
do
  aws s3 sync $modfile s3://bucket/$modfile --quiet
done


No comments:

Post a Comment