Scale and Performance of Software

Monday, January 12, 2015

Using Docker on Mac OS X

Installation

Docker is the newest iteration of linux containers - self-contained, chroot-like minature VM. Docker is an OSS product, produced by https://www.docker.com/ .

Installing it on Mac OS X requires installing VirtualBox, Docker and boot2docker. I used Home Brew package manager to install (I had VirtualBox installed already)

brew update
brew install docker
brew install boot2docker
boot2docker init
boot2docker up

The up command returns three environment variables DOCKER_CERT_PATH, DOCKER_TLS_VERIFY and DOCKER_HOST that need to be exported to your environment.

After this, docker is installed and ready to use.

Find a list of docker images with the command

docker images

Initially there will be no images. When using docker to create a new image, the actual image is downloaded and installed. After that first install, that image will be available for future use.

Creating first Docker Container

Let's create a CentOS container.

docker run --rm -i -t centos /bin/bash

That's it. All the needed packages will be downloaded and installed
You will be popped into a root prompt of the container - essentially a fully featured CentOS system with its own filesystem, network stack, IP, etc.

--rm means it will destroyed on exit
-i means interactive (because we are running /bin/bash inside the centos image)

Compute to heart's content. Exit with Control-D and the container is destroyed.

Creating a non-transient docker image

docker run -i -t centos /bin/bash

Make changes to this container (create files) and exit

Now, the command "docker ps -a" should show a new container.
You can save the state of this container aas a new image using
docker commit CONTAINERID NAME
after which "docker images" should show the new image listed with NAME.

Dockerfiles

Dockerfiles are bootstrap files that can be associated with an image. It can be used to install new packages, mount filesystems, execute commands, etc. Sample docker files are shared by the OSS community on hub.docker.com

Wednesday, January 7, 2015

Using Amazon Simple Storage Services (S3) as cheap, reliable DIY backup solution

Using Amazon Simple Storage Services (S3) as a cheap, reliable DIY backup solution

Background:

A non-profit organization's machine had a MySQL failure and part of the database got corrupted. Unfortunately, we did not have a recent backup. We restored the last available backup.

However, this was a wake up call to implement a regular backup plan for the organization's data:
A. MySQL dumps - approximately 500Mb
B. Regular disk files - approximately 50+Gb

Backup frequency could be a full backup once per week and an incremental backup once per day.

Investigating available options:

Implementing a regular scp or rsync over the internet to a home machine was not feasible, as the home machines are not constantly connected. I checked out a few online backup service providers like SpiderOak, BackBlaze and Carbonite. However, for reasons of either cost, lack of OS support and ease of use, I could not utilize any of these.

On various forums, I saw mention of Amazon S3 as a highly reliable storage option.
However, there was no pre-packaged tool for doing backups to S3. So I decided to roll my own using the Amazon SDK. It looks like Amazon monthly charges are about 1 cent for 2000 files uploaded and a fraction of a penny for each GB stored. For 50GB and 20K files, this likely comes to about $2 per month or less. With a homegrown backup solution and Amazon's highly reliable S3 service, we have a very cheap, very reliable cloud backup solution.

Amazon S3 DIY backup solution

1. Create an Amazon Web Services account. This requires you to provide a credit card number. They do have a free service tier (a tiny bit of storage) which provides you the ability to play with the Amazon services for no cost.

2. From the Amazon console's S3 screen, create an S3 bucket (essentially a directory) that will be the root folder for the backups. Within this bucket, create a sub-folder with the same name as the directory you wish to backup from your machine (keeping the same names keeps things simple).

3. Create a user account within your main account with correct permissions to read/write to this S3 bucket.

4. Generate API credentials for this user for use with the Amazon SDK. These credentials are only displayed once. Download them as a CSV and save them.

5. Install Python-2.6 on the CentOS machine which is required by the Amazon SDK. This requires enabling some alternative yum repositories. The default python on CentOS 5 is Python-2.4 which is not compatible with the Amazon SDK.

$ sudo yum install epel-release
$ sudo yum update
$ sudo yum install python26

6. Download and install the Amazon CLI. This CLI has many different options for interacting with the various Amazon services. Specifically, I wanted to use 'aws s3 sync' for using it to sync all files from the machine to S3.

$ sudo python26 get-pip.py
$ sudo pip install awscli

6.1. Configure the AWS CLI with the user credentials

user@user.org [~]# aws configure
AWS Access Key ID [None]: AAAAAAAAAAAAAAAAAAAAAAA
AWS Secret Access Key [None]: asdfasdfasdfasdfasdfasdfasdfasdfasfasdfasdf
Default region name [None]: us-west-2
Default output format [None]: json

7. Write a few scripts that do the backups to S3

7.1. Script to dump the MySQL database into a file and compress it. Put the script in cron to make nightly backups and delete backups that are older than 7 days. This way we maintain 7 days worth of database backups on local disk. Recommend doing this in a directory set aside for this purpose.

$ mysqldump -u dbuser --password=dbpassword databasename > db_dump`date +%F`.sql
$ find . -name db_dump\* -mtime 7 -delete

7.2. Do an initial full sync of top level directory on the machine to S3. This can take hours to run.

$ aws s3 sync /home/user/data s3://bucket/data --recursive --quiet

7.3 Write a script to do the same sync and put it in cron. This is the incremental backup.
It's possible to optimize the synch to first use the find command to locate recently modified files and do a copy of these files only. That is:

#!/bin/bash
set -e
# presume the current working directory is to be backed up.
modified_files=`find . -mtime 1`
for modfile in $modified_files
do
aws s3 sync $modfile s3://bucket/$modfile --quiet
done

Friday, January 2, 2015

New Year Greetings to all contacts

As 2014 ended, I figured it was a good time to send out a New Year's Greeting to all of my professional contacts. But, with over a thousand contacts on LinkedIn alone, it would have been a several days of work to compose an email individually. It should be possible to download all of the LinkedIn contacts and then script a custom email to every one.

Step 1: Export LinkedIn contacts

From LinkedIn Connections, select "Manage Sources" (accessed via the gear icon on the connections page). At the top right, there is a link to export contacts.


https://www.linkedin.com/contacts/manage_sources/ page

Export the CSV file.

Step 2: Script to send a custom email to every contact

Really any scripting language could be used, but, I chose Python since I am not using it daily. Like any skills, I need to practice to keep the skill sharp. I wrote this quick script which worked on CentOS, presuming that email is properly configured. Should also work on any Linux variant.

#!/usr/bin/env python26
import csv
import smtplib
import time

sender='myemail@domain.com'

message="""From: myemail@domain.com
Return-path: myemail@domain.com
Reply-to: myemail@domain.com
To: {0} {1} <{2}>
Subject: Hi {0}, Happy New Year 2015

Dear {0}, Greetings!

Message you want to send.

Happy New Year 2015.
Signature
"""

# CSV Format: FirstName,LastName,Email
f = open('data.csv', 'r')
try:
    reader = csv.reader(f)
    for row in reader:
      print row[2]
      if row[2] != "": # data.csv may be missing an email address
        msg=message.format(row[0], row[1], row[2])
        server = smtplib.SMTP('localhost')
        server.sendmail(sender, row[2], msg)
        server.quit()
        time.sleep(10) # Meter your messages. SMTP relays block messages if too many are sent quickly.
finally:
    f.close()

For a thousand contacts, this script will run for about 3 hours sending one email every 10 seconds.

If you run this script, I suggest wrapping it in a nohup command.

Step 3: Read replies and respond to them.

Just because you are able to send bulk New Year greeting emails, it does not absolve your responsibility to actually communicate with your connections. If I get a response from my connection, I respond and follow up!