AWS re:Invent 2014 – Day 1

Yesterday was the first day of Amazon’s third annual customer and partner conference in Las Vegas. There are 13,500 attendees from 65 countries here, with an additional 15,000 people tuning in for the keynote live streams.

Today’s keynote from Andy Jassy (Senior Vice President of AWS) included a number of interesting product and service announcements (see more details below) and clearly showed that AWS is interested in moving up the cloud stack to capture more customers and revenue.

Continue reading AWS re:Invent 2014 – Day 1

Taking Bullet Journal Digital with Evernote

Back at the end of 2013, I was indulging in a #productivity binge, trying to figure out how I could make myself more productive in 2014. I stumbled across an analogue organisation system called Bullet Journal and found it intriguing. I read up on it and decided to give it a go, starting 2nd January 2014.

Continue reading Taking Bullet Journal Digital with Evernote

Automatically creating and purging snapshots for additional EBS volumes attached to an instance

If you are using AWS in any anger, you will likely be storing data that you need to persist. Unfortunately, persistent data doesn’t fit well with Amazon’s disposal philosophy, and so you’ll be using EBS volumes over ephemeral storage.

As the data’s important enough to persist, you’ll probably want to make sure it’s backed up too. The easiest way to back up EBS volumes is by snapshotting them. It’s easy to automate the process, so you can rest assured that your data is safe.

Snapshots are particularly useful as you can create new EBS volumes from a snapshot and mount them alongside your live data. Snapshots are also Amazon’s suggested mechanism to restore an EBS volume into another AZ if the AZ hosting your EBS volumes becomes unavailable for some reason.

Create snapshot script

Below is a script I wrote to automatically snapshot all volumes attached to a particular instance. The script is run from the host to which the EBS volume is attached, so you’ll need to have some AWS credentials in there for this to work. You could quite easily move the script to a centralised management host and feed in the instance ID or some other parameter, but I didn’t have a requirement to do that.

The script relies on a couple of custom Tags we apply to all our EBS volumes, but you could modify the script to work around this. I just couldn’t think of a simple way to de-duplicate all the volume IDs you get in the output from ec2-describe-volumes.

#!/bin/bash
 
echo "#"
echo "# Starting create_volume_snapshot.sh on `date +%Y-%m-%d` at `date +%H%M`"
echo "#"
source /home/ec2-user/.bash_profile
export YV_INSTANCE_ID=`curl http://169.254.169.254/latest/meta-data/instance-id > /dev/null`
 
# Create a temp text file to store the information about the volumes
TEMP_VOL_FILE="/tmp/YV-ec2-describe-volumes-`date +%s`.txt"
ec2-describe-volumes -F "attachment.instance-id=${YV_INSTANCE_ID}" > ${TEMP_VOL_FILE}
# Check we received a non-empty response
RESPONSE_CHECK=`cat ${TEMP_VOL_FILE} | wc -l`
if [[ ${RESPONSE_CHECK} = 0 ]]; then
  echo "Call to ec2-describe-volumes resulted in a blank response, cannot continue."
fi
 
# Loop through the output file of ec2-describe-volumes command, searching for 4th field 
# which is Tag Key name (and should match MountPoint)
while read LINE
  do
  MPSEARCH=`echo $LINE | cut -f4 -d' '`
  if [[ ${MPSEARCH} == "MountPoint" ]]
    then
    # For each volume, get all data about the volume and output to a temporary file
    VOLUMEID=`echo $LINE | cut -f3 -d' '`
    ec2-describe-volumes ${VOLUMEID} > /tmp/${VOLUMEID}.txt
    # Get the volume ID and the MountPoint tag value
    ROLE=`grep Role /tmp/${VOLUMEID}.txt | awk -F" " '{ print $5 }'`
    ENVIRONMENT=`grep Environment /tmp/${VOLUMEID}.txt | awk -F" " '{ print $5 }'`
    MOUNTPOINT=`grep MountPoint /tmp/${VOLUMEID}.txt | awk -F" " '{ print $5 }'`
    DATESTAMP=`date +%Y-%m-%d-%H%M`
    echo "    VolumeID    : ${VOLUMEID}"
    echo "    Environment : ${ENVIRONMENT}"
    echo "    Role        : ${ROLE}"
    echo "    MountPoint  : ${MOUNTPOINT}"
    echo "    DateStamp   : ${DATESTAMP}"
    # Create a snapshot of the volume, including a description
    ec2-create-snapshot ${VOLUMEID} -d snap_${ENVIRONMENT}_${ROLE}_${MOUNTPOINT}_${YV_INSTANCE_ID}_${DATESTAMP}
    # Clean up volume information temporary file
    rm -f /tmp/${VOLUMEID}.txt
  fi
done < ${TEMP_VOL_FILE}
 
# Clean up temporary text file with all volume information
rm -f ${TEMP_VOL_FILE}
echo "#"
echo "# Finished create_volume_snapshot.sh on `date +%Y-%m-%d` at `date +%H%M`"
echo "#"

Purge snapshot script

As with everything in Amazon, you pay for exactly what you use. So you don’t want to keep all those EBS snapshots indefinitely, as over time you’ll start paying a lot to store them. What you need is a script which runs regularly and removes snapshots that are older than a certain age. The script below is again based on some some of our custom EBS Tags, but it shouldn’t be too hard to modify for your purposes.

#!/bin/bash
 
echo "#"
echo "# Starting purge_volume_snapshot.sh on `date +%Y-%m-%d` at `date +%H%M`"
echo "#"
source /home/ec2-user/.bash_profile
export YV_INSTANCE_ID=`curl http://169.254.169.254/latest/meta-data/instance-id 2> /dev/null`
 
# Date variables
DATECHECK=`date +%Y-%m-%d --date '15 days ago'`
DATECHECK_EPOCH=`date --date="$DATECHECK" +%s`
 
# Get all volume info and copy to temp file
TEMP_VOL_FILE="/tmp/YV-ec2-describe-volumes-`date +%s`.txt"
ec2-describe-volumes -F "attachment.instance-id=${YV_INSTANCE_ID}" > ${TEMP_VOL_FILE}
# Check we received a non-empty response
RESPONSE_CHECK=`cat ${TEMP_VOL_FILE} | wc -l`
if [[ ${RESPONSE_CHECK} = 0 ]]; then
  echo "Call to ec2-describe-volumes resulted in a blank response, cannot continue."
fi
 
# Loop through the output file of ec2-describe-volumes command, searching for 4th field which is Tag Key name (and should match MountPoint)
while read LINE
  do
  MPSEARCH=`echo $LINE | cut -f4 -d' '`
  if [[ ${MPSEARCH} == "MountPoint" ]]
    then
    # For each volume, get all data about the volume and output to a temporary file
    VOLUMEID=`echo $LINE | cut -f3 -d' '`
    echo "Volume ID : ${VOLUMEID}"
    ec2-describe-snapshots -F "volume-id=${VOLUMEID}" > /tmp/${VOLUMEID}.txt
    # Loop to remove any snapshots older than 15 days
    while read LINE
      do
      SNAPSHOT_NAME=`echo $LINE | grep ${VOLUMEID} | awk '{ print $2 }'`
      DATECHECK_OLD=`echo $LINE | grep ${VOLUMEID} | awk '{ print $5 }' | awk -F "T" '{ printf "%s\n", $1 }'`
      DATECHECK_OLD_EPOCH=`date --date=${DATECHECK_OLD} +%s`
      echo "    Snapshot Name        : ${SNAPSHOT_NAME}"
      echo "    Datecheck -15d       : ${DATECHECK}"
      echo "    Datecheck -15d Epoch : ${DATECHECK_EPOCH}"
      echo "    Snapshot Epoch       : ${DATECHECK_OLD_EPOCH}"
      if [[ ${DATECHECK_OLD_EPOCH} -lt ${DATECHECK_EPOCH} ]]; then
        echo "Deleting snapshot $SNAPSHOT_NAME as it is more than 15 days old..."
        ec2-delete-snapshot $SNAPSHOT_NAME
      else
        echo "Not deleting snapshot $SNAPSHOT_NAME as it is less than 15 days old..."
      fi
    done < /tmp/${VOLUMEID}.txt
    # Clean up volume information temporary file
    rm -f /tmp/${VOLUMEID}.txt
  fi
done < ${TEMP_VOL_FILE}
 
# Clean up temporary text file with all volume information
rm -f ${TEMP_VOL_FILE}
echo "#"
echo "# Finished purge_volume_snapshot.sh on `date +%Y-%m-%d` at `date +%H%M`"
echo "#"

Acknowledgement

I would like to thank Kevin at stardot hosting for this post, on which I based the above snapshot and purge scripts.

I hope somebody finds these scripts useful – it’s certainly been a great for my peace of mind!

Syncing files from Linux to S3

Amazon S3 makes for a great off-site backup of your important files. However, getting files into it in the first place in an automated way can be slightly tricky, particularly if you have a set of files that changes on a regular basis (like an iPhoto Library). My Thecus NAS is meant to be capable of syncing files into S3, but I’ve found the implementation to be limited and buggy, so I wanted to look at alternatives.

Given that Amazon charge for each PUT/GET/COPY/POST/LIST request, I wanted an equivalent of rsync for S3. That way, only files that were changed or deleted would be moved into or updated in S3. I came across s3cmd, which is part of the S3 tools Project, which seemed to meet my criteria.

The rest of this post outlines the steps I took to get s3cmd installed and working on my CentOS virtual server, and set up to regularly sync my iPhoto Library into S3. In a future post, I may look at regularly pushing a copy of my iPhoto Library into Glacier for point-in-time longer-term storage.

Installation of s3cmd

Installation was pretty simple. The S3 tools Project provide a number of package repositories for the common Linux distros. All I had to do was:

  • Download the relevant .repo file (CentOS 5 in my case) into the /etc/yum.repos.d directory
  • Ensure that the repository is enabled by setting enabled=1 inside the s3tools.repo file (this was done by default)
  • Run yum install s3cmd as root

Configuration of s3cmd

Before you can run s3cmd, you’ll need to configure it, so it knows what credentials to use, and whether to use HTTPS, etc. Unfortunately, like many tools of its ilk, it does not seem to support the use of IAM Access Keys and Secret Keys, so you have to provide the master credentials for your AWS account. Not too much of a problem for single-user accounts, but not great if you want to use this in the Enterprise.

To start the configuration, simply run s3cmd --configure and provide the following information:

  • Access Key
  • Secret Key
  • An encryption password (I elected not to provide one)
  • Whether to use HTTPS (I decided to use HTTPS)

You’ll then get a chance to test your settings, and presuming they work ok, you’re good to go! s3cmd will save your configuration to ~/.s3cmd.

Using s3cmd

Once you’re installed and configured, you can start playing around with s3cmd to get a feel for how it works. It largely imitates other *nix commands, so you can do something like s3cmd ls to show all S3 buckets:

2011-01-12 21:26  s3://s3-bucket-1
2012-08-21 13:34  s3://s3-bucket-2

If you don’t already have any buckets, or you want to create a new bucket to sync your files into, s3cmd can help with that; s3cmd mb s3://s3-bucket-3 (note the s3:// prefix to your bucket name). Listing all buckets should now show your new bucket too:

2011-01-12 21:26  s3://s3-bucket-1
2012-08-21 13:34  s3://s3-bucket-2
2012-10-03 21:51  s3://s3-bucket-3

The s3 tools Howto page gives a good overview of what you can do with the tool, and the commands to run to get familiar with the tool.

As my ultimate goal was to be able to upload my iPhoto Library into S3, and then keep it in sync by regularly uploading the deltas, I wanted to make use of the ‘sync’ functionality of s3cmd. The command I have decided to use is as follows:

s3cmd sync --dry-run  --recursive --delete-removed --human-readable-sizes \
--progress /mnt/smbserver/Pictures/iPhoto\ Library/* \
s3://s3-bucket-2/Pictures/iPhoto\ Library/ > /var/log/s3cmd-sync.log 2>&1

Note the use of the --dry-run option on the first run, so I get an idea of what will change. I can then run it again without that option for the changes to actually take effect. In all likelihood, I will set up a cron job to run this without the --dry-run option on a regular basis (otherwise I’ll forget, or simply be too busy to do it manually!).

It’s also worth noting that this isn’t necessarily a quick process. My iPhoto Library isn’t small (it currently weighs in at 56GB), but the s3cmd sync dry-run took about 15 minutes. Then running the actual sync itself took over 90 hours(!), although your mileage may vary depending on total volume of data and upload speed. A subsequent run of the same sync command took 30 minutes, and no data was transferred (as there were no local changes).

So far, my new solution is looking good! Hope this is helpful to someone else out there.