Tag Archives: aws

AWS re:Invent 2014 – Day 2

The second day of the AWS re:Invent conference has drawn to a close, and I sit here with ringing ears from the re:Play party where Skrillex was headlining. AWS certainly know how to throw a party – this one including a 20ft Tetris game, an entire arcade of retro games, a quadcopter obstacle course, and luminous dodgeball.

Back to the keynote; and a large portion of today’s was spent letting large AWS customers talk about their positive experiences of using the “AWS platform”. And this is an important phrase – one that has been used significantly more this year than in previous years, and reinforces the message that Amazon are trying to push – that AWS is much more than just IaaS. As Amazon continue to release products and services that creep up the stack, this message will become increasingly important. The other key message coming across this year is containers – they have appeared from nowhere in the past 18 months and are becoming increasingly important. Expect to see a lot more of them in the coming years.

Continue reading AWS re:Invent 2014 – Day 2

AWS re:Invent 2014 – Day 1

Yesterday was the first day of Amazon’s third annual customer and partner conference in Las Vegas. There are 13,500 attendees from 65 countries here, with an additional 15,000 people tuning in for the keynote live streams.

Today’s keynote from Andy Jassy (Senior Vice President of AWS) included a number of interesting product and service announcements (see more details below) and clearly showed that AWS is interested in moving up the cloud stack to capture more customers and revenue.

Continue reading AWS re:Invent 2014 – Day 1

Syncing files from Linux to S3

Amazon S3 makes for a great off-site backup of your important files. However, getting files into it in the first place in an automated way can be slightly tricky, particularly if you have a set of files that changes on a regular basis (like an iPhoto Library). My Thecus NAS is meant to be capable of syncing files into S3, but I’ve found the implementation to be limited and buggy, so I wanted to look at alternatives.

Given that Amazon charge for each PUT/GET/COPY/POST/LIST request, I wanted an equivalent of rsync for S3. That way, only files that were changed or deleted would be moved into or updated in S3. I came across s3cmd, which is part of the S3 tools Project, which seemed to meet my criteria.

The rest of this post outlines the steps I took to get s3cmd installed and working on my CentOS virtual server, and set up to regularly sync my iPhoto Library into S3. In a future post, I may look at regularly pushing a copy of my iPhoto Library into Glacier for point-in-time longer-term storage.

Installation of s3cmd

Installation was pretty simple. The S3 tools Project provide a number of package repositories for the common Linux distros. All I had to do was:

  • Download the relevant .repo file (CentOS 5 in my case) into the /etc/yum.repos.d directory
  • Ensure that the repository is enabled by setting enabled=1 inside the s3tools.repo file (this was done by default)
  • Run yum install s3cmd as root

Configuration of s3cmd

Before you can run s3cmd, you’ll need to configure it, so it knows what credentials to use, and whether to use HTTPS, etc. Unfortunately, like many tools of its ilk, it does not seem to support the use of IAM Access Keys and Secret Keys, so you have to provide the master credentials for your AWS account. Not too much of a problem for single-user accounts, but not great if you want to use this in the Enterprise.

To start the configuration, simply run s3cmd --configure and provide the following information:

  • Access Key
  • Secret Key
  • An encryption password (I elected not to provide one)
  • Whether to use HTTPS (I decided to use HTTPS)

You’ll then get a chance to test your settings, and presuming they work ok, you’re good to go! s3cmd will save your configuration to ~/.s3cmd.

Using s3cmd

Once you’re installed and configured, you can start playing around with s3cmd to get a feel for how it works. It largely imitates other *nix commands, so you can do something like s3cmd ls to show all S3 buckets:

2011-01-12 21:26  s3://s3-bucket-1
2012-08-21 13:34  s3://s3-bucket-2

If you don’t already have any buckets, or you want to create a new bucket to sync your files into, s3cmd can help with that; s3cmd mb s3://s3-bucket-3 (note the s3:// prefix to your bucket name). Listing all buckets should now show your new bucket too:

2011-01-12 21:26  s3://s3-bucket-1
2012-08-21 13:34  s3://s3-bucket-2
2012-10-03 21:51  s3://s3-bucket-3

The s3 tools Howto page gives a good overview of what you can do with the tool, and the commands to run to get familiar with the tool.

As my ultimate goal was to be able to upload my iPhoto Library into S3, and then keep it in sync by regularly uploading the deltas, I wanted to make use of the ‘sync’ functionality of s3cmd. The command I have decided to use is as follows:

s3cmd sync --dry-run  --recursive --delete-removed --human-readable-sizes \
--progress /mnt/smbserver/Pictures/iPhoto\ Library/* \
s3://s3-bucket-2/Pictures/iPhoto\ Library/ > /var/log/s3cmd-sync.log 2>&1

Note the use of the --dry-run option on the first run, so I get an idea of what will change. I can then run it again without that option for the changes to actually take effect. In all likelihood, I will set up a cron job to run this without the --dry-run option on a regular basis (otherwise I’ll forget, or simply be too busy to do it manually!).

It’s also worth noting that this isn’t necessarily a quick process. My iPhoto Library isn’t small (it currently weighs in at 56GB), but the s3cmd sync dry-run took about 15 minutes. Then running the actual sync itself took over 90 hours(!), although your mileage may vary depending on total volume of data and upload speed. A subsequent run of the same sync command took 30 minutes, and no data was transferred (as there were no local changes).

So far, my new solution is looking good! Hope this is helpful to someone else out there.