Tag Archives: linux

Syncing files from Linux to S3

Amazon S3 makes for a great off-site backup of your important files. However, getting files into it in the first place in an automated way can be slightly tricky, particularly if you have a set of files that changes on a regular basis (like an iPhoto Library). My Thecus NAS is meant to be capable of syncing files into S3, but I’ve found the implementation to be limited and buggy, so I wanted to look at alternatives.

Given that Amazon charge for each PUT/GET/COPY/POST/LIST request, I wanted an equivalent of rsync for S3. That way, only files that were changed or deleted would be moved into or updated in S3. I came across s3cmd, which is part of the S3 tools Project, which seemed to meet my criteria.

The rest of this post outlines the steps I took to get s3cmd installed and working on my CentOS virtual server, and set up to regularly sync my iPhoto Library into S3. In a future post, I may look at regularly pushing a copy of my iPhoto Library into Glacier for point-in-time longer-term storage.

Installation of s3cmd

Installation was pretty simple. The S3 tools Project provide a number of package repositories for the common Linux distros. All I had to do was:

  • Download the relevant .repo file (CentOS 5 in my case) into the /etc/yum.repos.d directory
  • Ensure that the repository is enabled by setting enabled=1 inside the s3tools.repo file (this was done by default)
  • Run yum install s3cmd as root

Configuration of s3cmd

Before you can run s3cmd, you’ll need to configure it, so it knows what credentials to use, and whether to use HTTPS, etc. Unfortunately, like many tools of its ilk, it does not seem to support the use of IAM Access Keys and Secret Keys, so you have to provide the master credentials for your AWS account. Not too much of a problem for single-user accounts, but not great if you want to use this in the Enterprise.

To start the configuration, simply run s3cmd --configure and provide the following information:

  • Access Key
  • Secret Key
  • An encryption password (I elected not to provide one)
  • Whether to use HTTPS (I decided to use HTTPS)

You’ll then get a chance to test your settings, and presuming they work ok, you’re good to go! s3cmd will save your configuration to ~/.s3cmd.

Using s3cmd

Once you’re installed and configured, you can start playing around with s3cmd to get a feel for how it works. It largely imitates other *nix commands, so you can do something like s3cmd ls to show all S3 buckets:

2011-01-12 21:26  s3://s3-bucket-1
2012-08-21 13:34  s3://s3-bucket-2

If you don’t already have any buckets, or you want to create a new bucket to sync your files into, s3cmd can help with that; s3cmd mb s3://s3-bucket-3 (note the s3:// prefix to your bucket name). Listing all buckets should now show your new bucket too:

2011-01-12 21:26  s3://s3-bucket-1
2012-08-21 13:34  s3://s3-bucket-2
2012-10-03 21:51  s3://s3-bucket-3

The s3 tools Howto page gives a good overview of what you can do with the tool, and the commands to run to get familiar with the tool.

As my ultimate goal was to be able to upload my iPhoto Library into S3, and then keep it in sync by regularly uploading the deltas, I wanted to make use of the ‘sync’ functionality of s3cmd. The command I have decided to use is as follows:

s3cmd sync --dry-run  --recursive --delete-removed --human-readable-sizes \
--progress /mnt/smbserver/Pictures/iPhoto\ Library/* \
s3://s3-bucket-2/Pictures/iPhoto\ Library/ > /var/log/s3cmd-sync.log 2>&1

Note the use of the --dry-run option on the first run, so I get an idea of what will change. I can then run it again without that option for the changes to actually take effect. In all likelihood, I will set up a cron job to run this without the --dry-run option on a regular basis (otherwise I’ll forget, or simply be too busy to do it manually!).

It’s also worth noting that this isn’t necessarily a quick process. My iPhoto Library isn’t small (it currently weighs in at 56GB), but the s3cmd sync dry-run took about 15 minutes. Then running the actual sync itself took over 90 hours(!), although your mileage may vary depending on total volume of data and upload speed. A subsequent run of the same sync command took 30 minutes, and no data was transferred (as there were no local changes).

So far, my new solution is looking good! Hope this is helpful to someone else out there.

Using Varnish to set a Maintenance Page on your website

There may come a time when you would like to perform a fairly major update to your website, but you don’t want to be serving out semi-broken pages while you update your stylesheets, javascript files or static image assets. Without re-configuring Apache to point to an alternative DocumentRoot, how might you display a simple maintenance page to avoid this? What if you need to take Apache offline (even momentarily) to update some of the dependency modules? Or what if you finally want to do that upgrade from Lighttpd to Apache? What will serve out your Maintenance Page while your webserver is offline?

If you use an HTTP accelerator like Varnish, you can use it to serve out a simple Maintenance Page whilst you do your updates in the background. Instead of passing through incoming requests to your Apache back-end, and then caching them for subsequent requests, Varnish will simply respond with the Maintenance Page that you design.

Sound like something you might want to do? Well, here’s how (on Red Hat and clones):

Create a Varnish VCL configuration file somewhere on your Linux system. My default Varnish configuration file lives in /etc/varnish/default.vcl, so I chose to put my new Varnish Maintenance configuration file at /etc/varnish/maintenance.vcl. This new file should have the following contents (you may need to change small details like the backend host and port to match your specific configuration):

# /etc/varnish/maintenance.vcl
backend default {
    .host = "127.0.0.1";
    .port = "8000";
}
 
sub vcl_recv {
    error 503;
}
 
sub vcl_error {
    set obj.http.Content-Type = "text/html; charset=utf-8";
    # Create your Maintenance Page here
    synthetic {"We're currently down for maintenance.

We’re currently down for maintenance.

We won’t be long, so please check back soon. Thanks for your patience.

 
    "};
    return (deliver);
}

This file, /etc/varnish/maintenance.vcl, can live on your filesystem without ever impacting the day-to-day operation of your website. Varnish will still load its default configuration on startup, and it will only use your new Maintenance configuration if you tell it to.

So, how do you tell it to? When you’re ready to start your website updates and want to start displaying your Maintenance Page, you simply log onto your server and run the following commands:

varnishadm vcl.load maintenance_mode /etc/varnish/maintenance.vcl

This will create a new Varnish profile, called maintenance_mode. Note that at the moment, Varnish is still serving normal website content; all you’ve done is made it aware of an alternative configuration to the one it normally uses.

varnishadm vcl.use maintenance_mode

Now we’ve activated the new maintenance_mode configuration profile. You should find that if you attempt to load your webpage, Varnish returns your Maintenance Page. You should find that there are no hits in the Apache web logs, and you should also find that if you hit Apache directly, you will see your regular website (but you’ve got a firewall in place to stop every Tom, Dick and Harry doing that, right?). If you’ve got multiple hosts serving the same website content through a load-balancer, you’ll obviously need to run the commands above on each webserver separately.

You’re now ready to start your maintenance. You can take as long as you need at this point, because Varnish has your back. If you’re going to be a while, I would mention an estimated time when you expect your website to be back online on the Maintenance Page, to prevent people getting annoyed if they keep checking back before your new website is back up.

You can stop and start services to your heart’s content, with the obvious exception of Varnish! You can pretty do whatever you want behind the scenes, as Varnish won’t pass any traffic back to Apache.

Once you’ve finished making your updates, and you’re ready to launch your shiny new website, you simply run the following command:

varnishadm vcl.use boot

This will revert Varnish back to it’s default (startup) configuration, and you should find that it starts loading your updated website just as it did previously.

Hope you found that useful!