Amazon SimpleDB

December 14th, 2007

Amazon sent out an email this morning unveiling their newest web service, SimpleDB.

Dear AWS Developers,
This is a short note to let a subset of our most active developers know about an upcoming limited beta of our newest web service: Amazon SimpleDB, which is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud.

Traditionally, this type of functionality has been accomplished with a clustered relational database that requires a sizable upfront investment, brings more complexity than is typically needed, and often requires a DBA to maintain and administer. In contrast, Amazon SimpleDB is easy to use and provides the core functionality of a database – real-time lookup and simple querying of structured data – without the operational complexity.

Were excited about this upcoming service and wanted to let you know about it as soon as possible. We anticipate beginning the limited beta in the next few weeks. In the meantime, you can read more about the service, and sign up to be notified when the limited beta program opens and a spot becomes available for you. To do so, simply click the “Sign Up For This Web Service” button on the web site below and we will record your contact information.

Learn more and sign up

Sincerely,
The Amazon Web Services Team

Amazon also posted a more detailed description of the upcoming limited beta for SimpleDB.

It won’t replace MySQL, but it’s an interesting addition to Amazon’s scalable, pay-for-what-you-use web offerings.

Avoid Emailing Bounced Addresses

September 12th, 2007

In the last post, I explained how I use VERP to handle email bounces and mark bad email addresses. I use email addresses as login handles, so I can’t just delete accounts with email addresses that have become invalid. However, I still want to avoid sending emails to addresses with permanent failures. The address verification needs to happen in one central location before emails are sent. There are numerous actions on the site that trigger notification emails, and I don’t want to every programmer to have to remember to check the status of an email address before sending an email.

I aliased the perform_delivery_sendmail() method in ActionMailer to my own gatekeeper method that checks the status of the recipient address before sending an email. Email is my own ActiveRecord model that stores the status of each address. Here’s the code, which I placed in a file called email_gatekeeper.rb in the lib directory of my Rails project:

module ActionMailer
  class Base

    private

    def perform_delivery_sendmail_with_gatekeeper(mail)
      ignore = false

      if (mail.to.size() == 1)
        email_address = Email.find_by_address(mail.to[0])
        if (email_address && (email_address.status == Email::BOUNCED))
          ignore = true
        end
      end

      perform_delivery_sendmail_without_gatekeeper(mail) unless (ignore)
    end

    alias_method_chain :perform_delivery_sendmail, :gatekeeper

  end
end

VERP on Rails

August 8th, 2007

Web applications that send out emails usually process bounced emails in order to avoid sending emails to the bad addresses in the future. The standard technique for handling bounces is to use a variable envelope return path (VERP).

If you are using Postfix with Ruby on Rails, setting up VERP for outgoing mail is easy. In your environment.rb configuration file, include these settings:

config.action_mailer.delivery_method = :sendmail
config.action_mailer.sendmail_settings = {
  :location       => '/usr/sbin/sendmail',
  :arguments      => '-V -f bounces-main -i -t'
}

The -V flag tells Postfix to use VERP. Check the sendmail manpage (man sendmail) to see which flag is necessary on your system. For a default installation of Mac OS X, the flag is -V. For the Postfix installation on our CentOS machines, the flag is -XV. You’ll also need to specify the location of the sendmail binary on your particular system.

Now all emails sent through Rails will include a variable envelope return path. For example, if you send an email to keaka@example.com, it will have a Return-Path like this:
Return-Path: <bounces-main+keaka=example.com@yourdomain.com>

Setting up your system to process incoming delivery failure notifications is very system dependent and can be tricky. Basically, you need to intercept emails sent to the Return-Path address, process each email to determine the original recipient, and then mark the original recipient’s email address as bounced in your database.

We use Postfix virtual aliases, so I added this entry to my /etc/postfix/virtual file:
bounces-main@yourdomain.com bounces@localhost
And then I rebuilt the alias index:
$ sudo postmap /etc/postfix/virtual

I also added a ‘bounces’ alias to my /etc/postfix/aliases file that pipes the email into a Rails ActionMailer model:

bounces: | "RAILS_ENV=production /usr/local/bin/ruby /u/apps/your_app/current/script/runner 'BounceHandler.receive(STDIN.read)'"

And then I ran newaliases to tell Postfix about the changes:
$ sudo newaliases

Your BounceHandler model needs to parse out the original recipient’s address, and then perform some custom business logic to mark the address as bounced in your database. Here’s something similar to what my BounceHandler does (although I log all activity in my real version):

class BounceHandler < ActionMailer::Base

  def receive(email)
    begin
      handle_permanent_failure(email) if (email.body =~ /Status: 5/)
    rescue Exception => e
      # Rescue all exceptions so that error messages don't get emailed to sender.
      # I log the exception.
    end
  end
  private
  # Status codes starting with 5 are permanent errors
  def handle_permanent_failure(email)
    address = original_to(email)
    if (address)
      email = Email.find_or_create_by_address(address)
      email.status = Email::BOUNCED
      email.save
    end
  end

  # Returns the email address of the original recipient, or nil.
  def original_to(email)
    address = nil

    # To email address should be in this form:
    # bounces-main+foo=example.com@yourdomain.com
    match = email.to[0].match(/.*?+(.*)@.*/)
    if (match)
      address = match[1].gsub(/=/, '@')
    end

    return(address)
  end

end

Cache-Control Header for Amazon S3

July 27th, 2007

Or “How to set a far future Expires header in S3 to appease the YSlow gods”.

I’m working on a Ruby on Rails site that stores images and other static content on Amazon S3. We want Amazon to serve all of our images with a Cache-Control or Expires header set to a point in the very far future. This will avoid unnecessary HTTP requests on subsequent page views, making the site faster for users and consuming less bandwidth.

Amazon provides an option for specifying the Cache-Control header, but we use the AWS::S3 gem and the attachment_fu plugin for uploading our files to S3. The gem and plugin don’t provide a convenient way to set the Cache-Control header. My solution is to enhance the behavior of the store() method within the AWS::S3 gem so that it always specifies a Cache-Control header of 10 years if another value is not specified. Here’s my patch, which I placed in a file called s3_cache_control.rb in the lib directory of my rails project:

module AWS
  module S3
    class S3Object
      class << self
        def store_with_cache_control(key, data, bucket = nil, options = {})
          if (options['Cache-Control'].blank?)
            options['Cache-Control'] = 'max-age=315360000'
          end
          store_without_cache_control(key, data, bucket, options)
        end

        alias_method_chain :store, :cache_control
      end
    end
  end
end

In my config/environment.rb file, I added the following lines to load my patch:

require 'aws/s3'
require 's3_cache_control'

Restart your server, and from now on, anything stored to S3 via the AWS::S3 gem will automatically get a Cache-Control header with max-age set to 10 years. Rockin’ tacos.

But what about all those existing images our users have already uploaded? Those need to be updated too, so I added a method to my Photo model which iterates through all photos and sets the Cache-Control. Here’s the method:

def self.set_cache_control
  photos = Photo.find(:all)
  photos.each do |photo|
    begin
      s3_object = AWS::S3::S3Object.find(photo.full_filename,
        'your_bucket_name')
      s3_object.cache_control = 'max-age=315360000'
      s3_object.save({:access => :public_read})
    rescue Exception => e
      logger.error("Unable to update photo with key " +
        "#{photo.full_filename}: #{e}")
    end
  end
end

You can run the update using script/runner:

$ RAILS_ENV=production ./script/runner Photo.set_cache_control

The set_cache_control() method assumes you have a full_filename() method on your Photo class that provides the S3 key. You’ll already have the full_filename() method if you’re using attachment_fu. You’ll also need to replace your_bucket_name with your Amazon S3 bucket name in the code above.

Now you can sing Cache-Control to Major Tom like I’ve been doing all afternoon. In my head. I’ve only been singing it in my head. Mostly.