Accidental Technologist

Musings about Entrepreneurship, Technology and Software Development

  • Home
  • About
  • Still River Software
  • Privacy Policy

Powered by Genesis

Delete Large Numbers of Amazon S3 Files using Ruby

September 28, 2010 by Rob Bazinet 3 Comments

Tweet

I recently found I problem I needed to solve; remove hundreds of thousands of files from Amazon S3.  I mean, it had to be a common problem, right?  Well, it certainly be a common problem but the solution was less than common.

I tried a few tools available, both the tool from the Amazon S3 site but it keep erroring out and I was never sure why.  I then went to third-party tools used for managing S3 buckets but their either errored-out or behaved as if they worked but later determined did nothing.

I posted my need on Twitter and was pointed to a solution (thanks @Kishfy) I had not thought of, use Ruby.  There is a great open-source project named S3Nukem which its sole purpose is to remove Amazon S3 buckets.

S3Nukem

This is an open source project hosted on Github.   Installation and setup is pretty simple (from the Github repo readme), install required gems.

For Ruby >= 1.9:

sudo gem install dmarkow-right_aws --source http://gems.github.com

The docs don?t mention it but I needed to install the right_http_connection gem, the above command fails unless it is installed.

For Ruby < 1.9:

sudo gem install right_aws

Install S3Nukum:

curl -O http://github.com/lathanh/s3nukem/raw/master/s3nukem

Make it executable:

chmod 755 s3nukem

This is done in the directory where the above curl command was executed from.

Usage:

Usage: ./s3nukem [options] buckets...

Options:
    -a, --access ACCESS              Amazon Access Key (required)
    -s, --secret SECRET              Amazon Secret Key (required)
    -t, --threads COUNT              Number of simultaneous threads (default 10)
    -h, --help                       Show this message

Running the application in a terminal window shows large numbers of files being deleted:

s3nukem

This script is fast.  I tried running this under both Ruby 1.8.7 and 1.9.2 with 1.9.2 quite a bit faster.  I didn?t run any benchmarks but it was noticeably faster, my goal was really to just delete large amounts of files.  Ruby 1.9.2 thread handling really shines here and with the ability to control the number threads from the command line, is really nice.

The nice thing about this script version is the cap on the number of items to be deleted each time, 1000 * thread_count, which defaults to 10.  With this limit in place the script won?t chew up all your system memory. 

This script is designed to delete an entire bucket but could be modified to just remove the content or a directory tree within the bucket.  I may do this for a project I am working which has a need for such functionality.

Share this:

  • LinkedIn
  • Twitter
  • Facebook
  • Email
  • More
  • Pinterest
  • Tumblr
  • Pocket
  • Reddit

Filed Under: Ruby Tagged With: Amazon S3, Ruby, S3Nukem

Recent Posts

  • How to Fix Rails Flash Rendering When Using Hotwire
  • Hotwire Fix for CORS Error when using Omniauth
  • Fix Installation of Ruby using rbenv on macOS Big Sur
  • RailsConf 2021 and the Future of Conferences
  • Fixing Out of Diskspace Errors on Amazon EC2

Categories

Services I Love

HatchBox - Easy Rails Deploys Fathom Analytics
Follow @rbazinet

Rob Bazinet
@rbazinet

  • This is so true and has been my personal take on people complaining they are busy - https://t.co/YW8NTQLXtl
    about 3 days ago
  • Wow…https://t.co/h94ia053sL
    about 4 days ago
  • My Bills lost today but more importantly so did the Dallas Cowboys. Nice seeing the ‘boys done for the season.
    about 5 days ago
  • It looks like the Apple Xcode command line tools is a bit bloated for it to take this long… https://t.co/U0HObTvzXf
    about 2 months ago
  • How many people are mad that @elonmusk bought Twitter yet own or plan to own a Tesla? I bet many.
    about 2 months ago
  • RSS - Posts
  • RSS - Comments