Ruby Crawling Resource
https://github.com/teampoltergeist/poltergeist
https://github.com/jnicklas/capybara
An open crawling project
https://github.com/commoncrawl/commoncrawl
Web-crawling framework
https://github.com/chriskite/anemone/tree/master
Automate filling up forms
http://mechanize.rubyforge.org/Mechanize.html
https://www.ruby-toolbox.com/projects/cobweb
A super lightweight DSL crawler
https://github.com/felipecsl/wombat
Distributed computing
http://hadoop.apache.org
https://github.com/infochimps-labs/wukong
http://stackoverflow.com/a/4981595
headless WebKit scriptable with a JavaScript API (use this to navigate javascript based site)
http://phantomjs.org/
http://zombie.labnotes.org/