Tuesday, December 28, 2010

Rails4 - curl to download directory content

thanks to: http://www.linuxquestions.org/questions/programming-9/using-curl-to-dl-files-from-http-sites-with-wildcard-379067/ for the initial curl script line and to http://www.linuxquestions.org/questions/programming-9/extract-substring-using-sed-and-regular-expressions-regexp-702074/ for help extracting the filename from html.

for file in `curl http://media.pragprog.com/titles/rails4/code/depot_b/public/images/ | perl -wlne 'print $1 if
/href="(.*[gif|png|jpg])">/'`;do curl -o ${file} http://media.pragprog.com/titles/rails4/code/depot_b/public/images/${file};done

*Note: not happy with having to specify the file types, but the page HTML contains other links like Download, Parent etc so href=.* litters the results with junk files.


*Note would have been easier with wget, but for some reason darwin-ports for leopard DMG was unavailable.
should I get darwin-ports installed try this instead.

run from depot/public/images directory
wget -rkp -np -nH --cut-dirs=1 http://media.pragprog.com/titles/rails4/code/depot_b/public/images

@see http://psung.blogspot.com/2008/06/using-wget-or-curl-to-download-web.html

No comments:

Post a Comment