This downloads a copy of images from Popolo data and saves them to S3. This is so the images are available over HTTPS and reliably available.
This runs on morph.io to ensure the images are always up to date.
The images are publicy available at:
https://australian-local-councillors-images.s3.amazonaws.com/#{Popolo ID}.jpg
The code is fairly generic and all the configuration is stored in environment variable so you should be able to repurpose this quite easily.
All the expected environment variables are documented in .env.example
. To test locally, copy this to .env
and replace values with real ones. To run the script first install gems:
bundle
Then run:
bundle exec dotenv ruby scraper.rb
To speed things up, you can specify an Australian state or territory to target
using the environment variable ENV["MORPH_TARGET_STATE"]
, e.g.:
MORPH_TARGET_STATE=sa
You can also target a specific organization by id using ENV["MORPH_TARGET_ORGANIZATION"]
, e.g.:
MORPH_TARGET_ORGANIZATION=legislature/city_of_unley
This scraper can also save a resized version of the images as a JPEG. You can configure the width, and the height of the image you’d like to save using environment variables:
MORPH_RESIZE_IMAGES=true
MORPH_RESIZE_WIDTH=80
MORPH_RESIZE_HEIGHT=88
# Reprocess and save the resized images
MORPH_CLOBBER_RESIZED_IMAGES=false
Average successful run time: about 1 hour
Total run time: 6 months
Total cpu time used: 3 days
Total disk space used: 685 KB