When you're writing scrapers it can be very handy to develop your scrapers on a local machine. Here we explain how you can do that.
It also means that you don't have to push your changes to Github every time you want to test your scraper on morph.io.
There are two distinct ways that you can do this:
Install the morph command-line which is a small tool that you can run locally. It will upload the scraper in your current directory to the morph server and stream back the results.
Install the development tools for your language of choice on your machine. This is the easiest approach if you already have a development environment setup or if you only plan to use or write scrapers in a single language.
Either install the morph command-line client using these instructions or install development tools on your machine. You don't need to do both.
First, check you have Ruby 1.9 or greater installed,
ruby -v
If you need to install or update Ruby see the instructions at ruby-lang.org.
Then install morph command line gem,
gem install morph-cli
To run a scraper go to the directory you have your scraper code in and
morph
It will run your local scraper on the morph.io server and stream the console output back to you. You can use this with any support scraper language without the hassle of having to install lots of things.
Either install development tools on your machine using these instructions or install the morph command-line client. You don't need to do both.
Install bundler
: a Ruby package manager you’ll use to install the scraperwiki
gem.
Also install the sqlite
development headers because scraperwiki
will need them.
$ sudo apt-get install bundler libsqlite3-dev
Fork the repo you want to work on, or start a new one.
Clone it:
mkdir oaf cd oaf git clone git@github.com:yourname/example.git cd example
If there’s no Gemfile, use this simple one:
source 'https://rubygems.org' gem 'scraperwiki', git: 'https://github.com/openaustralia/scraperwiki-ruby.git', branch: 'morph_defaults' gem 'mechanize'
Use bundler to install these Ruby gems locally:
bundle install --path ../vendor/bundleThis will create a file called
Gemfile.lock
.Gemfile
and Gemfile.lock
to your repository.Run the scraper. Use bundler to initialize the environment:
bundle exec ruby scraper.rb
Install virtualenv
and pip
for package management, and BeautifulSoup4
for HTML parsing:
sudo apt-get install python-pip python-bs4 python-dev python-virtualenv
Create a virtualenv
and activate it.
virtualenv --system-site-packages oaf source oaf/bin/activate
Fork and clone the scraper you're going to work on:
git clone git@github.com:yourname/example.git cd example
Use pip
to install the dependencies:
pip install -r requirements.txt
Run the scraper locally:
python scraper.py
Fork and clone the scraper you're going to work on:
git clone git@github.com:yourname/example.git cd example
If there is cpanfile
in the repository, install packages with modules in your distribution, or install each module from that file like:
cpan module
Run the scraper locally:
perl scraper.py