heathd / civil_service_jobs

UK Civil Service Job Ads

README.md

Civil Service Jobs Scraper

Designed to incrementally scrape the Civil Service Jobs website.

It will parse index pages and individual job pages.

It has a multi-threaded crawler with configurable parallisation. If you use this please be considerate as it's very easy to over-load a dynamic website with a high level of parallel threads. Recommended number of threads is 4 which will allow you to scrape all of the job listings (~3000) in under 20 minutes.

The scraper uses a local sqlite database and will not re-scrape a job that has been seen before.

It uses the reference number shown on the search results page as a unique key.

Content under Open Government License

Fixtures in the spec/fixtures folder contain copies of the HTML source from the Civil Service Jobs website. This is public sector information licensed under the Open Government Licence v3.0.

License

This source code is licensed under the GNU Affero Public License v3.0

Contributors

Last run failed 2023-07-03 with status code 255.

Console output of last run

Injecting configuration and compiling... [1G [1G-----> Ruby app detected [1G-----> Compiling Ruby/NoLockfile [1G ! [1G ! Gemfile.lock required. Please check it in. [1G !

Statistics

Total run time: 2 minutes

Total cpu time used: less than 5 seconds

Total disk space used: 287 KB

History

Manually ran revision 76dc9301 and failed 2023-07-03.

run time 11 s
Manually ran revision 584820d5 and failed 2023-07-03.

run time 15 s
Manually ran revision 4e9c7e63 and failed 2023-07-03.

run time 19 s
Manually ran revision 7e660043 and failed 2023-06-30.

run time 24 s
Manually ran revision 8789dfa8 and failed 2023-06-30.

run time 39 s
Created on morph.io 2023-06-30

Scraper code

Ruby

civil_service_jobs / scraper.rb

git clone URL