Work in progress...
sudo apt-get install python-dev
Check https://virtualenv.pypa.io/en/latest/installation.html or follow these instructions:
sudo apt-get install python-virtualenv
sudo apt-get install ca-certificates gnupg
curl https://pypi.python.org/packages/source/v/virtualenv/virtualenv-13.1.0.tar.gz#md5=70f63a429b7dd7c3e10f6af09ed32554 > /pathtovirtualenvdownload/virtualenv-13.1.0.tar.gz # or latest
curl https://pypi.python.org/packages/source/v/virtualenv/virtualenv-13.1.0.tar.gz.asc > /pathtovirtualenvdownload/virtualenv-13.1.0.tar.gz.asc # or latest
mkdir /tmp/.gnupg
chmod 700 /tmp/.gnupg
gpg --homedir /tmp/.gnupg --keyserver hkps.pool.sks-keyservers.net --recv-keys 3372DCFA
gpg --homedir /tmp/.gnupg --fingerprint 3372DCFA # check is 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
gpg --homedir /tmp/.gnupg --verify /pathtovirtualenvdownload/virtualenv-13.1.0.tar.gz.asc
tar xzf /pathtovirtualenvdownload/virtualenv-13.1.0.tar.gz --directory /pathtovirtualenvbin/
echo "alias virtualenv='python /pathtovirtualenvbin/virtualenv-13.1.0/virtualenv.py'" >> ~/.bashrc # or other shell start
source ~/.bashrc # or other shell start
mkdir ~/.virtualenvs
virtualenv ~/.virtualenvs/oiienv
source ~/.virtualenvs/oiienv/bin/activate
git clone https://github.com/juga0/page_watcher_scraper
cd page_watcher_scraper
pip install -r requirements.txt
More about pagewatcherscraper/pagewatcherscraper/settings.py: TBD More about pagewatcherscraper/configlocal.py: TBD If you need local settings, edit pagewatcherscraper/pagewatcherscraper/settingslocal.py and pagewatcherscraper/config_local.py
To list the scrappers:
scrapy list
To run pagewatcherscraper:
cd page_watcher_scraper
./scraper.py
Create an script like this replacing the path by your path:
cd page_watcher_scraper
vim run.sh
#!/bin/bash
cd /mypath/page_watcher_scraper && source /mypath/page_watcher_scraper/environment && source /mypath/.virtualenvs/oiienv/bin/activate && /mypath/.virtualenvs/oiienv/bin/python /mypath/page_watcher_scraper/scraper.py >> /mypath/page_watcher_scraper/cronlog.txt
Edit crontab:
crontab -e
To run every day at 14:35h:
35 14 * * * /bin/bash /home/duy/page_watcher_scraper/run.sh
Average successful run time: 3 minutes
Total run time: about 1 month
Total cpu time used: about 4 hours
Total disk space used: 131 KB