Sunbelt Computer Software

Tutorial python files for WGXC DC Python presentation on Scrapy.

Before You Start

Clone the wgxcoders repo. You only need the 'scrapy_tutorial' directory for this lab.
Install python if you don't have it. Python 3 is preferred.
Make a python virtual environment and install Scrapy.
- Python with pip. Create the virtual env in your scrapy tutorial directory.
```
cd /path/to/repo/scrapy
python3 -m venv scrapy_venv
source scrapy_venv/bin/activate
pip install -r requirements.txt
```
- Python with conda. Conda will put the virtual env in your conda envs directory.
```
conda create -n scrapy_venv python=3.6
conda install -n scrapy_env -c conda-forge scrapy
conda install -n scrapy_env beautifulsoup4
conda install -n scrapy_env python-dateutil
source activate scrapy_env
```
- Note: Scrapy takes a while to install because it has many dependencies. See https://docs.scrapy.org/en/latest/intro/install.html if you run into issues.
The spiders in this repo in WGXC directory have numbers that indicate the rough order for working through them.

Let's make the shell of our first spider

cd WGXC_tutorial
scrapy genspider meetup_WGXC www.meetup.com/Women-Who-Code-DC/

# have to go to the inner directory
cd WGXC_tutorial 
 scrapy crawl <my_spider> -o <my_spider>.json