Parking Requirement Database

(for database credentials: place database.ini file into repo directory on your machine)

Current development:

web scraping/crawling
ORM database management
PostgreSQL database
currently hosted by Supabase
Wiki documentation

Future development:

NLP
Website
Aggregation and other data analysis

For Windows users:

Due to the use of scrapy-playwright (loading js elements when using scrapy), we recommend installing WSL/Ubuntu to run the scrapy spider.

However, you may want to still have a conda environment on your Windows environment for quick debugging and development through your IDE (ex. VSCode, PyCharm, etc).

Other utilities to consider include this for commands like view(response) in scrapy shell.

There may be more dependencies to install including:

playwright install-deps
etc

Installing conda environment

Open Anaconda Prompt
Create conda environment from environment.yml: has all the necessary libraries and packages (including ipykernel)
```
(base) > conda env create -f environment-[os].yml
```
NOTE: environment.yml will need to be updated if we need to use more packages
Main packages:
- SQLAlchemy 2.0
- Camelot
- Selenium
- Beautiful Soup 4
- ipykernel
- lxml
- html5lib
- pandas, numpy
- scrapy
- scrapy-playwright
Use the following command to update environmental.yml
```
   conda env export > environment.yml
```

Web scraping/crawling

(currently trying to migrate from selenium/bs4 into scrapy)

In the web_crawling folder (has settings.py and a folder called web_crawling)
```
scrapy crawl munispider
```
Check out the wiki for updated information

Extra installation step for Jupyter Notebook

Open Anaconda Prompt
Install nb_conda_kernels in base environment: allows you to access conda environments in Jupyter Notebook (as long as ipykernel is installed)
```
(base) > conda install nb_conda_kernels
```
When running .ipynb, switch kernel to "Python [conda env: db_env]"
A quick test to make sure the environment/kernel is working.
```
import sqlalchemy
sqlalchemy.__version__
>> '2.0.12'
```

Inserting into database

Download Chrome driver. Change path of chrome driver in get_html() in scraper_functions.py.

Main function is create_csv_url() in data_processing.py

from data_processing import *
create_csv_url("CA", "Los Angeles County", [insert url], [insert table number])   # web-scrape
create_csv_url("CA", "Los Angeles County")   # if csv exists

Follow prompts

Reading database

Main function is read_database() in database_functions.py
```
from database_functions import *
read_database()
```
Follow prompts

Reading pdfs

Main function is read_pdf() in scraper_functions.py
Install Ghostscript via your OS here
Run function with parameters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Parking Requirement Database

For Windows users:

Installing conda environment

Web scraping/crawling

Extra installation step for Jupyter Notebook

Inserting into database

Reading database

Reading pdfs

Files

README.md

Latest commit

History

README.md

File metadata and controls

Parking Requirement Database

For Windows users:

Installing conda environment

Web scraping/crawling

Extra installation step for Jupyter Notebook

Inserting into database

Reading database

Reading pdfs