Skip to content

Commit

Permalink
#382 import loop
Browse files Browse the repository at this point in the history
  • Loading branch information
zganger committed Jun 26, 2024
1 parent 60cd1e7 commit 6f7e084
Show file tree
Hide file tree
Showing 21 changed files with 168 additions and 44 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ This method uses Docker to run the complete application stack.
> **Note**
> When running locally, you may need to update one of the ports in the `.env` file if it conflicts with another application on your machine.
3. Build and run the project with `docker-compose build && docker-compose up -d && docker-compose logs -f`
3. Build and run the project with `docker compose build && docker compose up -d && docker compose logs -f`

## Installation (Frontend Only)

Expand All @@ -57,15 +57,15 @@ You'll need to replace `police-data-trust-api-1` with the name of the container
docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c0cf******** police-data-trust-api "/bin/sh -c '/wait &…" About a minute ago Up About a minute 0.0.0.0:5001->5001/tcp police-data-trust-api-1
5e6f******** postgres:16.1 "docker-entrypoint.s…" 3 days ago Up About a minute 0.0.0.0:5432->5432/tcp police-data-trust-db-1
5e6f******** postgres:16 "docker-entrypoint.s…" 3 days ago Up About a minute 0.0.0.0:5432->5432/tcp police-data-trust-db-1
dacd******** police-data-trust-web "docker-entrypoint.s…" 3 days ago Up About a minute 0.0.0.0:3000->3000/tcp police-data-trust-web-1
```

### Backend Tests

The current backend tests can be found in the GitHub Actions workflow file [python-tests.yml](https://github.com/codeforboston/police-data-trust/blob/0488d03c2ecc01ba774cf512b1ed2f476441948b/.github/workflows/python-tests.yml)

To run the tests locally, first start the application with docker-compose. Then open up a command line interface to the running container:
To run the tests locally, first start the application with docker compose. Then open up a command line interface to the running container:

```
docker exec -it "police-data-trust-api-1" /bin/bash
Expand All @@ -82,7 +82,7 @@ python -m pytest

The current frontend tests can be found in the GitHub Actions workflow file [frontend-checks.yml](https://github.com/codeforboston/police-data-trust/blob/0488d03c2ecc01ba774cf512b1ed2f476441948b/.github/workflows/frontend-checks.yml)

To run the tests locally, first start the application with docker-compose. Then open up a command line interface to the running container:
To run the tests locally, first start the application with dockerccompose. Then open up a command line interface to the running container:

```
docker exec -it "police-data-trust-web-1" /bin/bash
Expand Down
2 changes: 1 addition & 1 deletion backend/Dockerfile.cloud
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ RUN arch=$(arch) && \
file=pandas-2.2.2-cp312-cp312-manylinux_2_17_${arch}.manylinux2014_${arch}.whl && \
url="https://pypi.debian.net/pandas/${file}" && \
wget ${url} && \
sed -i "s/pandas==1.5.3/${file}/" prod.txt
sed -i "s/pandas==2.2.2/${file}/" prod.txt
RUN pip install --no-cache-dir -r prod.txt

COPY . .
Expand Down
3 changes: 3 additions & 0 deletions backend/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ def create_app(config: Optional[str] = None):
# def _():
# db.create_all()

# start background processor for SQS imports


return app


Expand Down
Empty file added backend/import/__init__.py
Empty file.
Empty file.
50 changes: 50 additions & 0 deletions backend/import/loop.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
from io import BytesIO
from logging import getLogger
from time import sleep

import boto3
import ujson

class Importer:
def __init__(self, queue_name: str, region: str = "us-east-1"):
self.queue_name = queue_name
self.session = boto3.Session(region_name=region)
self.sqs_client = self.session.client("sqs")
self.s3_client = self.session.client("s3")
self.sqs_queue_url = self.sqs_client.get_queue_url(QueueName=self.queue_name)
self.logger = getLogger(self.__class__.__name__)

def run(self):
while True:
resp = self.sqs_client.receive_message(
QueueUrl=self.sqs_queue_url,
MaxNumberOfMessages=1, # retrieve one message at a time - we could up this and parallelize but no point until way more files.
VisibilityTimeout=600, # 10 minutes to process message before it becomes visible for another consumer.
)
# if no messages found, wait 5m for next poll
if len(resp["Messages"]) == 0:
sleep(600)
continue

for message in resp["Messages"]:
sqs_body = ujson.loads(message["Body"])
for record in sqs_body["Records"]: # this comes through as a list, but we expect one object
bucket_name = record["s3"]["bucket"]["name"]
key = record["s3"]["object"]["key"]
with BytesIO() as fileobj:
self.s3_client.download_fileobj(bucket_name, key, fileobj)
fileobj.seek(0)
content = fileobj.read()

# TODO: we now have an in-memory copy of the s3 file content. This is where we would run the import.
# we want a standardized importer class; we would call something like below:
# loader = Loader(content).load()

self.logger.info(f"Imported s3://{bucket_name}/{key}")

class Loader:
def __init__(self, content: bytes):
self.content = content

def load(self):
raise Exception("unimplemented; extend this class to write a load migration.")
6 changes: 3 additions & 3 deletions backend/scraper/data_scrapers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ You can also run the scraper in Docker:

```bash
# From the base of the repository
docker-compose build api
docker-compose run -u $(id -u) api flask scrape
docker compose build api
docker compose run -u $(id -u) api flask scrape
# Stop the database service
docker-compose down
docker compose down
```

You may see several warnings about mixed types. The script could also take several minutes.
Expand Down
2 changes: 1 addition & 1 deletion backend/scraper/notebooks/cpdp.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"\n",
"```bash\n",
"# Stop services and remove volumes, rebuild images, start the database, create tables, run seeds, and follow logs\n",
"docker-compose down -v && docker-compose up --build -d db api && docker-compose logs -f\n",
"docker compose down -v && docker compose up --build -d db api && docker compose logs -f\n",
"```\n",
"\n",
"Then open the notebook with either [VSCode](https://code.visualstudio.com/) or `jupyter notebook`.\n",
Expand Down
2 changes: 1 addition & 1 deletion backend/scraper/notebooks/mpv.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
"\n",
"```bash\n",
"# Stop services and remove volumes, rebuild images, start the database, create tables, run seeds, and follow logs\n",
"docker-compose down -v && docker-compose up --build -d db api && docker-compose logs -f\n",
"docker compose down -v && docker compose up --build -d db api && docker compose logs -f\n",
"```\n",
"\n",
"Then open the notebook with either [VSCode](https://code.visualstudio.com/) or `jupyter notebook`.\n",
Expand Down
1 change: 0 additions & 1 deletion docker-compose.notebook.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
version: "3"
services:
api:
command: bash -c '/wait && flask psql create && flask psql init && jupyter notebook --allow-root --ip=0.0.0.0 --port=8889'
Expand Down
3 changes: 1 addition & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
version: "3"
services:
db:
image: postgres:16.2 #AWS RDS latest version
image: postgres:16 #AWS RDS latest version
env_file:
- ".env"
volumes:
Expand Down
6 changes: 3 additions & 3 deletions requirements/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# requirements, so this image starts with the same image as the database
# containers and installs the same version of python as the api containers

FROM postgres:16.2 as base
FROM postgres:16 as base

RUN apt-get update && apt-get install -y \
make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev \
Expand All @@ -15,9 +15,9 @@ SHELL ["bash", "-lc"]
RUN curl https://pyenv.run | bash && \
echo 'export PATH="$HOME/.pyenv/shims:$HOME/.pyenv/bin:$PATH"' >> ~/.bashrc

ENV PYTHON_VERSION=3.12.3
ENV PYTHON_VERSION=3.12.4
RUN pyenv install ${PYTHON_VERSION} && pyenv global ${PYTHON_VERSION}
RUN pip install pip-tools
RUN pip install -U pip-tools

COPY . requirements/

Expand Down
2 changes: 1 addition & 1 deletion requirements/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ python -m pip install -r requirements/dev_unix.txt

```bash
cd requirements
docker-compose up --build --force-recreate
docker compose up --build --force-recreate
```

If you run the application natively, first install the pip-compile tool:
Expand Down
4 changes: 3 additions & 1 deletion requirements/_core.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
bcrypt==3.2.2
black
boto3
celery
flake8
flask
Expand Down Expand Up @@ -35,4 +36,5 @@ numpy
spectree
jupyter
mixpanel
ua-parser
ua-parser
ujson
16 changes: 16 additions & 0 deletions requirements/dev_unix.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@ bleach==6.1.0
# via nbconvert
blinker==1.7.0
# via flask-mail
boto3==1.34.133
# via -r requirements/_core.in
botocore==1.34.133
# via
# boto3
# s3transfer
build==1.2.1
# via pip-tools
celery==5.3.6
Expand Down Expand Up @@ -186,6 +192,10 @@ jinja2==3.1.3
# jupyterlab
# jupyterlab-server
# nbconvert
jmespath==1.0.1
# via
# boto3
# botocore
json5==0.9.25
# via jupyterlab-server
jsonpointer==2.4
Expand Down Expand Up @@ -405,6 +415,7 @@ pytest-postgresql==5.1.0
python-dateutil==2.9.0
# via
# arrow
# botocore
# celery
# jupyter-client
# pandas
Expand Down Expand Up @@ -451,6 +462,8 @@ rpds-py==0.18.0
# via
# jsonschema
# referencing
s3transfer==0.10.2
# via boto3
send2trash==1.8.2
# via jupyter-server
six==1.16.0
Expand Down Expand Up @@ -528,10 +541,13 @@ tzdata==2024.1
# pandas
ua-parser==0.18.0
# via -r requirements/_core.in
ujson==5.10.0
# via -r requirements/_core.in
uri-template==1.3.0
# via jsonschema
urllib3==1.26.18
# via
# botocore
# mixpanel
# requests
vine==5.1.0
Expand Down
16 changes: 16 additions & 0 deletions requirements/dev_windows.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@ bleach==6.1.0
# via nbconvert
blinker==1.7.0
# via flask-mail
boto3==1.34.133
# via -r requirements/_core.in
botocore==1.34.133
# via
# boto3
# s3transfer
build==1.2.1
# via pip-tools
celery==5.3.6
Expand Down Expand Up @@ -186,6 +192,10 @@ jinja2==3.1.3
# jupyterlab
# jupyterlab-server
# nbconvert
jmespath==1.0.1
# via
# boto3
# botocore
json5==0.9.25
# via jupyterlab-server
jsonpointer==2.4
Expand Down Expand Up @@ -405,6 +415,7 @@ pytest-postgresql==5.1.0
python-dateutil==2.9.0
# via
# arrow
# botocore
# celery
# jupyter-client
# pandas
Expand Down Expand Up @@ -451,6 +462,8 @@ rpds-py==0.18.0
# via
# jsonschema
# referencing
s3transfer==0.10.2
# via boto3
send2trash==1.8.2
# via jupyter-server
six==1.16.0
Expand Down Expand Up @@ -528,10 +541,13 @@ tzdata==2024.1
# pandas
ua-parser==0.18.0
# via -r requirements/_core.in
ujson==5.10.0
# via -r requirements/_core.in
uri-template==1.3.0
# via jsonschema
urllib3==1.26.18
# via
# botocore
# mixpanel
# requests
vine==5.1.0
Expand Down
1 change: 0 additions & 1 deletion requirements/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
version: "3"
services:
pip-compile:
build:
Expand Down
Loading

0 comments on commit 6f7e084

Please sign in to comment.