Skip to content

Commit

Permalink
Dockerfile improvements
Browse files Browse the repository at this point in the history
Make upgrades to Django and django-rest-framework to reduce CVEs.
  • Loading branch information
chosak committed Aug 5, 2024
1 parent 81415d4 commit 9831d3a
Show file tree
Hide file tree
Showing 3 changed files with 60 additions and 8 deletions.
18 changes: 12 additions & 6 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,12 @@ ENV LANG en_US.UTF-8
# Disable pip cache dir.
ENV PIP_NO_CACHE_DIR 1

# Allow pip install as root.
ENV PIP_ROOT_USER_ACTION ignore

# Stops Python default buffering to stdout, improving logging to the console.
ENV PYTHONUNBUFFERED 1

# Define app home and workdir.
ENV APP_HOME /usr/src/app
WORKDIR $APP_HOME

# Create a non-root user for the container.
ARG USERNAME=app
ARG USER_UID=1000
Expand All @@ -24,9 +23,12 @@ RUN addgroup \
--uid $USER_UID \
--ingroup $USERNAME \
--disabled-password \
--no-create-home \
$USERNAME

# Define app home and workdir.
ENV APP_HOME /home/$USERNAME
WORKDIR $APP_HOME

# Copy the whole project except for what is in .dockerignore.
COPY --chown=$USERNAME:$USERNAME . .

Expand All @@ -53,6 +55,11 @@ RUN set -eux; \
; \
pip install -U pip; \
pip install --no-cache-dir -r requirements/base.txt; \
# Remove keys that aren't needed by the application but would be
# flagged as a vulnerability by our Docker image scanner.
rm /usr/local/lib/python3.12/site-packages/tornado/test/test.key; \
rm /usr/local/lib/python3.12/site-packages/wpull/proxy/proxy.key; \
rm /usr/local/lib/python3.12/site-packages/wpull/testing/test.pem; \
apk del .backend-deps

# Build the frontend.
Expand All @@ -71,7 +78,6 @@ RUN set -eux; \
rm -rf ./node_modules; \
apk del .frontend-deps


# Run the application with the user we created.
USER $USERNAME

Expand Down
46 changes: 46 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ and export results as CSV or JSON reports.

## Crawling a website

### Using a Python virtual environment

Create a Python virtual environment and install required packages:

```
Expand All @@ -37,6 +39,23 @@ Crawl a website:
./manage.py crawl https://www.consumerfinance.gov crawl.sqlite3
```

### Using Docker

To build the Docker image:

```
docker build -t website-indexer:main .
```

Crawl a website:

```
docker run -it \
-p 8000:8000 \
-v `pwd`:/data website-indexer:main \
python manage.py crawl https://www.consumerfinance.gov /data/crawl.sqlite3
```

## Searching the crawl database

You can use the
Expand Down Expand Up @@ -129,6 +148,8 @@ sqlite> SELECT url FROM crawler_page WHERE html LIKE "%<br>%" ORDER BY URL asc;

## Running the viewer application

### Using a Python virtual environment

From the repo's root, compile frontend assets:

```
Expand Down Expand Up @@ -165,6 +186,31 @@ Finally, run the Django webserver:

The viewer application will be available locally at http://localhost:8000.

### Using Docker

To build the Docker image:

```
docker build -t website-indexer:main .
```

To run the image using sample data:

```
docker run -it -p 8000:8000 website-indexer:main
```

To run the image using a local database dump:

```
docker run \
-it \
-p 8000:8000 \
-v /path/to/local/dump:/data \
-e CRAWL_DATABASE=/data/crawl.sqlite3 \
website-indexer:main
```

## Development

### Testing
Expand Down
4 changes: 2 additions & 2 deletions requirements/base.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
beautifulsoup4==4.12.2
click==8.0.4
cssselect==1.1.0
Django==3.2.22
Django==3.2.25
django-click==2.3.0
django-debug-toolbar==3.2.4
django-filter==21.1
django-modelcluster==5.3
djangorestframework==3.13.1
djangorestframework==3.15.1
djangorestframework-csv==2.1.1
whitenoise==5.3.0

Expand Down

0 comments on commit 9831d3a

Please sign in to comment.