Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add debug setup for inference server & worker #3575

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,38 @@
"CUDA_VISIBLE_DEVICES": "1,2,3,4,5",
"OMP_NUM_THREADS": "1"
}
}
},
{
"name": "Debug: Inference Server",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}/inference/server",
"remoteRoot": "/opt/inference/server"
}
],
"justMyCode": false
},
{
"name": "Debug: Worker",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5679
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the different ports for server and worker

},
"pathMappings": [
{
"localRoot": "${workspaceFolder}/inference/worker",
"remoteRoot": "/opt/inference/worker"
}
],
"justMyCode": false
},
]
}
5 changes: 5 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -231,12 +231,14 @@ services:
TRUSTED_CLIENT_KEYS: "6969"
ALLOW_DEBUG_AUTH: "True"
API_ROOT: "http://localhost:8000"
DEBUG: "True"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the compose file is only meant for local development, so setting this here shouldn't be a problem?

volumes:
- "./oasst-shared:/opt/inference/lib/oasst-shared"
- "./inference/server:/opt/inference/server"
restart: unless-stopped
ports:
- "8000:8000"
- "5678:5678" # Port to attach debugger
depends_on:
inference-redis:
condition: service_healthy
Expand All @@ -254,9 +256,12 @@ services:
MODEL_CONFIG_NAME: ${MODEL_CONFIG_NAME:-distilgpt2}
BACKEND_URL: "ws://inference-server:8000"
PARALLELISM: 2
DEBUG: "True"
volumes:
- "./oasst-shared:/opt/inference/lib/oasst-shared"
- "./inference/worker:/opt/inference/worker"
ports:
- "5679:5679" # Port to attach debugger
deploy:
replicas: 1
profiles: ["inference"]
Expand Down
4 changes: 2 additions & 2 deletions docker/inference/Dockerfile.server
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ USER ${APP_USER}
VOLUME [ "${APP_BASE}/lib/oasst-shared" ]
VOLUME [ "${APP_BASE}/lib/oasst-data" ]


CMD uvicorn main:app --reload --host 0.0.0.0 --port "${PORT}"
# In the dev image, we start uvicorn from Python so that we can attach the debugger
CMD python main.py



Expand Down
16 changes: 16 additions & 0 deletions inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,22 @@ python __main__.py
# You'll soon see a `User:` prompt, where you can type your prompts.
```

## Debugging

The inference server and the worker allow attaching a Python debugger.
To do this from VS Code, start the inference server & worker using docker compose as described above
(e.g. with `docker compose --profile inference up --build`), then simply pick one of the following launch
profiles, depending on what you would like to debug:
- Debug: Inference Server
- Debug: Worker

### Waiting for Debugger on Startup
It can be helpful to wait for the debugger before starting the application.
This can be achieved by uncommenting `debugpy.wait_for_client()` in the appropriate location:
- `inference/server/main.py` for the inference server
- `inference/worker/__main.py__` for the worker


## Distributed Testing

We run distributed load tests using the
Expand Down
16 changes: 16 additions & 0 deletions inference/server/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,3 +148,19 @@ async def maybe_add_debug_api_keys():
async def welcome_message():
logger.warning("Inference server started")
logger.warning("To stop the server, press Ctrl+C")


if __name__ == "__main__":
import uvicorn
import os

port = int(os.getenv('PORT', "8000"))
is_debug = bool(os.getenv("DEBUG", "False"))

if is_debug:
import debugpy
debugpy.listen(("0.0.0.0", "5679"))
# Uncomment to wait here until a debugger is attached
# debugpy.wait_for_client()

uvicorn.run("main:app", host="0.0.0.0", port=port, reload=is_debug)
Copy link
Contributor Author

@0xfacade 0xfacade Jul 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method of starting the server is only used for development - the docker image for production still invokes the uvicorn command. I could change that to also use python main.py instead for consistency, if desired.

1 change: 1 addition & 0 deletions inference/server/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ asyncpg
authlib
beautifulsoup4 # web_retriever plugin
cryptography==39.0.0
debugpy
fastapi-limiter
fastapi[all]==0.88.0
google-api-python-client
Expand Down
10 changes: 10 additions & 0 deletions inference/worker/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
import time
from contextlib import closing

import os

import pydantic
import transformers
import utils
Expand Down Expand Up @@ -130,4 +132,12 @@ def main():


if __name__ == "__main__":
is_debug = bool(os.getenv("DEBUG", "False"))

if is_debug:
import debugpy
debugpy.listen(("0.0.0.0", "5679"))
# Uncomment to wait here until a debugger is attached
# debugpy.wait_for_client()

main()
1 change: 1 addition & 0 deletions inference/worker/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
aiohttp
debugpy
hf_transfer
huggingface_hub
langchain==0.0.142
Expand Down