forked from hpcaitech/ColossalAI
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Feature] Add document retrieval QA (hpcaitech#5020)
* add langchain * add langchain * Add files via upload * add langchain * fix style * fix style: remove extra space * add pytest; modified retriever * add pytest; modified retriever * add tests to build_on_pr.yml * fix build_on_pr.yml * fix build on pr; fix environ vars * seperate unit tests for colossalqa from build from pr * fix container setting; fix environ vars * commented dev code * add incremental update * remove stale code * fix style * change to sha3 224 * fix retriever; fix style; add unit test for document loader * fix ci workflow config * fix ci workflow config * add set cuda visible device script in ci * fix doc string * fix style; update readme; refactored * add force log info * change build on pr, ignore colossalqa * fix docstring, captitalize all initial letters * fix indexing; fix text-splitter * remove debug code, update reference * reset previous commit * update LICENSE update README add key-value mode, fix bugs * add files back * revert force push * remove junk file * add test files * fix retriever bug, add intent classification * change conversation chain design * rewrite prompt and conversation chain * add ui v1 * ui v1 * fix atavar * add header * Refactor the RAG Code and support Pangu * Refactor the ColossalQA chain to Object-Oriented Programming and the UI demo. * resolved conversation. tested scripts under examples. web demo still buggy * fix ci tests * Some modifications to add ChatGPT api * modify llm.py and remove unnecessary files * Delete applications/ColossalQA/examples/ui/test_frontend_input.json * Remove OpenAI api key * add colossalqa * move files * move files * move files * move files * fix style * Add Readme and fix some bugs. * Add something to readme and modify some code * modify a directory name for clarity * remove redundant directory * Correct a type in llm.py * fix AI prefix * fix test_memory.py * fix conversation * fix some erros and typos * Fix a missing import in RAG_ChatBot.py * add colossalcloud LLM wrapper, correct issues in code review --------- Co-authored-by: YeAnbang <[email protected]> Co-authored-by: Orion-Zheng <[email protected]> Co-authored-by: Zian(Andy) Zheng <[email protected]> Co-authored-by: Orion-Zheng <[email protected]>
- Loading branch information
1 parent
3acbf6d
commit e53e729
Showing
69 changed files
with
6,758 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
name: Run colossalqa unit tests | ||
|
||
on: | ||
pull_request: | ||
types: [synchronize, opened, reopened] | ||
paths: | ||
- 'applications/ColossalQA/colossalqa/**' | ||
- 'applications/ColossalQA/requirements.txt' | ||
- 'applications/ColossalQA/setup.py' | ||
- 'applications/ColossalQA/tests/**' | ||
- 'applications/ColossalQA/pytest.ini' | ||
|
||
jobs: | ||
tests: | ||
name: Run colossalqa unit tests | ||
if: | | ||
github.event.pull_request.draft == false && | ||
github.base_ref == 'main' && | ||
github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI' | ||
runs-on: [self-hosted, gpu] | ||
container: | ||
image: hpcaitech/pytorch-cuda:1.12.0-11.3.0 | ||
volumes: | ||
- /data/scratch/test_data_colossalqa:/data/scratch/test_data_colossalqa | ||
- /data/scratch/llama-tiny:/data/scratch/llama-tiny | ||
options: --gpus all --rm | ||
timeout-minutes: 30 | ||
defaults: | ||
run: | ||
shell: bash | ||
steps: | ||
- name: Checkout ColossalAI | ||
uses: actions/checkout@v2 | ||
|
||
- name: Install colossalqa | ||
run: | | ||
cd applications/ColossalQA | ||
pip install -e . | ||
- name: Execute Unit Testing | ||
run: | | ||
cd applications/ColossalQA | ||
pytest tests/ | ||
env: | ||
NCCL_SHM_DISABLE: 1 | ||
MAX_JOBS: 8 | ||
ZH_MODEL_PATH: bigscience/bloom-560m | ||
ZH_MODEL_NAME: bloom | ||
EN_MODEL_PATH: bigscience/bloom-560m | ||
EN_MODEL_NAME: bloom | ||
TEST_DATA_PATH_EN: /data/scratch/test_data_colossalqa/companies.txt | ||
TEST_DATA_PATH_ZH: /data/scratch/test_data_colossalqa/companies_zh.txt | ||
TEST_DOCUMENT_LOADER_DATA_PATH: /data/scratch/test_data_colossalqa/tests/* | ||
SQL_FILE_PATH: /data/scratch/test_data_colossalqa/sql_file_path |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
pip-wheel-metadata/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
*.py,cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
docs/.build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow | ||
__pypackages__/ | ||
|
||
# Celery stuff | ||
celerybeat-schedule | ||
celerybeat.pid | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# IDE | ||
.idea/ | ||
.vscode/ | ||
|
||
# macos | ||
*.DS_Store | ||
#data/ | ||
|
||
docs/.build | ||
|
||
# pytorch checkpoint | ||
*.pt | ||
|
||
# sql | ||
*.db | ||
|
||
# wandb log | ||
example/wandb/ | ||
example/ui/gradio/ | ||
example/vector_db_for_test | ||
examples/awesome-chatgpt-prompts/ |
Oops, something went wrong.