-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Babak/upgrade triton to v2.34.0 #4
Conversation
…ton-inference-server#4921) * Add testing for empty gpu output tensor with cuda device setting * Fix up
fix broken links in documentation and add tests for backend
* Add sphinx support to build documentation
* added stable diffusion example
…-inference-server#4948) * Check if the format of override model config has extra fields * Revert previous changes * Check the server log for override config error message * Update test script * Fix up typo
* Update trace example in documentation * Add brief description on new trace options
Change metrics http content type from "text/plain" to "text/plain; charset=utf-8".
* Autoformat and add copyrights * Add copyright
…ference-server#4982) * Switch to symlink * Removing straight version on filenames, all these files are covered with 'SymLinks' and 'alternatives'
…riton-inference-server#4950) * Add test for cloud path outside model dir * Add test for absolute cloud path * Group py version strings
…n-inference-server#4962) * Add testing for loading no-autofill model with config override * Address comment * Modify the error message based on the new changes on the core side
* Add trt kUINT8 doc * Add trt uint8 to L0_infer * Expand uint8 test with other types
…p… (triton-inference-server#4991) * Update the limitation of multiple server binding to the same http/grpc port * Address comment * Address comment
…#4789) * Add testing for sequence and ensemble models * Add testing for C-API using system/CUDA memory * review edit
* skipping JAX example test
* Fix the copyrights in the files * Fix ups
* Add checks for trt uint8 support * Unify trt uint8 support check
Add fastertransformer test that uses 1GPU.
* Don't use mem probe in Jetson * Clarify failure messages in L0_backend_python * Update copyright * Add JIRA ref, fix _test_jetson
* Add testing for python custom metrics API * Add custom metrics example to the test * Fix for CodeQL report * Fix test name * Address comment * Add logger and change the enum usage
* Add HTTP client plugin test * Add testing for HTTP asyncio * Add async plugin support * Fix qa container for L0_grpc * Add testing for grpc client plugin * Remove unused imports * Fix up * Fix L0_grpc models QA folder * Update the test based on review feedback * Remove unused import * Add testing for .plugin method
* Add --metrics-address, add tests to L0_socket, re-order CLI options for consistency * Use non-localhost address
…ence-server#5739) * Add HTTP basic auth test * Add testing for gRPC basic auth * Fix up * Remove unused imports
…nce-server#5550) * Add multi-gpu, multi-stream testing for dlpack tensors
* Update python and conda version * Update CMAKE installation * Update checksum version * Update ubuntu base image to 22.04 * Use ORT 1.15.0 * Set CMAKE to pull latest version * Update libre package version * Removing unused argument * Adding condition for ubuntu 22.04 * Removing installation of the package from the devel container * Nnshah1 u22.04 (triton-inference-server#5770) * Update CMAKE installation * Update python and conda version * Update CMAKE installation * Update checksum version * Update ubuntu base image to 22.04 * updating versions for ubuntu 22.04 * remove re2 --------- Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: Neelay Shah <[email protected]> * Set ONNX version to 1.13.0 * Fix L0_custom_ops for ubuntu 22.04 (triton-inference-server#5775) * add back rapidjson-dev --------- Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: nv-kmcgill53 <[email protected]>
…-server#5796) (1) reduce MAX_ALLOWED_ALLOC to be more strict for bounded tests, and generous for unbounded tests. (2) allow unstable measurement from PA. (3) improve logging for future triage
…inference-server#5801) * Add note on --metrics-address * Copyright Co-authored-by: Ryan McCormick <[email protected]>
* working thread * remove default install of blinker * merge issue fixed
* Fix L0_backend_python/env test * Address comment * Update the copyright * Fix up
* installing python 3.8.16 for test * spelling Co-authored-by: Neelay Shah <[email protected]> * use util functions to install python3.8 in an easier way --------- Co-authored-by: Neelay Shah <[email protected]>
…r#5876) * Update README and add RELEASE notes for 23.05 * Update README and add RELEASE notes for 23.05 * Update README and add RELEASE notes for 23.05 * Update README and add RELEASE notes for 23.05 * Update README and add RELEASE notes for 23.05
# }, | ||
"use_edit_page_button": False, | ||
"use_issues_button": True, | ||
"use_repository_button": True, |
Check warning
Code scanning / CodeQL
Duplicate key in dict literal Warning documentation
overwritten
"This protocol is restricted"): | ||
raise user_data._completed_requests.get() | ||
|
||
self.assertTrue(user_data._completed_requests.empty()) |
Check warning
Code scanning / CodeQL
Unreachable code Warning
# Load same named model concurrently | ||
with concurrent.futures.ThreadPoolExecutor() as pool: | ||
# First load an 10 seconds delayed identity backend model | ||
thread_1 = pool.submit(triton_client.load_model, "identity_model") |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error
# responses with different GPU output shapes | ||
num_requests = 5 | ||
for _ in range(num_requests): | ||
result = self._client.async_infer(model_name=model_name, |
Check warning
Code scanning / CodeQL
Variable defined multiple times Warning
redefined
# Test combinations of BLS and decoupled API in Python backend. | ||
model_name = "decoupled_bls_stream" | ||
in_values = [4, 2, 0, 1] | ||
shape = [1] |
Check notice
Code scanning / CodeQL
Unused local variable Note
self.assertEqual(result.as_numpy('OUTPUT')[0], 1) | ||
|
||
result_start = triton_client.infer(model_name="no_state_update", inputs=inputs, sequence_id=correlation_id, sequence_end=True) | ||
result_start = triton_client.infer(model_name="no_state_update", |
Check notice
Code scanning / CodeQL
Unused local variable Note
# This test only care if the request goes through | ||
with self.assertRaisesRegex(InferenceServerException, | ||
"This protocol is restricted"): | ||
results = self.client_.infer(model_name=self.model_name_, |
Check notice
Code scanning / CodeQL
Unused local variable Note
# Load identity_zero_1_int32 and unload it while loading | ||
# The unload operation should wait until the load is completed | ||
with concurrent.futures.ThreadPoolExecutor() as pool: | ||
load_thread = pool.submit(triton_client.load_model, |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error
No description provided.