Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Babak/upgrade triton to v2.34.0 #4

Closed
wants to merge 370 commits into from

Conversation

babakbehzad
Copy link

No description provided.

krishung5 and others added 30 commits September 28, 2022 17:42
…ton-inference-server#4921)

* Add testing for empty gpu output tensor with cuda device setting

* Fix up
fix broken links in documentation and add tests for backend
* Add sphinx support to build documentation
…-inference-server#4948)

* Check if the format of override model config has extra fields

* Revert previous changes

* Check the server log for override config  error message

* Update test script

* Fix up typo
* Update trace example in  documentation

* Add brief description on new trace options
Change metrics http content type from "text/plain" to "text/plain; charset=utf-8".
* Autoformat and add copyrights

* Add copyright
…ference-server#4982)

* Switch to symlink

* Removing straight version on filenames, all these files are covered with 'SymLinks' and 'alternatives'
…riton-inference-server#4950)

* Add test for cloud path outside model dir

* Add test for absolute cloud path

* Group py version strings
…n-inference-server#4962)

* Add testing for loading no-autofill model with config override

* Address comment

* Modify the error message based on the new changes on the core side
* Add trt kUINT8 doc

* Add trt uint8 to L0_infer

* Expand uint8 test with other types
…p… (triton-inference-server#4991)

* Update the limitation of multiple server binding to the same http/grpc port

* Address comment

* Address comment
…#4789)

* Add testing for sequence and ensemble models

* Add testing for C-API using system/CUDA memory

* review edit
* Fix the copyrights in the files

* Fix ups
* Add checks for trt uint8 support

* Unify trt uint8 support check
jbkyang-nvi and others added 25 commits May 2, 2023 20:30
Add fastertransformer test that uses 1GPU.
* Don't use mem probe in Jetson

* Clarify failure messages in L0_backend_python

* Update copyright

* Add JIRA ref, fix _test_jetson
* Add testing for python custom metrics API

* Add custom metrics example to the test

* Fix for CodeQL report

* Fix test name

* Address comment

* Add logger and change the enum usage
* Add HTTP client plugin test

* Add testing for HTTP asyncio

* Add async plugin support

* Fix qa container for L0_grpc

* Add testing for grpc client plugin

* Remove unused imports

* Fix up

* Fix L0_grpc models QA folder

* Update the test based on review feedback

* Remove unused import

* Add testing for .plugin method
* Add --metrics-address, add tests to L0_socket, re-order CLI options for consistency

* Use non-localhost address
…ence-server#5739)

* Add HTTP basic auth test

* Add testing for gRPC basic auth

* Fix up

* Remove unused imports
* Update python and conda version

* Update CMAKE installation

* Update checksum version

* Update ubuntu base image to 22.04

* Use ORT 1.15.0

* Set CMAKE to pull latest version

* Update libre package version

* Removing unused argument

* Adding condition for ubuntu 22.04

* Removing installation of the package from the devel container

* Nnshah1 u22.04 (triton-inference-server#5770)

* Update CMAKE installation

* Update python and conda version

* Update CMAKE installation

* Update checksum version

* Update ubuntu base image to 22.04

* updating versions for ubuntu 22.04

* remove re2

---------

Co-authored-by: Neelay Shah <[email protected]>
Co-authored-by: Neelay Shah <[email protected]>

* Set ONNX version to 1.13.0

* Fix L0_custom_ops for ubuntu 22.04 (triton-inference-server#5775)

* add back rapidjson-dev

---------

Co-authored-by: Neelay Shah <[email protected]>
Co-authored-by: Neelay Shah <[email protected]>
Co-authored-by: nv-kmcgill53 <[email protected]>
…-server#5796)

(1) reduce MAX_ALLOWED_ALLOC to be more strict for bounded tests, and generous for unbounded tests.
(2) allow unstable measurement from PA.
(3) improve logging for future triage
…inference-server#5801)

* Add note on --metrics-address

* Copyright

Co-authored-by: Ryan McCormick <[email protected]>
* working thread

* remove default install of blinker

* merge issue fixed
* Fix L0_backend_python/env test

* Address comment

* Update the copyright

* Fix up
* installing python 3.8.16 for test

* spelling

Co-authored-by: Neelay Shah <[email protected]>

* use util functions to install python3.8 in an easier way

---------

Co-authored-by: Neelay Shah <[email protected]>
…r#5876)

* Update README and add RELEASE notes for 23.05

* Update README and add RELEASE notes for 23.05

* Update README and add RELEASE notes for 23.05

* Update README and add RELEASE notes for 23.05

* Update README and add RELEASE notes for 23.05
# },
"use_edit_page_button": False,
"use_issues_button": True,
"use_repository_button": True,

Check warning

Code scanning / CodeQL

Duplicate key in dict literal Warning documentation

Dictionary key 'use_repository_button' is subsequently
overwritten
.
"This protocol is restricted"):
raise user_data._completed_requests.get()

self.assertTrue(user_data._completed_requests.empty())

Check warning

Code scanning / CodeQL

Unreachable code Warning

This statement is unreachable.
# Load same named model concurrently
with concurrent.futures.ThreadPoolExecutor() as pool:
# First load an 10 seconds delayed identity backend model
thread_1 = pool.submit(triton_client.load_model, "identity_model")

Check failure

Code scanning / CodeQL

Potentially uninitialized local variable Error

Local variable 'triton_client' may be used before it is initialized.
# responses with different GPU output shapes
num_requests = 5
for _ in range(num_requests):
result = self._client.async_infer(model_name=model_name,

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning

This assignment to 'result' is unnecessary as it is
redefined
before this value is used.
# Test combinations of BLS and decoupled API in Python backend.
model_name = "decoupled_bls_stream"
in_values = [4, 2, 0, 1]
shape = [1]

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable shape is not used.
self.assertEqual(result.as_numpy('OUTPUT')[0], 1)

result_start = triton_client.infer(model_name="no_state_update", inputs=inputs, sequence_id=correlation_id, sequence_end=True)
result_start = triton_client.infer(model_name="no_state_update",

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable result_start is not used.
# This test only care if the request goes through
with self.assertRaisesRegex(InferenceServerException,
"This protocol is restricted"):
results = self.client_.infer(model_name=self.model_name_,

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable results is not used.
# Load identity_zero_1_int32 and unload it while loading
# The unload operation should wait until the load is completed
with concurrent.futures.ThreadPoolExecutor() as pool:
load_thread = pool.submit(triton_client.load_model,

Check failure

Code scanning / CodeQL

Potentially uninitialized local variable Error

Local variable 'triton_client' may be used before it is initialized.
@babakbehzad babakbehzad closed this Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.