Babak/upgrade triton to v2.34.0 #4

babakbehzad · 2024-03-20T20:16:15Z

No description provided.

…ton-inference-server#4921) * Add testing for empty gpu output tensor with cuda device setting * Fix up

fix broken links in documentation and add tests for backend

* Add sphinx support to build documentation

* added stable diffusion example

…-inference-server#4948) * Check if the format of override model config has extra fields * Revert previous changes * Check the server log for override config error message * Update test script * Fix up typo

* Update trace example in documentation * Add brief description on new trace options

Change metrics http content type from "text/plain" to "text/plain; charset=utf-8".

* Autoformat and add copyrights * Add copyright

…ference-server#4982) * Switch to symlink * Removing straight version on filenames, all these files are covered with 'SymLinks' and 'alternatives'

…riton-inference-server#4950) * Add test for cloud path outside model dir * Add test for absolute cloud path * Group py version strings

…n-inference-server#4962) * Add testing for loading no-autofill model with config override * Address comment * Modify the error message based on the new changes on the core side

* Add trt kUINT8 doc * Add trt uint8 to L0_infer * Expand uint8 test with other types

…p… (triton-inference-server#4991) * Update the limitation of multiple server binding to the same http/grpc port * Address comment * Address comment

…4993)

…#4789) * Add testing for sequence and ensemble models * Add testing for C-API using system/CUDA memory * review edit

…#4721)

* skipping JAX example test

…r#5019)

…er#5022)

* Fix the copyrights in the files * Fix ups

* Add checks for trt uint8 support * Unify trt uint8 support check

)

Add fastertransformer test that uses 1GPU.

* Don't use mem probe in Jetson * Clarify failure messages in L0_backend_python * Update copyright * Add JIRA ref, fix _test_jetson

* Add testing for python custom metrics API * Add custom metrics example to the test * Fix for CodeQL report * Fix test name * Address comment * Add logger and change the enum usage

* Add HTTP client plugin test * Add testing for HTTP asyncio * Add async plugin support * Fix qa container for L0_grpc * Add testing for grpc client plugin * Remove unused imports * Fix up * Fix L0_grpc models QA folder * Update the test based on review feedback * Remove unused import * Add testing for .plugin method

* Add --metrics-address, add tests to L0_socket, re-order CLI options for consistency * Use non-localhost address

…ence-server#5739) * Add HTTP basic auth test * Add testing for gRPC basic auth * Fix up * Remove unused imports

…nce-server#5550) * Add multi-gpu, multi-stream testing for dlpack tensors

…erver#5723)

…ence-server#5753)

* Update python and conda version * Update CMAKE installation * Update checksum version * Update ubuntu base image to 22.04 * Use ORT 1.15.0 * Set CMAKE to pull latest version * Update libre package version * Removing unused argument * Adding condition for ubuntu 22.04 * Removing installation of the package from the devel container * Nnshah1 u22.04 (triton-inference-server#5770) * Update CMAKE installation * Update python and conda version * Update CMAKE installation * Update checksum version * Update ubuntu base image to 22.04 * updating versions for ubuntu 22.04 * remove re2 --------- Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: Neelay Shah <[email protected]> * Set ONNX version to 1.13.0 * Fix L0_custom_ops for ubuntu 22.04 (triton-inference-server#5775) * add back rapidjson-dev --------- Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: nv-kmcgill53 <[email protected]>

…-server#5796) (1) reduce MAX_ALLOWED_ALLOC to be more strict for bounded tests, and generous for unbounded tests. (2) allow unstable measurement from PA. (3) improve logging for future triage

…inference-server#5801) * Add note on --metrics-address * Copyright Co-authored-by: Ryan McCormick <[email protected]>

* working thread * remove default install of blinker * merge issue fixed

* Fix L0_backend_python/env test * Address comment * Update the copyright * Fix up

* installing python 3.8.16 for test * spelling Co-authored-by: Neelay Shah <[email protected]> * use util functions to install python3.8 in an easier way --------- Co-authored-by: Neelay Shah <[email protected]>

…r#5876) * Update README and add RELEASE notes for 23.05 * Update README and add RELEASE notes for 23.05 * Update README and add RELEASE notes for 23.05 * Update README and add RELEASE notes for 23.05 * Update README and add RELEASE notes for 23.05

docs/conf.py

+    # },
+    "use_edit_page_button": False,
+    "use_issues_button": True,
+    "use_repository_button": True,


qa/L0_grpc/python_unit_test.py

+                                    "This protocol is restricted"):
+            raise user_data._completed_requests.get()
+
+        self.assertTrue(user_data._completed_requests.empty())


qa/L0_lifecycle/lifecycle_test.py

+        # Load same named model concurrently
+        with concurrent.futures.ThreadPoolExecutor() as pool:
+            # First load an 10 seconds delayed identity backend model
+            thread_1 = pool.submit(triton_client.load_model, "identity_model")


qa/L0_backend_python/io/io_test.py

+        # responses with different GPU output shapes
+        num_requests = 5
+        for _ in range(num_requests):
+            result = self._client.async_infer(model_name=model_name,


qa/L0_backend_python/decoupled/decoupled_test.py

+        # Test combinations of BLS and decoupled API in Python backend.
+        model_name = "decoupled_bls_stream"
+        in_values = [4, 2, 0, 1]
+        shape = [1]


qa/L0_implicit_state/implicit_state.py

            self.assertEqual(result.as_numpy('OUTPUT')[0], 1)

-        result_start = triton_client.infer(model_name="no_state_update", inputs=inputs, sequence_id=correlation_id, sequence_end=True)
+        result_start = triton_client.infer(model_name="no_state_update",


qa/L0_grpc/python_unit_test.py

+        # This test only care if the request goes through
+        with self.assertRaisesRegex(InferenceServerException,
+                                    "This protocol is restricted"):
+            results = self.client_.infer(model_name=self.model_name_,


qa/L0_lifecycle/lifecycle_test.py

+        # Load identity_zero_1_int32 and unload it while loading
+        # The unload operation should wait until the load is completed
+        with concurrent.futures.ThreadPoolExecutor() as pool:
+            load_thread = pool.submit(triton_client.load_model,


krishung5 and others added 30 commits September 28, 2022 17:42

Add testing for empty GPU output tensor with CUDA device setting (tri…

da0d0f2

…ton-inference-server#4921) * Add testing for empty gpu output tensor with cuda device setting * Fix up

Update the comment for the limitations of build.py script (triton-inf…

9c093b4

…erence-server#4933)

Update README and versions for 22.09 branch

b72fe10

Fix L0_custom_ops (triton-inference-server#4873)

03dda11

Update to model generation script

d5424fe

Adding CI docker arguments evaluation to map appropriate device

d293325

fix broken links in server (triton-inference-server#4926)

a2240c5

fix broken links in documentation and add tests for backend

Add sphinx support for documentation (triton-inference-server#4949)

e9ef15b

* Add sphinx support to build documentation

added stable diffusion example (triton-inference-server#4939)

80a0008

* added stable diffusion example

Check if the format of override model config has extra fields (triton…

5072e8a

…-inference-server#4948) * Check if the format of override model config has extra fields * Revert previous changes * Check the server log for override config error message * Update test script * Fix up typo

Update trace example in documentation (triton-inference-server#4951)

76366af

* Update trace example in documentation * Add brief description on new trace options

Update http_server.cc (triton-inference-server#4701)

d47847c

Change metrics http content type from "text/plain" to "text/plain; charset=utf-8".

Revert per response metrics (triton-inference-server#4960)

a31b3f5

Add Copyrights (triton-inference-server#4968)

a39dfb2

* Autoformat and add copyrights * Add copyright

Update master to track development for 2.28.0 / r22.11

93896ad

Using alternative managed symbolic links for library paths (triton-in…

f036cb0

…ference-server#4982) * Switch to symlink * Removing straight version on filenames, all these files are covered with 'SymLinks' and 'alternatives'

Add test for python cloud EXECUTION_ENV_PATH outside model directory (t…

5bfb7fe

…riton-inference-server#4950) * Add test for cloud path outside model dir * Add test for absolute cloud path * Group py version strings

Add testing for loading no-autofill model with config override (trito…

ac98a02

…n-inference-server#4962) * Add testing for loading no-autofill model with config override * Address comment * Modify the error message based on the new changes on the core side

Add TensorRT uint8 models and tests (triton-inference-server#4946)

0301b7a

* Add trt kUINT8 doc * Add trt uint8 to L0_infer * Expand uint8 test with other types

Update the limitation of multiple servers binding to the same http/gr…

dfa101c

…p… (triton-inference-server#4991) * Update the limitation of multiple server binding to the same http/grpc port * Address comment * Address comment

Adding STD_FLAG to select valid C++ dialect (triton-inference-server#…

ff131f3

…4993)

Add testing for sequence and ensemble models (triton-inference-server…

8e1e98a

…#4789) * Add testing for sequence and ensemble models * Add testing for C-API using system/CUDA memory * review edit

Update to support parsing string tensor data (triton-inference-server…

8e43f84

…#4721)

Skip testing JAX example for Jetson (triton-inference-server#5012)

1b950f3

* skipping JAX example test

Fix broken link (triton-inference-server#5015)

af94496

Increase the L0_backend_python client_timeout (triton-inference-serve…

a3938e8

…r#5019)

Document JIT graph optimizations for TF models (triton-inference-serv…

f188279

…er#5022)

Fix L0_copyrights test (triton-inference-server#5029)

8c8014f

* Fix the copyrights in the files * Fix ups

Add checks for TRT uint8 support (triton-inference-server#5006)

ebd7aab

* Add checks for trt uint8 support * Unify trt uint8 support check

Enable request rate manager test for C-API (triton-inference-server#5030

539be45

)

jbkyang-nvi and others added 25 commits May 2, 2023 20:30

Add fastertransformer test (triton-inference-server#5500)

049ea02

Add fastertransformer test that uses 1GPU.

Fix L0_backend_python on Jetson (triton-inference-server#5728)

0a51f7e

* Don't use mem probe in Jetson * Clarify failure messages in L0_backend_python * Update copyright * Add JIRA ref, fix _test_jetson

Add testing for Python custom metrics API (triton-inference-server#5669)

734363f

* Add testing for python custom metrics API * Add custom metrics example to the test * Fix for CodeQL report * Fix test name * Address comment * Add logger and change the enum usage

Install jemalloc (triton-inference-server#5738)

c7df57a

Add --metrics-address and testing (triton-inference-server#5737)

1046b0f

* Add --metrics-address, add tests to L0_socket, re-order CLI options for consistency * Use non-localhost address

Add testing for basic auth plugin for HTTP/gRPC clients (triton-infer…

151376e

…ence-server#5739) * Add HTTP basic auth test * Add testing for gRPC basic auth * Fix up * Remove unused imports

Add multi-gpu, multi-stream testing for dlpack tensors (triton-infere…

be4493f

…nce-server#5550) * Add multi-gpu, multi-stream testing for dlpack tensors

Update note on SageMaker MME support for ensemble (triton-inference-s…

1b12110

…erver#5723)

Run L0_backend_python subtests with virtual environment (triton-infer…

bd8f4a7

…ence-server#5753)

Update README and versions for 23.05 branch

cf26885

Update README.md for 23.05

4f6272d

Fix L0_memory_growth (triton-inference-server#5795) (triton-inference…

ea13bf6

…-server#5796) (1) reduce MAX_ALLOWED_ALLOC to be more strict for bounded tests, and generous for unbounded tests. (2) allow unstable measurement from PA. (3) improve logging for future triage

Add note on --metrics-address (triton-inference-server#5800) (triton-…

7103c1e

…inference-server#5801) * Add note on --metrics-address * Copyright Co-authored-by: Ryan McCormick <[email protected]>

Fix L0_mlflow (triton-inference-server#5805)

f982ec4

* working thread * remove default install of blinker * merge issue fixed

Fix L0_backend_python/env test (triton-inference-server#5799)

335cf59

* Fix L0_backend_python/env test * Address comment * Update the copyright * Fix up

Fix L0_http_fuzz (triton-inference-server#5776)

19af658

* installing python 3.8.16 for test * spelling Co-authored-by: Neelay Shah <[email protected]> * use util functions to install python3.8 in an easier way --------- Co-authored-by: Neelay Shah <[email protected]>

Update Windows versions for 23.05 release (triton-inference-server#5826)

414b098

Rename Ubuntu 20.04 mentions to 22.04 (triton-inference-server#5849)

cb3d77e

Update DCGM version (triton-inference-server#5856)

edcd4c5

Update DCGM version (triton-inference-server#5857)

2e1bd0f

downgrade DCGM version to 2.4.7 (triton-inference-server#5860)

3c4f7cb

Updating link for latest release notes to 23.05

02700fa

github-advanced-security bot found potential problems Mar 20, 2024

View reviewed changes

babakbehzad added 3 commits March 20, 2024 20:04

Fix a triton_user id check due to verkada user

60d7f04

Fix libre2-5 because we are on ubuntu 20.04

8438505

Fix another issue with ldconfig

71e7bdd

babakbehzad closed this Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Babak/upgrade triton to v2.34.0 #4

Babak/upgrade triton to v2.34.0 #4

babakbehzad commented Mar 20, 2024

Babak/upgrade triton to v2.34.0 #4

Babak/upgrade triton to v2.34.0 #4

Conversation

babakbehzad commented Mar 20, 2024