-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Client-side input shape/element validation #7427
base: main
Are you sure you want to change the base?
Conversation
* Fix gRPC test failure and refactor * Add gRPC AsyncIO cancellation tests * Better check if a request is cancelled * Use f-string
* Fixing torch version for vllm
* Switch Jetson model TensorRT models generation to container * Adding missed file * Fix typo * Fix typos * Remove extra spaces * Fix typo
* Ensure notify_state_ gets properly destructed * Fix inflight state tracking to properly erase states * Prevent removing the notify_state from being erased * Wrap notify_state_ object within unique_ptr
* TRTLLM backend post release * TRTLLM backend post release * Update submodule url for permission issue * Update submodule url * Fix up * Not using postbuild function to workaround submodule url permission issue
Co-authored-by: Neelay Shah <[email protected]>
* Minor fix for L0_model_config
* Test with different sizes of CUDA memory pool * Check the server log for error message * Improve debugging * Fix syntax
Co-authored-by: dyastremsky <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>
* Update README and versions for 23.10 branch (#6399) * Cherry-picking vLLM backend changes (#6404) * Update build.py to build vLLM backend (#6394) * Add Python backend when vLLM backend built (#6397) --------- Co-authored-by: dyastremsky <[email protected]> * Add documentation on request cancellation (#6403) (#6407) * Add documentation on request cancellation * Include python backend * Update docs/user_guide/request_cancellation.md * Update docs/user_guide/request_cancellation.md * Update docs/README.md * Update docs/user_guide/request_cancellation.md * Remove inflight term from the main documentation * Address review comments * Fix * Update docs/user_guide/request_cancellation.md * Fix --------- Co-authored-by: Iman Tabrizian <[email protected]> Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: Ryan McCormick <[email protected]> Co-authored-by: Jacky <[email protected]> * Fixes in request cancellation doc (#6409) (#6410) * TRT-LLM backend build changes (#6406) (#6430) * Update url * Debugging * Debugging * Update url * Fix build for TRT-LLM backend * Remove TRTLLM TRT and CUDA versions * Fix up unused var * Fix up dir name * FIx cmake patch * Remove previous TRT version * Install required packages for example models * Remove packages that are only needed for testing * Fixing vllm build (#6433) (#6437) * Fixing torch version for vllm Co-authored-by: Olga Andreeva <[email protected]> * Update TRT-LLM backend url (#6455) (#6460) * TRTLLM backend post release * TRTLLM backend post release * Update submodule url for permission issue * Update submodule url * Fix up * Not using postbuild function to workaround submodule url permission issue * remove redundant lines * Revert "remove redundant lines" This reverts commit 86be7ad. * restore missed lines * Update build.py Co-authored-by: Olga Andreeva <[email protected]> * Update build.py Co-authored-by: Olga Andreeva <[email protected]> --------- Co-authored-by: Tanmay Verma <[email protected]> Co-authored-by: dyastremsky <[email protected]> Co-authored-by: Iman Tabrizian <[email protected]> Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: Ryan McCormick <[email protected]> Co-authored-by: Jacky <[email protected]> Co-authored-by: Kris Hung <[email protected]> Co-authored-by: Katherine Yang <[email protected]> Co-authored-by: Olga Andreeva <[email protected]>
…ycle) (#6490) * Test torch allocator gpu memory usage directly rather than global gpu memory for more consistency
* Add testing backend and test * Add test to build / CI. Minor fix on L0_http * Format. Update backend documentation * Fix up * Address comment * Add negative testing * Fix up
…n from test script. (#6499)
* Use postbuild function * Remove updating submodule url
* Added testing for python_backend autocomplete: optional input and model_transaction_policy
Co-authored-by: Francesco Petrini <[email protected]>
* Fixing L0_io
* Bumped vllm version * Add python-bsed backends testing * Add python-based backends CI * Fix errors * Add vllm backend * Fix pre-commit * Modify test.sh * Remove vllm_opt qa model * Remove vLLM ackend tests * Resolve review comments * Fix pre-commit errors * Update qa/L0_backend_python/python_based_backends/python_based_backends_test.py Co-authored-by: Tanmay Verma <[email protected]> * Remove collect_artifacts_from_subdir function call --------- Co-authored-by: oandreeva-nv <[email protected]> Co-authored-by: Tanmay Verma <[email protected]>
… pairs (similar to gRPC)
* Add boost-filesystem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple questions
chore: PA Migration From Client
f432d41
to
3863c39
Compare
17053e9
to
482409e
Compare
qa/L0_input_validation/test.sh
Outdated
} | ||
EOL | ||
|
||
cp -r $DATADIR/qa_model_repository/graphdef_object_int32_int32 models/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use model "simple_identity" instead to test string inputs.
if client_type == "http": | ||
triton_client = tritonhttpclient.InferenceServerClient("localhost:8000") | ||
else: | ||
triton_client = tritongrpcclient.InferenceServerClient("localhost:8001") | ||
|
||
# Example using BYTES input tensor with utf-8 encoded string that | ||
# has an embedded null character. | ||
null_chars_array = np.array( | ||
["he\x00llo".encode("utf-8") for i in range(16)], dtype=np.object_ | ||
) | ||
null_char_data = null_chars_array.reshape([1, 16]) | ||
identity_inference(triton_client, null_char_data, True) # Using binary data | ||
identity_inference(triton_client, null_char_data, False) # Using JSON data | ||
|
||
# Example using BYTES input tensor with 16 elements, where each | ||
# element is a 4-byte binary blob with value 0x00010203. Can use | ||
# dtype=np.bytes_ in this case. | ||
bytes_data = [b"\x00\x01\x02\x03" for i in range(16)] | ||
np_bytes_data = np.array(bytes_data, dtype=np.bytes_) | ||
np_bytes_data = np_bytes_data.reshape([1, 16]) | ||
identity_inference(triton_client, np_bytes_data, True) # Using binary data | ||
identity_inference(triton_client, np_bytes_data, False) # Using JSON data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from client/src/python/examples/simple_http_string_infer_client.py
. It looks like the example demonstrated two ways of preparing string input data. I'll remove one of them.
inputs[0].set_shape([2, 8]) | ||
inputs[1].set_shape([2, 8]) | ||
|
||
with self.assertRaises(InferenceServerException) as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with self.assertRaises(InferenceServerException) as e: | |
# If number of elements (volume) is correct but shape is wrong, the core will return an error. | |
with self.assertRaises(InferenceServerException) as e: |
dff25f4
to
48c9b25
Compare
What does the PR do?
Add client input size check to make sure input shape byte size matches input data byte size.
Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
triton-inference-server/client#742
Where should the reviewer start?
Should look at triton-inference-server/client#742 first.
Test plan:
n/a
17202351
Caveats:
Shared memory byte size checks for string inputs is not implemented.
Background
Stop malformed input request at client side before sending to the server.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Relates to #7171