Skip to content

Latest commit

 

History

History
121 lines (99 loc) · 5.11 KB

README.md

File metadata and controls

121 lines (99 loc) · 5.11 KB

License

Triton Java API

This is a Triton Java API contributed by Alibaba Cloud PAI Team. It's based on Triton's HTTP/REST Protocols and for both easy of use and performance.

This Java API mimics Triton's official Python API. It has similar classes and methods.

  • triton.client.InferInput describes each input to model.
  • triton.client.InferRequestedOutput describes each output from model.
  • triton.client.InferenceServerClient is the main inference class.

Currently the Java API supports only a subset of the entire Triton protocol.

A minimal example would be like:

package triton.client.example;

import java.util.Arrays;
import java.util.List;

import com.google.common.collect.Lists;
import triton.client.InferInput;
import triton.client.InferRequestedOutput;
import triton.client.InferResult;
import triton.client.InferenceServerClient;
import triton.client.pojo.DataType;

public class MinExample {
    public static void main(String[] args) throws Exception {
        boolean isBinary = true;
        InferInput inputIds = new InferInput("input_ids", new long[] {1L, 32}, DataType.INT32);
        int[] inputIdsData = new int[32];
        Arrays.fill(inputIdsData, 1); // fill with some data.
        inputIds.setData(inputIdsData, isBinary);

        InferInput inputMask = new InferInput("input_mask", new long[] {1, 32}, DataType.INT32);
        int[] inputMaskData = new int[32];
        Arrays.fill(inputMaskData, 1);
        inputMask.setData(inputMaskData, isBinary);

        InferInput segmentIds = new InferInput("segment_ids", new long[] {1, 32}, DataType.INT32);
        int[] segmentIdsData = new int[32];
        Arrays.fill(segmentIdsData, 0);
        segmentIds.setData(segmentIdsData, isBinary);
        List<InferInput> inputs = Lists.newArrayList(inputIds, inputMask, segmentIds);
        List<InferRequestedOutput> outputs = Lists.newArrayList(new InferRequestedOutput("logits", isBinary));

        InferenceServerClient client = new InferenceServerClient("0.0.0.0:8000", 5000, 5000);
        InferResult result = client.infer("roberta", inputs, outputs);
        float[] logits = result.getOutputAsFloat("logits");
        System.out.println(Arrays.toString(logits));
    }
}

Supported and Unsupported Java client features

Supported Java client features:

HTTP client is supported with limited capability. Currently supported:

  • Synchronous inference requests

GRPC has very limited support. Please see grpc generated Java client for details

Unsupported Java client features:

GRPC client:

  • Full feature Java GRPC client and corresponding tests

HTTP client:

  1. Asynchronous inference requests
  2. Streaming inference requests
  3. SSL or HTTPS protocol communications
  4. Requesting/Receiving Server Metadata Information
  5. Requesting/Receiving Model Metadata Information
  6. Requesting/Receiving Model Inference Statistics
  7. Sending inference requests using Shared Memory (System, GPU)
  8. Sending multiple synchronous inferences on server
  9. Extensions are not supported

Building Java Examples

The Java examples can be found in examples folder. To compile these examples, simply run:

$ cd client/src/java
$ mvn clean install -Ddir=examples

Then you will be able to find the examples located in your target folder: examples and the compiled jar at target/java-api-0.0.1.jar.