Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detectors with whole doc chunker and other chunker not returning results #271

Open
evaline-ju opened this issue Dec 31, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@evaline-ju
Copy link
Collaborator

evaline-ju commented Dec 31, 2024

Describe the bug

Given a combination of detectors, at least one of which depends on the whole document chunker (whole_doc_chunker) and another which depends on another chunker (e.g. a sentence tokenizer) in a streaming generation with classification request, currently returns an empty result.

Instead with the current aggregation strategy, we would expect the result to not appear "streaming" and just return like a unary result, since the whole document chunker has to process/return the entire text.

Update: This may have implications on any use of any combination of differing types of chunkers

Sample Code

Config with two detectors example

detectors:
    detector_1:
        type: text_contents
        service:
            hostname: detector1.com
            port: 443
        chunker_id: sentence_chunker
        default_threshold: 0.5
    detector_2:
        type: text_contents
        service:
            hostname: detector2.com
            port: 443
        chunker_id: whole_doc_chunker

Call example

curl -v 'POST' \
  'http://localhost:8033/api/v1/task/server-streaming-classification-with-text-generation' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model_id": "my-llm",
  "inputs": "Tell me a story about a dog",
  "guardrail_config": {
    "output": {
      "masks": [],
      "models": {"detector_1":{}, "detector_2": {}}
    },
    "input": {
      "models": {}
    }
  },
  "text_gen_parameters": {
    "max_new_tokens": 99,
    "min_new_tokens": 2,
  }
}'

Expected behavior

Results that look "unary" i.e. whole document results are returned

Observed behavior

No results

@evaline-ju evaline-ju added the bug Something isn't working label Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant