Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect BM25Encoder values returned for query and document. #85

Open
2 tasks done
clive-eltropy opened this issue Oct 2, 2024 · 0 comments
Open
2 tasks done
Labels
bug Something isn't working

Comments

@clive-eltropy
Copy link

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

when I try to get sparse vectors using encode_documents and encode_queries for the same piece of text is gives different values.

piece to text : "the lazy dog"
encode_documents values : 0.58, 0.58
encode_queries: 0.5

Expected Behavior

Getting different values for encode_documents and encode encode_queries for the same piece of text. expecting values should be 0.5 for both right but there is ~0.08 difference, am I missing something?

Steps To Reproduce

    from pinecone_text.sparse import BM25Encoder
    
    corpus = ["The quick brown fox jumps over the lazy dog", "The lazy dog is brown"]

    bm25 = BM25Encoder()
    bm25.fit(corpus)

    print(bm25.encode_documents("the lazy dog")) 
    ### Output: {'indices': [226376294, 2982218203], 'values': [0.5882352941176472, 0.5882352941176472]}
    
    print(bm25.encode_queries("the lazy dog"))
    ### Output: {'indices': [226376294, 2982218203], 'values': [0.5, 0.5]}

Relevant log output

No response

Environment

OS: Ubuntu 20.04
Python 3.9.12
pinecone-text==0.9.0

Additional Context

No response

@clive-eltropy clive-eltropy added the bug Something isn't working label Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant