Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Queries with string and keyword fields returning incorrect results #3215

Open
skumarp7 opened this issue Dec 19, 2024 · 5 comments
Open
Labels
bug Something isn't working

Comments

@skumarp7
Copy link

We have enabled opensearch-sql plugin in the Opensearch so that we can run SQL like queries on opensearch data .

When we run following command which is very basic, we get following error:

curl -XPOST _plugins/_sql  -d '{"query": "SELECT bundleName FROM log-30-08-2024 WHERE (requestType=\"TERMINATION_REQUEST\") group by  bundleName order by bundleName"}'
{
  "error":
 
{     "reason": "Error occurred in OpenSearch engine: all shards failed",     "details": "Shard[0]: java.lang.IllegalArgumentException: Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [bundleName] in order to load field data by uninverting the inverted index. Note that this can use significant memory.\nShard[1]: java.lang.IllegalArgumentException: Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [bundleName] in order to load field data by uninverting the inverted index. Note that this can use significant memory.\nShard[2]: java.lang.IllegalArgumentException: Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [bundleName] in order to load field data by uninverting the inverted index. Note that this can use significant memory.\n\nFor more details, please send request for Json format to see the raw response from OpenSearch engine.",     "type": "SearchPhaseExecutionException"   }
 
,
  "status": 400
}
 

How to decide what should be the data type of the field, because after changing to keyword also some queries are not running and fetching relevent fields and its value, e.g.

curl -XPOST _plugins/_sql -d '{"query": "SELECT  sessionId FROM log-30-08-2024"}'
{
  "schema": [
{       "name": "sessionId",       "type": "keyword"     }
  ],
  "datarows": [],
  "total": 0,
  "size": 0,
  "status": 200
}

Note that for the above query, there are records with sessionId field.

curl -XPOST _plugins/_sql  -d '{"query": "SELECT  sessionId,  bundleName FROM log-30-08-2024 WHERE (bundleName=\"test_001\") group by  bundleName order by  bundleName"}'   

{
  "schema": [
{       "name": "sessionId",       "type": "keyword"     }
,
{       "name": "bundleName",       "type": "keyword"     }
  ],
  "datarows": [
    [
      null,
      "test_001"
    ]
  ],
  "total": 1,
  "size": 1,
  "status": 200
}
 

Ideally this plugin should be able to excute queries with text type field also.
Is there any limitations to execute queries from this opensearch-sql plugin?

@skumarp7 skumarp7 added bug Something isn't working untriaged labels Dec 19, 2024
@Swiddis Swiddis changed the title [BUG] [BUG] Queries with string and keyword fields returning incorrect results Dec 19, 2024
@Swiddis
Copy link
Collaborator

Swiddis commented Dec 19, 2024

Please follow the bug template to help prevent churn -- specifically of interest here are the cluster environment, and the index configuration as a repro step.

@Swiddis Swiddis removed the untriaged label Dec 19, 2024
@acarbonetto
Copy link
Collaborator

@skumarp7 text fields are not optimized for search. Please see https://opensearch.org/docs/latest/field-types/supported-field-types/string/ for a description of the different mapping string types.

SELECT  sessionId FROM log-30-08-2024

If sessionId is a keyword field, this query will return all records/documents with sessionId in the index log-30-08-2024. Is it possible there's a problem with your index mapping?

@skumarp7
Copy link
Author

skumarp7 commented Jan 6, 2025

Hi @acarbonetto ,

Below is my sample index mapping:

{
  "cdr-log-30-08-2024": {
    "mappings": {
      "_source": {
        "enabled": false
      },
      "properties": {
        "Code": {
          "type": "text",
          "store": true
        },
        "Id": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "RecordTimestamp": {
          "type": "date",
          "store": true,
          "format": "dd-MM-yyyy HH:mm:ss"
        },
        "ServiceId": {
          "type": "keyword",
          "index": false,
          "store": true
        },
        "IndexData": {
          "type": "text",
          "index": false,
          "store": true
        },
        "accessName": {
          "type": "text",
          "index": false,
          "store": true
        },
        "accountID": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "accountName": {
          "type": "text",
          "index": false,
          "store": true
        },
        "sessionDuration": {
          "type": "text",
          "index": false,
          "store": true
        },
        "sessionId": {
          "type": "keyword",
          "index": false,
          "store": true
        },
        "sessionId2": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "sessionId3": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

As per recommendation, "sessionId" field was set to "keyword" and not text. Even after the change, below query did not return any values:

SELECT sessionId FROM log-30-08-2024

We have enough records with sessionId field with valid value mapped to it:

Ex:

{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "cdr-log-30-08-2024",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : { .....,
          "sessionId" : [
            "test.abc001.abc219.test.org"
          ], ......
          ]
        }
      }
    ]
  }
}

Are we missing something in the configuration. This looks more like a issue from the plugin itself.

@LantaoJin
Copy link
Member

I can get nothing when the _source set to enabled=false

The reproduce:

PUT /ltjin-test1
{
    "mappings":{
      "_source": {
        "enabled": false
      },
      "properties": {
        "accountName": {
          "type": "text",
          "index": false,
          "store": true
        },
        "sessionId": {
          "type": "keyword",
          "index": false,
          "store": true
        }
      }
    }
}

POST _bulk
{ "index" : { "_index" : "ltjin-test1", "_id" : "1" } }
{ "accountName": "a", "sessionId":"id1" }
{ "index" : { "_index" : "ltjin-test1", "_id" : "2" } }
{ "accountName": "b", "sessionId":"id2" }
{ "index" : { "_index" : "ltjin-test1", "_id" : "3" } }
{ "accountName": "c", "sessionId":"id3" }
{ "index" : { "_index" : "ltjin-test1", "_id" : "4" } }
{ "accountName": "d", "sessionId":"id4" }

POST /_plugins/_sql/
{
    "query": """
    SELECT accountName, sessionId FROM ltjin-test1;
    """
}

return

{
  "schema": [
    {
      "name": "accountName",
      "type": "text"
    },
    {
      "name": "sessionId",
      "type": "keyword"
    }
  ],
  "datarows": [],
  "total": 0,
  "size": 0,
  "status": 200
}

But it works when set to enabled=true (default settings)

@LantaoJin
Copy link
Member

LantaoJin commented Jan 10, 2025

In the latest code base (3.0), when _source disabled, the query will throw

Caused by: java.lang.IllegalArgumentException: unable to fetch fields from _source field: _source is disabled in the mappings for index [ltjin-test1]
at org.opensearch.search.fetch.subphase.FetchSourcePhase.hitExecute(FetchSourcePhase.java:86) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.search.fetch.subphase.FetchSourcePhase$1.process(FetchSourcePhase.java:73) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants