Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate context from Request Logging #17582

Merged
merged 2 commits into from
Dec 19, 2024

Conversation

ashwintumma23
Copy link
Contributor

Description

Fixed the bug ...

  • Currently, for SQL queries, whenever we output the Request Log, we are printing the context map twice on the same log line, once in the query stats map, and other during the request log line's queryContext.
  • This causes unnecessary duplication of data in the request log, and also increases the volume of data that we send to our logging systems like Splunk, or File based loggers.
  • Steps to Reproduce:
druid.request.logging.type=file
druid.request.logging.dir=/tmp/druidLogs
  • The request logger will output lines as:
2024-12-10T23:41:10.909Z	127.0.0.1		{"sqlQuery/time":440,"sqlQuery/planningTimeMs":357,"sqlQuery/bytes":326208,"success":true,"identity":"allowAll","context":{"sqlQueryId":"95f73480-0916-48c9-bd1c-5c24a439c528","sqlOuterLimit":1001,"maxNumTasks":2,"priority":1,"sqlStringifyArrays":false,"queryId":"95f73480-0916-48c9-bd1c-5c24a439c528","nativeQueryIds":"[95f73480-0916-48c9-bd1c-5c24a439c528]"}}	{"query":"SELECT *\nFROM \"wikipedia\"","context":{"sqlQueryId":"95f73480-0916-48c9-bd1c-5c24a439c528","sqlOuterLimit":1001,"maxNumTasks":2,"priority":1,"sqlStringifyArrays":false,"queryId":"95f73480-0916-48c9-bd1c-5c24a439c528","nativeQueryIds":"[95f73480-0916-48c9-bd1c-5c24a439c528]"}}
  • Formatted JSON looks like, which shows that we have the same information being printed on the same log line:
{
  "sqlQuery/time": 440,
  "sqlQuery/planningTimeMs": 357,
  "sqlQuery/bytes": 326208,
  "success": true,
  "identity": "allowAll",
  "context": {
    "sqlQueryId": "95f73480-0916-48c9-bd1c-5c24a439c528",
    "sqlOuterLimit": 1001,
    "maxNumTasks": 2,
    "priority": 1,
    "sqlStringifyArrays": false,
    "queryId": "95f73480-0916-48c9-bd1c-5c24a439c528",
    "nativeQueryIds": "[95f73480-0916-48c9-bd1c-5c24a439c528]"
  }
}

{
  "query": "SELECT *\nFROM \"wikipedia\"",
  "context": {
    "sqlQueryId": "95f73480-0916-48c9-bd1c-5c24a439c528",
    "sqlOuterLimit": 1001,
    "maxNumTasks": 2,
    "priority": 1,
    "sqlStringifyArrays": false,
    "queryId": "95f73480-0916-48c9-bd1c-5c24a439c528",
    "nativeQueryIds": "[95f73480-0916-48c9-bd1c-5c24a439c528]"
  }
}
  • Proposal: Remove the context from the statsMap, and print it only in the request log line. The new log line will look like:
{
  "sqlQuery/time": 440,
  "sqlQuery/planningTimeMs": 357,
  "sqlQuery/bytes": 326208,
  "success": true,
  "identity": "allowAll"
}

{
  "query": "SELECT *\nFROM \"wikipedia\"",
  "context": {
    "sqlQueryId": "95f73480-0916-48c9-bd1c-5c24a439c528",
    "sqlOuterLimit": 1001,
    "maxNumTasks": 2,
    "priority": 1,
    "sqlStringifyArrays": false,
    "queryId": "95f73480-0916-48c9-bd1c-5c24a439c528",
    "nativeQueryIds": "[95f73480-0916-48c9-bd1c-5c24a439c528]"
  }
}
  • 2019-01-14T10:00:00.000Z 127.0.0.1 {"sqlQuery/time":100, "sqlQuery/planningTimeMs":10, "sqlQuery/bytes":600, "success":true, "identity":"user1"} {"query":"SELECT page, COUNT(*) AS Edits FROM wikiticker WHERE TIME_IN_INTERVAL(\"__time\", '2015-09-12/2015-09-13') GROUP BY page ORDER BY Edits DESC LIMIT 10","context":{"sqlQueryId":"c9d035a0-5ffd-4a79-a865-3ffdadbb5fdd","nativeQueryIds":"[490978e4-f5c7-4cf6-b174-346e63cf8863]"}}
    documentation already captures this behavior.

Release note

  • Changed: The SQL request logger now does not print the query context twice. Previously, it printed query context twice on the same log line, once in the query statistics, and other in the request log line.

Key changed/added classes in this PR
  • SqlExecutionReporter.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@kfaraz
Copy link
Contributor

kfaraz commented Dec 18, 2024

@ashwintumma23 , there seem to be some unit test failures. Could you please take a look?

@ashwintumma23
Copy link
Contributor Author

@ashwintumma23 , there seem to be some unit test failures. Could you please take a look?

Thanks for the review, @kfaraz ; yup, working on fixing the Unit Test Failures. Will send in the commits.

@ashwintumma23
Copy link
Contributor Author

Hi @kfaraz,
Fixed the unit test to read from SQLContext instead of the context embedded in the StatsMap. Looks like the integration tests are failing on connections and there is an open PR right now attempting to fix it.
Thanks!

@kfaraz
Copy link
Contributor

kfaraz commented Dec 19, 2024

Thanks, @ashwintumma23 !

@kfaraz kfaraz merged commit f7c2c0a into apache:master Dec 19, 2024
75 of 77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants