-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data management API doc refactor #15087
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 task
vtlim
requested changes
Oct 5, 2023
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
vtlim
reviewed
Nov 17, 2023
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
vtlim
approved these changes
Nov 20, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* fix time shifting
…ationWithDefaults` (apache#15317) * + Fix for Flaky Test * + Replacing TreeMap with LinkedHashMap * + Changing data structure from LinkedHashMap to HashMap * Fixed flaky test in S3DataSegmentPusherConfigTest.testSerializationValidatingMaxListingLength * Minor Changes
…` query. (apache#15243) * MSQ generates tombstones honoring the query's granularity. This change tweaks to only account for the infinite-interval tombstones. For finite-interval tombstones, the MSQ query granualrity will be used which is consistent with how MSQ works. * more tests and some cleanup. * checkstyle * comment edits * Throw TooManyBuckets fault based on review; add more tests. * Add javadocs for both methods on reconciling the methods. * review: Move testReplaceTombstonesWithTooManyBucketsThrowsException to MsqFaultsTest * remove unused imports. * Move TooManyBucketsException to indexing package for shared exception handling. * lower max bucket for tests and fixup count * Advance and count the iterator. * checkstyle
Saw bug where MSQ controller task would continue to hold the task slot even after cancel was issued. This was due to a deadlock created on work launch. The main thread was waiting for tasks to spawn and the cancel thread was waiting for tasks to finish. The fix was to instruct the MSQWorkerTaskLauncher thread to stop creating new tasks which would enable the main thread to unblock and release the slot. Also short circuited the taskRetriable condition. Now the check is run in the MSQWorkerTaskLauncher thread as opposed to the main event thread loop. This will result in faster task failure in case the task is deemed to be non retriable.
* Document segment metadata cache behaviour * Fix typo * Minor update * Minor change
…n` by changing string to key:value pair (apache#15207) * Fix capacity response in mm-less ingestion (apache#14888) Changes: - Fix capacity response in mm-less ingestion. - Add field usedClusterCapacity to the GET /totalWorkerCapacity response. This API should be used to get the total ingestion capacity on the overlord. - Remove method `isK8sTaskRunner` from interface `TaskRunner` * Using Map to perform comparison * Minor Change --------- Co-authored-by: George Shiqi Wu <[email protected]>
There is a problem with Quantiles sketches and KLL Quantiles sketches. Queries using the histogram post-aggregator fail if: - the sketch contains at least one value, and - the values in the sketch are all equal, and - the splitPoints argument is not passed to the post-aggregator, and - the numBins argument is greater than 2 (or not specified, which leads to the default of 10 being used) In that case, the query fails and returns this error: { "error": "Unknown exception", "errorClass": "org.apache.datasketches.common.SketchesArgumentException", "host": null, "errorCode": "legacyQueryException", "persona": "OPERATOR", "category": "RUNTIME_FAILURE", "errorMessage": "Values must be unique, monotonically increasing and not NaN.", "context": { "host": null, "errorClass": "org.apache.datasketches.common.SketchesArgumentException", "legacyErrorCode": "Unknown exception" } } This behaviour is undesirable, since the caller doesn't necessarily know in advance whether the sketch has values that are diverse enough. With this change, the post-aggregators return [N, 0, 0...] instead of crashing, where N is the number of values in the sketch, and the length of the list is equal to numBins. That is what they already returned for numBins = 2. Here is an example of a query that would fail: {"queryType":"timeseries", "dataSource": { "type": "inline", "columnNames": ["foo", "bar"], "rows": [ ["abc", 42.0], ["def", 42.0] ] }, "intervals":["0000/3000"], "granularity":"all", "aggregations":[ {"name":"the_sketch", "fieldName":"bar", "type":"quantilesDoublesSketch"}], "postAggregations":[ {"name":"the_histogram", "type":"quantilesDoublesSketchToHistogram", "field":{"type":"fieldAccess","fieldName":"the_sketch"}, "numBins": 3}]} I believe this also fixes issue apache#10585.
Fixing outdated query from deep storage docs.
…pache#14995) * Prevent a race that may cause multiple attempts to publish segments for the same sequence
Co-authored-by: 317brian <[email protected]>
The TaskQueue maintains a map of active task ids to tasks, which can be utilized to get active task payloads, before falling back to the metadata store.
Fixed the following flaky tests: org.apache.druid.math.expr.ParserTest#testApplyFunctions org.apache.druid.math.expr.ParserTest#testSimpleMultiplicativeOp1 org.apache.druid.math.expr.ParserTest#testFunctions org.apache.druid.math.expr.ParserTest#testSimpleLogicalOps1 org.apache.druid.math.expr.ParserTest#testSimpleAdditivityOp1 org.apache.druid.math.expr.ParserTest#testSimpleAdditivityOp2 The above mentioned tests have been reported as flaky (tests assuming deterministic implementation of a non-deterministic specification ) when ran against the NonDex tool. The tests contain assertions (Assertion 1 & Assertion 2) that compare an ArrayList created from a HashSet using the ArrayList() constructor with another List. However, HashSet does not guarantee the ordering of elements and thus resulting in these flaky tests that assume deterministic implementation of HashSet. Thus, when the NonDex tool shuffles the HashSet elements, it results in the test failures: Co-authored-by: ythorat2 <[email protected]>
This patch introduces a param snapshotTime in the iceberg inputsource spec that allows the user to ingest data files associated with the most recent snapshot as of the given time. This helps the user ingest data based on older snapshots by specifying the associated snapshot time. This patch also upgrades the iceberg core version to 1.4.1
…d json_query (apache#15320) * support dynamic expressions for path arguments for json_value and json_query
* reset spec before looking for tile * improve logging * log screenshots * get and log jpeg * other test tidy up
* Make numCorePartitions as 0 in the TombstoneShardSpec. * fix up test * Add tombstone core partition tests * review comment * Need to register the test shard type to make jackson happy
…g-segment retry bug. (apache#15260) * Fix NPE caused by realtime segment closing race, fix possible missing-segment retry bug. Fixes apache#12168, by returning empty from FireHydrant when the segment is swapped to null. This causes the SinkQuerySegmentWalker to use ReportTimelineMissingSegmentQueryRunner, which causes the Broker to look for the segment somewhere else. In addition, this patch changes SinkQuerySegmentWalker to acquire references to all hydrants (subsegments of a sink) at once, and return a ReportTimelineMissingSegmentQueryRunner if *any* of them could not be acquired. I suspect, although have not confirmed, that the prior behavior could lead to segments being reported as missing even though results from some hydrants were still included. * Some more test coverage.
yashdeep97
added a commit
to yashdeep97/druid
that referenced
this pull request
Dec 1, 2023
Co-authored-by: Victoria Lim <[email protected]> Co-authored-by: George Shiqi Wu <[email protected]> Co-authored-by: 317brian <[email protected]> Co-authored-by: ythorat2 <[email protected]> Co-authored-by: Krishna Anandan <[email protected]> Co-authored-by: Vadim Ogievetsky <[email protected]> Co-authored-by: Abhishek Radhakrishnan <[email protected]> Co-authored-by: Karan Kumar <[email protected]> Co-authored-by: Rishabh Singh <[email protected]> Co-authored-by: Magnus Henoch <[email protected]> Co-authored-by: AmatyaAvadhanula <[email protected]> Co-authored-by: Charles Smith <[email protected]> Co-authored-by: Yashdeep Thorat <[email protected]> Co-authored-by: Atul Mohan <[email protected]> Co-authored-by: Clint Wylie <[email protected]> Co-authored-by: Gian Merlino <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Refactor the data management API documentation.
This PR has: