Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building #15817

Merged
merged 281 commits into from
Apr 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
281 commits
Select commit Hold shift + click to select a range
33a8dd5
Remove enabled field from SegmentMetadataCacheConfig
findingrish Sep 12, 2023
7a7ca55
Add class to manage druid table information in SegmentMetadataCache, …
findingrish Sep 13, 2023
eb6a145
Merge remote-tracking branch 'origin/master' into coordinator_builds_…
findingrish Sep 13, 2023
b9fb83d
Minor refactoring in SegmentMetadataCache
findingrish Sep 13, 2023
aa2bfe7
Make SegmentMetadataCache generic
findingrish Sep 13, 2023
e97dcda
Add a generic abstract class for segment metadata cache
findingrish Sep 13, 2023
7badce1
Rename SegmentMetadataCache to CoordinatorSegmentMetadataCache
findingrish Sep 13, 2023
25cdce6
Rename PhysicalDataSourceMetadataBuilder to PhysicalDataSourceMetadat…
findingrish Sep 13, 2023
5f5ad18
Fix json property key name in DataSourceInformation
findingrish Sep 13, 2023
08e949e
Add validation in MetadataResource#getAllUsedSegments, update javadocs
findingrish Sep 14, 2023
80fc09d
Minor changes
findingrish Sep 14, 2023
4217cd8
Minor change
findingrish Sep 14, 2023
8b7e483
Merge remote-tracking branch 'upstream/master' into coordinator_build…
findingrish Sep 14, 2023
d6ac350
Update base property name for query config classes in Coordinator
findingrish Sep 14, 2023
533236b
Log ds schema change when polling from coordinator
findingrish Sep 15, 2023
70f0888
update the logic to determine is_active status in segments table for …
findingrish Sep 15, 2023
a176bfe
Merge remote-tracking branch 'upstream/master' into coordinator_build…
findingrish Sep 15, 2023
b32dfd6
Update the logic to set numRows in the sys segments table, add comments
findingrish Sep 15, 2023
17417b5
Rename config druid.coordinator.segmentMetadataCache.enabled to druid…
findingrish Sep 15, 2023
6a395a9
Merge remote-tracking branch 'upstream/master' into coordinator_build…
findingrish Sep 18, 2023
907ace3
Report cache init time irrespective of the awaitInitializationOnStart…
findingrish Sep 20, 2023
cf68c38
Merge remote-tracking branch 'upstream/master' into coordinator_build…
findingrish Sep 20, 2023
441f37a
Report metric for fetching schema from coordinator
findingrish Sep 20, 2023
bd5b048
Add auth check in api to return dataSourceInformation, report metrics…
findingrish Sep 21, 2023
933d8d1
Fix bug in Coordinator api to return dataSourceInformation
findingrish Sep 21, 2023
9e7e364
Minor change
findingrish Sep 21, 2023
e7356ce
Merge remote-tracking branch 'upstream/master' into coordinator_build…
findingrish Sep 22, 2023
5d16148
Address comments around docs, minor renaming
findingrish Sep 23, 2023
d8884be
Remove null check from MetadataResource#getDataSourceInformation
findingrish Sep 23, 2023
0f0805a
Merge remote-tracking branch 'upstream/master' into coordinator_build…
findingrish Sep 23, 2023
e129d3e
Install cache module in Coordinator, if feature is enabled and beOver…
findingrish Sep 25, 2023
b4042c6
Merge remote-tracking branch 'upstream/master' into coordinator_build…
findingrish Sep 29, 2023
01c27c9
Minor change in QueryableCoordinatorServerView
findingrish Sep 29, 2023
87c9873
Remove QueryableCoordinatorServerView, add a new QuerySegmentWalker i…
findingrish Oct 14, 2023
971b347
Merge remote-tracking branch 'upstream/master' into coordinator_build…
findingrish Oct 14, 2023
89d3845
fix build
findingrish Oct 14, 2023
270dbd5
fix build
findingrish Oct 14, 2023
2da23b8
Fix spelling, intellij-inspection, codeql bug
findingrish Oct 14, 2023
6f568a6
undo some changes in CachingClusteredClientTest
findingrish Oct 14, 2023
fe229c0
minor changes
findingrish Oct 14, 2023
473b25c
Fix typo in metric name
findingrish Oct 14, 2023
39fb248
temporarily enable feature on ITs
findingrish Oct 15, 2023
cac695a
fix checkstyle issue
findingrish Oct 15, 2023
eb3e3c1
Changes in CliCoordinator to conditionally add segment metadata cache…
findingrish Oct 15, 2023
30438f4
temporary changes to debug IT failure
findingrish Oct 16, 2023
e88ad00
revert temporary changes in gha
findingrish Oct 16, 2023
61d130b
revert temporary changes to run ITs with this feature
findingrish Oct 16, 2023
2e4c45b
update docs with the config for enabling feature
findingrish Oct 16, 2023
255cf2c
update docs with the config for enabling feature
findingrish Oct 16, 2023
1a6dfc5
Add IT for the feature
findingrish Oct 17, 2023
a961501
Merge remote-tracking branch 'upstream/master' into coordinator_build…
findingrish Oct 17, 2023
2e65726
Merge branch 'coordinator_builds_ds_schema' of github.com:findingrish…
findingrish Oct 17, 2023
4b51c42
Changes in BrokerSegmentMetadataCache to poll schema for all the loca…
findingrish Oct 20, 2023
a4e2097
Address review comments
findingrish Oct 26, 2023
3ca03c9
Merge remote-tracking branch 'upstream/master' into coordinator_build…
findingrish Oct 26, 2023
902abd3
Run DruidSchemaInternRowSignatureBenchmark using BrokerSegmentMetadat…
findingrish Oct 26, 2023
bcab458
Address feedback
findingrish Oct 26, 2023
32a4065
Simplify logic for setting isRealtime in sys segments table
findingrish Oct 26, 2023
6b04ee7
Remove forbidden api invocation
findingrish Oct 26, 2023
cb93e43
Merge remote-tracking branch 'upstream/master' into coordinator_build…
findingrish Oct 26, 2023
152c480
Debug log when coordinator poll fails
findingrish Oct 26, 2023
80c6d26
Fix CoordinatorSegmentMetadataCacheTest
findingrish Oct 27, 2023
9cca98e
Merge remote-tracking branch 'upstream/master' into coordinator_build…
findingrish Oct 28, 2023
bb69cde
Minor changes
findingrish Oct 28, 2023
a604e65
Initial class for persisting and retrieving segment schema
findingrish Nov 6, 2023
f2d8a5e
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Nov 6, 2023
37ac4a7
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Dec 12, 2023
629f3ad
fix conflicts
findingrish Dec 12, 2023
af34c34
Revert changes in SegmentLoadInfo
findingrish Dec 12, 2023
4a4f1b0
Remove segmentSchemaMapping table
findingrish Dec 12, 2023
b142b29
Initial commit
findingrish Dec 15, 2023
035a884
Outline
findingrish Jan 10, 2024
7935349
Revert some changes
findingrish Jan 10, 2024
26ee22f
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Jan 10, 2024
1555592
minor changes
findingrish Jan 10, 2024
ccd2c42
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Jan 10, 2024
900dad5
Minor changes
findingrish Jan 10, 2024
45423fc
Add kill task, update schema persist code
findingrish Jan 14, 2024
33cfb75
Initial task side changes to publish schema
findingrish Jan 15, 2024
6c7e0d5
Changes to publish schema for streaming task
findingrish Jan 21, 2024
85594a7
Update schema table
findingrish Jan 22, 2024
72f671c
Fix build
findingrish Jan 23, 2024
ecc8fc4
Changes to publish realtime segment schema, poll, cache and use it fo…
findingrish Jan 31, 2024
566d09b
Publish schema for batch task
findingrish Feb 6, 2024
c81b90b
Remove design file
findingrish Feb 6, 2024
3fb3da5
Add MinimalSegmentSchemas class to pass segment schema from task to o…
findingrish Feb 8, 2024
8c7bd0e
Fix build
findingrish Feb 9, 2024
9fc3f0f
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Feb 9, 2024
6208486
Revert some changes
findingrish Feb 9, 2024
96ec515
Fix build KillUnusedSegmentsTaskTest
findingrish Feb 9, 2024
3f7e9c0
Fix intellij-inspection failures
findingrish Feb 9, 2024
f9dde90
checkstyle
findingrish Feb 9, 2024
034d1ef
Add more UTs
findingrish Feb 12, 2024
ee303da
Add test to verify changes in CoordinatorSegmentMetadataCache
findingrish Feb 13, 2024
1d1e983
Fix test failures in indexing and server module
findingrish Feb 13, 2024
500c375
Fix checkstyle
findingrish Feb 14, 2024
d9bc9bc
Fix DatasourceOptimizerTest
findingrish Feb 14, 2024
9e89aed
Fix ConcurrentReplaceAndAppendTest
findingrish Feb 14, 2024
6210504
Revert some change
findingrish Feb 14, 2024
a0d2886
Fix KinesisIndexTaskTest
findingrish Feb 14, 2024
88826f7
Test to up the coverage
findingrish Feb 14, 2024
8e7539e
Minor change
findingrish Feb 14, 2024
a3343ff
checkstyle
findingrish Feb 14, 2024
9852f9a
Fix test
findingrish Feb 15, 2024
fbf1189
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Feb 15, 2024
d1f3180
Create schema table and alter segments table only when feature is ena…
findingrish Feb 16, 2024
ffc8e06
Fix test
findingrish Feb 17, 2024
cede729
Run KillUnreferencedSegmentSchemas duty only when the feature is enabled
findingrish Feb 19, 2024
264b19e
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Feb 19, 2024
052d026
Fix sql based ingestion
findingrish Feb 19, 2024
96c05b1
Fix issues in passing feature flag to task
findingrish Feb 20, 2024
17ca156
Temp changes
findingrish Feb 20, 2024
0b49546
Retrigger build
findingrish Feb 20, 2024
5607196
MSQ side changes to publish segment schema
findingrish Feb 21, 2024
ab5c0cf
Fix bug in segment table creation
findingrish Feb 21, 2024
f417abb
checkstyle
findingrish Feb 21, 2024
0ac4c83
Build segment schema before the directory is removed
findingrish Feb 21, 2024
d8cd78e
Remove logging
findingrish Feb 21, 2024
b66a1f0
Fix centralized-datasource-schema IT
findingrish Feb 21, 2024
b1b9529
Temp changes
findingrish Feb 22, 2024
e8a6d9b
run all IT with the feature
findingrish Feb 22, 2024
7cf889d
test changes
findingrish Feb 23, 2024
e0e97ba
Add docs, checkstyle, minor code changes
findingrish Feb 25, 2024
9175e93
Revert changes in integration-tests-ex
findingrish Feb 25, 2024
aa52df0
Revert changes in examples
findingrish Feb 25, 2024
9fdeedd
Revert change in MetadataStorageUpdaterJobSpec
findingrish Feb 25, 2024
fd4e9f6
Revert changes in integration-tests environment configs
findingrish Feb 25, 2024
e86131c
Minor code changes
findingrish Feb 26, 2024
5d55539
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Mar 1, 2024
5a19650
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Mar 5, 2024
348a72c
CoordinatorSegmentMetadataCache refresh is executed only on leader node
findingrish Mar 5, 2024
5e9e58f
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Mar 11, 2024
f2e9488
Add debug logging
findingrish Mar 11, 2024
564e038
Fix build
findingrish Mar 12, 2024
88b0146
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Mar 12, 2024
2ad6639
Fix build
findingrish Mar 12, 2024
c7980b0
Add exception handling in /druid/coordinator/v1/metadata/segments api
findingrish Mar 14, 2024
39f9328
Fix bug where existing schema is not being associated with segment
findingrish Mar 14, 2024
924268d
checkstyle
findingrish Mar 14, 2024
e14ab29
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Mar 14, 2024
ca33d8b
Fix build
findingrish Mar 14, 2024
cfece06
Fix build
findingrish Mar 14, 2024
e103336
Fix IndexerSQLMetadataStorageCoordinatorTest#testCommitReplaceSegment…
findingrish Mar 14, 2024
9875c44
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Mar 14, 2024
634ced2
Fix DataSegmentPlusTest failure
findingrish Mar 14, 2024
43f1850
Add logs when schemaId or numRows is null in the cache
findingrish Mar 14, 2024
12e36d0
Fix: Schedule schemaBackfillExecutor at fixed rate
findingrish Mar 15, 2024
c65b2f2
Fix schema generation for IndexTask
findingrish Mar 18, 2024
b6b578b
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Mar 18, 2024
8d962da
Add test config to disable segmentMetadataQuery in Coordinator and sc…
findingrish Mar 19, 2024
2723796
Update schema poll logic: Poll schema id greater than max schema id p…
findingrish Mar 19, 2024
5969c60
Merge branch 'master' into coordinator_schema_read_write
findingrish Mar 19, 2024
78d760b
Add 2 new IT profiles to test CentralizedDatasourceSchema feature
findingrish Mar 19, 2024
0fa4f8d
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Mar 19, 2024
bf20a25
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Mar 19, 2024
9f78e1d
Use AtomicReference for latestSchemaId, handle potential race with th…
findingrish Mar 21, 2024
8816298
Fix bug where segment schema is cleared when segment is loaded on mul…
findingrish Mar 21, 2024
ac77102
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Mar 21, 2024
21a7320
Replace pair of segments and schema with SegmentAndSchemas, SegmentAn…
findingrish Mar 21, 2024
0963fca
Fix IndexerSqlMetadataStorageCoordinatorSchemaPersistenceTest#testCom…
findingrish Mar 21, 2024
80c88de
Add test to verify the behaviour when segments is present on multiple…
findingrish Mar 21, 2024
f783636
Update CoordinatorSegmentMetadataCache startup logic
findingrish Mar 22, 2024
0786a6a
Fix test
findingrish Mar 22, 2024
3415418
Run CoordinatorSegmentMetadataCache#refresh only on leader node
findingrish Mar 25, 2024
873f373
Trigger datasource schema polling from Coordinator in each refresh cy…
findingrish Mar 25, 2024
e3736af
Add test for ParallelIndexSupervisorTask and CompactionTask. Update s…
findingrish Mar 26, 2024
f4fb2ec
More task side UTs
findingrish Mar 26, 2024
b11ec29
Add custom metrics
findingrish Mar 27, 2024
52e7c05
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Mar 27, 2024
9a7b1b6
Use DruidLeaderSelector instead of DruidCoordinator in CliCoordinator…
findingrish Mar 28, 2024
249313a
Minor changes
findingrish Mar 28, 2024
5acb93d
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Mar 28, 2024
b0ab4d9
Check if constraint exists before altering table
findingrish Apr 1, 2024
1d1c803
Add config to disable foreign key creation
findingrish Apr 3, 2024
cf7901a
Revert "Add config to disable foreign key creation"
findingrish Apr 4, 2024
5f729f6
Revert "Check if constraint exists before altering table"
findingrish Apr 4, 2024
e883918
Add datasource to the schema table, refactor schema kill logic
findingrish Apr 11, 2024
aa1ab56
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Apr 11, 2024
864bc91
Review comments
findingrish Apr 12, 2024
a582417
Minor change
findingrish Apr 12, 2024
6fb906f
Review comments
findingrish Apr 12, 2024
43111e7
Validate CentralizedDatasourceSchemaConfig in broker, coordinator, hi…
findingrish Apr 12, 2024
55cf345
Fix build
findingrish Apr 12, 2024
2b97164
Add version information in schema table
findingrish Apr 14, 2024
03c0c94
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Apr 14, 2024
08d0498
Fix build
findingrish Apr 14, 2024
1b4497d
Fix test failures
findingrish Apr 14, 2024
ff76aeb
codeql comments
findingrish Apr 15, 2024
5c181f6
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Apr 15, 2024
dd46989
Revert msq changes
findingrish Apr 15, 2024
bc0e4e3
Fix build in ControllerImpl
findingrish Apr 15, 2024
b54bcac
Fix ControllerImplTest
findingrish Apr 15, 2024
29107b9
Refactor createSegmentTable code
findingrish Apr 15, 2024
f8f8a97
Update FingerprintGenerator to use Hex#encodeHexString for converting…
findingrish Apr 15, 2024
8a75693
Change schema version type to integer
findingrish Apr 16, 2024
a768dba
Minor code restructuring in IndexerSQlMetadataStorageCoordinator
findingrish Apr 16, 2024
6e333a0
Remove defensive check in SegmentSchemaManager
findingrish Apr 16, 2024
8548f2a
Refactor schema poll lambda in SqlSegmentsMetadataManager
findingrish Apr 16, 2024
edec5e7
Add docs, some nit code changes
findingrish Apr 17, 2024
e4d525d
Minor changes in SegmentSchemaBackFillQueue
findingrish Apr 17, 2024
86cc24c
Move validation logic to ServerRunnable
findingrish Apr 17, 2024
0f7df79
MinimalSegmentSchemas refactoring and other minor changes
findingrish Apr 17, 2024
43b6b96
Rename MinimalSegmentSchemas to SegmentSchemaMapping
findingrish Apr 17, 2024
b57f987
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Apr 17, 2024
08f1c15
Refactor kill schema duty
findingrish Apr 18, 2024
1c66e5d
Minor change
findingrish Apr 18, 2024
86d2d90
Minor change
findingrish Apr 18, 2024
e629f9a
Merge branch 'coordinator_schema_read_write' of github.com:findingris…
findingrish Apr 18, 2024
77b181f
codeql comments
findingrish Apr 18, 2024
bf37982
Rename DataSegmentWithSchemas to DataSegmentsWithSchemas
findingrish Apr 18, 2024
4efe3fb
Update docs
findingrish Apr 18, 2024
7ac88c4
Persist aggregatorFactories in inTransitSMQResult cache
findingrish Apr 18, 2024
5382669
Use schema fingerprint in segments table
findingrish Apr 19, 2024
f3b1711
Add test for changes in ServerRunnable
findingrish Apr 19, 2024
e0db66c
Minor code changes
findingrish Apr 19, 2024
8c29649
Fix SqlMetadataConnectorTest
findingrish Apr 19, 2024
158f124
Refactor code in IndexerSQLMetadataStorageCoordinator
findingrish Apr 19, 2024
5e6dff5
Some more refactoring
findingrish Apr 19, 2024
fdc46cc
Some more refactoring
findingrish Apr 19, 2024
e9dbe3c
Minor change
findingrish Apr 19, 2024
f3545ac
Fix test failure
findingrish Apr 19, 2024
6138338
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Apr 19, 2024
aba03e0
Fix bug in IndexerSQLMetadataStorageCoordinator#commitReplaceSegments
findingrish Apr 19, 2024
55c4ed4
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Apr 21, 2024
28359e8
Update docs
findingrish Apr 21, 2024
7886412
Update segment schema cache metrics
findingrish Apr 21, 2024
14efbac
Temporary change
findingrish Apr 21, 2024
ca571ee
Change schemaMap in SegmentSchemaCache from ConcurrentMap to Immutabl…
findingrish Apr 21, 2024
4277760
Update docs
findingrish Apr 21, 2024
5e7a6a6
Update test
findingrish Apr 21, 2024
86351ae
Fix checkstyle
findingrish Apr 21, 2024
a5827a6
Add test for FingerprintGenerator
findingrish Apr 22, 2024
5383967
Add config validation in CliPeon
findingrish Apr 22, 2024
5bf86be
Initialize schema cache
findingrish Apr 22, 2024
a07eccf
Enable feature in IngestionTestBase
findingrish Apr 22, 2024
0023d24
Fix SqlSegmentsMetadataManagerSchemaPollTest
findingrish Apr 22, 2024
85208f2
Minor code changes
findingrish Apr 22, 2024
11e5ca7
Poll schema for all datasources in the inventory in Broker
findingrish Apr 22, 2024
e78fb65
Tests for coverage
findingrish Apr 23, 2024
bb7e98b
Minor change in SegmentSchemaCache
findingrish Apr 23, 2024
ae43a83
Tests to meet coverage
findingrish Apr 23, 2024
26a6472
Expose a method in AbstractSegmentMetadataCache to fetch aggregators …
findingrish Apr 23, 2024
e65ff4d
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Apr 23, 2024
a05043b
Add a note
findingrish Apr 23, 2024
c44b289
nit changes
findingrish Apr 23, 2024
9033a46
Rename some methods, add null check when accessing cacheExecFuture
findingrish Apr 23, 2024
7e349ce
Enable CentralizedDatasourceSchema config in ParallelIndexSupervisorT…
findingrish Apr 23, 2024
472bce9
Encapsulate finalizedSegmentStats & finalizedSegmentSchema in a class…
findingrish Apr 23, 2024
bc1e131
Fix potential npe while accessing segmentMetadataInfo
findingrish Apr 24, 2024
4adfc77
Merge remote-tracking branch 'upstream/master' into coordinator_schem…
findingrish Apr 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/standard-its.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ jobs:
strategy:
fail-fast: false
matrix:
testing_group: [batch-index, input-format, input-source, perfect-rollup-parallel-batch-index, kafka-index, kafka-index-slow, kafka-transactional-index, kafka-transactional-index-slow, kafka-data-format, ldap-security, realtime-index, append-ingestion, compaction]
testing_group: [batch-index, input-format, input-source, perfect-rollup-parallel-batch-index, kafka-index, kafka-index-slow, kafka-transactional-index, kafka-transactional-index-slow, kafka-data-format, ldap-security, realtime-index, append-ingestion, compaction, cds-task-schema-publish-disabled, cds-coordinator-smq-disabled]
uses: ./.github/workflows/reusable-standard-its.yml
if: ${{ needs.changes.outputs.core == 'true' || needs.changes.outputs.common-extensions == 'true' }}
with:
Expand Down Expand Up @@ -196,6 +196,6 @@ jobs:
with:
build_jdk: 8
runtime_jdk: 8
testing_groups: -DexcludedGroups=batch-index,input-format,input-source,perfect-rollup-parallel-batch-index,kafka-index,query,query-retry,query-error,realtime-index,security,ldap-security,s3-deep-storage,gcs-deep-storage,azure-deep-storage,hdfs-deep-storage,s3-ingestion,kinesis-index,kinesis-data-format,kafka-transactional-index,kafka-index-slow,kafka-transactional-index-slow,kafka-data-format,hadoop-s3-to-s3-deep-storage,hadoop-s3-to-hdfs-deep-storage,hadoop-azure-to-azure-deep-storage,hadoop-azure-to-hdfs-deep-storage,hadoop-gcs-to-gcs-deep-storage,hadoop-gcs-to-hdfs-deep-storage,aliyun-oss-deep-storage,append-ingestion,compaction,high-availability,upgrade,shuffle-deep-store,custom-coordinator-duties,centralized-datasource-schema
testing_groups: -DexcludedGroups=batch-index,input-format,input-source,perfect-rollup-parallel-batch-index,kafka-index,query,query-retry,query-error,realtime-index,security,ldap-security,s3-deep-storage,gcs-deep-storage,azure-deep-storage,hdfs-deep-storage,s3-ingestion,kinesis-index,kinesis-data-format,kafka-transactional-index,kafka-index-slow,kafka-transactional-index-slow,kafka-data-format,hadoop-s3-to-s3-deep-storage,hadoop-s3-to-hdfs-deep-storage,hadoop-azure-to-azure-deep-storage,hadoop-azure-to-hdfs-deep-storage,hadoop-gcs-to-gcs-deep-storage,hadoop-gcs-to-hdfs-deep-storage,aliyun-oss-deep-storage,append-ingestion,compaction,high-availability,upgrade,shuffle-deep-store,custom-coordinator-duties,centralized-datasource-schema,cds-task-schema-publish-disabled,cds-coordinator-smq-disabled
use_indexer: ${{ matrix.indexer }}
group: other
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
import org.apache.druid.query.metadata.metadata.SegmentAnalysis;
import org.apache.druid.segment.column.ColumnType;
import org.apache.druid.segment.join.JoinableFactory;
import org.apache.druid.segment.metadata.CentralizedDatasourceSchemaConfig;
import org.apache.druid.server.QueryLifecycleFactory;
import org.apache.druid.server.SegmentManager;
import org.apache.druid.server.coordination.DruidServerMetadata;
Expand Down Expand Up @@ -91,7 +92,8 @@ public SegmentMetadataCacheForBenchmark(
brokerInternalQueryConfig,
new NoopServiceEmitter(),
new PhysicalDatasourceMetadataFactory(joinableFactory, segmentManager),
new NoopCoordinatorClient()
new NoopCoordinatorClient(),
CentralizedDatasourceSchemaConfig.create()
);
}

Expand Down
3 changes: 2 additions & 1 deletion docs/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -878,7 +878,7 @@ These Coordinator static configurations can be defined in the `coordinator/runti
|`druid.coordinator.loadqueuepeon.repeatDelay`|The start and repeat delay for the `loadqueuepeon`, which manages the load and drop of segments.|`PT0.050S` (50 ms)|
|`druid.coordinator.asOverlord.enabled`|Boolean value for whether this Coordinator service should act like an Overlord as well. This configuration allows users to simplify a Druid cluster by not having to deploy any standalone Overlord services. If set to true, then Overlord console is available at `http://coordinator-host:port/console.html` and be sure to set `druid.coordinator.asOverlord.overlordService` also.|false|
|`druid.coordinator.asOverlord.overlordService`| Required, if `druid.coordinator.asOverlord.enabled` is `true`. This must be same value as `druid.service` on standalone Overlord services and `druid.selectors.indexing.serviceName` on Middle Managers.|NULL|
|`druid.centralizedDatasourceSchema.enabled`|Boolean flag for enabling datasource schema building on the Coordinator.|false|
|`druid.centralizedDatasourceSchema.enabled`|Boolean flag for enabling datasource schema building on the Coordinator. Note, when using MiddleManager to launch task, set `druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled` in MiddleManager runtime config. |false|

##### Metadata management

Expand Down Expand Up @@ -1435,6 +1435,7 @@ MiddleManagers pass their configurations down to their child peons. The MiddleMa
|`druid.worker.baseTaskDirs`|List of base temporary working directories, one of which is assigned per task in a round-robin fashion. This property can be used to allow usage of multiple disks for indexing. This property is recommended in place of and takes precedence over `${druid.indexer.task.baseTaskDir}`. If this configuration is not set, `${druid.indexer.task.baseTaskDir}` is used. For example, `druid.worker.baseTaskDirs=[\"PATH1\",\"PATH2\",...]`.|null|
|`druid.worker.baseTaskDirSize`|The total amount of bytes that can be used by tasks on any single task dir. This value is treated symmetrically across all directories, that is, if this is 500 GB and there are 3 `baseTaskDirs`, then each of those task directories is assumed to allow for 500 GB to be used and a total of 1.5 TB will potentially be available across all tasks. The actual amount of memory assigned to each task is discussed in [Configuring task storage sizes](../ingestion/tasks.md#configuring-task-storage-sizes)|`Long.MAX_VALUE`|
|`druid.worker.category`|A string to name the category that the MiddleManager node belongs to.|`_default_worker_category`|
|`druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled`| This config should be set when CentralizedDatasourceSchema feature is enabled. |false|
Copy link
Contributor

@kfaraz kfaraz Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For follow-up PR:
The config description should be more like Indicates whether centralized schema management is enabled. The description should also link to the page which contains the details of the feature.


#### Peon processing

Expand Down
6 changes: 6 additions & 0 deletions docs/operations/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,12 @@ Most metric values reset each emission period, as specified in `druid.monitoring
|`metadatacache/schemaPoll/count`|Number of coordinator polls to fetch datasource schema.||
|`metadatacache/schemaPoll/failed`|Number of failed coordinator polls to fetch datasource schema.||
|`metadatacache/schemaPoll/time`|Time taken for coordinator polls to fetch datasource schema.||
|`metadatacache/backfill/count`|Number of segments for which schema was back filled in the database.|`dataSource`|
|`schemacache/realtime/count`|Number of realtime segments for which schema is cached.||Depends on the number of realtime segments.|
Copy link
Contributor

@kfaraz kfaraz Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For follow-up PR
Do these rows render correctly? The preceding rows have only 3 columns, this one seems to have 4.

|`schemacache/finalizedSegmentMetadata/count`|Number of finalized segments for which schema metadata is cached.||Depends on the number of segments in the cluster.|
|`schemacache/finalizedSchemaPayload/count`|Number of finalized segment schema cached.||Depends on the number of distinct schema in the cluster.|
|`schemacache/inTransitSMQResults/count`|Number of segments for which schema was fetched by executing segment metadata query.||Eventually it should be 0.|
|`schemacache/inTransitSMQPublishedResults/count`|Number of segments for which schema is cached after back filling in the database.||Eventually it should be 0.|
Comment on lines +80 to +83
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For follow-up PR:
Is schemacache/ not the same as metadatacache? The similar yet different names can be confusing.

|`serverview/sync/healthy`|Sync status of the Broker with a segment-loading server such as a Historical or Peon. Emitted only when [HTTP-based server view](../configuration/index.md#segment-management) is enabled. This metric can be used in conjunction with `serverview/sync/unstableTime` to debug slow startup of Brokers.|`server`, `tier`|1 for fully synced servers, 0 otherwise|
|`serverview/sync/unstableTime`|Time in milliseconds for which the Broker has been failing to sync with a segment-loading server. Emitted only when [HTTP-based server view](../configuration/index.md#segment-management) is enabled.|`server`, `tier`|Not emitted for synced servers.|
|`subquery/rowLimit/count`|Number of subqueries whose results are materialized as rows (Java objects on heap).|This metric is only available if the `SubqueryCountStatsMonitor` module is included.| |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@
import org.apache.druid.query.aggregation.LongSumAggregatorFactory;
import org.apache.druid.segment.TestHelper;
import org.apache.druid.segment.indexing.DataSchema;
import org.apache.druid.segment.metadata.CentralizedDatasourceSchemaConfig;
import org.apache.druid.segment.metadata.SegmentSchemaManager;
import org.apache.druid.segment.realtime.firehose.ChatHandlerProvider;
import org.apache.druid.segment.transform.TransformSpec;
import org.apache.druid.server.security.AuthorizerMapper;
Expand Down Expand Up @@ -88,19 +90,28 @@ public class MaterializedViewSupervisorTest
private String derivativeDatasourceName;
private MaterializedViewSupervisorSpec spec;
private final ObjectMapper objectMapper = TestHelper.makeJsonMapper();
private SegmentSchemaManager segmentSchemaManager;

@Before
public void setUp()
{
TestDerbyConnector derbyConnector = derbyConnectorRule.getConnector();
derbyConnector.createDataSourceTable();
derbyConnector.createSegmentSchemasTable();
derbyConnector.createSegmentTable();
taskStorage = EasyMock.createMock(TaskStorage.class);
taskMaster = EasyMock.createMock(TaskMaster.class);
segmentSchemaManager = new SegmentSchemaManager(
derbyConnectorRule.metadataTablesConfigSupplier().get(),
objectMapper,
derbyConnector
);
indexerMetadataStorageCoordinator = new IndexerSQLMetadataStorageCoordinator(
objectMapper,
derbyConnectorRule.metadataTablesConfigSupplier().get(),
derbyConnector
derbyConnector,
segmentSchemaManager,
CentralizedDatasourceSchemaConfig.create()
);
metadataSupervisorManager = EasyMock.createMock(MetadataSupervisorManager.class);
sqlSegmentsMetadataManager = EasyMock.createMock(SqlSegmentsMetadataManager.class);
Expand Down Expand Up @@ -142,8 +153,8 @@ public void testCheckSegments() throws IOException
final Interval day1 = baseSegments.get(0).getInterval();
final Interval day2 = new Interval(day1.getStart().plusDays(1), day1.getEnd().plusDays(1));

indexerMetadataStorageCoordinator.commitSegments(new HashSet<>(baseSegments));
indexerMetadataStorageCoordinator.commitSegments(derivativeSegments);
indexerMetadataStorageCoordinator.commitSegments(new HashSet<>(baseSegments), null);
indexerMetadataStorageCoordinator.commitSegments(derivativeSegments, null);
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
EasyMock.expect(taskStorage.getActiveTasks()).andReturn(ImmutableList.of()).anyTimes();
Expand All @@ -165,8 +176,8 @@ public void testSubmitTasksDoesNotFailIfTaskAlreadyExists() throws IOException
Set<DataSegment> baseSegments = Sets.newHashSet(createBaseSegments());
Set<DataSegment> derivativeSegments = Sets.newHashSet(createDerivativeSegments());

indexerMetadataStorageCoordinator.commitSegments(baseSegments);
indexerMetadataStorageCoordinator.commitSegments(derivativeSegments);
indexerMetadataStorageCoordinator.commitSegments(baseSegments, null);
indexerMetadataStorageCoordinator.commitSegments(derivativeSegments, null);
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
EasyMock.expect(taskStorage.getActiveTasks()).andReturn(ImmutableList.of()).anyTimes();
Expand All @@ -187,8 +198,8 @@ public void testSubmitTasksFailsIfTaskCannotBeAdded() throws IOException
Set<DataSegment> baseSegments = Sets.newHashSet(createBaseSegments());
Set<DataSegment> derivativeSegments = Sets.newHashSet(createDerivativeSegments());

indexerMetadataStorageCoordinator.commitSegments(baseSegments);
indexerMetadataStorageCoordinator.commitSegments(derivativeSegments);
indexerMetadataStorageCoordinator.commitSegments(baseSegments, null);
indexerMetadataStorageCoordinator.commitSegments(derivativeSegments, null);
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
EasyMock.expect(taskStorage.getActiveTasks()).andReturn(ImmutableList.of()).anyTimes();
Expand All @@ -211,7 +222,7 @@ public void testSubmitTasksFailsIfTaskCannotBeAdded() throws IOException
public void testCheckSegmentsAndSubmitTasks() throws IOException
{
Set<DataSegment> baseSegments = Collections.singleton(createBaseSegments().get(0));
indexerMetadataStorageCoordinator.commitSegments(baseSegments);
indexerMetadataStorageCoordinator.commitSegments(baseSegments, null);
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
EasyMock.expect(taskStorage.getActiveTasks()).andReturn(ImmutableList.of()).anyTimes();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@
import org.apache.druid.query.topn.TopNQuery;
import org.apache.druid.query.topn.TopNQueryBuilder;
import org.apache.druid.segment.TestHelper;
import org.apache.druid.segment.metadata.CentralizedDatasourceSchemaConfig;
import org.apache.druid.segment.metadata.SegmentSchemaManager;
import org.apache.druid.segment.realtime.appenderator.SegmentSchemas;
import org.apache.druid.server.coordination.DruidServerMetadata;
import org.apache.druid.server.coordination.ServerType;
Expand Down Expand Up @@ -89,12 +91,14 @@ public class DatasourceOptimizerTest extends CuratorTestBase
private IndexerSQLMetadataStorageCoordinator metadataStorageCoordinator;
private BatchServerInventoryView baseView;
private BrokerServerView brokerServerView;
private SegmentSchemaManager segmentSchemaManager;

@Before
public void setUp() throws Exception
{
TestDerbyConnector derbyConnector = derbyConnectorRule.getConnector();
derbyConnector.createDataSourceTable();
derbyConnector.createSegmentSchemasTable();
derbyConnector.createSegmentTable();
MaterializedViewConfig viewConfig = new MaterializedViewConfig();
jsonMapper = TestHelper.makeJsonMapper();
Expand All @@ -106,10 +110,18 @@ public void setUp() throws Exception
jsonMapper,
derbyConnector
);
segmentSchemaManager = new SegmentSchemaManager(
derbyConnectorRule.metadataTablesConfigSupplier().get(),
jsonMapper,
derbyConnector
);

metadataStorageCoordinator = new IndexerSQLMetadataStorageCoordinator(
jsonMapper,
derbyConnectorRule.metadataTablesConfigSupplier().get(),
derbyConnector
derbyConnector,
segmentSchemaManager,
CentralizedDatasourceSchemaConfig.create()
);

setupServerAndCurator();
Expand Down Expand Up @@ -167,7 +179,7 @@ public void testOptimize() throws InterruptedException
1024 * 1024
);
try {
metadataStorageCoordinator.commitSegments(Sets.newHashSet(segment));
metadataStorageCoordinator.commitSegments(Sets.newHashSet(segment), null);
announceSegmentForServer(druidServer, segment, zkPathsConfig, jsonMapper);
}
catch (IOException e) {
Expand All @@ -192,7 +204,7 @@ public void testOptimize() throws InterruptedException
1024
);
try {
metadataStorageCoordinator.commitSegments(Sets.newHashSet(segment));
metadataStorageCoordinator.commitSegments(Sets.newHashSet(segment), null);
announceSegmentForServer(druidServer, segment, zkPathsConfig, jsonMapper);
}
catch (IOException e) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
import org.apache.druid.metadata.MetadataStorageConnectorConfig;
import org.apache.druid.metadata.MetadataStorageTablesConfig;
import org.apache.druid.metadata.SQLMetadataConnector;
import org.apache.druid.segment.metadata.CentralizedDatasourceSchemaConfig;
import org.skife.jdbi.v2.Binding;
import org.skife.jdbi.v2.ColonPrefixNamedParamStatementRewriter;
import org.skife.jdbi.v2.DBI;
Expand Down Expand Up @@ -133,9 +134,13 @@ public class SQLServerConnector extends SQLMetadataConnector
));

@Inject
public SQLServerConnector(Supplier<MetadataStorageConnectorConfig> config, Supplier<MetadataStorageTablesConfig> dbTables)
public SQLServerConnector(
Supplier<MetadataStorageConnectorConfig> config,
Supplier<MetadataStorageTablesConfig> dbTables,
CentralizedDatasourceSchemaConfig centralizedDatasourceSchemaConfig
)
{
super(config, dbTables);
super(config, dbTables, centralizedDatasourceSchemaConfig);

final BasicDataSource datasource = getDatasource();
datasource.setDriverClassLoader(getClass().getClassLoader());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import com.google.common.base.Suppliers;
import org.apache.druid.metadata.MetadataStorageConnectorConfig;
import org.apache.druid.metadata.MetadataStorageTablesConfig;
import org.apache.druid.segment.metadata.CentralizedDatasourceSchemaConfig;
import org.junit.Assert;
import org.junit.Test;

Expand All @@ -38,7 +39,8 @@ public void testIsTransientException()
Suppliers.ofInstance(new MetadataStorageConnectorConfig()),
Suppliers.ofInstance(
MetadataStorageTablesConfig.fromBase(null)
)
),
CentralizedDatasourceSchemaConfig.create()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be test case which test the schema is created only when the config is set.

);

Assert.assertTrue(connector.isTransientException(new SQLException("Resource Failure!", "08DIE")));
Expand All @@ -59,7 +61,8 @@ public void testLimitClause()
Suppliers.ofInstance(new MetadataStorageConnectorConfig()),
Suppliers.ofInstance(
MetadataStorageTablesConfig.fromBase(null)
)
),
CentralizedDatasourceSchemaConfig.create()
);
Assert.assertEquals("FETCH NEXT 100 ROWS ONLY", connector.limitClause(100));
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1584,9 +1584,9 @@ private static TaskAction<SegmentPublishResult> createAppendAction(
)
{
if (taskLockType.equals(TaskLockType.APPEND)) {
return SegmentTransactionalAppendAction.forSegments(segments);
return SegmentTransactionalAppendAction.forSegments(segments, null);
} else if (taskLockType.equals(TaskLockType.SHARED)) {
return SegmentTransactionalInsertAction.appendAction(segments, null, null);
return SegmentTransactionalInsertAction.appendAction(segments, null, null, null);
} else {
throw DruidException.defensive("Invalid lock type [%s] received for append action", taskLockType);
}
Expand All @@ -1598,9 +1598,9 @@ private TaskAction<SegmentPublishResult> createOverwriteAction(
)
{
if (taskLockType.equals(TaskLockType.REPLACE)) {
return SegmentTransactionalReplaceAction.create(segmentsWithTombstones);
return SegmentTransactionalReplaceAction.create(segmentsWithTombstones, null);
} else if (taskLockType.equals(TaskLockType.EXCLUSIVE)) {
return SegmentTransactionalInsertAction.overwriteAction(null, segmentsWithTombstones);
return SegmentTransactionalInsertAction.overwriteAction(null, segmentsWithTombstones, null);
} else {
throw DruidException.defensive("Invalid lock type [%s] received for overwrite action", taskLockType);
}
Expand Down
Loading
Loading