Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[run-tests] [improve][monitoring]PIP-231: Add topic_load_failed metric #20

Open
wants to merge 280 commits into
base: master
Choose a base branch
from

Conversation

tjiuming
Copy link
Owner

This PR is for running tests for upstream PR apache#19236.

lhotari and others added 18 commits March 27, 2023 14:32
…ioned-topic stat (apache#19942)

### Motivation

Pulsar will merge the variable `PartitionedTopicStatsImpl.replication[x].connected` by the way below when we call `pulsar-admin topics partitioned-stats`

``` java
this.connected = this.connected & other.connected
```

But the variable `connected` of `PartitionedTopicStatsImpl.replication` is initialized `false`, so the expression `this.connected & other.connected` will always be `false`.

Then we will always get the value `false` if we call `pulsar-admin topics partitioned-stats`.

### Modifications

make the variable `` of `PartitionedTopicStatsImpl` is initialized `true`
…pache#19851)

PIP: apache#16691

### Motivation
Raising a PR to implement apache#16691.

We need to support delete namespace bundle admin API.

### Modifications

* Support delete namespace bundle admin API.
* Add units test.
Master Issue: Master Issue: apache#16691, apache#18099

### Motivation

Raising a PR to implement Master Issue: apache#16691, apache#18099

We want to reduce unload frequencies from flaky traffic.

### Modifications
This PR 
- Introduced a config `loadBalancerSheddingConditionHitCountThreshold` to further restrict shedding conditions based on the hit count.
- Normalized offload traffic
- Lowered the default `loadBalanceSheddingDelayInSeconds` value from 600 to 180, as 10 mins are too long. 3 mins can be long enough to catch the new load after unloads.
- Changed the config `loadBalancerBundleLoadReportPercentage` to `loadBalancerMaxNumberOfBundlesInBundleLoadReport` to make the topk bundle count absolute instead of relative.
- Renamed `loadBalancerNamespaceBundleSplitConditionThreshold` to `loadBalancerNamespaceBundleSplitConditionHitCountThreshold` to be consistent with `*ConditionHitCountThreshold`.
- Renamed `loadBalancerMaxNumberOfBrokerTransfersPerCycle ` to `loadBalancerMaxNumberOfBrokerSheddingPerCycle`.
- Added LoadDataStore cleanup logic in BSC monitor.
- Added `msgThroughputEMA` in BrokerLoadData to smooth the broker throughput info.
- Updated Topk bundles sorted in a ascending order (instead of descending)
- Update some info logs to only show in the debug mode.
- Added load data tombstone upon Own, Releasing, Splitting
- Added the bundle ownership(isOwned) check upon split and unload.
- Added swap unload logic
…9951)

Motivation
Kafka's schema has "Optional" flag that used there to validate data/allow nulls.
Pulsar's schema does not have such info which makes conversion to kafka schema lossy.

Modifications
Added a config parameter that lets one force primitive schemas into optional ones.
KV schema is always optional.

Default is false, to match existing behavior.
…pic with ProtoBuf schema (apache#19767)

### Motivation
1. There is a topic1 with a protobuf schema.
2. Create a producer1 with AutoProduceBytes schema.
3. The producer1 will be created failed because the way to get the schema of protobuf schema is not supported. ### ### 
### Modification
Because the Protobuf schema is implemented from the AvroBaseStructSchema. So we add a way to get Protobuf schema just like the AvroSchema.
@github-actions
Copy link

github-actions bot commented Apr 2, 2023

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Apr 2, 2023
lordcheng10 and others added 9 commits April 3, 2023 12:55
…, when the updateStats method is executed (apache#19887)

Co-authored-by: lordcheng10 <[email protected]>
apache#19990)

### Motivation

While debugging an issue, I noticed that we call `super.exceptionCaught(ctx, cause);` in the `ProxyConnection` class. This leads to the following log line:

> An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. io.netty.channel.unix.Errors$NativeIoException: recvAddress(..) failed: Connection reset by peer

Because we always handle exceptions, there is no need to forward them to the next handler.

### Modifications

* Remove a single method call

### Verifying this change

This is a trivial change. Note that we do not call the super method in any other handler implementations in the project.

### Documentation

- [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->

### Matching PR in forked repository

PR in forked repository: skipping PR for this trivial change
lhotari and others added 28 commits May 30, 2023 07:08
…#20436)

- Add `MetadataStoreTest#testExistsDistributed` for distributed metaStore implementations only
- Add `MetadataStoreTest#testGetChildrenDistributed` for distributed metaStore implementations only
### Motivation

Currently, topics/bundles in `pulsar/system` will be filter while doing shedding, which is introduced by mistake by pr apache#15252.
But we need to unload topics/bundles in `pulsar/system` for load balancing. 

### Modifications

do not filter topics/bundles in `pulsar/system`.
@Technoboy- Technoboy- force-pushed the dev/PIP-231 branch 2 times, most recently from 38e18c3 to 4d1fab7 Compare June 1, 2023 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.