Skip to content

Commit

Permalink
Remove fail-fast behaviour on cluster bootstrap when peers discovery …
Browse files Browse the repository at this point in the history
…fails (#1513)
  • Loading branch information
thampiotr authored Aug 20, 2024
1 parent 6dff731 commit 5f50950
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 4 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ Main (unreleased)
- Updated Snowflake exporter with performance improvements for larger environments.
Also added a new panel to track deleted tables to the Snowflake mixin. (@Caleb-Hurshman)

- Changed the cluster startup behaviour, reverting to the previous logic where
a failure to resolve cluster join peers results in the node creating its own cluster. This is
to facilitate the process of bootstrapping a new cluster following user feedback (@thampiotr)

### Bugfixes

- Fix a bug where custom components don't always get updated when the config is modified in an imported directory. (@ante012)
Expand Down
7 changes: 3 additions & 4 deletions internal/service/cluster/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -275,10 +275,9 @@ func (s *Service) Run(ctx context.Context, host service.Host) error {

peers, err := s.getPeers()
if err != nil {
// Fatal failure on startup if we can't discover peers to prevent a split brain and give a clear signal to the user.
// NOTE: currently returning error from `Run` will not be handled correctly: https://github.com/grafana/alloy/issues/843
level.Error(s.log).Log("msg", "fatal error: failed to get peers to join at startup - this is likely a configuration error", "err", err)
os.Exit(1)
// Warn when failed to get peers on startup as it can result in a split brain. We do not fail hard here
// because it would complicate the process of bootstrapping a new cluster.
level.Warn(s.log).Log("msg", "failed to get peers to join at startup; will create a new cluster", "err", err)
}

// We log on info level including all the peers (without any abbreviation), as it's happening only on startup and
Expand Down

0 comments on commit 5f50950

Please sign in to comment.