-
Notifications
You must be signed in to change notification settings - Fork 228
Helix Cloud Support
In the past, Helix, as a cluster management framework to manage partitioned, replicated resources in distributed systems, was mostly used in on premise environment. In on premise environment, companies handle all resources deployment, hardware maintenance, securities, privacy, etc. However, nowadays, with more and more high performance/low cost cloud environment available, companies started to switch their software to cloud environment. There a couple of famous cloud service providers, like AWS, Azure, and GCP, etc. In a cloud environment, company can easily scale up or scale down depending on overall usage, no need to worry about provisioning any more. With this trend, we can see challenges as well as opportunities for Helix to better serve the systems deployed in cloud environment.
In this feature, we focus on one opportunity of Helix in cloud environment to help customers auto register the participants to a Helix cluster. Currently, after a Helix cluster is created, there are two ways to add instances (participants) to the cluster. One is manually added, where customers manually add instance config to the cluster; and the other is auto join, where customers set the auto join config of the clusters to be true, and each participant populates its own instance config when connects. However, the auto join only works perfectly when customers use Helix in non rack-aware environment, meaning there is no fault domain concept. If used in rack-aware environment, users still need to manually input the domain information to the instance config. Considering most customers would use Helix in a rack-aware environment, it will be beneficial if Helix could provide them a fully automatic way for participants to join the cluster.
In on premise environment, it is hard for each participant to get its own fault domain information. But in cloud environment, there is a good opportunity to realize full automation as a lot of cloud providers give this information to each individual participant through a metadata endpoint. For example, for AWS, Azure, and GCP, they all use a fixed IP address http://169.254.169.254/ for each instance to get their metadata information which contains domain information. In AWS, the field is named as "placement"; in Azure, the field is named as "PlatformUpdateDomain"; in GCP, the field is named as "zone". It is usually just an integer dictating which fault domain the instance belongs to.
Pull Request Description Template
ZooKeeper API module for Apache Helix
DataAccessor for Assignment Metadata
Concurrency and Parallelism for BucketDataAccessor
WAGED Rebalance Pipeline Redesign
WAGED rebalancer Hard Constraint Scope Expansion
IdealState Dependency Removal Progression Remove requested state in Task Framework