This project is designed to generate and index random data into an Elasticsearch Leader index and then verify replication in a Follower index. The script supports both Elastic Cloud and self-managed clusters.
- Python 3.x
- Elasticsearch 8.x
- Virtual Environment (recommended)
git clone https://github.com/sajitsasi/ccr-data-gen.git
cd ccr-data-gen
python -m venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
pip install requirements.txt
########################Leader Cluster########################
# Leader Cluster Elastic Cloud info
Leader_ELASTIC_CLOUD_ID=<Your-Leader-Cloud-ID>
# Leader Cluster self-managed info
Leader_ELASTIC_HOST=<Your-Leader-Host> # Optional if using Cloud ID
Leader_ELASTIC_PORT=9200 # Default port
# Leader Cluster auth info
Leader_ELASTIC_USERNAME=<Your-Leader-Username>
Leader_ELASTIC_PASSWORD=<Your-Leader-Password>
########################Leader Cluster########################
#######################Follower Cluster#######################
# Follower Cluster Elastic Cloud info
Follower_ELASTIC_CLOUD_ID=<Your-Follower-Cloud-ID>
# Follower Cluster self-managed info
Follower_ELASTIC_HOST=<Your-Follower-Host> # Optional if using Cloud ID
Follower_ELASTIC_PORT=9200 # Default port
# Follower Cluster auth info
Follower_ELASTIC_USERNAME=<Your-Follower-Username>
Follower_ELASTIC_PASSWORD=<Your-Follower-Password>
#######################Follower Cluster#######################
########################Index Settings$#######################
INDEX_NAME=<Your-Index-Name>
EVENTS_PER_SECOND=<Number-of-Events-Per-Second>
COUNT_INDEX_NAME=<Count-Index-Name> # Created on follower Cluster
########################Index Settings$#######################
python ./index_ccr_data.py
The script checks if the Leader index exists. If it does not, it is created and prompts the user to set up the corresponding Follower index in the Follower cluster. Note that the index name needs to be the same for both Leader and Follower cluster indices
Another index defined under COUNT_INDEX_NAME
is created on the Follower cluster to keep track of document counts in the Leader and Follower clusters
The generate_document
function creates random documents with fields as specified in the code
The script uses multiple threads to index historical data (30 days back) into the Leader index
A separate thread continuously indexes real-time data into the Leader index based on the specified events per second (EPS) configured.
Another thread separately queries the indices in both the Leader and Follower clusters to get the document count in each as well as getting the operations lag between the leader and the follower.
Note that this has been tested on 8.14, please open an issue if you need this to work for an earlier 8.X version
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Please open an issue or submit a pull request.