Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodetool status not showing dynamic pods joining #2

Open
screeley44 opened this issue Dec 3, 2015 · 6 comments
Open

nodetool status not showing dynamic pods joining #2

screeley44 opened this issue Dec 3, 2015 · 6 comments

Comments

@screeley44
Copy link

@vyshane - hello, I'm experimenting with your examples, everything seems to run fine, I have an openshift 3.1 cluster running master + 1 node and gluster cluster on the backend for Persistent Volume support.

I created peer-service, service and rc and my pods run, and I'm using a glusterfs volume for data persistence, the data is persisted on multiple restarts of the pods/rc but when I scale I'm not seeing the pods join the C* ring - and not sure what I'm missing. I don't have a ton of experience with k8 or cassandra but from each container I can ping cassandra-peer (peer service)- so I know they are able to connect.

Unclear to me right now if I need to change my PEER_DISCOVERY_DOMAIN or something else?

      env:
        # Feel free to change the following:
        - name: CASSANDRA_CLUSTER_NAME
          value: Test Cluster
        - name: CASSANDRA_DC
          value: datacenter1
        - name: CASSANDRA_RACK
          value: rack1
        - name: CASSANDRA_ENDPOINT_SNITCH
          value: GossipingPropertyFileSnitch

        # The peer discovery domain needs to point to the Cassandra peer service
        - name: PEER_DISCOVERY_DOMAIN
          value: cassandra-peers.default.cluster.local.

some output from oc (kubectl for openshift):
[root@ose1 cassandra-custom]# oc get pods
NAME READY STATUS RESTARTS AGE
cassandra-vfujv 1/1 Running 0 59s
cassandra-x36ay 1/1 Running 0 1m
[root@ose1 cassandra-custom]# oc exec -it cassandra-x36ay -- nodetool status testspace

Datacenter: datacenter1

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.1.0.32 176.43 KB 256 100.0% 03b19bd1-ce65-4525-89e7-b23c9b3f0a92 rack1

@vyshane
Copy link
Owner

vyshane commented Dec 7, 2015

@screeley44 do you get a list of IP addresses when you do a dig cassandra-peers.default.cluster.local from a Cassandra container?

What's the output of oc get namespaces? It's possible that Openshift uses a different namespace from default.cluster.local.

@screeley44
Copy link
Author

@vyshane - I'm using the default namespace (also referred to as project for OSE):

[root@ose1 usr_configs]# oc get namespaces
NAME LABELS STATUS AGE
default Active 18d
openshift Active 18d
openshift-infra Active 18d

my dig is not returning the ipaddrs of the containers:

root@cassandra-vfujv:/etc# dig $PEER_DISCOVERY_DOMAIN

; <<>> DiG 9.9.5-9+deb8u3-Debian <<>> cassandra-peers.default.cluster.local.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 32805
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;cassandra-peers.default.cluster.local. IN A

;; AUTHORITY SECTION:
cluster.local. 60 IN SOA ns.dns.cluster.local. hostmaster.cluster.local. 1449158400 28800 7200 604800 60

;; Query time: 1 msec
;; SERVER: 192.168.122.251#53(192.168.122.251)
;; WHEN: Thu Dec 03 16:49:22 UTC 2015
;; MSG SIZE rcvd: 109

root@cassandra-vfujv:/etc# dig cassandra-peers

; <<>> DiG 9.9.5-9+deb8u3-Debian <<>> cassandra-peers
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8607
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;cassandra-peers. IN A

;; Query time: 0 msec
;; SERVER: 192.168.122.251#53(192.168.122.251)
;; WHEN: Thu Dec 03 16:50:23 UTC 2015
;; MSG SIZE rcvd: 33

The services (cassandra-peers and cassandra-service) look good to me based on get services and get endpoints:

[root@ose1 cassandra-custom]# oc get services
NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE
cassandra-peers None 7000/TCP,7001/TCP name=cassandra-cluster 1d
cassandra-service 172.30.209.133 9042/TCP name=cassandra-cluster 1d
cassandra-service-google 172.30.207.210 9042/TCP app=cassandra 1h
kubernetes 172.30.0.1 443/TCP,53/UDP,53/TCP 14d
[root@ose1 cassandra-custom]# oc get endpoints
NAME ENDPOINTS AGE
cassandra-peers 10.1.0.46:7001,10.1.0.47:7001,10.1.0.46:7000 + 1 more... 1d
cassandra-service 10.1.0.46:9042,10.1.0.47:9042 1d
cassandra-service-google 10.1.0.54:9042,10.1.0.55:9042 1h
glusterfs-cluster 192.168.122.221:1,192.168.122.222:1 3d
kubernetes 192.168.122.251:53,192.168.122.251:53,192.168.122.251:8443 14d

@vyshane
Copy link
Owner

vyshane commented Dec 10, 2015

It looks like DNS is not working for services. Is the DNS addon enabled for the Kubernetes cluster?

@ScubaDrew
Copy link

Hello, I am having a similar problem. DNS seems to respond, but the nodes are not joining:

root@cassandra-c7wdb:/# dig $PEER_DISCOVERY_DOMAIN

; <<>> DiG 9.9.5-9+deb8u6-Debian <<>> cassandra-peers.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63289
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;cassandra-peers.default.svc.cluster.local. IN A

;; ANSWER SECTION:
cassandra-peers.default.svc.cluster.local. 30 IN A 10.244.0.4
cassandra-peers.default.svc.cluster.local. 30 IN A 10.244.0.3

;; Query time: 1 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Wed May 04 17:32:47 UTC 2016
;; MSG SIZE  rcvd: 91
root@cassandra-c7wdb:/# nodetool status           
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.244.0.3  111.87 KB  256          100.0%            058c38c7-0f79-419c-b42a-a7c2f9782110  Kubernetes Cluster

@vyshane
Copy link
Owner

vyshane commented May 5, 2016

Do you see any log errors when you tail the cassandra pods?

@ScubaDrew
Copy link

ScubaDrew commented May 5, 2016

I think what happened was the DNS did not populate in time for the second node. I've got a cluster up now that has two nodes by waiting some time after the first node was up.

Have you used this setup very extensively @vyshane ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants