Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

goconserver malformed HTTP response #5313

Closed
kcgthb opened this issue Jun 21, 2018 · 25 comments
Closed

goconserver malformed HTTP response #5313

kcgthb opened this issue Jun 21, 2018 · 25 comments

Comments

@kcgthb
Copy link
Member

kcgthb commented Jun 21, 2018

congo commands fail with "malformed HTTP response":

# congo list
Could not list resources, Get http://127.0.0.1:12429/nodes: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
# congo version
Version: 0.2.2, BuildTime: 2018-03-29T21:46:21-0400
 Commit: 75077f1e64efb5ec56e70185b4aa0787f4d498a4

Remote console works fine otherwise:

# rcons sh-hn02
[Enter `^Ec?' for help]
goconserver(2018-06-21T16:29:17-07:00): Hello 10.10.0.1:49152, welcome to the session of sh-hn02

CentOS Linux 7 (Core)
Kernel 3.10.0-862.3.2.el7.x86_64 on an x86_64

sh-hn02 login:
@zet809
Copy link

zet809 commented Jun 22, 2018

hi, @kcgthb , can you show me the configuration file of goconserver? I think you have configured ssl_key file for goconserver.
The rcons works, so, maybe you shall export the following environment variables and try congo list again.

export CONGO_SSL_CA_CERT="/root/.xcat/ca.pem"
export CONGO_SSL_CERT="/root/.xcat/client-cert.pem"
export CONGO_SSL_KEY="/root/.xcat/client-key.pem"
export CONGO_URL="https://127.0.0.1:12429"
export CONGO_SERVER_HOST=<YOUR_MN_HOSTNAME>

Pls reference https://github.com/xcat2/goconserver/tree/master/scripts/ssl for more information.

@kcgthb
Copy link
Member Author

kcgthb commented Jun 22, 2018

Oh I see, thanks for the pointer.

Defining those environment variables seem a bit redundant, though. Wouldn't it be better if congo could parse /etc/goconserver/server.conf directly and use its information directly, since everything is already defined there?

@zet809
Copy link

zet809 commented Jun 25, 2018

In fact, server.conf is just for the goconserver itself, but congo is a client, it shall not read server configuration directly.
And in your case, rcons works, since rcons use xCAT default configuration for congo.
Maybe we can enhance congo to look for keys in an assignable directory.

@zet809
Copy link

zet809 commented Jun 25, 2018

BTW, @kcgthb , I notice that you use congo directly, how do you want to use with goconserver? A standalone or hierarchical goconserver cluster or with xCAT? Thx!

@kcgthb
Copy link
Member Author

kcgthb commented Jun 25, 2018

@zet809 I see, thanks! It just seems a bit redundant to have to provide the same information about certificates in multiple places.

As for using congo directly, I don't really have a use case, I was just trying to test the functionality I was asking for in #5134 (xcat2/goconserver#45). But I'd love not to have to use congo for this. :)

@zet809
Copy link

zet809 commented Jun 26, 2018

Got it, thx!
For the function broadcast mode, pls reference https://github.com/xcat2/goconserver/releases for more information.

@zet809 zet809 added the sprint2 label Jun 26, 2018
@kcgthb
Copy link
Member Author

kcgthb commented Jun 27, 2018

Thank you!
I'm not seeing any reference to the broadcast mode in https://github.com/xcat2/goconserver/releases. Is there a way to use it with rcons/wcons directly, without invoking congo?

@zet809
Copy link

zet809 commented Jul 9, 2018

The rcons will go into broadcast mode automatically if you specify multiple node for it. The ctrl+e c l ? will be send to all the nodes specified.
Since we haven't have change to verify this function, it is not included in the 0.3.0 release, you have to build file by yourself.

@kcgthb
Copy link
Member Author

kcgthb commented Jul 11, 2018

@zet809

Since we haven't have change to verify this function, it is not included in the 0.3.0 release, you have to build file by yourself.

All right, I'll wait for the next release, then. Thank you!

@zet809 zet809 modified the milestones: 2.14.2, 2.14.3 Jul 13, 2018
@xcat2 xcat2 deleted a comment from zet809 Jul 13, 2018
@zet809
Copy link

zet809 commented Jul 13, 2018

hi, @kcgthb , I will move this defect forward and let you know when we release new goconserver with broadcast enabled. Thx!

@zet809
Copy link

zet809 commented Aug 21, 2018

Hi, @kcgthb , the goconserver v0.3.1 include the broadcast function, pls take a try and feel free to let us know the result. Thx!

@kcgthb
Copy link
Member Author

kcgthb commented Aug 21, 2018

Hi @zet809
Thank you! I just tried it, and I have 2 observations:

1. there doesn't seem to be any feedback from the broadcast commands.

When running rcons sh-hn02,sh-hn03, I get the following prompt:

# rcons sh-hn02,sh-hn03

sh-hn02: Console session has been established
sh-hn03: Console session has been established
Enter the key to broadcast the buffer. [Enter '^Ec.' to exit]

I can type characters there but nothing is outputted on screen, so I have no way of knowing what I'm typing, or if the commands work.
If I open other consoles at the same time (with wcons sh-hn02,sh-hn03 for instance), I can see that the characters typed in the rcons window are correctly sent to the nodes' console. But not seeing them in rcons directly makes it difficult to use.

2. it doesn't seem to be working in hierarchy mode.

I have:

  • MN: sh-hn01
  • SN: sh-hn02
  • CN: sh-101-[59-60] using sh-hn02 as a SN
    Calling rcons on those 2 CNs fail with the following output:
# rcons sh-101-59,sh-101-60
bash: sh-hn02: command not found
Error: could not get the short hostname for sh-hn02
sh-hn02.
#

Please let me know if I can provide any additional information.

@zet809
Copy link

zet809 commented Aug 23, 2018

HI, @kcgthb , for the 1st question, after we investigate the broadcast function of https://github.com/dun/conman, it also output nothing. So, we just keep it.
For the 2nd issue, can you show me the output of makegocons -q?
Thx!

@kcgthb
Copy link
Member Author

kcgthb commented Aug 23, 2018

Hi @zet809

  1. I see, and that's true. Still makes it a bit counter-intuitive, though. Maybe a note in the documentation would be helpful, to mention that no output will be displayed, but that one can use wcons in parallel to actually see what's happening on the consoles?
  2. Sure
# makegocons -q sh-101-[59-60]

NODE                             SERVER                           STATE
sh-101-59                        sh-hn02.SUNet                    avaiable
sh-101-60                        sh-hn02.SUNet                    avaiable

sh-hn02.SUNet is the FQDN of that SN (which is required for other reasons), but all xCAT references to this nodes are done with the shortname:

# lsdef -c sh-101-[59-60] -i servicenode,conserver
sh-101-59: conserver=sh-hn02
sh-101-59: servicenode=sh-hn02
sh-101-60: conserver=sh-hn02
sh-101-60: servicenode=sh-hn02

@neo954
Copy link
Contributor

neo954 commented Aug 24, 2018

Hello @kcgthb,

Unfortunately, I cannot reproduce your problem in my test environment. Based on your comments above, it seems you are facing DNS resolving problem. Can both the short hostnames and the FQDNs be resolved on all the nodes in your cluster? It seems that is the problem. Can you verify that in your environment?

@zet809 zet809 modified the milestones: 2.14.3, 2.14.4 Aug 24, 2018
@zet809 zet809 added sprint1 and removed sprint2 labels Aug 24, 2018
@kcgthb
Copy link
Member Author

kcgthb commented Aug 24, 2018

@neo954 thanks for looking into this. I'll double check the DNS config on our end, but I think the question is: why is goconserver trying to get the FQDN of the console servers instead of using the name that is provided in the nodehm table?
When console servers are dual-homed, this could leads to errors, where the hostname is resolved to an external-facing interface, while the node serves consoles on its internal interface. It would be much better to stick to the name that is provided in the table, IMHO.

@kcgthb
Copy link
Member Author

kcgthb commented Aug 24, 2018

And I checked, and yes, both the SN/Console server's FQDN and short name can be properly resolved on both the CN itself, and the compute nodes:

# clush -w sh-101-[59-60],sh-hn02 host sh-hn02.SUNet
sh-101-59: sh-hn02.SUNet has address 10.102.2.202
sh-101-60: sh-hn02.SUNet has address 10.102.2.202
sh-hn02: sh-hn02.SUNet has address 10.102.2.202

# clush -w sh-101-[59-60],sh-hn02 host sh-hn02
sh-101-59: sh-hn02.int has address 10.10.0.2
sh-101-60: sh-hn02.int has address 10.10.0.2
sh-hn02: sh-hn02.int has address 10.10.0.2

And yet:

rcons sh-101-59,sh-101-60
bash: sh-hn02: command not found
Error: could not get the short hostname for sh-hn02
sh-hn02.

The bash: sh-hn02: command not found part especially makes me think that there's more than just DNS resolution at play here, it looks like an issue with the parsing of some command output, because that sure looks like the result of trying to execute a hostname.

@kcgthb
Copy link
Member Author

kcgthb commented Aug 24, 2018

Also, I think you're discussing similar findings in xcat2/goconserver#48 :)

@zet809
Copy link

zet809 commented Aug 28, 2018

hi, @kcgthb , the error msg below seems to me that the Management Node from which running rcons can not resolve sh-hn02, will you pls check that?

bash: sh-hn02: command not found
Error: could not get the short hostname for sh-hn02

@kcgthb
Copy link
Member Author

kcgthb commented Aug 28, 2018

@zet809 the Management Node from which the rcons command is called can perfectly resolve sh-hn02: it's in both DNS and /etc/hosts:

# rcons sh-101-59,sh-101-60
bash: sh-hn02: command not found
Error: could not get the short hostname for sh-hn02
sh-hn02.

# host sh-hn02
sh-hn02.int has address 10.10.0.2

# ping -c1 sh-hn02
PING sh-hn02.int (10.10.0.2) 56(84) bytes of data.
64 bytes from sh-hn02-em1.int (10.10.0.2): icmp_seq=1 ttl=64 time=0.179 ms

--- sh-hn02.int ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.179/0.179/0.179/0.000 ms

@zet809
Copy link

zet809 commented Aug 29, 2018

Hi, @kcgthb , this issue is really strange for us, the script /opt/xcat/bin/rcons is triggered to open console session.
Will you pls add set -x after line elif [ $USE_GOCONSERVER == "1" ] in /opt/xcat/bin/rcons, and then let us know the output of rcons?

@kcgthb
Copy link
Member Author

kcgthb commented Aug 29, 2018

Sure, here you go:

# rcons sh-101-59,sh-101-60
+ [[ -f /etc/init.d/confluent ]]
+ '[' -z '' ']'
++ nodels sh-101-59,sh-101-60 nodehm.conserver
++ awk -F: '{print $2}'
++ tr -d ' '
+ CONSERVER='sh-hn02
sh-hn02'
+ '[' -z 'sh-hn02
sh-hn02' ']'
+ '[' -z 'sh-hn02
sh-hn02' ']'
+ CONGO_ENV='CONGO_SSL_KEY=/root/.xcat/client-cred.pem                CONGO_SSL_CERT=/root/.xcat/client-cred.pem                CONGO_SSL_CA_CERT=/root/.xcat/ca.pem                CONGO_PORT=12430                CONGO_CLIENT_TYPE=xcat                CONGO_SSL_INSECURE=true'
++ hostname
+ '[' 'sh-hn02
sh-hn02' == sh-hn01.SUNet ']'
++ ssh sh-hn02 sh-hn02 hostname -s
bash: sh-hn02: command not found
+ host=
+ '[' 127 -ne 0 ']'
+ echo 'Error: could not get the short hostname for sh-hn02
sh-hn02.'
Error: could not get the short hostname for sh-hn02
sh-hn02.
+ exit 1

Makes the problem easy to spot: the CONSERVER variable contains 2 lines, where it probably expect a single value.
The list should probably be filtered for duplicates, but more importantly, what happens if the compute nodes have different console servers?

@zet809
Copy link

zet809 commented Aug 30, 2018

yes, @kcgthb , you are right the rcons doesn't work with hierarchical.
For nodes with same conserver, connect to the goconserver on it directly.
But for different conserver, we need to enhance current rcons and goconserver to support it. So, I will quickly fix the scenario that have same conserver.

@zet809 zet809 added sprint2 and removed sprint1 labels Sep 6, 2018
@zet809
Copy link

zet809 commented Oct 10, 2018

We have xcat2/goconserver#48 opened to trace the issue that goconserver broadcast function can not support the hierarchical structure.

@zet809 zet809 removed this from the 2.14.4 milestone Oct 10, 2018
@zet809
Copy link

zet809 commented Oct 23, 2018

hi, @kcgthb , since the xcat2/goconserver#48 can be used to trace the broadcast function, I will close this issue. Thx!

@zet809 zet809 closed this as completed Oct 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants