Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VNF on OneKE appliance doesn't NAT (or get any NAT related info) #89

Open
3 tasks
kCyborg opened this issue Apr 29, 2024 · 6 comments
Open
3 tasks

VNF on OneKE appliance doesn't NAT (or get any NAT related info) #89

kCyborg opened this issue Apr 29, 2024 · 6 comments
Assignees
Labels
category: oneke OpenNebula Kubernetes appliance community Issue created by OpenNebula Community status: accepted The issue is valid and will be planned for fix or implementation type: bug Something isn't working

Comments

@kCyborg
Copy link

kCyborg commented Apr 29, 2024

Description
Once our team try instantiate the OneKE appliance (both the normal and the airgaped version) available from the public Opennebula marketplace the VNF doesn't get any NAT rule, thus making the communication between the public network to the VNF and then to the private k8s cluster unavailable :-(

To Reproduce

  1. Download the Appliance
  2. Create 2 networks (one will be the public (in our case a real public IP) and the other will be the private (in this example we will use 192.168.10.1/24) )
  3. Instantiate the appliance under the Service tab using:

image
image
image
image

Please, note that we configure the MetalLB, but we also tried without the MetalLB and some other simple tweaks but got no NAT on VNF
We also tried in different OpenNebula versions (6.4.X and 6.8.2)

Expected behavior
A working k8s cluster

Details
If we go into the VNF via SSH and check the logs on /var/log/one-appliance/one-failover.log we got:

image

Telling us that the VRouter failed, but it doesn't say the why :-(

But, if we check on the /var/log/one-appliance/configure.log, we got:

image

Informing us that the /etc/iptables/rules-save file was created, but if we try to open the file, the file is indeed empty:

image

And if we check the iptables:

vrouter:~# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

And:

vrouter:~# iptables -L -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination

And if we try with the recommended command:

iptables -t nat -vnL NAT4
iptables: No chain/target/match by that name.

We got nothing :-(

If we try to get to the public network from the master, storage or slave nodes, (which have the DNS server (at /etc/resolv.conf) pointing to the private IP of the VNF node) we got no answer from the internet, meaning those k8s nodes can't get anything from the internet.

Additional context
We don't really know if the problem it's indeed VNF or if we are setting a wrong configuration, as the documentation doesn't give much :-(

Progress Status

  • Code committed
  • Testing - QA
  • Documentation (Release notes - resolved issues, compatibility, known issues)
@kCyborg kCyborg added the type: bug Something isn't working label Apr 29, 2024
@Franco-Sparrow
Copy link

I confirm the issue

@sk4zuzu sk4zuzu self-assigned this Apr 29, 2024
@sk4zuzu
Copy link
Contributor

sk4zuzu commented Apr 29, 2024

Hi,

the one-apps repo is the correct place to report VR, OneKE related issues. ☝️ 😌

iptables -t nat -vnL NAT4
iptables: No chain/target/match by that name.

Thanks, I've corrected the command in the docs it was a simple typo.

In general, your OneFlow configuration looks OK and something similar to it seems to be working in my environments.

When one-failover service "fails" it always tries to bring down every VR module possible, hence NAT (and everything else) is disabled. There must be a reason keepalived returned FAULT state through the VRRP fifo. If you could examine /var/log/messages, maybe you could find some hint what is going on with keepalived, also you could take a look at /etc/keepalived/conf.d/*.conf files to see if everything looks OK, something like:

vrrp_sync_group VRouter {
    group {
        ETH1
    }
}
vrrp_instance ETH1 {
    state             BACKUP
    interface         eth1
    virtual_router_id 2
    priority          100
    advert_int        1
    virtual_ipaddress {
        192.168.10.2/32 dev eth0
        192.168.10.1/32 dev eth1
    }
    virtual_routes {
    }
}

🤔

@kCyborg
Copy link
Author

kCyborg commented Apr 29, 2024

Hi there @sk4zuzu, thanks you for your answer mate.

First, sorry for ask in the wrong repo bro, it won't happen again.


Answering you:

  1. My VNFs (several tries) all have the same keepalive informatio messages in the /var/log/messages/:
vrouter:~# cat /var/log/messages | grep keep
Apr 29 23:34:41 vrouter local3.debug one-contextd: Script loc-15-keepalived: Starting ...
Apr 29 23:34:41 vrouter local3.debug one-contextd: Script loc-15-keepalived: Finished with exit code 0
Apr 29 23:34:44 vrouter daemon.info Keepalived[2704]: WARNING - keepalived was built for newer Linux 6.3.0, running on Linux 6.1.78-0-virt OpenNebula/one#1-Alpine SMP PREEMPT_DYNAMIC Wed, 21 Feb 2024 08:19:22 +0000
Apr 29 23:34:44 vrouter daemon.info Keepalived[2704]: Command line: '/usr/sbin/keepalived' '--dont-fork' '--use-file=/etc/keepalived/keepalived.conf'
Apr 29 23:34:44 vrouter daemon.info Keepalived[2704]: Configuration file /etc/keepalived/keepalived.conf
Apr 29 23:34:44 vrouter daemon.info Keepalived[2704]: Script user 'keepalived_script' does not exist
Apr 29 23:34:44 vrouter daemon.info Keepalived[2704]: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Apr 29 23:34:44 vrouter daemon.info Keepalived[2704]: Configuration file /etc/keepalived/keepalived.conf
Apr 29 23:34:44 vrouter daemon.info Keepalived[2704]: Script user 'keepalived_script' does not exist
Apr 29 23:34:44 vrouter daemon.info Keepalived_vrrp[2818]: Script user 'keepalived_script' does not exist

It's somehow screaming at me that the user 'keepalived_script' does not exist. But I guess this is not needed, cuz I understand that the screamed user 'keepalived_script' is only needed for scripts in the post/pre keepalive services.

  1. If I check on the /etc/keepalived/conf.d/vrrp.conf I got:
cat /etc/keepalived/conf.d/vrrp.conf
vrrp_sync_group VRouter {
    group {
        ETH1
    }
}
vrrp_instance ETH1 {
    state             BACKUP
    interface         eth1
    virtual_router_id 17
    priority          100
    advert_int        1
    virtual_ipaddress {
        10.1.0.11/26 dev eth0
        10.1.0.10/24 dev eth1
    }
    virtual_routes {
    }
}

Note, the IP may be different than in the last example I send you, cuz I have tried more than once network configuration

Which differ from the /etc/keepalived/conf.d/*.conf you send, if you see at the lines:

     virtual_ipaddress {
        10.1.0.11/26 dev eth0
        10.1.0.10/24 dev eth1
    }

In the cidr way, in yours is /32 in boths NICs, in my case are /26 for eth0 and /24 for eth1.

@kCyborg
Copy link
Author

kCyborg commented Apr 30, 2024

Hi there @sk4zuzu, I think I have found the problem (not the solution tho, sorry)

At the time we instantiate the cluster we define the Control Plane Endpoint VIP (IPv4) and Default Gateway VIP (IPv4):

image

Note: I use to work with OneKE like 1 year ago and those variables needed to be set manually.

If I left those variables blank (empty) it will run the cluster without a problem, regardless the network I use. I.E.:

image
(I created a simple cluster with just a master, a worker and the aforementioned VNF)


Let me explain myself:

  1. I created a network using a simple private network template:
AUTOMATIC_VLAN_ID = "YES"
CLUSTER_IDS = "100"
PHYDEV = "bond0"
VN_MAD = "802.1Q"

The already created private network:

image
image
image

  1. Then, at the instantiation time I set the above-mentioned variables:

image


Am I doing it wrong?

@rsmontero rsmontero transferred this issue from OpenNebula/one Apr 30, 2024
@rsmontero rsmontero added category: oneke OpenNebula Kubernetes appliance status: accepted The issue is valid and will be planned for fix or implementation community Issue created by OpenNebula Community labels Apr 30, 2024
@rsmontero
Copy link
Member

@kCyborg + @sk4zuzu issue has been transferred to the right repo

@sk4zuzu
Copy link
Contributor

sk4zuzu commented Apr 30, 2024

Hi @kCyborg

First, sorry for ask in the wrong repo bro, it won't happen again.

It's fine man :) @rsmontero already saved us.. ☝️ 😌

As for the example with 10.0.0.0 subnet, it seems you set the first VIP to 10.0.0.2 and then the same 10.0.0.2 address was used to create the master node. That can't work.

The first VIP address should preferably be from the public VNET (should work with the private VNET as well), but it has to be from outside the AR you use to deploy cluster nodes so there is no conflict on the IP protocol level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: oneke OpenNebula Kubernetes appliance community Issue created by OpenNebula Community status: accepted The issue is valid and will be planned for fix or implementation type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants