Skip to content

Commit

Permalink
New network topology for firecracker VMs
Browse files Browse the repository at this point in the history
Currently, each firecracker VM needs to use a TAP network device, to route
its packages into the network stack of the physical host. When saving and
restoring a function instance, the tap device name and the IP address of
the functions’ server, running inside the container, are preserved (see
also the current requirements for vanilla firecracker snapshot
loading [1]). This leads to networking conflicts on the host and limits the
snapshot restoration to a single instance per physical machine.

To bypass this obstacle, the following network topology is proposed:

1. A new network namespace (e.g.: VMns4) is created for each VM, in which
the TAP device from the snapshotted VM is rebuilt and receives the original
IP address of the function. The TAP device will broadcast all the incoming
and outgoing packets to and from the serverless function and VM’s network
interface. Each VM will run in its own network namespace, leading to no
conflicts on the host due to networking resources.

2. A local virtual tunnel is established between the VM inside its network
namespace and the host node via a virtual ethernet pair (veth). A link is
then established between the two ends of the virtual ethernet pair, in the
network namespace (veth4-0) and the host namespace (veth4-1). In contrast,
the default vHive configuration sets up a similar forwarding system through
network bridges.

3. Inside the network namespace we add a routing rule that redirects all
packets via the veth VM end towards a default gateway (172.17.0.17). Thus,
all packets sent by the function will show at the hosts’ end of the tunnel.

4. To avoid IP conflicts when routing the packets to and from functions,
each VM is assigned a unique clone address (172.18.0.5). All packets
leaving the VM end of the virtual ethernet pair get their source address
rewritten to the clone address of the corresponding VM. Packets entering
the host end of the virtual ethernet pair get their destination address
written to the original address of the VM. As a result, each VM still
thinks it is using the original address while in reality, its address is
translated to a clone address, different for every VM. This is accomplished
using two rules in the NAT table corresponding to the virtual namespace of
the VM. One rule is added in the POSTROUTING chain and one in the
PREROUTING chain. The POSTROUTING rule alters the network packets before
they are sent out in the virtual tunnel, from the VM namespace to the host,
and rewrites the IP source address of the packet. Similarly, the PREROUTING
rule overwrites the destination address of incoming packets, before
routing. The two ensure that packets going into the virtual namespace have
their destination address the original IP address of the VM (172.16.0.2),
while packets coming out of the namespace have their source address the
clone IP address (172.18.05). The source IP address will remain the same
for all the VM in the enhanced snapshotting mode, being set to 172.16.0.2
respectively.

5. In the routing table of the host, we add a rule that dictates that any
package that has as destination IP the clone IP of a VM, will be routed
towards the end of the tunnel situated in the corresponding network
namespace, through a set gateway (172.17.0.18). This ensures that whenever
packages arrive on the host for a VM, they will be sent down the right
virtual tunnel instantaneously.

6. In the hosts NFT filter table we add 2 rules for the FORWARD chain, that
allow traffic from the host end of the veth pair (veth4-1) to the default
host interface (eno 49) and vice versa.

Introduce a new networking management component for the topology described
above.

1. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md#loading-snapshots

Closes #797
Part of #794

Signed-off-by: Georgiy Lebedev <[email protected]>
  • Loading branch information
CuriousGeorgiy authored and leokondrashov committed Sep 6, 2023
1 parent d08c1d5 commit e940948
Show file tree
Hide file tree
Showing 7 changed files with 1,205 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
strategy:
fail-fast: false
matrix:
module: [taps, misc, profile]
module: [taps, misc, profile, networking]
steps:

- name: Set up Go 1.19
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ require (
github.com/stretchr/testify v1.8.0
github.com/vhive-serverless/vhive/examples/protobuf/helloworld v0.0.0-00010101000000-000000000000
github.com/vishvananda/netlink v1.1.1-0.20201029203352-d40f9887b852
github.com/vishvananda/netns v0.0.0-20200728191858-db3c7e526aae
github.com/wcharczuk/go-chart v2.0.1+incompatible
golang.org/x/net v0.6.0
golang.org/x/sync v0.1.0
Expand Down Expand Up @@ -102,7 +103,6 @@ require (
github.com/opencontainers/runtime-spec v1.0.3-0.20200929063507-e6143ca7d51d // indirect
github.com/opencontainers/selinux v1.8.0 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/vishvananda/netns v0.0.0-20200728191858-db3c7e526aae // indirect
github.com/willf/bitset v1.1.11 // indirect
go.opencensus.io v0.22.4 // indirect
golang.org/x/image v0.7.0 // indirect
Expand Down
33 changes: 33 additions & 0 deletions networking/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# MIT License
#
# Copyright (c) 2023 Georgiy Lebedev, Dmitrii Ustiugov, Plamen Petrov and vHive team
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

EXTRAGOARGS:=-v -race -cover

test:
# Need to pass GOROOT because GitHub-hosted runners may have several
# go versions installed so that calling go from root may fail
sudo env "PATH=$(PATH)" "GOROOT=$(GOROOT)" go test ./ $(EXTRAGOARGS)

test-man:
echo "Nothing to test manually"

.PHONY: test test-man
253 changes: 253 additions & 0 deletions networking/networkManager.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
// MIT License
//
// Copyright (c) 2023 Georgiy Lebedev, Amory Hoste and vHive team
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.

// Package networking provides primitives to connect function instances to the network.
package networking

import (
log "github.com/sirupsen/logrus"
"sync"
)

// NetworkManager manages the in use network configurations along with a pool of free network configurations
// that can be used to connect a function instance to the network.
type NetworkManager struct {
sync.Mutex
nextID int
hostIfaceName string

// Pool of free network configs
networkPool []*NetworkConfig
poolCond *sync.Cond
poolSize int

// Mapping of function instance IDs to their network config
netConfigs map[string]*NetworkConfig

// Network configs that are being created
inCreation sync.WaitGroup
}

// NewNetworkManager creates and returns a new network manager that connects function instances to the network
// using the supplied interface. If no interface is supplied, the default interface is used. To take the network
// setup of the critical path of a function creation, the network manager tries to maintain a pool of ready to use
// network configurations of size at least poolSize.
func NewNetworkManager(hostIfaceName string, poolSize int) (*NetworkManager, error) {
manager := new(NetworkManager)

manager.hostIfaceName = hostIfaceName
if manager.hostIfaceName == "" {
hostIface, err := getHostIfaceName()
if err != nil {
return nil, err
} else {
manager.hostIfaceName = hostIface
}
}

manager.netConfigs = make(map[string]*NetworkConfig)
manager.networkPool = make([]*NetworkConfig, 0)

startId, err := getNetworkStartID()
if err == nil {
manager.nextID = startId
} else {
manager.nextID = 0
}

manager.poolCond = sync.NewCond(new(sync.Mutex))
manager.initConfigPool(poolSize)
manager.poolSize = poolSize

return manager, nil
}

// initConfigPool fills an empty network pool up to the given poolSize
func (mgr *NetworkManager) initConfigPool(poolSize int) {
var wg sync.WaitGroup
wg.Add(poolSize)

logger := log.WithFields(log.Fields{"poolSize": poolSize})
logger.Debug("Initializing network pool")

// Concurrently create poolSize network configs
for i := 0; i < poolSize; i++ {
go func() {
mgr.addNetConfig()
wg.Done()
}()
}
wg.Wait()
}

// addNetConfig creates and initializes a new network config
func (mgr *NetworkManager) addNetConfig() {
mgr.Lock()
id := mgr.nextID
mgr.nextID += 1
mgr.inCreation.Add(1)
mgr.Unlock()

netCfg := NewNetworkConfig(id, mgr.hostIfaceName)
if err := netCfg.CreateNetwork(); err != nil {
log.Errorf("failed to create network %s:", err)
}

mgr.poolCond.L.Lock()
mgr.networkPool = append(mgr.networkPool, netCfg)
// Signal in case someone is waiting for a new config to become available in the pool
mgr.poolCond.Signal()
mgr.poolCond.L.Unlock()
mgr.inCreation.Done()
}

// allocNetConfig allocates a new network config from the pool to a function instance identified by funcID
func (mgr *NetworkManager) allocNetConfig(funcID string) *NetworkConfig {
// Add netconfig to pool to keep pool to configured size
go mgr.addNetConfig()

logger := log.WithFields(log.Fields{"funcID": funcID})
logger.Debug("Allocating a new network config from network pool to function instance")

// Pop a network config from the pool and allocate it to the function instance
mgr.poolCond.L.Lock()
for len(mgr.networkPool) == 0 {
// Wait until a new network config has been created
mgr.poolCond.Wait()
}

config := mgr.networkPool[len(mgr.networkPool)-1]
mgr.networkPool = mgr.networkPool[:len(mgr.networkPool)-1]
mgr.poolCond.L.Unlock()

mgr.Lock()
mgr.netConfigs[funcID] = config
mgr.Unlock()

logger = log.WithFields(log.Fields{
"funcID": funcID,
"ContainerIP": config.getContainerIP(),
"NamespaceName": config.getNamespaceName(),
"Veth0CIDR": config.getVeth0CIDR(),
"Veth0Name": config.getVeth0Name(),
"Veth1CIDR": config.getVeth1CIDR(),
"Veth1Name": config.getVeth1Name(),
"CloneIP": config.GetCloneIP(),
"ContainerCIDR": config.GetContainerCIDR(),
"GatewayIP": config.GetGatewayIP(),
"HostDevName": config.GetHostDevName(),
"NamespacePath": config.GetNamespacePath()})

logger.Debug("Allocated a new network config")

return config
}

// releaseNetConfig releases the network config of a given function instance with id funcID back to the pool
func (mgr *NetworkManager) releaseNetConfig(funcID string) {
mgr.Lock()
config := mgr.netConfigs[funcID]
delete(mgr.netConfigs, funcID)
mgr.Unlock()

logger := log.WithFields(log.Fields{"funcID": funcID})
logger.Debug("Releasing network config from function instance and adding it to network pool")

// Add network config back to the pool. We allow the pool to grow over it's configured size here since the
// overhead of keeping a network config in the pool is low compared to the cost of creating a new config.
mgr.poolCond.L.Lock()
mgr.networkPool = append(mgr.networkPool, config)
mgr.poolCond.Signal()
mgr.poolCond.L.Unlock()
}

// CreateNetwork creates the networking for a function instance identified by funcID
func (mgr *NetworkManager) CreateNetwork(funcID string) (*NetworkConfig, error) {
logger := log.WithFields(log.Fields{"funcID": funcID})
logger.Debug("Creating network config for function instance")

netCfg := mgr.allocNetConfig(funcID)
return netCfg, nil
}

// GetConfig returns the network config assigned to a function instance identified by funcID
func (mgr *NetworkManager) GetConfig(funcID string) *NetworkConfig {
mgr.Lock()
defer mgr.Unlock()

cfg := mgr.netConfigs[funcID]
return cfg
}

// RemoveNetwork removes the network config of a function instance identified by funcID. The allocated network devices
// for the given function instance must not be in use anymore when calling this function.
func (mgr *NetworkManager) RemoveNetwork(funcID string) error {
logger := log.WithFields(log.Fields{"funcID": funcID})
logger.Debug("Removing network config for function instance")
mgr.releaseNetConfig(funcID)
return nil
}

// Cleanup removes and deallocates all network configurations that are in use or in the network pool. Make sure to first
// clean up all running functions before removing their network configs.
func (mgr *NetworkManager) Cleanup() error {
log.Info("Cleaning up network manager")
mgr.Lock()
defer mgr.Unlock()

// Wait till all network configs still in creation are added
mgr.inCreation.Wait()

// Release network configs still in use
var wgu sync.WaitGroup
wgu.Add(len(mgr.netConfigs))
for funcID := range mgr.netConfigs {
config := mgr.netConfigs[funcID]
go func(config *NetworkConfig) {
if err := config.RemoveNetwork(); err != nil {
log.Errorf("failed to remove network %s:", err)
}
wgu.Done()
}(config)
}
wgu.Wait()
mgr.netConfigs = make(map[string]*NetworkConfig)

// Cleanup network pool
mgr.poolCond.L.Lock()
var wg sync.WaitGroup
wg.Add(len(mgr.networkPool))

for _, config := range mgr.networkPool {
go func(config *NetworkConfig) {
if err := config.RemoveNetwork(); err != nil {
log.Errorf("failed to remove network %s:", err)
}
wg.Done()
}(config)
}
wg.Wait()
mgr.networkPool = make([]*NetworkConfig, 0)
mgr.poolCond.L.Unlock()

return nil
}
Loading

0 comments on commit e940948

Please sign in to comment.