-
Notifications
You must be signed in to change notification settings - Fork 90
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
New network topology for firecracker VMs
Currently, each firecracker VM needs to use a TAP network device, to route its packages into the network stack of the physical host. When saving and restoring a function instance, the tap device name and the IP address of the functions’ server, running inside the container, are preserved (see also the current requirements for vanilla firecracker snapshot loading [1]). This leads to networking conflicts on the host and limits the snapshot restoration to a single instance per physical machine. To bypass this obstacle, the following network topology is proposed: 1. A new network namespace (e.g.: VMns4) is created for each VM, in which the TAP device from the snapshotted VM is rebuilt and receives the original IP address of the function. The TAP device will broadcast all the incoming and outgoing packets to and from the serverless function and VM’s network interface. Each VM will run in its own network namespace, leading to no conflicts on the host due to networking resources. 2. A local virtual tunnel is established between the VM inside its network namespace and the host node via a virtual ethernet pair (veth). A link is then established between the two ends of the virtual ethernet pair, in the network namespace (veth4-0) and the host namespace (veth4-1). In contrast, the default vHive configuration sets up a similar forwarding system through network bridges. 3. Inside the network namespace we add a routing rule that redirects all packets via the veth VM end towards a default gateway (172.17.0.17). Thus, all packets sent by the function will show at the hosts’ end of the tunnel. 4. To avoid IP conflicts when routing the packets to and from functions, each VM is assigned a unique clone address (172.18.0.5). All packets leaving the VM end of the virtual ethernet pair get their source address rewritten to the clone address of the corresponding VM. Packets entering the host end of the virtual ethernet pair get their destination address written to the original address of the VM. As a result, each VM still thinks it is using the original address while in reality, its address is translated to a clone address, different for every VM. This is accomplished using two rules in the NAT table corresponding to the virtual namespace of the VM. One rule is added in the POSTROUTING chain and one in the PREROUTING chain. The POSTROUTING rule alters the network packets before they are sent out in the virtual tunnel, from the VM namespace to the host, and rewrites the IP source address of the packet. Similarly, the PREROUTING rule overwrites the destination address of incoming packets, before routing. The two ensure that packets going into the virtual namespace have their destination address the original IP address of the VM (172.16.0.2), while packets coming out of the namespace have their source address the clone IP address (172.18.05). The source IP address will remain the same for all the VM in the enhanced snapshotting mode, being set to 172.16.0.2 respectively. 5. In the routing table of the host, we add a rule that dictates that any package that has as destination IP the clone IP of a VM, will be routed towards the end of the tunnel situated in the corresponding network namespace, through a set gateway (172.17.0.18). This ensures that whenever packages arrive on the host for a VM, they will be sent down the right virtual tunnel instantaneously. 6. In the hosts NFT filter table we add 2 rules for the FORWARD chain, that allow traffic from the host end of the veth pair (veth4-1) to the default host interface (eno 49) and vice versa. Introduce a new networking management component for the topology described above. 1. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md#loading-snapshots Closes #797 Part of #794 Signed-off-by: Georgiy Lebedev <[email protected]>
- Loading branch information
1 parent
d08c1d5
commit e940948
Showing
7 changed files
with
1,205 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# MIT License | ||
# | ||
# Copyright (c) 2023 Georgiy Lebedev, Dmitrii Ustiugov, Plamen Petrov and vHive team | ||
# | ||
# Permission is hereby granted, free of charge, to any person obtaining a copy | ||
# of this software and associated documentation files (the "Software"), to deal | ||
# in the Software without restriction, including without limitation the rights | ||
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
# copies of the Software, and to permit persons to whom the Software is | ||
# furnished to do so, subject to the following conditions: | ||
# | ||
# The above copyright notice and this permission notice shall be included in all | ||
# copies or substantial portions of the Software. | ||
# | ||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
# SOFTWARE. | ||
|
||
EXTRAGOARGS:=-v -race -cover | ||
|
||
test: | ||
# Need to pass GOROOT because GitHub-hosted runners may have several | ||
# go versions installed so that calling go from root may fail | ||
sudo env "PATH=$(PATH)" "GOROOT=$(GOROOT)" go test ./ $(EXTRAGOARGS) | ||
|
||
test-man: | ||
echo "Nothing to test manually" | ||
|
||
.PHONY: test test-man |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,253 @@ | ||
// MIT License | ||
// | ||
// Copyright (c) 2023 Georgiy Lebedev, Amory Hoste and vHive team | ||
// | ||
// Permission is hereby granted, free of charge, to any person obtaining a copy | ||
// of this software and associated documentation files (the "Software"), to deal | ||
// in the Software without restriction, including without limitation the rights | ||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
// copies of the Software, and to permit persons to whom the Software is | ||
// furnished to do so, subject to the following conditions: | ||
// | ||
// The above copyright notice and this permission notice shall be included in all | ||
// copies or substantial portions of the Software. | ||
// | ||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
// SOFTWARE. | ||
|
||
// Package networking provides primitives to connect function instances to the network. | ||
package networking | ||
|
||
import ( | ||
log "github.com/sirupsen/logrus" | ||
"sync" | ||
) | ||
|
||
// NetworkManager manages the in use network configurations along with a pool of free network configurations | ||
// that can be used to connect a function instance to the network. | ||
type NetworkManager struct { | ||
sync.Mutex | ||
nextID int | ||
hostIfaceName string | ||
|
||
// Pool of free network configs | ||
networkPool []*NetworkConfig | ||
poolCond *sync.Cond | ||
poolSize int | ||
|
||
// Mapping of function instance IDs to their network config | ||
netConfigs map[string]*NetworkConfig | ||
|
||
// Network configs that are being created | ||
inCreation sync.WaitGroup | ||
} | ||
|
||
// NewNetworkManager creates and returns a new network manager that connects function instances to the network | ||
// using the supplied interface. If no interface is supplied, the default interface is used. To take the network | ||
// setup of the critical path of a function creation, the network manager tries to maintain a pool of ready to use | ||
// network configurations of size at least poolSize. | ||
func NewNetworkManager(hostIfaceName string, poolSize int) (*NetworkManager, error) { | ||
manager := new(NetworkManager) | ||
|
||
manager.hostIfaceName = hostIfaceName | ||
if manager.hostIfaceName == "" { | ||
hostIface, err := getHostIfaceName() | ||
if err != nil { | ||
return nil, err | ||
} else { | ||
manager.hostIfaceName = hostIface | ||
} | ||
} | ||
|
||
manager.netConfigs = make(map[string]*NetworkConfig) | ||
manager.networkPool = make([]*NetworkConfig, 0) | ||
|
||
startId, err := getNetworkStartID() | ||
if err == nil { | ||
manager.nextID = startId | ||
} else { | ||
manager.nextID = 0 | ||
} | ||
|
||
manager.poolCond = sync.NewCond(new(sync.Mutex)) | ||
manager.initConfigPool(poolSize) | ||
manager.poolSize = poolSize | ||
|
||
return manager, nil | ||
} | ||
|
||
// initConfigPool fills an empty network pool up to the given poolSize | ||
func (mgr *NetworkManager) initConfigPool(poolSize int) { | ||
var wg sync.WaitGroup | ||
wg.Add(poolSize) | ||
|
||
logger := log.WithFields(log.Fields{"poolSize": poolSize}) | ||
logger.Debug("Initializing network pool") | ||
|
||
// Concurrently create poolSize network configs | ||
for i := 0; i < poolSize; i++ { | ||
go func() { | ||
mgr.addNetConfig() | ||
wg.Done() | ||
}() | ||
} | ||
wg.Wait() | ||
} | ||
|
||
// addNetConfig creates and initializes a new network config | ||
func (mgr *NetworkManager) addNetConfig() { | ||
mgr.Lock() | ||
id := mgr.nextID | ||
mgr.nextID += 1 | ||
mgr.inCreation.Add(1) | ||
mgr.Unlock() | ||
|
||
netCfg := NewNetworkConfig(id, mgr.hostIfaceName) | ||
if err := netCfg.CreateNetwork(); err != nil { | ||
log.Errorf("failed to create network %s:", err) | ||
} | ||
|
||
mgr.poolCond.L.Lock() | ||
mgr.networkPool = append(mgr.networkPool, netCfg) | ||
// Signal in case someone is waiting for a new config to become available in the pool | ||
mgr.poolCond.Signal() | ||
mgr.poolCond.L.Unlock() | ||
mgr.inCreation.Done() | ||
} | ||
|
||
// allocNetConfig allocates a new network config from the pool to a function instance identified by funcID | ||
func (mgr *NetworkManager) allocNetConfig(funcID string) *NetworkConfig { | ||
// Add netconfig to pool to keep pool to configured size | ||
go mgr.addNetConfig() | ||
|
||
logger := log.WithFields(log.Fields{"funcID": funcID}) | ||
logger.Debug("Allocating a new network config from network pool to function instance") | ||
|
||
// Pop a network config from the pool and allocate it to the function instance | ||
mgr.poolCond.L.Lock() | ||
for len(mgr.networkPool) == 0 { | ||
// Wait until a new network config has been created | ||
mgr.poolCond.Wait() | ||
} | ||
|
||
config := mgr.networkPool[len(mgr.networkPool)-1] | ||
mgr.networkPool = mgr.networkPool[:len(mgr.networkPool)-1] | ||
mgr.poolCond.L.Unlock() | ||
|
||
mgr.Lock() | ||
mgr.netConfigs[funcID] = config | ||
mgr.Unlock() | ||
|
||
logger = log.WithFields(log.Fields{ | ||
"funcID": funcID, | ||
"ContainerIP": config.getContainerIP(), | ||
"NamespaceName": config.getNamespaceName(), | ||
"Veth0CIDR": config.getVeth0CIDR(), | ||
"Veth0Name": config.getVeth0Name(), | ||
"Veth1CIDR": config.getVeth1CIDR(), | ||
"Veth1Name": config.getVeth1Name(), | ||
"CloneIP": config.GetCloneIP(), | ||
"ContainerCIDR": config.GetContainerCIDR(), | ||
"GatewayIP": config.GetGatewayIP(), | ||
"HostDevName": config.GetHostDevName(), | ||
"NamespacePath": config.GetNamespacePath()}) | ||
|
||
logger.Debug("Allocated a new network config") | ||
|
||
return config | ||
} | ||
|
||
// releaseNetConfig releases the network config of a given function instance with id funcID back to the pool | ||
func (mgr *NetworkManager) releaseNetConfig(funcID string) { | ||
mgr.Lock() | ||
config := mgr.netConfigs[funcID] | ||
delete(mgr.netConfigs, funcID) | ||
mgr.Unlock() | ||
|
||
logger := log.WithFields(log.Fields{"funcID": funcID}) | ||
logger.Debug("Releasing network config from function instance and adding it to network pool") | ||
|
||
// Add network config back to the pool. We allow the pool to grow over it's configured size here since the | ||
// overhead of keeping a network config in the pool is low compared to the cost of creating a new config. | ||
mgr.poolCond.L.Lock() | ||
mgr.networkPool = append(mgr.networkPool, config) | ||
mgr.poolCond.Signal() | ||
mgr.poolCond.L.Unlock() | ||
} | ||
|
||
// CreateNetwork creates the networking for a function instance identified by funcID | ||
func (mgr *NetworkManager) CreateNetwork(funcID string) (*NetworkConfig, error) { | ||
logger := log.WithFields(log.Fields{"funcID": funcID}) | ||
logger.Debug("Creating network config for function instance") | ||
|
||
netCfg := mgr.allocNetConfig(funcID) | ||
return netCfg, nil | ||
} | ||
|
||
// GetConfig returns the network config assigned to a function instance identified by funcID | ||
func (mgr *NetworkManager) GetConfig(funcID string) *NetworkConfig { | ||
mgr.Lock() | ||
defer mgr.Unlock() | ||
|
||
cfg := mgr.netConfigs[funcID] | ||
return cfg | ||
} | ||
|
||
// RemoveNetwork removes the network config of a function instance identified by funcID. The allocated network devices | ||
// for the given function instance must not be in use anymore when calling this function. | ||
func (mgr *NetworkManager) RemoveNetwork(funcID string) error { | ||
logger := log.WithFields(log.Fields{"funcID": funcID}) | ||
logger.Debug("Removing network config for function instance") | ||
mgr.releaseNetConfig(funcID) | ||
return nil | ||
} | ||
|
||
// Cleanup removes and deallocates all network configurations that are in use or in the network pool. Make sure to first | ||
// clean up all running functions before removing their network configs. | ||
func (mgr *NetworkManager) Cleanup() error { | ||
log.Info("Cleaning up network manager") | ||
mgr.Lock() | ||
defer mgr.Unlock() | ||
|
||
// Wait till all network configs still in creation are added | ||
mgr.inCreation.Wait() | ||
|
||
// Release network configs still in use | ||
var wgu sync.WaitGroup | ||
wgu.Add(len(mgr.netConfigs)) | ||
for funcID := range mgr.netConfigs { | ||
config := mgr.netConfigs[funcID] | ||
go func(config *NetworkConfig) { | ||
if err := config.RemoveNetwork(); err != nil { | ||
log.Errorf("failed to remove network %s:", err) | ||
} | ||
wgu.Done() | ||
}(config) | ||
} | ||
wgu.Wait() | ||
mgr.netConfigs = make(map[string]*NetworkConfig) | ||
|
||
// Cleanup network pool | ||
mgr.poolCond.L.Lock() | ||
var wg sync.WaitGroup | ||
wg.Add(len(mgr.networkPool)) | ||
|
||
for _, config := range mgr.networkPool { | ||
go func(config *NetworkConfig) { | ||
if err := config.RemoveNetwork(); err != nil { | ||
log.Errorf("failed to remove network %s:", err) | ||
} | ||
wg.Done() | ||
}(config) | ||
} | ||
wg.Wait() | ||
mgr.networkPool = make([]*NetworkConfig, 0) | ||
mgr.poolCond.L.Unlock() | ||
|
||
return nil | ||
} |
Oops, something went wrong.