Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Interpartition Communication Drivers #14

Closed
dadada opened this issue Dec 6, 2022 · 13 comments
Closed

Add support for Interpartition Communication Drivers #14

dadada opened this issue Dec 6, 2022 · 13 comments
Assignees

Comments

@dadada
Copy link
Collaborator

dadada commented Dec 6, 2022

The hypervisor should support initializing and attaching a device driver to a partition, as specified by ARINC 653P5-1. The device driver should be accessible to other partitions through sampling- or queueing-ports and execute inside the partition it is attached to.

In the case of apex-linux, a device driver that exposes the receive / send system calls of an UDP-Socket could be useful for development purposes. The semantics of UDP-Sockets can be pretty much directly translated to the semantics of UDP-Sockets, since both are message-based. The following semantics should be implemented by a partition that is handed a UDP "device driver".

Messages that are received on a UDP-Socket should be available immediately on the associated sampling port. This can be achieved by writing them to the sampling port source as-soon-as they are received, and keeping the contents of the port the same, unless another message is received on the UDP-Socket.

For sending via UDP-Sockets, each write to the sampling port should lead to exactly one successful send for the UDP-Socket. If the content of the sampling port does not change (no newer message was written to the sampling port), no further UDP-Message should be sent.

Untitled-2022-12-06-2003

@cvengler cvengler self-assigned this Dec 9, 2022
@cvengler
Copy link
Member

cvengler commented Dec 9, 2022

I will try to see how this can be accomplished through namespaces.

@cvengler
Copy link
Member

cvengler commented Dec 9, 2022

I've began working on this within the network_namespaces branch. See 2d46f8a

@cvengler
Copy link
Member

cvengler commented Dec 9, 2022

I have investigated this a bit further and it looks more complicated than I originally thought, but it's definitely do-able. 😄

It looks like that the man 7 netlink sockets are what Linux uses, in order to "modify the routing tables (both IPv4 and IPv6), IP addresses, link parameters, neighbor setups, queueing disciplines, traffic classes, and packet classifiers".
There is a popular Rust crate for this (1.000.000+ downloads).

The fishy 🐟 things that occurs however, is that after the call to clone3(2) was made, and the process has split itself into two, the child and parent both need to coordinate, in order to create a veth(4) interface for the child and for the parent. Speaking more precisely, the parent needs to create the veth(4) interface and move it to the children net namespace. After that, it needs to tell the child to proceed with the preparations of running the binary.

Starting next week, I'll spend my two work days working on a netlink module within the core crate. It should be capable of creating veth(4) pairs, moving interfaces between namespaces, and setting up the loopback interface. The IPC between parent and child is fairly simple. A socket pair will be enough and the child will wait, until it receives an OOB, because the other FD has been closed.

The only thing I am asking myself right now: Is veth(4) a good choice? AFAIK, it is only possible to use it to create a new interface on both ends, but not to connect with an existing interface. Beside this, am I overseeing anything else?

/cc @wucke13 @dadada

@dadada
Copy link
Collaborator Author

dadada commented Dec 9, 2022

That sounds great! I think for testing and development purposes, a veth is sufficient or even more practical than a physical network interface. If we want to use the other side (on the host / hypervisor) for integration testing, we should be able to do that with Linux. For example, we might add the interface to a software bridge where multiple hypervisors are connected.

What information does the veth expose to the partition (e.g. timings of individual transmissions, if the frame was sent, if the interface is busy)? I have a use-case where that information might be relevant.

@dadada
Copy link
Collaborator Author

dadada commented Dec 10, 2022

The partition process would probably also need CAP_NET_RAW to send raw Ethernet frames on the interface. The hypervisor would have to allow it in the capability bounding set. The effective capabilities could then be controlled for example by the file capabilities. How the effective capabilities are calculated is documented in capabilities(7).

P'(permitted) = (P(inheritable) & F(inheritable)) |
                (F(permitted) & cap_bset)

P'(effective) = F(effective) ? P'(permitted) : 0

P'(inheritable) = P(inheritable)    [i.e., unchanged]

@dadada
Copy link
Collaborator Author

dadada commented Dec 10, 2022

Currently, all partitions seem to be started with =ep, which, if I understand cap_from_text(3) correctly, means that the process has no effective or permitted capabilities.

Another question would be how we might specify which capabilities are allowed for which partition. Maybe we could specify this in the configuration file?

@dadada
Copy link
Collaborator Author

dadada commented Dec 10, 2022

Setting the file capabilities for the hypervisor partition probably won't work, since the executable is copied to the file system of the partition process.

@dadada
Copy link
Collaborator Author

dadada commented Dec 10, 2022

Setting the file capabilities for the hypervisor partition probably won't work, since the executable is copied to the file system of the partition process.

I've checked. If I understand correctly, we would have to either set the file capabilities when copying the executable to the file process root or add CAP_NET_RAW to the permitted set cap_net_raw=i and set cap_net_raw=ep in the partition process.

@dadada
Copy link
Collaborator Author

dadada commented Dec 18, 2022

@emilengler: @wucke13 and I were wondering if it would be better to use UDP sockets for this at first, just to get things working initially. It's probably a lot easier than working with capabilities and veth, although being able to directly use Ethernet would be more useful, since it is closer to the production environment.

@cvengler
Copy link
Member

@emilengler: @wucke13 and I were wondering if it would be better to use UDP sockets for this at first, just to get things working initially. It's probably a lot easier than working with capabilities and veth, although being able to directly use Ethernet would be more useful, since it is closer to the production environment.

What do you mean by UDP sockets exactly? A socket to communicate from the host to the partition and vice-versa?

@dadada
Copy link
Collaborator Author

dadada commented Dec 18, 2022

What do you mean by UDP sockets exactly? A socket to communicate from the host to the partition and vice-versa?

No, what I mean is a socket that can be used to send or receive datagrams from the network. The hypervisor may create such a socket and pass it to the partition process using an AF_UNIX socket.

@dadada
Copy link
Collaborator Author

dadada commented Dec 28, 2022

Here is a draft for this using UDP sockets. #24

@dadada
Copy link
Collaborator Author

dadada commented Feb 29, 2024

We have something like that now by sending TCP and UDP sockets to partitions.

@dadada dadada closed this as completed Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants