Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on chapter 2 and chapter 3 #79

Open
ved-rivos opened this issue Nov 19, 2024 · 12 comments
Open

Feedback on chapter 2 and chapter 3 #79

ved-rivos opened this issue Nov 19, 2024 · 12 comments
Assignees

Comments

@ved-rivos
Copy link

  1. The distinction between a fast channel vs. a normal channel is unclear.
    Do fast and normal channels co-exist between same pair of producers
    and consumers? If so what is the criteria for selecting between the two.
  2. Section 2.2/2.3 - explain more about the caching side effects and what is
    being avoided.
  3. For P2A channels - what is the distinction between a notification and a
    request? Do notifications not require an Ack?
  4. Replace "cache line" with "cache block"
  5. Explain rationale for the head and tail to use the -2 notation? As the
    effective size of the buffer is M-2, there is already a need to special case
    the wrap decision.
  6. Section 2.3.2 - explain the cache maintenance intended to be done to the
    head and tail slots.
  7. What are the semantics of the head and tail. Are they expected to have a
    wrap indicator or is the convention that tail = head-1 indicates a full queue?
  8. Section 2.3.3 - the non-normative comment about having as many slots as
    number of application processors indicates that this queue is expected to be
    concurrently operated on by multiple application processors? If so then in the
    P2A direction: how is the de-multiplixing of requests/notifications/acks
    performed and which application processor fields the interrupts?
  9. section 2.3.3 - the privileged architecture does not have a concept of PMA
    entries. Each location in memory has a PMA. Please uplevel this note to
    explain the intent of the note and call out what type of optimization
    are intended.
  10. Section 2.3.3 : typo "shared shared" -> "shared"
  11. Chapter 2: Which entity is responsible for initializing the head and tail pointer
    and the queue memory. What coordination mechanisms are used to re-init the
    queue (for instance on a kexec).
  12. Section 3.1 - is non-posted request a better term to complement posted request?
  13. section 3.2 - is the message length variable within a slot or can a message
    occupy multiple slots?
  14. Section 3.2.2 - what is the intent of flag[3]. The earlier sections indicate that
    the producer must explicitly ring a doorbell. Is "enabled/disabled" meant to indicate
    that doorbells must not be used for a queue or does it indicate that they are not
    used for this specific message?
  15. Section 3.2.2 - what is the intent of the Token. Are the sequence numbers required
    to be monotonic? What are the sequence numbers initialized to? Are there any error
    checks for missing sequence numbers? How are sequence numbers synchronized
    (for instance after a kexec).
  16. section 3.2.2 - are there any error checks done on data len? What is handling for
    a zero length message as the header does not have any operation code?
  17. When multipart acknowledgements are used, how are the multiple parts identified?
    Can acknowledgements for two different messages be interleaved? Does the STATUS
    field of each multipart message need to be identical and if not what are the rules for
    the overall STATUS?
@pathakraul pathakraul self-assigned this Nov 21, 2024
@dhaval-rivos
Copy link

Additional comments:

  1. Now that we are extending RPMI to MM as well, in the introduction we should clarify that it is not just for PuC.
  2. Figure 2 should be updated to reflect req-fwd API implementation
  3. Nit: figure2 should have "System Monitoring And Control" instead of just system control.
  4. How do we define Ownership of the channel (I think it is mentioned in one of the diagrams). Is it of any significance?
  5. Messaging Protocol: Typo, response which is "s/send/sent" back as an RPMI acknowledgement
  6. Since we are dealing with PuC, should spec cover a scenario where request times out and any corrective action is required or that is left to the platform FW? There is RPMI_ERR_TIMEOUT but for that Ack has to come which may not happen in some scenarios?
  7. Messaging Protocol: Should this be renamed to /sRPMI_ERR_ALREADY/RPMI_ERR_INPROGRESS?
  8. MPXY: "The SBI MPXY channel must correspond to a single RPMI service group". This means one channel can not be mapped to multiple groups. Can we also clarify other way around that it is possible to have multiple channels that support a service group type. i.e. there can be 2 channels for MM service group.
  9. RPMI service group: Typo "A RPMI service group may be partially implement".."service group may implement..."
  10. We do not need this now I believe: any specific use case for this? MM_ENABLE_NOTIFICATION
  11. MM_SHMEM_ADDR_LOW: In a typical MM implementation NS buffer is allocated by edk2 and then passed into MM as a HOB. If that implementation is done then this attr may not be necessary. But there are some challenges with that implementation that is under discussion. So if eventually implementation does not support this attribute, is there a way to inform caller? Just send 0 in the address?
  12. REQFWD_ENABLE_NOTIFICATION this may not be necessary?

@pathakraul
Copy link
Collaborator

Hi @ved-rivos and ARC team, thanks for the feedback.

Provided the answers below.

  1. The distinction between a fast channel vs. a normal channel is unclear.
    Do fast and normal channels co-exist between same pair of producers
    and consumers? If so what is the criteria for selecting between the two.

A normal channel is a shared memory queue which is shared among multiple RPMI clients and normal channel transmits a RPMI message which has RPMI message header.

A fast channel is a shared memory without any message format which transmits raw data usually few bytes (though its not a limitation). A service group if it requires fastchannel, then discovery of the fastchannel details, semantics, which services can send the data over fastchannel are defined by that particular service group.

Yes, a service group supporting fastchannel can use both normal RPMI message via normal channel fastchannel. A service group has to define, if fastchannel is supported and implemented, then which services in that service group will only use fastchannel to send the data.

For example, the CPPC service group defines the fastchannel support (Section: 4.5.1. CPPC Fast-channel) and also provide the mechanism to discover its support (Section: 4.5.7). It also mentions where AP should write the desired performance level based on the fastchannel availability.

  1. Section 2.2/2.3 - explain more about the caching side effects and what is
    being avoided.

Noted, will update.

  1. For P2A channels - what is the distinction between a notification and a
    request? Do notifications not require an Ack?

Notification is a RPMI message with its own message format different from P2A Request message which is a request from Platform(PuC) to Application Processor. Request message is meant to carry a command which needs to be serviced by the service provider like PuC. Notification message is only to report events

Acknowledgement for notification message is not necessary because events which are carried by notification message are only meant for logging or reporting purposes. AP can choose to subscribe to these notification messages for any service group and even if it has subscribed to them it's upto AP and implementation of that service group to decide what to do with the events.

  1. Replace "cache line" with "cache block"

Noted, will update

  1. Explain rationale for the head and tail to use the -2 notation? As the
    effective size of the buffer is M-2, there is already a need to special case
    the wrap decision.

For implementation the indices for the message slots only matters which represents the actual queue. The text in this section and the notation (slot index - 2) is just to represent that effectively from all slots perspective(total shared memory slots as in the image) which includes head and tail slots then head and tail stores the message slots indices which are (slot index - 2).

We can add a non-normative text and elaborate this.

  1. Section 2.3.2 - explain the cache maintenance intended to be done to the
    head and tail slots.

The cache maintenance will involve using cache flush/invalidate operation whenever the head and tail are updated.
Noted, will update.

  1. What are the semantics of the head and tail. Are they expected to have a
    wrap indicator or is the convention that tail = head-1 indicates a full queue?

RPMI spec defines the role of Head for dequeuing and Tail for enqueuing. It also states that ownership of head and tail is exclusive and only one entity is allowed to write to any of head or tail according to their role as producer or consumer. A queue has a message transmission direction associated which implicitly defines who will be the producer and who will be the consumer.

There are special wrap indicators defined to highlight the wrapping of the queue. Checking empty and full can be done by only head and tail. Currently, there are no queue empty/full conditions explicitly defined in the specification because these are obvious based on above semantics. We certainly don’t mind defining these conditions.

  1. Section 2.3.3 - the non-normative comment about having as many slots as
    number of application processors indicates that this queue is expected to be
    concurrently operated on by multiple application processors? If so then in the
    P2A direction: how is the de-multiplixing of requests/notifications/acks
    performed and which application processor fields the interrupts?

Multiple application processors share the same RPMI queues and the messages in the queues are not bound to a particular application processor. Now it is up to the software running on the application processors about how to synchronize the access to the queues.

Regarding interrupt to the application processors, there is only one interrupt defined for RPMI transport in P2A direction. This interrupt can be wired interrupt or MSI. If P2A interrupt is wired interrupt then it is managed through PLIC or APLIC which allows application processors to select the target processor. If P2A interrupt is MSI then the application processor can configure IMSIC coordinates (MSI address + data) of target processor using BASE service group.

  1. section 2.3.3 - the privileged architecture does not have a concept of PMA
    entries. Each location in memory has a PMA. Please uplevel this note to
    explain the intent of the note and call out what type of optimization
    are intended.

The RPMI shared memory may or may not be cache-coherent for application processor and it may also be some on-chip RAM. This means the platform must set up PMA for the RPMI shared memory so that both PuC and application processors can coherently access it.

The number of PMA regions supported for a RISC-V implementation is generally limited and non-contiguous RPMI shared memory required multiple PMA regions hence the note in the Section 2.3.3 suggests.

We can explicitly mention that defining these attributes is the part of platform/implementation.

  1. Section 2.3.3 : typo "shared shared" -> "shared"

Noted, will update.

  1. Chapter 2: Which entity is responsible for initializing the head and tail pointer
    and the queue memory. What coordination mechanisms are used to re-init the
    queue (for instance on a kexec).

The shared memory queues (including the head and tail slots) are initialized by the implementation (aka PuC). From the AP perspective the shared memory is initialized and ready to use and details of queues as mentioned in Section 2.3.4 are provided to the AP.

Upon kexec, the application processors can simply resume from the current head and tail values without any special coordination.

Please note that kexec is a Linux feature and we can have any OS or firmware running on the application processors.

  1. Section 3.1 - is non-posted request a better term to complement posted request?

A normal message communication usually involves sending a message and then expecting a response hence the term “normal request”. Now in limited cases, we may not have any response so we choose the term “posted request” for such cases.

There was a good consensus for the above terms among folks who have already reviewed the spec and a lot of software is already written using these terms. Unless absolutely required, we would like to continue with these terms.

  1. section 3.2 - is the message length variable within a slot or can a message
    occupy multiple slots?

Message header is fixed, message data length is variable which makes message length variable. But message length cannot be greater than the Slot Size.

And a message cannot occupy multiple slots. The multipart message is different from this which i have explained in point 17.

  1. Section 3.2.2 - what is the intent of flag[3]. The earlier sections indicate that
    the producer must explicitly ring a doorbell. Is "enabled/disabled" meant to indicate
    that doorbells must not be used for a queue or does it indicate that they are not
    used for this specific message?

Transport doorbell are optional. Spec mentions that even if doorbells are supported on either side they can ignore and do the polling. Thats why the flag[3] is provided to let the entity which is servicing the request each time to either trigger the doorbell or not.

Yes, this can be used for a particular message or for the entire lifecycle. For example if AP and PuC are capable for MSIs and AP has configured MSI details via defined service in Section 4.1.10 then this bit can always be enabled so that PuC always send MSI for each response. But AP can selectively disable it so that PuC in that case does not trigger a doorbell.

RPMI spec can elaborate this more for this flag and we can update this.

  1. Section 3.2.2 - what is the intent of the Token. Are the sequence numbers required
    to be monotonic? What are the sequence numbers initialized to? Are there any error
    checks for missing sequence numbers? How are sequence numbers synchronized
    (for instance after a kexec).

The message token helps application processors track the origin of the request when it sees a response.
This is very useful when multiple application processors are sharing the same queues. For example, two different processors may queue the same type ( ServiceGroup ID + Service ID) of requests so when responses of both requests are received the token helps differentiate which response belongs to which request.

It is generally recommended to have monotonically increasing token numbers and the token number can be initialized to any value and there is no constraint here.

PuC does not check on the token number since this is only for application processors. For Kexec, the application processors must start from a new value of token so no special handling required.

  1. section 3.2.2 - are there any error checks done on data len? What is handling for
    a zero length message as the header does not have any operation code?

A RPMI message is valid even if DATALEN field is 0 for example the service defined in Section 4.1.5 which does not have any request data and the Request is identified from ServiceGroupID and ServiceID.

Implementations can perform the datalen checks. For example, the datalen field value received in an acknowledgement must not be greater than whats RPMI spec has defined for that service acknowledgement. More error checks can also be done by implementations. These checks are not defined by the RPMI specifications.

  1. When multipart acknowledgements are used, how are the multiple parts identified?
    Can acknowledgements for two different messages be interleaved? Does the STATUS
    field of each multipart message need to be identical and if not what are the rules for
    the overall STATUS?

A multipart acknowledgement means that the data which is supposed to be sent in acknowledgement is greater and cannot be accommodated in a single message. In such cases there are extra fields defined in that service REMAINING and RETURNED to show that some data is remaining and requester must again
send another request message with appropriate START_INDEX to get remaining data. One such example is in Section 4.4.6. All services where such a case may arise has defined the Acknowledgement layout in such a manner.

In this case the multiple request messages made to get the complete data are considered as separate messages. Now since these are considered as different messages the STATUS field is also taken independent. AP in such cases must consider all STATUS together after all transactions are complete and make decisions accordingly.

@pathakraul
Copy link
Collaborator

@dhaval-rivos Thanks for your feedback, will reply to your queries asap

@pathakraul
Copy link
Collaborator

Additional comments:

  1. Now that we are extending RPMI to MM as well, in the introduction we should clarify that it is not just for PuC.

In Chapter 1. we have already expanded the scope of RPMI and who can implement it and also making it implicit that not every service group defined in the RPMI spec is meant for PuC.

  1. Figure 2 should be updated to reflect req-fwd API implementation

This image is not meant to encompass everything defined in this spec and rather only present an overview.
We can put an image for this in the service group section itself. Like we have done for Chapter 5.

  1. Nit: figure2 should have "System Monitoring And Control" instead of just system control.

Noted. Will update

  1. How do we define Ownership of the channel (I think it is mentioned in one of the diagrams). Is it of any significance?

I will remove that text from figure-2. But actually it just meant to highlight that a RPMI transport in SEE has SEE as owner which is implicit.

  1. Messaging Protocol: Typo, response which is "s/send/sent" back as an RPMI acknowledgement

Noted, will update.

  1. Since we are dealing with PuC, should spec cover a scenario where request times out and any corrective action is required or that is left to the platform FW? There is RPMI_ERR_TIMEOUT but for that Ack has to come which may not happen in some scenarios?

There are two scenarios possible -

One is that ACK not getting to the application processor can happen either due to PuC not able to send ACK or issue with the transport. In such cases the type of corrective action will depend on the service being called and the RPMI clients.

Similarly, a service can also return ACK with RPMI_ERR_TIMEOUT and corrective action is again depends on that service called on RPMI client side. For example a voltage change request may timeout when its implementation at PuC side is not able to service the request in time due to slow PSU or VRM. In such case the corrective action depends on RPMI client if it wants to retry again asap, or retry after sometime or any other measure.

  1. Messaging Protocol: Should this be renamed to /sRPMI_ERR_ALREADY/RPMI_ERR_INPROGRESS?

Yes, this is little subjective and was also discussed with others and we decided to go with ALREADY as we found it more generic and appropriate for conditions which are already in progress or already happened. INPROGRESS was not chosen since it made more sense to use it only for in progress actions.

  1. MPXY: "The SBI MPXY channel must correspond to a single RPMI service group". This means one channel can not be mapped to multiple groups. Can we also clarify other way around that it is possible to have multiple channels that support a service group type. i.e. there can be 2 channels for MM service group.

The specification only mandates to assign one MPXY channel to have one RPMI service group which means that two service groups cannot share one MPXY channel. But the implementation can have multiple MPXY channels with same service group mapped either in a single RPMI transport instance or across multiple RPMI transport instances in an implementation.

  1. RPMI service group: Typo "A RPMI service group may be partially implement".."service group may implement..."

Noted, will update.

  1. We do not need this now I believe: any specific use case for this? MM_ENABLE_NOTIFICATION

If there are no events supported then it can simply return RPMI_ERR_NOT_SUPPORTED. But every service group has to follow requirements 1 and 2 and 3 as stated in Chapter 4 starting and need to implement the ENABLE_NOTIFICATION service.

  1. MM_SHMEM_ADDR_LOW: In a typical MM implementation NS buffer is allocated by edk2 and then passed into MM as a HOB. If that implementation is done then this attr may not be necessary. But there are some challenges with that implementation that is under discussion. So if eventually implementation does not support this attribute, is there a way to inform caller? Just send 0 in the address?

The discussion in the meeting on ReqFwd and MM concluded that the memory will already be allocated to avoid passing memory physical address every time which may require PMP mapping/unmapping and sanitization of pased memory addresses every time and its allocation is mandatory which led to adding these attributes. Passing 0 will reflect that its not present which will defy that. In case of EDK2 even if these shared memory details are made static then EDK2 non secure domain client can ignore these attributes instead of passing them as 0 from the MM service provider in secure domain.

  1. REQFWD_ENABLE_NOTIFICATION this may not be necessary?

Same as mentioned for question 10.

@ved-rivos
Copy link
Author

Currently, there are no queue empty/full conditions explicitly defined in the specification because these are obvious based on above semantics. We certainly don’t mind defining these conditions.

There are at least two ways to implement the head and tail for a circular buffer:

  1. In a scheme where both the head and tail pointers are augmented with a wrap bit, the full capacity of the circular buffer can be utilized without reserving a slot to differentiate between full and empty states. The wrap bit toggles each time the head or tail pointer wraps around from the last slot back to the first slot. This additional bit allows the system to distinguish between the buffer being full and empty, even when the head and tail pointers point to the same slot. Specifically, the buffer is empty when the head and tail indices are equal, and their wrap bits are also the same. Conversely, the buffer is full when the head and tail indices are equal, but their wrap bits differ. This approach eliminates the need to reserve a slot, maximizing the usable capacity of the buffer while maintaining simple and efficient enqueue and dequeue operations.
  2. In the traditional circular buffer scheme, one slot is intentionally left unused to differentiate between the full and empty states. The head pointer indicates the next slot to dequeue from, and the tail pointer indicates the next slot to enqueue into. The buffer is considered empty when head == tail, as there are no elements to dequeue. It is considered full when ((tail + 1) % N) == head, where N is the total number of slots, meaning advancing the tail would overwrite the head. This reserved slot simplifies the logic for distinguishing between full and empty states but slightly reduces the effective capacity of the buffer by one slot. Despite this trade-off, the simplicity and robustness of this scheme make it widely used.

The updated specification should be normative about which of the two schemes is intended. Based on the latest update it seems like the second scheme is the intent. If so that should be in normative text.

@ved-rivos
Copy link
Author

From the AP perspective the shared memory is initialized and ready to use and details of queues as mentioned in Section 2.3.4 are provided to the AP.

The section 2.2.4 does not mention the details about the initialization. If the intent is for the PuC to initialize the queue head and tail then it should be mentioned. kexec is a linux concept but similar mechanisms such as vmphu may apply to other OSs. If an OS is expected to continue from wherever the head and tail happen to be then such an OS may receive notifications that it has not subscribed to or may get ACK for requests it has never made. This should be noted and based on the comments in this thread seems like they are expected to be benignly ignored.

@ved-rivos
Copy link
Author

ved-rivos commented Dec 10, 2024

RPMI spec can elaborate this more for this flag and we can update this.

It is better to rename the flag as "doorbell interrupt is requested". The description of the flag states "inform the platform micro-controller to ring the doorbell". Is this flag not applicable to requests sent in P2A direction? The treatment of the flag[3] being set to 1 when doorbell interrupts are not enabled or not supported is undefined. Further, based on this description this flag seems to be not applicable in an ACK.

@ved-rivos
Copy link
Author

ved-rivos commented Dec 10, 2024

The message token helps application processors track the origin of the request when it sees a response.

The specification does not state anywhere that the TOKEN in a request must be echoed back in an ACK for the purposes of such correlation. Further describing this as a "sequence number" is misleading. A better description should be used as these are primarily message tags and not used to identify any form of sequence of messages.

Please provide some explanation about the use of a TOKEN in a notification. Also when a channel is shared by multiple application processor harts then what method is used to determine which hart should process the notification message and whether events for multiple harts be packed into a single notification message.

@pathakraul
Copy link
Collaborator

Currently, there are no queue empty/full conditions explicitly defined in the specification because these are obvious based on above semantics. We certainly don’t mind defining these conditions.

There are at least two ways to implement the head and tail for a circular buffer:

  1. In a scheme where both the head and tail pointers are augmented with a wrap bit, the full capacity of the circular buffer can be utilized without reserving a slot to differentiate between full and empty states. The wrap bit toggles each time the head or tail pointer wraps around from the last slot back to the first slot. This additional bit allows the system to distinguish between the buffer being full and empty, even when the head and tail pointers point to the same slot. Specifically, the buffer is empty when the head and tail indices are equal, and their wrap bits are also the same. Conversely, the buffer is full when the head and tail indices are equal, but their wrap bits differ. This approach eliminates the need to reserve a slot, maximizing the usable capacity of the buffer while maintaining simple and efficient enqueue and dequeue operations.
  2. In the traditional circular buffer scheme, one slot is intentionally left unused to differentiate between the full and empty states. The head pointer indicates the next slot to dequeue from, and the tail pointer indicates the next slot to enqueue into. The buffer is considered empty when head == tail, as there are no elements to dequeue. It is considered full when ((tail + 1) % N) == head, where N is the total number of slots, meaning advancing the tail would overwrite the head. This reserved slot simplifies the logic for distinguishing between full and empty states but slightly reduces the effective capacity of the buffer by one slot. Despite this trade-off, the simplicity and robustness of this scheme make it widely used.

The updated specification should be normative about which of the two schemes is intended. Based on the latest update it seems like the second scheme is the intent. If so that should be in normative text.

Noted, will make the conditions as normative text

@pathakraul
Copy link
Collaborator

From the AP perspective the shared memory is initialized and ready to use and details of queues as mentioned in Section 2.3.4 are provided to the AP.

The section 2.2.4 does not mention the details about the initialization. If the intent is for the PuC to initialize the queue head and tail then it should be mentioned. kexec is a linux concept but similar mechanisms such as vmphu may apply to other OSs. If an OS is expected to continue from wherever the head and tail happen to be then such an OS may receive notifications that it has not subscribed to or may get ACK for requests it has never made. This should be noted and based on the comments in this thread seems like they are expected to be benignly ignored.

Noted, Can this be a non-normative text explaining this condition?

@pathakraul
Copy link
Collaborator

RPMI spec can elaborate this more for this flag and we can update this.

It is better to rename the flag as "doorbell interrupt is requested". The description of the flag states "inform the platform micro-controller to ring the doorbell". Is this flag not applicable to requests sent in P2A direction? The treatment of the flag[3] being set to 1 when doorbell interrupts are not enabled or not supported is undefined. Further, based on this description this flag seems to be not applicable in an ACK.

This flag is applicable to Normal Requests from both AP or PuC.

A sender if sets this bit without having doorbell interrupts enabled or without support will result in undefined behavior. We can mention this as a normative text.

@pathakraul
Copy link
Collaborator

The message token helps application processors track the origin of the request when it sees a response.

The specification does not state anywhere that the TOKEN in a request must be echoed back in an ACK for the purposes of such correlation. Further describing this as a "sequence number" is misleading. A better description should be used as these are primarily message tags and not used to identify any form of sequence of messages.

Specification does mention that TOKEN needs to be preserved, this is mentioned in Section 3.2.2 just below the RPMI Message Header table.

Its true that these are not used as sequence number among a set of message sent over a period. Specification can clarify this in a non-normative text since this is not any requirement but just explanation what TOKEN represents.

Please provide some explanation about the use of a TOKEN in a notification. Also when a channel is shared by multiple application processor harts then what method is used to determine which hart should process the notification message and whether events for multiple harts be packed into a single notification message.

Though TOKEN in a notification message does not serve a purpose but since RPMI has a common message header it has become part of notification message. I agree and specification can mention this wrt notification message.

Noted, will update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants