diff --git a/server_soc_requirements.adoc b/server_soc_requirements.adoc index f152cd9..9bc5246 100644 --- a/server_soc_requirements.adoc +++ b/server_soc_requirements.adoc @@ -29,15 +29,18 @@ deliver external interrupts to the RISC-V application processor harts. | IIC_020 | External interrupts MUST be signaled to a hart as message-signaled interrupts (MSI). -2+| _Since MSI are a memory write, it allows for a simplified implementation of - producer-consumer ordering rules. For example, interrupts following a write - from a device must only be observed after the writes have been observed, - and the read completion generated by a device following an interrupt must - only be observed after the interrupt has been observed._ + +2+| _Since Message Signaled Interrupts (MSI) are implemented as memory writes, + they facilitate a simplified enforcement of producer-consumer ordering + rules. Specifically, interrupts issued by a device following a write + operation must be processed only after the previous write operations have + been completed and observed. Similarly, interrupts issued by a device must + be observed before any subsequent read completions generated by the + device._ + + - _Message Signaled interrupts (MSI) are the preferred interrupt signaling - mechanism in PCIe. Supporting MSI for harts allows low-latency interrupt - signaling._ + _MSI is the preferred mechanism for interrupt signaling in PCIe due to its + efficiency and support for low-latency communication between devices and + harts. By adopting MSI, systems can achieve faster and more reliable + interrupt handling, essential for high-performance computing environments._ | IIC_030 | The Incoming Message-signaled Interrupt Controller (IMSIC) MUST implement an interrupt file for S-mode. @@ -46,7 +49,7 @@ deliver external interrupts to the RISC-V application processor harts. 2+| _Supporting 5 VS-mode interrupt files for a hart allows context switching between up to 5 virtual CPUs (vCPU) on a hart without needing to swap the contents of the interrupt file out to memory. This is particularly - beneficial when devices are directly assigned to virtul machines + beneficial when devices are directly assigned to virtual machines (VMs), as swapping out the context of an IMSIC interrupt file may result in longer latencies due to the need to redirect device interrupts to a memory-resident interrupt file._ @@ -57,12 +60,12 @@ deliver external interrupts to the RISC-V application processor harts. | IIC_060 | The VS-mode interrupt files MUST support at least 63 interrupt identities. -| IIC_070 a| The memory regions for IMSIC interrupt files MUST have the - following PMAs: +| IIC_070 a| The memory regions designated for IMSIC interrupt files MUST have + the following PMAs: * Not cacheable, non-idempotent, coherent, strongly-ordered (I/O ordering) channel 0 I/O region - * Support 4-byte aligned reads and writes. + * Support for 4-byte aligned reads and writes. | IIC_080 a| If the SoC implements devices that use wire-signaled interrupts then the SoC MUST implement an APLIC as specified by the RISC-V @@ -72,7 +75,7 @@ deliver external interrupts to the RISC-V application processor harts. If implemented, the APLIC MUST support: * Supervisor interrupt domain. - * `GEILEN` equal to that implemented by the harts. + * `GEILEN` values matching those implemented by the harts. * MSI delivery mode. * Extempore MSI generation using the `genmsi` register. @@ -110,22 +113,14 @@ deliver external interrupts to the RISC-V application processor harts. wide device IDs. | IOM_040 | An IOMMU that does not govern a PCIe root port MUST support a - device ID width REQUIRED to support all requester IDs originated by + device ID width required to support all requester IDs originated by the devices governed by that IOMMU. | IOM_050 | The IOMMU MUST implement all the page based virtual memory system - modes and extensions that are supported by the IOMMU and are also - implemented by the RISC-V application processor harts in the SoC. -2+| _The page based virtual memory system modes that may be optionally supported - by the IOMMU are defined in the IOMMU capabilities register._ - -| IOM_060 a| The IOMMU SHOULD support the following virtual memory extensions: - - * Svadu (enumerated by 1 setting of `capabilities.AMO_HWAD`) - -2+| _Hardware A/D bit updates capability enables efficient support for use - models such as VM migration, shared virtual addressing, and user space work - submission._ + modes and extensions that are implemented by the RISC-V application + processor harts in the SoC. +2+| _The page based virtual memory system modes supported by the IOMMU are + enumerated in the IOMMU capabilities register._ | IOM_070 | The IOMMU SHOULD support pass-through mode and MRIF mode MSI address translation. @@ -138,10 +133,9 @@ deliver external interrupts to the RISC-V application processor harts. translation services (ATS). 2+| _High performance devices such as DPU/SmartNICs, GPUs, and FPGAs, utilized in server platforms rely on ATS and Page Request services to achieve high - throughput and low-latency I/O. + - Supporting ATS is also required for efficiently accommodating usage models - such as Shared Virtual Addressing and direct work submission from user - mode._ + throughput and low-latency I/O. Supporting ATS is also required for + efficiently accommodating usage models such as Shared Virtual Addressing + and direct work submission from user mode._ | IOM_100 | IOMMU governing PCIe root ports SHOULD support the T2GPA mode of operation with ATS if ATS is supported. @@ -264,7 +258,7 @@ deliver external interrupts to the RISC-V application processor harts. the PCIe RID (and if applicable the segment ID). |=== -=== PCIe Subsystem Integration +=== PCIe Subsystem A PCIe subsystem consists of a root complex with a collection of root ports, root complex event collectors (RCECs), root complex register blocks (RCRBs), @@ -295,7 +289,7 @@ presented to software using one of the following options: . Presented to software as a PCIe endpoint (EP; See section 1.3.2.2 of the PCIe 6.0 specification) connected to a PCIe root port (See example of such an - endpoint connected to root port 3 in <>). Such PCIe + endpoint connected to root port 3 in <>). Such PCIe endpoints must comply with the PCIe specified rules for endpoints. . Presented to software as a root complex integrated endpoint (RCiEP; See @@ -323,7 +317,7 @@ page in this range such that the address bits (20+b):20 where b=0 to 7 identify the bus number of that function (see also recommendations in the PCIe specification 6.0 section 7.2.2), the address bits 19:15 identify the device number, and the address bits 14:12 identify the function number. The host bridge -in conjunction with the SoC firmware maps the ECAM address range to the +in conjunction with the SoC boot firmware maps the ECAM address range to the hierarchy domain originating at each PCIe root port. [width=100%] @@ -360,7 +354,7 @@ hierarchy domain originating at each PCIe root port. | ECM_040 | A SoC MAY support multiple hierarchies. When multiple hierarchies are supported, the ECAM address range of the hierarchies MUST NOT - overlap, but they are NOT REQUIRED to be contiguous. + overlap, but they are not required to be contiguous. | ECM_050 | The configuration space of the PCIe root ports MUST be associated with the primary bus number of the hierarchy associated with the @@ -409,9 +403,8 @@ hierarchy domain originating at each PCIe root port. each retry of the configuration read. 2+| _The data response to the Vendor ID register on receipt of an RRS response - MUST follow the PCIe defined rules._ + - + - _See also the recommendations in PCIe specification 6.0 section 2.3.2._ + MUST follow the PCIe defined rules. See also the recommendations in PCIe + specification 6.0 section 2.3.2._ | ECM_110 | Write access from a RISC-V hart to configuration registers of non-existent functions on the primary bus MUST be dropped (silently @@ -446,11 +439,11 @@ hierarchy domain originating at each PCIe root port. as the low MMIO ranges and those suitable for use with 64-bit BARs termed as high MMIO ranges._ -| MMS_030 a| The system physical address ranges designated for mapping - endpoint memory spaces MUST have the following physical memory - attributes (PMAs): +| MMS_030 a| The system physical address ranges designated for mapping endpoint + memory spaces have the following physical memory attributes (PMAs) + requirements: - * Not cacheable, non-idempotent, coherent, strongly-ordered + * MUST be Not-cacheable, non-idempotent, coherent, strongly-ordered (I/O ordering) I/O region. * MUST support all aligned and unaligned access sizes that can be generated by data requests from any of the RISC-V application @@ -467,20 +460,20 @@ hierarchy domain originating at each PCIe root port. device._ + + _See also the implementation note on optimizations based on restricted - programming mode in section 2.3.1 of PCIe specification 6.0._ + programming mode in section 2.3.1 of PCIe specification 6.0._ + + _See also first/last DW byte enable rules in section 2.2.5 of PCIe specification 6.0._ | MMS_040 a| A load from a RISC-V application processor hart to memory ranges - designated for the mapped memory space of endpoints or RCiEP MUST + designated for mapping memory spaces of endpoints or RCiEP MUST complete with an all 1s response and MUST NOT lead to any other behavior (e.g., hangs, deadlocks, etc.) if any of the following are TRUE: - * Address is not within any of the following: - ** Memory base/limit or prefetchable memory base/limit of any - root port. + * Address is not within any of the following address ranges: + ** Address range defined by memory base/limit or prefetchable + memory base/limit registers of any root port. ** BAR (including when EA capability is used) mapped range of any RCiEP. ** BAR (including when EA capability is used) mapped range of @@ -493,14 +486,14 @@ hierarchy domain originating at each PCIe root port. * A completion timeout occurs. | MMS_050 a| A store from a RISC-V application processor hart to memory ranges - designated for the mapped memory space of endpoints or RCiEP MUST + designated for mapping memory space of endpoints or RCiEP MUST be dropped (silently ignored or discarded) and MUST NOT lead to any other behavior (e.g., hangs, deadlocks, etc.) if any of the following are TRUE: - * Address is not within any of the following: - ** Memory base/limit or prefetchable memory base/limit of any - root port. + * Address is not within any of the following address ranges: + ** Address range defined by memory base/limit or prefetchable + memory base/limit registers of any root port. ** BAR (including when EA capability is used) mapped range of any RCiEP. ** BAR (including when EA capability is used) mapped range of @@ -521,8 +514,8 @@ hierarchy domain originating at each PCIe root port. forwarding is disallowed, then the poisoned data MUST be replaced with all 1s data. -| MMS_080 | SoC MUST NOT use EA capability to indicate memory for allocation - to endpoints downstream of a PCIe root port. +| MMS_080 | SoC MUST NOT use EA capability to indicate memory resources for + allocation to endpoints downstream of a PCIe root port. |=== ==== Access Control Services (ACS) @@ -544,8 +537,8 @@ devices, and SR-IOV capable devices. * ACS I/O request blocking. | ACS_020 | If a PCIe root port or a SoC-integrated downstream switch port - implements memory BAR space, then it SHOULD support the PCIe ACS - DSP memory target access control. + implements a memory BAR, then it SHOULD support the PCIe ACS DSP + memory target access control. 2+| _The ACS DSP memory target access control can be used to prevent unauthorized accesses to protected memory spaces such as the PCIe root port's BAR mapped registers._ @@ -579,7 +572,7 @@ devices, and SR-IOV capable devices. The rules in this section apply to treatment in the root complex of TLPs that are routed by address. An address carried in such transactions may be the address of a host memory location or the address of a location in the memory -space of a peer endpoint or RCiEP. +space of an endpoint or RCiEP. [width=100%] [%header, cols="5,25"] @@ -616,32 +609,33 @@ space of a peer endpoint or RCiEP. whether the transaction is targeting host memory or peer device memory. -| ADR_040 | The host bridge MAY support access to peer device memory. If peer - device memory access is not enabled (by design or by +| ADR_040 | The host bridge MAY support devices accessing peer devices' memory. + If peer device memory access is not enabled (either by design or configuration), then such accesses MUST be responded to with a - UR/CA response. The host bridge MUST NOT cause any other error - (e.g., hang, deadlock, etc.) when rejecting access to peer device - memory. + UR/CA response. The host bridge MUST NOT cause any other errors + (e.g., hang, deadlock, etc.) when rejecting access by a device to + a peer device's memory. 2+| _A virtual machine may violate the peer-to-peer access policies and/or - configurations established by the hypervisor and/or SoC firmware, which - disallow peer device memory accesses. Such a VM may attempt to program - devices passed through to the virtual machine to perform peer memory - accesses. Such attempts to violate the peer-to-peer policies MUST NOT - lead to system instabilities (e.g., hangs, deadlocks, etc.) or errors._ - -| ADR_050 | When a posted or non-posted-with-data request is allowed to access - peer device memory, then any poisoned data (EP=1) MUST be forwarded - as poisoned data, unless such forwarding is disallowed (e.g., - poisoned TLP egress blocking or lack of support for data poisoning - is the SoC). - -| ADR_060 | Host memory writes caused by posted or non-posted-with-data requests - with poisoned data (EP=1) MUST mark such data in the host memory as - poisoned. - -| ADR_070 | Host memory reads that have uncorrectable data errors detected - within the SoC MUST cause a response with poisoned data (EP=1) if - transmission of poisoned TLPs is not blocked (see also section + configurations enforced by the hypervisor and/or SoC firmware, which + prohibit peer device memory accesses. In instances where a VM configures + devices passed through to it to perform peer memory accesses, such attempts + must not result in system instabilities (e.g., hangs, deadlocks, etc.) or + errors. Compliance with this directive ensures system resilience against + unauthorized access attempts, maintaining operational integrity._ + +| ADR_050 | When a posted or non-posted-with-data request from a device is + allowed to access peer device memory, then any poisoned data (EP=1) + MUST be forwarded as poisoned data, unless such forwarding is + disallowed (e.g., due to poisoned TLP egress blocking or lack of + support for data poisoning in the SoC). + +| ADR_060 | Host memory writes resulting from posted or non-posted-with-data + requests with poisoned data (EP=1) MUST mark such data as poisoned + in the host memory. + +| ADR_070 | Host memory reads that encounter uncorrectable data errors detected + within the SoC MUST result in a response with poisoned data (EP=1) + if transmission of poisoned TLPs is not blocked (see also section 2.7.2.1 of PCIe specification 6.0). |=== @@ -662,7 +656,9 @@ messages or completions. hierarchies SHOULD be supported. 2+| _MCTP transport protocols using PCIe VDM are used by the BMC to manage PCIe/CXL devices. These messages are used to support manageability - protocols such as PLDM, NVMe-MI, Redfish, etc._ + protocols such as PLDM, NVMe-MI, Redfish, etc. Supporting P2P routing of + VDMs such as those carrying MCTP protocol messages enables greater system + design flexibility in supporting these management protocols._ | IDR_030 | P2P routing of PCIe VDM to/from RCIeP MAY be supported. @@ -677,7 +673,7 @@ messages or completions. | ID# ^| Requirement | CCS_010 | The host bridge MUST enforce PCIe memory ordering rules and SHOULD support the relaxed ordering (RO) and ID-based ordering (IDO). -2+| _An implementation may occassionally or never permit the relaxations allowed +2+| _An implementation may occasionally or never permit the relaxations allowed by RO and/or IDO attributes. Such implementations will result in a more conservative interpretation of the ordering rules, but they will not result in a violation of the ordering rules._ @@ -685,14 +681,14 @@ messages or completions. | CCS_020 | Writes to host or device memory using the RO attribute set to 0 MUST be observed by other harts and bus mastering devices in the order in which the write was received by the PCIe root port or the - host bridge such that all previous writes are globally observed + host bridge, ensuring that all previous writes are globally observed before the RO=0 write is globally observed. | CCS_030 | The host bridge MUST enforce the idempotency, coherence, cacheability, and access type physical memory attributes of the accessed memory and perform any reordering or combining of PCIe transactions only if the combination of physical memory attributes - and TLP specified memory ordering attributes allow it. + and TLP-specified memory ordering attributes allow it. | CCS_040 | The host bridge SHOULD implement hardware enforced cache coherency, irrespective of the “No Snoop” attribute in the TLP, unless it has @@ -836,7 +832,7 @@ mechanism in PCIe. [%header, cols="5,25"] |=== | ID# ^| Requirement -| VSR_010 a| Vendors specific registers in the root ports, host bridge, RCiEP, +| VSR_010 a| Vendor specific registers in the root ports, host bridge, RCiEP, and RCRB MUST be implemented using one or more of the following capabilities: @@ -846,7 +842,7 @@ mechanism in PCIe. | VSR_020 | SoC MUST NOT require hypervisor and/or operating system interaction with configuration space registers that are not defined - by an industry standards. Non-standard vendor specific registers, + by an industry standard. Non-standard vendor specific registers, if implemented in the configuration space, must only be used by the SoC firmware. 2+| _Some industry standards such a CXL may define standard DVSEC structures in @@ -884,7 +880,7 @@ mechanism in PCIe. needs management by the operating system or hypervisors. | SID_040 | SoC-integrated PCIe devices that support PCIe SR-IOV capability - SHOULD support MSI-X capability. + SHOULD support the MSI-X capability. 2+| _MSI-X capability enables virtual machines to assign interrupt resources to virtual functions without needing access to the configuration space of the function. Access to the configuration space of the virtual function is @@ -906,14 +902,14 @@ mechanism in PCIe. * ACS P2P completion redirect. * ACS direct translated P2P. -| SID_070 | The BAR registers are implemented by SoC-integrated PCIe devices +| SID_070 | If the BAR registers are implemented by SoC-integrated PCIe devices then they MUST be programmable. The Memory Space Indicator (bit 0) - of such BAR registers MUST be 1 and they SHOULD support being - mapped anywhere in the 64 bit memory space. + of such BAR registers MUST be 1, and they SHOULD support being + mapped anywhere in the 64-bit memory space. | SID_080 | RCiEP MAY support the PCIe enhanced allocation (EA) capability for fixed allocation of memory resources. If EA capability is used then - the BEI of the entries MUST be one of 0 through 5 or 9 though 14 + the BEI of the entries MUST be one of 0 through 5 or 9 through 14 and their primary/secondary properties must be one of 0 through 4 or 0xFF. @@ -986,7 +982,7 @@ mechanism in PCIe. | RAS_030 | If poisoned data needs to be transmitted from a first component to a second component that lacks the ability to manage poison, the - first component MUST trigger an uncorrected urgent error report + first component MUST trigger an critical uncorrected error report instead of silently transmitting the corrupted data. 2+| _Some components serve as intermediaries through which data passes. For instance, a PCIe/CXL port acts as an intermediary that receives data from @@ -1004,13 +1000,14 @@ mechanism in PCIe. 2+| _Note RERI is still under construction._ | RAS_050 | When RERI is supported, the RAS error records MUST include the - capability to individually enable error signals for each severity - (UUE, UDE, or CE) of error that could be logged in that specific - error record. + capability to individually enable error signaling for each severity - + Uncorrected Error Critical (UEC), Uncorrected Error Deferred (UED), + and Corrected Error (CE) - of error that could be logged in that + specific error record. 2+| _Configurable enables provide software with the flexibility of using an event-based or polling-based error logging for both corrected errors and deferred errors. Typically, software operates in an event-based mode for - urgent errors, as these errors necessiate immediate remedial action when + critical errors, as these errors necessiate immediate remedial action when they arise._ | RAS_060 | If RERI is supported, RAS error records MUST preserve the @@ -1087,7 +1084,7 @@ and more. significant shared caches and the memory controllers. | QOS_030 | If CBQRI is supported, RISC-V harts within the application - processors of the SoC MUST include support for the `sqoscfg` CSR. + processors of the SoC MUST include support for the `srmcfg` CSR. Furthermore, this CSR MUST support a minimum of 16 RCIDs and at least 32 MCIDs. 2+| _The count of RCID and MCID that can be used in the SoC should scale with @@ -1207,6 +1204,8 @@ data centers and enterprises. |=== +<<< + === Performance Monitoring [width=100%]