-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update COORDINATION.md with Marvell review comments #216
base: main
Are you sure you want to change the base?
Conversation
Add Marvell comments and CRS details Signed-off-by: Satananda Burla <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sburla-marvell great comments.
Do yiu want to merge this content or just collect feedback? Let me know
@ballle98 @RezaBacchus @dandaly @Gal-Zaidman please review |
Hi,
|
Greetings Dan: |
Most of it is just feedback. I can create a separate pull request for the CRS part if needed. |
yes, please do, so I can merge the CRS part |
Hi Reza,
This way we don't gate IPU/DPU provisioning on requiring a new PCI, UEFI or BIOS change in order for it to work the OPI way. We can also specify the OPI way to workaround specific incompatibilities, whatever they may be. |
we should try to find a way that solves all vendors. if we can't, we can't... but we need to try first... |
I propose we answer the OPI provisioning question within the constraints of what's existing, and see where we land. I also haven't heard from any other vendor that they can't work within the PCI spec, or that they can't work in existing servers. |
I got feedback from one DPU vendor that strongly advocates for Option3 with OOB management via BMC in order not to rely on PCIe timing. Even if they are PCIe compliant card. |
@@ -36,10 +38,13 @@ In-Band refers to PCIe config access to the xPU from UEFI running on the server | |||
- Booted | |||
- Stalled or locked-up | |||
- Halted | |||
- ***SB> what does OS refer to here? there could be multiple software subsystems on the XPU running multiple OS. what is the difference between Halted and Not started ? When does the transition take place?*** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Halted is an OS_STATUS which I imagine would corespond to the Linux definition of halted. Similar to Graceful Shutdown.
Mot Started is a different status register for CRASHDUMP similar to "Normal", Software termination is not detected.
Can we get this rationale into the document? We need to have a clear reason why we need to create a dependency like what Option 3 proposes, since that dependency will limit adoption. |
@@ -87,6 +125,7 @@ meeting the timing constraints around enumeration. Dynamic adding and | |||
deleting devices on the PCI bus via hot plug requires BIOS configuration. | |||
|
|||
## 3: Out-band via platform BMC | |||
***SB> How about multi Host support?*** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To some degree I see an xPU as multihost. The processor on the xPU and host processor are sharing a network device. The key thing is power domains and if the multiple hosts need to be coordinated. It may be possible to have multiple BMCs on I2C in a multi-master configuration.
- ***SB> How is this related to standard PCIe FLR? Looks like we are assuming XPU internal architecture here. What if the XPU's CPU complex is the one driving its PCIe interface?*** | ||
|
||
|
||
##1.a: PCIe CRS (Config Retry Status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sburla-marvell let's submit this part as separate PR so I can merge it...
- *“A Root Complex implementation may choose to limit the number of Configuration Request/CRS Completion Status loops before determining that something is wrong with the target of the Request and taking appropriate action, e.g., complete the Request to the host as a failed transaction.” * | ||
- Implementation complicated by software visibility option (if not visible, it is treated similar to CA and UR with 1sec timeout), some old IPs do not support software visibility. | ||
|
||
|
||
|
||
## 2: Driver Ready Check | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from slack by @samerhaj:
UEFI spec has a Watchdog that may exit the UEFI Driver if it WAITs for too long.. Also, some of the higher-level UEFI drivers used for booting (like PXE/HTTP/etc..) are not in the vendor / xPU. They are in the UEFI BIOS itself, and use lower level xPU provided UEFI drivers (such as UEFI UNDI/SNP for Ethernet ... etc..)
@dandaly fyi
Add Marvell comments and CRS details