Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fault should be returned to PCIe EP against ATS bypass translation requests #451

Closed
zetalog opened this issue Dec 6, 2024 · 27 comments
Closed

Comments

@zetalog
Copy link

zetalog commented Dec 6, 2024

We've tested IOMMU with PCIe EP VIP and real PCIe EP.

It looks, EP will never send ATS bypass (S1+S2 bare) translation request.

And if IOMMU returns the input address back to the EP VIP golden model, it will misbehave by re-issuing a translated request with ambiguous addresses after receiving such a response.
image

We then check the de-facto implementations. It seems, for ATS bypass translation request, the real world is expecting a fault to return UR. This makes senses and should be safer to be compatible with all kinds of PCIe EPs.

Could we change the IOMMU specification and the reference model to return "type disallowed" to such an ATS bypass translation request?

@ved-rivos
Copy link
Collaborator

Is my interpretation right that this test is using a Mrd64 to request address translations and then interpreting the returned 64-bit address a 32-bit address and using it with Mrd32?

@ved-rivos
Copy link
Collaborator

It looks, EP will never send ATS bypass (S1+S2 bare) translation request.

Whether the TA's page tables are in Bare mode is not visible to an EP. Its not clear what is referred by stating an EP will never send a ATS bypass translation request. I did not think there is a translation request type that can request an ATS bypass.

And if IOMMU returns the input address back to the EP VIP golden model, it will misbehave by re-issuing a translated request with ambiguous addresses after receiving such a response.

What was the ambiguity or misbehavior ?

@zetalog
Copy link
Author

zetalog commented Dec 9, 2024

Is my interpretation right that this test is using a Mrd64 to request address translations and then interpreting the returned 64-bit address a 32-bit address and using it with Mrd32?

An 64-bit translation request with input address of df7_46e6c000 (S1+S2 bare) is responded as 7fff_fffff as output address, meaning all these masked bits should be taken from the input address.
However EP misbehaves with a translated address of 00000000 being used as the address of the following 32-bit write/read requests.

@zetalog
Copy link
Author

zetalog commented Dec 9, 2024

What was the ambiguity or misbehavior ?

For ATS bypass request which is unlike normal translation request, it depends on the implementation of the PCIe EP. How EP interprets the returned output address (normally, higher bits from the output address, lower bits, ex, page offset from the input address) is actually implementation dependent and is out of the scope of the IOMMU.

Back to the IOMMU specification, it doesn't force such a fault. Since there is no such a fault condition, the reference model handles this condition in the default code path now by returning a translation result. Following this implementation by returning a translation result back to the ATS clients, we see this incompatibility.

It looks the easiest way is to just return a fault to such ATS bypass requests, so that all EPs including all kinds of the verification models can be satisfied with this kind of fault. What they should do is just reporting such mis-configuration back to the EP driver if it cannot be handled in the EP firmware. This seems to be compatible with the real application world.

IMO, we should add an ATS bypass fault (probably we should use 260) in the specification and tune the reference model to be compatible with the real world - avoid misbehaviors caused by the implementation difference between the PCIe EPs.

@ved-rivos
Copy link
Collaborator

For ATS bypass request which is unlike normal translation request,

PCIe has no concept of a ATS bypass request. From this thread I am unable to understand what is being referenced as ATS bypass request and what the incompatibility issue is.

Following this implementation by returning a translation result back to the ATS clients, we see this incompatibility.

How can the PCIe device even tell whether S1+S2 is Bare or they have been programmed as non-Bare but with a translation which matches the returned result when Bare. What is the incompatibility?

It looks the easiest way is to just return a fault to such ATS bypass requests,

If ATS requests should be disallowed when S1+S2 is Bare then software can just set ATS enable to 0.

@ved-rivos
Copy link
Collaborator

An 64-bit translation request with input address of df7_46e6c000 (S1+S2 bare) is responded as 7fff_fffff as output address, meaning all these masked bits should be taken from the input address. However EP misbehaves with a translated address of 00000000 being used as the address of the following 32-bit write/read requests.

A 64-bit translation request returns a 64-bit translated address. Its not clear why this implementation is requesting a 64-bit address and then interpreting it as a 32-bit address.

@zetalog
Copy link
Author

zetalog commented Dec 10, 2024

A 64-bit translation request returns a 64-bit translated address. Its not clear why this implementation is requesting a 64-bit address and then interpreting it as a 32-bit address.

This is the vendor PCIe EP VIP's behavior. If the higher bits of the address is 0, it issues 32-bit address wide reads/writes. That means in this case, if the resulting address is 44-bit, the 43:32 bit is 0s.

@zetalog
Copy link
Author

zetalog commented Dec 10, 2024

PCIe has no concept of a ATS bypass request.

That's our internal terminology, means ATS request with S1 and S2 configured as bare. Or we can mark this as EN_ATS=1, IOSATP.MODE=Bare, IOHGATP.MODE=Bare.

@zetalog
Copy link
Author

zetalog commented Dec 10, 2024

If ATS requests should be disallowed when S1+S2 is Bare then software can just set ATS enable to 0.

It's just a verification random tests, testing all kinds of valid configurations with comparison to the reference model, then triggered this problem.

What is the incompatibility?

It looks the real world industry VIP expects a UR error from IOMMU to correctly behave by re-issuing an untranslated request out which is different than the reference model.

@ved-rivos
Copy link
Collaborator

It looks the real world industry VIP expects a UR error from IOMMU to correctly behave by re-issuing an untranslated request out which is different than the reference model.

There is nothing in the PCIe specification that requires a UR to be generated based on the virtual memory system modes. A UR is generated if: "Indicates there is a problem, potentially correctable by system software, that prevents the TA from
returning any Translation Completion entries.". When VS and G stage are in Bare mode, the TA is not prevented from returning a Translation Completion. Bare mode indicates no address translation or protection is applicable.

It looks the real world industry VIP expects a UR error from IOMMU

Could you point me to the section in the PCIe specification that requires a UR to be returned based on the page table mode configured in the TA. What is the purpose of this test? Why is it expecting a UR?

@zetalog
Copy link
Author

zetalog commented Dec 11, 2024

IMO, this is just an application notes level problem. There is no special statement in the PCIe specification. It sounds a kind of area the PCIe specification wouldn't define. Thus the behavior of the PCIe EPs could vary. It's not a compatibility problem against the PCIe specification, it's a compatibility problem against application world EP products.

In the real world, there are various PCIe EPs already tuned against the real world IOMMU/PCIe RC products.
Since configuration 1 "EN_ATS=0" is equivalent to configuration 2 "EN_ATS=1, IOSATP.MODE=Bare, IOHGATP.MODE=Bare". For the configuration 1, there could be chances that PCIe EPs misbehave while the driver gets no indication of such failure. It could be better for IOMMU to just return a UR to notice the configuration 2 failure to the software driver, so that administrators can re-configure to use the configuration 1.

@ved-rivos
Copy link
Collaborator

Still unclear to me what the test is attempting to do. Why is the EP expecting a UR?

@zetalog
Copy link
Author

zetalog commented Dec 12, 2024

EP doesn't expect UR.
IOMMU just follows the RISC-V spec, returning a translation result to the EP.
The result is returning in the way corresponding to the interfaces between PCIe RC and IOMMU.
The interface we are using is DTI (in order to support distribute translation, making IOMMU TLB closer to the DMA masters).
Following the DTI specification, the result is returning using two fields:

  1. OA
  2. TRANS_RNG

Here we are returning a trans_rng=256TB, OA=IA in the translation response, hoping that:

  1. EP can use IA directly, or
  2. EP may use a trans_rng masked OA as OA, and masked lower address bits from IA.

image
However the EP mis-interprets the result, returning a wrong result (0x0).
BTW, you can see from the figure that this ATS response contains a bypass=1 field, that's why we call it ATS bypass request.

Whether the EP can correctly interpret such an "EN_ATS=1, IOSATP.MODE=Bare, IOHGATP.MODE=Bare" response actually is out of IOMMU and PCIe scope.
It seems the ecosystem PCIe IPs/VIPs are already ruled by the ARM world behavior. Where ARM chooses to return UR to notify the software driver to change the configuration to "EN_ATS=0" in this case.
ARM expects UR in this situation, thus there are no specific definitions in the ARM world defined in this case for EP, and EP behavior is actually UNSPECIFIED.
Probably ARM also encountered application problem due to the unspecified EP behavior, then chose to return UR to ask driver to change the settings. Just a guess, you can ignore my assumption. This might be reasonable that in such case, EP shouldn't issue ATS translation request and can get translation result by itself, which could be more performance friendly.

@ved-rivos
Copy link
Collaborator

I don't understand why this ATS response is different than when S1+S2 is NOT Bare but a 256 TB page has been setup in the page tables.

@zetalog
Copy link
Author

zetalog commented Dec 13, 2024

I don't understand why this ATS response is different than when S1+S2 is NOT Bare but a 256 TB page has been setup in the page tables.

That's kind of thing we are also not clear as the golden model VIP code is encrypted and is not accessible. Looks like an EP by EP vendor specific behavior.

@ved-rivos
Copy link
Collaborator

I am not sure there is anything actionable here. This EP seems to be not compliant with the ATS specification. Setting ATS_EN to 0 when the mode of both stages is Bare can be used as a workaround but that would still not address the case where the address translation is not Bare.

@zetalog
Copy link
Author

zetalog commented Dec 17, 2024

This EP seems to be not compliant with the ATS specification.

Is there an ATS specification statement defining the behavior that EP should follow when the translation is bare? PCIe specification only defines how EP should behave when ATS is enabled, which means when translation is not bare. The specification doesn't explicitly define how EP should behave when translation is bare, should this be handled as a translation or should it be forbidden by disabling ATS is beyond the scope that PCIe would define.

but that would still not address the case where the address translation is not Bare.

So far we couldn't see misbehaviors from this EP model when the address translation is not bare.
It seems ARM has noticed such vendor specific issue and start not to define bypass behavior from DTIv2.

@ved-rivos
Copy link
Collaborator

Is there an ATS specification statement defining the behavior that EP should follow when the translation is bare? PCIe specification only defines how EP should behave when ATS is enabled, which means when translation is not bare. The specification doesn't explicitly define how EP should behave when translation is bare, should this be handled as a translation or should it be forbidden by disabling ATS is beyond the scope that PCIe would define.

There is no concept of translation being Bare in PCIe specification. The method by which an IOMMU uses to obtain an address translation should not have any impact on the EP behavior. The EP should only interpret the results as specified by the PCIe specification for ATS translation completion.

The RISC-V IOMMU has no concept of "bypass" in the form used by other architectures. The two-stage address translation is always active. Setting the mode to Bare states that the result of the address translation is an identity map i.e. for the first stage GPA is same as VA and for second stage the SPA is same as GPA. Due to this BYPASS=1 is not a concept for RISC-V IOMMU.

@zetalog
Copy link
Author

zetalog commented Dec 19, 2024

We did this to the reference model:
image
And looks fine with such golden models.
Probably SPEC should be changed to state this (ATS bypass) as "UNSPECIFIED" so that manufactures can decide what to do on their own.
And probably we could add hooks in the reference model TB to allow vendors to build additional "UNSPECIFIED" checkers to pass the comparison in iommu_translate_iova() to meet manufactures' own application environment.

@ved-rivos
Copy link
Collaborator

The spec should not be changed this way and this change to the model is not in conformance with the specification. The RISC-V IOMMU has no concept of a bypass and PCIe ATS as well has no concept of bypass. With RISC-V IOMMU, two stage address translation is always active and there is no optional to disable it; BYPASS should be driven to 0.

@zetalog
Copy link
Author

zetalog commented Dec 20, 2024

With RISC-V IOMMU, two stage address translation is always active and there is no optional to disable it; BYPASS should be driven to 0.

This doesn't work as there is no such granularity in DTI defined for 44-bit where this case is facing.
image
First and last, this is an application problem. IMO, mentioning this as UNSPECIFIED in the specification would be very helpful.

@ved-rivos
Copy link
Collaborator

This doesn't work as there is no such granularity in DTI defined for 44-bit where this case is facing.

There is no requirement in the spec to provide any specific translation size. When the two stages are Bare the implementation can provide a translation of any size that works for the bus interfaces it supports. For instance it could always use 2 MB.

@zetalog
Copy link
Author

zetalog commented Dec 25, 2024

In the distributed ATC environment, is this still considered a problem:

  1. with EN_ATS=0, no traffic will be generated between PCIe and IOMMU and this works.
  2. with EN_ATS=1, traffic will be generated between PCIe and IOMMU and this risks not working.

@ved-rivos
Copy link
Collaborator

Could you explain what risks not working?

@zetalog
Copy link
Author

zetalog commented Dec 25, 2024

Sorry, we haven't tried the workaround yet as we still have concerns. Let me do the re-wording:

In the distributed ATC environment, is this still considered a problem:

  1. with EN_ATS=0, no traffic will be generated between PCIe and IOMMU.
  2. with EN_ATS=1, traffic will be generated between PCIe and IOMMU.

@ved-rivos
Copy link
Collaborator

I assume EN_ATS is the flag in device context structure that indicates whether the device can generate ATS transaction such as Mrd/Mwr with AT=Translated/Translation request. When EN_ATS is 0, transactions with AT=Translated/Translation requests is disallowed and only transactions with AT=Untranslated request are allowed. When EN_ATS is 1, transactions with all AT encodings are allowed.

The configuration of the device context has no bearing on what transactions get generated by the device. So its not clear why the device will not generate transactions when EN_ATS flag in the device-context structure of the IOMMU is 0 and why it will when it is 1.

@zetalog
Copy link
Author

zetalog commented Dec 26, 2024

Sounds reasonable. EN_ATS in DC cannot help to restrict the behavior of PCIe EP driver.
We'll record this case in our application notes. Thanks for the support.

Best regards

@zetalog zetalog closed this as completed Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants