Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in artifact creation of the LX6 instructions #453

Merged
merged 1 commit into from
Jun 24, 2024

Conversation

nirvedhmeshram
Copy link
Contributor

@nirvedhmeshram nirvedhmeshram commented Jun 24, 2024

There was a bug where were appending LX6 instructions to an existing vector for subsequent entry points rather than making a new vector for each entry point. This change fixes that and hence fixes #447 which was indeed a kernel time out due to bad artifacts.
Also adds a multi-dispatch e2e test to CI.

@nirvedhmeshram nirvedhmeshram requested a review from newling June 24, 2024 15:17
@nirvedhmeshram nirvedhmeshram force-pushed the nm_fix_artifact_bug branch 2 times, most recently from 1ee2715 to 77b0747 Compare June 24, 2024 15:22
@makslevental
Copy link
Collaborator

Is this the kernel lag issue we've all been facing? I'm wondering how I was seeing the seeing the same thing even though I wasn't reusing the same buffer/vector in my own path (in xaiepy).

Copy link
Contributor

@newling newling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

In the future we could add a test that there is exactly 1xclbin generated for a test like this, I can try and add that at some stage.

@@ -359,6 +358,8 @@ LogicalResult AIETargetBackend::serializeExecutable(

std::ifstream instrFile(static_cast<std::string>(npuInstPath));
std::string line;
// Vector to store LX6 instructions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I've heard that AIE2p has LX7 instructions (not a request for change!)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's true that forthcoming gens will have LX7 (and higher?) but the instructions themselves don't actually have anything to do with the LX arch (which is Tensilica's naming scheme for their archs and not connected to our firmware that runs these instructions). But you're right the comment will be out of date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have a clean up PR for this, I noticed you did LX[0-9] in some places, maybe we should say LX6/LX7 instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the safest term would be command processor but who cares (honestly when I first started it was useful knowing that it was LX6 because there's a datasheet type thing out there for that so it clarified what people were really talking about).

@nirvedhmeshram
Copy link
Contributor Author

Is this the kernel lag issue we've all been facing? I'm wondering how I was seeing the seeing the same thing even though I wasn't reusing the same buffer/vector in my own path (in xaiepy).

I think that might be a different issue, this one is only hit when we have multiple dispatches in one executable in iree xrt driver.

@nirvedhmeshram
Copy link
Contributor Author

nirvedhmeshram commented Jun 24, 2024

Nice find!

In the future we could add a test that there is exactly 1xclbin generated for a test like this, I can try and add that at some stage.

Yes that would be useful, there is one challenge though, we will create multiple XCLBINs as we create them first and then merge it into the previous one, so the old xclbin artifact is still present even though we wont use it at runtime.

Btw jut to be clear, this test currently will produce three xclbins and use all of them at runtime, we need something like #418 for it to use 1 xclbin at runtime. As I said previously it will still produced three XCLBIN's at compile time.

@nirvedhmeshram nirvedhmeshram merged commit 1ed8257 into main Jun 24, 2024
2 checks passed
@nirvedhmeshram nirvedhmeshram deleted the nm_fix_artifact_bug branch June 24, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance and tracing issues when looking at multi dispatch execution
3 participants