Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port standard instructions to Rust. #13486

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

kevinhartman
Copy link
Contributor

Summary

Adds a new Rust enum StandardInstruction representation for standard instructions (i.e. Barrier, Delay, Measure and Reset) and updates the encoding of PackedOperation to support storing it compactly at rest.

The size of a PackedOperation has now been anchored to always be 64 bits, even on 32-bit systems. This is necessary to encode the data payload of standard instructions like Barrier and Delay. See the docstrings within packed_instruction.rs for details.

Details and comments

The implementation works very similarly to what we currently do for StandardGate, but with a bit of extra consideration for an additional data payload used by variadic/template instructions like Barrier and Delay.

Similarly to StandardGate, the newly added StandardInstruction enum serves as the first-class Rust interface for working with standard instructions. The existing OperationRef enum returned by PackedOperation::view has a new variant StandardInstruction which exposes this.

Unlike StandardGate, StandardInstruction is not a pyclass because it is not directly used to tag instructions as standard in our Python class definitions. Instead, the simple integral enum StandardInstructionType is used as the tag, and OperationFromPython::extract_bound further queries the incoming Python object for variadic/template data when building the complete StandardInstruction.

Encoding

The PackedOperation encoding has been updated as follows:

  • A PackedOperation is now always 64 bits wide even on 32 bit systems, and is technically a union, viewed as either two sequential 32 bit words or a platform width usize, reinterpreted as a pointer.
  • The discriminant has been widened from 2 to 3 bits (it is now at its maximum width, but we still have room for 3 more variants).
  • The discriminant now has additional variant StandardInstruction.

Todo:

  • Add more detail to PR description.
  • Actually test this on 32 bit systems and a big endian 64 bit arch.

@jakelishman
Copy link
Member

jakelishman commented Nov 25, 2024

I'm concerned that the LoHi struct and PackedOperation union are making it easier to make mistakes on BE and/or 32-bit systems, because the encodings now change between them.

I feel like the bitshift and masking operations could be extended over the whole u64, and that'll all automatically work on BE and 32-bit systems, especially with the size of PackedOperation now fixed to 64 bits - we can even have everything be aligned. The shifts of the payload of instructions can be set such that a u32 payload is always in the high bits, with the padding bits between it and the rest of the discriminants if you're concerned about aligned access, because then the bitshifts will be compiled out into a single register load (just read the u32 from the right place) and the compiler will handle endianness for us. The pointer doesn't need to be handled any differently to any other payload - everything's just shifts and masks anyway (LoHi can't avoid that), and I think introducing the union makes that harder to follow.

The LoHi struct to me seems to be forcing every method of PackedOperation to think about the partial register reads/loads. If we use shifts and masks on a u64 with const masks and shifts, there's no logic split where some of it is done with shifts and some with endianness-dependent partial register access in our own code.

@jakelishman
Copy link
Member

jakelishman commented Nov 25, 2024

I guess my main point is this: the 64-bit PackedOperation can always be thought about as a collection of payloads each of which is stored in a particular mask and needs a particular shift. These are:

  • the PackedOperation discriminant at (0b111, 0)
  • a pointer at (usize::MAX as u64 & 0b000, 0) (regardless of 32-bit or 64-bit)
  • a StandardGate discriminant at (0xff << 3, 3)
  • a StandardInstruction discriminant at (0xf << 3, 3)
  • a DelayUnit payload at (for example) (0x7 << 32, 32)
  • a u32 payload at (u32::MAX as u64 << 32, 32)

Introducing LoHi doesn't remove the need to bitshift and mask for most items, it just means that some of my above list are done one way, some are done another, and the union means that the pointer now has more ways to access it. LoHi also restricts the payload size to u32, when we already have payloads that exceed that (the pointers).

If you want a shade more encapsulation than my loose const shift and mask associated items, would something like this look better to you?

struct PayloadLocation<T: SomeBytemuckOrCustomTrait> {
    mask: u64,
    shift: u32,
    datatype: PhantomData<T>,
}
impl PayloadLocation {
    // Note we _mustn't_ accept `PackedOperation` until we've got a valid one,
    // because holding a `PackedOperation` implies ownership over its stored
    // pointer, and it mustn't `Drop` or attempt to interpret partial data.

    #[inline]
    fn store(&self, src: T, target: u64) -> u64 {
        let src: u64 = bytemuck::cast(src);
        target | ((src << self.shift) & self.mask)
    }
    #[inline]
    fn load(&self, target: u64) -> T {
        bytemuck::cast((target & self.mask) >> self.shift)
    }
}

const PACKED_OPERATION_DISCRIMINANT = PayloadLocation { mask: 0b111, shift: 0 };
const STANDARD_GATE_DISCRIMINANT = PayloadLocation { mask: 0bff << 3, shift: 3 };
const POINTER_PAYLOAD = PayloadLocation { mask: usize::MAX as u64 & 0b000, shift: 0 };
// and so on

(Bear in mind I just typed that raw without trying it, and I didn't think all that hard about what the trait bound should be.)

If everything's inlined and constant, the compiler absolutely should be able to compile out 32 bit shifts and all-ones masks into partial register reads/loads itself, so there oughtn't to be any overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants