Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: std.Target: Add more architecture tags. #20835

Closed
wants to merge 1 commit into from

Conversation

alexrp
Copy link
Member

@alexrp alexrp commented Jul 27, 2024

This is sort of a mini-proposal in PR form; CI is expected to fail. If we like this direction, I'll turn this into a complete patch.

Introduction

I'm starting this from the premise that we want std.Target to be usable more broadly than just in the Zig compiler itself. I could imagine it being useful in other compilers, assemblers, emulators, etc. I certainly would like to base my own compiler project's target information on it. If this premise is wrong, then of course the rest of this doesn't matter. 🙂

I've surveyed about 40-50 different architectures (depending on how you count) as part of this. This PR contains the selection of architectures that I think are worth proactively adding to std.Target based on some objective and subjective criteria.

The objective criteria:

  • Does it have a GCC port?
  • Does it have an LLVM port?
  • Does it have a Linux kernel port?
  • Does it have a glibc or musl port?
  • Does it have a QEMU port?

(Note: I don't consider deprecated ports to count here. For example, ia64 and nios2 have GCC backends, but they're slated for removal. Also, the Linux and glibc/musl points don't count for microcontrollers.)

If all of these are false, then the architecture is hopelessly dead and thus excluded. If only one or two of these are true, it requires an individual (potentially subjective) evaluation. If three to five are true, it's clearly alive.

Survey Results

Clearly Alive

These are included without further evaluation.

  • alpha: GCC, Linux, glibc, QEMU
  • arc64: GCC, Linux, glibc, QEMU
    • ARCv3 (arc is ARCv2). This is a new-ish architecture; support is in the process of being upstreamed.
  • hppa: GCC, Linux, glibc, QEMU
    • Surprisingly enough, more alive than ia64 which HP abandoned hppa for...
  • kvx: GCC, LLVM, Linux, musl, QEMU
    • This is a very new architecture; all in the process of being finalized and upstreamed.
  • microblaze: GCC, Linux, glibc, musl, QEMU
  • or1k: GCC, Linux, glibc, musl, QEMU
  • sh: GCC, Linux, glibc, musl, QEMU

Hopelessly Dead

These are excluded without further evaluation due to fulfilling none of the objective criteria.

  • avr32
  • biin
  • c500
  • ia64
  • lm8
  • m88k
  • ns32k
  • s360
  • s370
  • tile
  • vax
  • z8
  • ez8
  • z8k
  • z80k
  • z80
  • ez80
  • z180
  • z280
  • z380
  • zneo

Individual Evaluation

  • bfin: GCC
    • Used to have a Linux port. Still sees limited use in the embedded space, but is considered soft-deprecated.
  • c6x: GCC
    • No new models since ~2012? GCC port is borderline unmaintained.
  • cris: GCC, QEMU
    • Hanging on by a thread in GCC and QEMU. Would not be surprised to see this one get dropped in the coming years.
  • epiphany: GCC
    • Abandoned by Adapteva. Sucks too; I was an early adopter of the Parallella boards. Cool architecture, but basically dead. GCC port likely going in the coming years.
  • fr30: GCC
    • Abandoned by Fujitsu in favor of Arm.
  • frv: GCC
    • Used to have a Linux port. GCC port is borderline unmaintained.
  • ft32: GCC
    • GCC port is borderline unmaintained.
  • h8300: GCC
    • GCC port actually looks to be maintained. Very niche architecture though. No new models since ~2011?
  • iq2000: GCC
    • GCC port is borderline unmaintained.
  • lm32: GCC
    • GCC port is borderline unmaintained.
  • m32c: GCC
    • GCC port is borderline unmaintained. RTEMS port was removed recently; this port will probably go altogether in the near future.
  • m32r: GCC
    • GCC port is borderline unmaintained.
  • mcore: GCC
    • GCC port is borderline unmaintained.
  • mn10300: GCC
    • GCC port is borderline unmaintained.
  • moxie: GCC
    • Cute little ISA created in the open by the maintainer of libffi. GCC port sees ongoing development.
    • I have a soft spot for open ISAs, and especially for simple/elegant ones like this. I think it would be cool to have a Zig backend for this one someday, so I'm including it.
  • nds32: GCC
    • Used to have a Linux port. GCC port is borderline unmaintained.
  • nios2: Linux, glibc
    • GCC port is slated for removal; LLVM port was removed. Intel have been very explicit that they're done with Nios II. I suspect the Linux port will be removed soon after that, and glibc will follow.
  • pdp11: GCC
    • GCC port seems unmaintained. Strongly suspect it's only kept around for the nostalgia factor.
  • pru: GCC
    • GCC port seems to be maintained. Still pretty niche, and I can't even find an ISA manual.
  • rl78: GCC
    • GCC port is borderline unmaintained.
  • rx: GCC, QEMU
    • GCC port is borderline unmaintained. QEMU port seems somewhat maintained. Pretty niche microcontroller though.
  • s390: GCC, glibc
    • Linux port exists but is deprecated in favor of only supporting s390x. Once removed, I expect glibc to follow, and eventually GCC.
  • tricore: QEMU
    • Not even sure why the QEMU port is still around without a compiler.
  • v850: GCC
    • GCC port is borderline unmaintained.
  • visium: GCC
    • GCC port is borderline unmaintained.
  • xstormy16: GCC
    • GCC port sees some ongoing development. Appears to be an academic ISA for teaching or something, but I can't even find an ISA manual.

@alexrp alexrp marked this pull request as draft July 27, 2024 22:22
@@ -979,26 +979,38 @@ pub const Cpu = struct {
};

pub const Arch = enum {
alpha,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alpha technically allows big endian implementations. To my knowledge, no such implementation has ever existed, and no software that supports Alpha supports it in a big endian configuration. So no alphaeb here.

msp430,
or1k,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenRISC does specify a 64-bit architecture, but no core designs exist for that yet, and no software supports it. In practice, OpenRISC is basically considered a 32-bit architecture, so not adding a 64-bit variant here yet.

thumb,
thumbeb,
x86,
x86_64,
xcore,
xtensa,
xtensaeb,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xtensa supports both little and big endian. QEMU supports both, as does GCC.

@alexrp alexrp force-pushed the target-add-arches branch from 2d51a28 to e5ad877 Compare July 29, 2024 23:00
@andrewrk
Copy link
Member

Approved.

Some notes:

  • we're willing to add/keep targets for active hobbyist projects, or active communities surrounding "dead" hardware. for example:
  • what do you think about modeling endianness and 32/64-bitness as CPU features rather than entire architectures?

@alexrp
Copy link
Member Author

alexrp commented Jul 30, 2024

we're willing to add/keep targets for active hobbyist projects, or active communities surrounding "dead" hardware.

Makes sense.

And FWIW, I was being fairly conservative here. If we wanted to be a tad more liberal with the selection, I think bfin, cris, nios2, and rx are well worth considering as they still at least have real user bases to speak of, despite being on the decline. I also just now realized one thing that might be a decent reason for including ia64: As far as I'm aware, it's the only explicit ILP architecture... and that is an interesting property for a lot of recent research in compiler design - specifically around dataflow-centric IRs that trivially expose ILP by construction. 🤔 Anyway, I don't have strong feelings on these, just something to consider.

what do you think about modeling endianness and 32/64-bitness as CPU features rather than entire architectures?

Hmmmmm. Mixed feelings right as I'm reading this. Mostly concern about the UX. But let me think/sleep on it and get back to you with some coherent thoughts.

@alexrp
Copy link
Member Author

alexrp commented Jul 31, 2024

Hmmmmm. Mixed feelings right as I'm reading this. Mostly concern about the UX. But let me think/sleep on it and get back to you with some coherent thoughts.

@andrewrk ok, did some thinking (sorry if this ended up a bit ramble-y):

I think there's a world where you can separate bitness and endianness from the architecture tag, but I don't think CPU features are the way. If you went that route, you'd need two separate feature flags to indicate "this CPU is capable of 64-bit code" and "we're actually compiling code to be 64-bit". It just seems a bit wrong on a conceptual level. I view CPU features as being strictly capabilities, and the choice of endianness and bitness as almost being similar to the choice of ABI, if that makes sense. (For SPARC v9, being capable of true bi-endianness, it quite literally is 'just' a choice of ABI. But that's an exceptional case.)

So if we wanted to reduce the amount of Arch tags, I think the approach that would make more sense to me conceptually is to make it another component or sub-component, separate from both architecture and CPU features.

But this is where the user-friendliness concern I alluded to earlier comes in: Endianness and bitness are much more important than CPU model, features, ABI, and even libc in many cases, when specifying a target triple. Assuming for a second that #20690 is accepted in its current form, we can infer a sensible ABI for probably 95% of all Zig usage that'll ever happen. We also already have mostly reasonable baselines for CPU models/features, and our default choice of libc is probably also good for most users. But if I want to target, say, mips or powerpc, there really just isn't a good default for bitness or endianness. For sparc, though, big endian is the only sane default (for v9; v8 can only be big endian). Then there's x86 which is only little endian, so there is no choice. All this to say that you have to figure out if bitness/endianness components should be mandatory in the target triple, or optional with defaults that we hope are good, or optional only for specific architectures, etc.

Here I would suggest that e.g. for endianness, the combination "some are mandatory (mips, powerpc, ...), some are optional (arm, sparc, ...), some are not permitted (x86, riscv, ...)" would be the most user-friendly approach.

There's also the familiarity aspect to consider. People are used to the Arch tag names we have now because they're pervasive in the Unix world. If people are used to specifying either powerpc or powerpc64, and they then see that Zig's architecture component only accepts powerpc, what are they going to think that this means? For the ABI component, I think this is less of a concern because we expect 95% of users to never need to specify it, so it's not all that important that the ABI names follow exact GNU convention.

So, if we were to go this route, I would strongly suggest that we try to maintain status quo syntax, even if the underlying modeling of the information is more structured. That is:

<arch>[bits][endian][.<cpu>[+~feats]]-<os>[.<ver>][-<libc>[.<ver>][-<abi>[+~opts]]]

The parsing works by first chopping off endianness and bitness suffixes if present (in that order), and then matching against the Arch enum. After that, there's architecture-specific logic for bitness and endianness:

  • If the architecture requires an explicit value, take it if given, or error.
  • If the architecture allows an optional explicit value, take it, otherwise default.
  • If the architecture does not permit an explicit value, error if given.

For this scheme to work, aarch64_be becomes aarch64eb (or rather, arm64eb), but I think that's ok anyway; it's a super weird outlier with this naming scheme. We'd also have to normalize le/el vs be/eb, which I think is also ok. x86_64 might need a special case since x8664 looks... goofy. OTOH, we could just allow an optional _ between architecture name, bitness, and endianness in general, so x8664, x86_64, arm64, arm_64, arm_64eb, powerpc_64_el, mipsel, etc... are all valid. Then we would just pick a good-looking canonical name when rendering it back to a string, which would probably mean <arch>_<bits> for x86 and <arch><bits><endian> for everything else.

In this new world order, std.Target would look something like this (simplifying a lot for brevity):

pub const Target = struct {
    arch: Arch,
    cpu: Cpu,
    os: Os,
    libc: LibC,
    abi: Abi,
    ofmt: ObjectFormat,

    // I think there's enough info here to warrant separating it from Cpu, if possible.
    pub const Arch = struct {
        tag: Tag,
        bits: Bits,
        endian: Endian,

        pub const Tag = enum {
            arm,
            mips,
            powerpc,
            riscv,
            sparc,
            x86,
            ...
        };
        pub const Bits = enum {
            @"32",
            @"64",
        };
        pub const Endian = enum {
            big,
            little,
        };
    };

    pub const Cpu = struct {
        model: Model,
        features: Feature.Set,

        pub const Feature = struct { ... };
        pub const Model = struct { ... };
    };

    pub const Os = struct {
        tag: Tag,
        version_range: VersionRange,

        pub const Tag = enum { ... };
    };

    pub const LibC = struct {
        tag: Tag,
        dynamic_linker: DynamicLinker,
        version_range: VersionRange,

        pub const Tag = enum { ... };
        pub const DynamicLinker = struct { ... };
    };

    pub const Abi = struct {
        variant: Variant,
        options: Options,

        // Analogous to Model and Features in Cpu.
        pub const Variant = struct { ... };
        pub const Options = struct { ... };
    };
    
    pub const ObjectFormat = enum { ... };
};

@alexrp
Copy link
Member Author

alexrp commented Aug 13, 2024

@andrewrk I plan to re-do this as a proper PR shortly since #21020 and #21037 are the only remaining cleanups I planned to do in preparation. Just wanted to check if you had a chance to consider my comment above? (Even though I considered it in the context of #20690, it could be done independently of that.)

@alexrp
Copy link
Member Author

alexrp commented Aug 28, 2024

(In any case, closing this since there's no reason to have it pollute the PR queue right now.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants