Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite generate_linux_syscalls.zig #20895

Merged
merged 9 commits into from
Aug 5, 2024

Conversation

ippsav
Copy link
Contributor

@ippsav ippsav commented Aug 1, 2024

Related: #20801, #20869

Closes: #20801

Description:

This PR refactors the syscall generation tool aiming to reduce code duplication for both the table based arches and the ones generated using a preprocessor.

The ArchInfo union contains 2 tags .table and .preprocessor

To add an Arch that depends on a tbl file, use the following structure:

    table: struct {
        name: []const u8, // Architecture name
        enum_name: []const u8, // Name of the generated enum
        file_path: []const u8, // Path to the tbl file for enum generation
        header: ?[]const u8, // Optional header to prepend to the enum
        extra_values: ?[]const u8, // This is used to append values at the end of the enum after processing the file mainly
                                  // to avoid conflicts with the generated enums in current master
        process_file: ProcessTableBasedArchFileFn,  // Function to process the tbl file and write the enum
                                                    // Can be customized for specific architecture needs
        filters: Filters, // Different checks on the abi / name tag to modify / skip certain values
        additional_enum: ?[]const u8, // Optional additional enum (see PowerPC architecture)
    },

To add an Arch that is generated using a preprocessor, use the following structure

    preprocessor: struct {
        name: []const u8, // Architecture name
        enum_name: []const u8, // The name of the enum that will be generated
        file_path: []const u8, // The path to the header file that will be used
        child_options: struct {
            comptime additional_args: ?[]const []const u8 = null,  // Additional args that will be passed to zig cc
            target: []const u8, // The target architecture

            pub inline fn getArgs(self: *const @This(), zig_exe: []const u8, file_path: []const u8) []const []const u8 {
                const additional_args: []const []const u8 = self.additional_args orelse &.{};
                return .{ zig_exe, "cc" } ++ additional_args ++ .{ "-target", self.target } ++ default_args ++ .{file_path};
            }
        },
        header: ?[]const u8, // Optional header to prepend to the enum
        extra_values: ?[]const u8, // this is used to append values at the end of the enum after processing the file mainly to 
                                  // avoid conflicts with the generated enums in master
        process_file: ProcessPreprocessedFileFn, // Function to process the header file and write the enum
                                                  // Can be customized for specific architecture needs
        additional_enum: ?[]const u8,

The generated enum will be in this format:

const EnumName = enum {
  // header_values
  
  // values parsed from the processing function
  
  // additional values added at the end
}

How it's tested:

Ran the tool and generated the enums and diffed them with lib/std/os/linux/syscalls.zig from master to make sure everything works the same


if (!mem.eql(u8, prefix, "zigsyscall")) continue;
const linux_dir = try std.fs.openDirAbsolute(linux_path, .{});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const linux_dir = try std.fs.openDirAbsolute(linux_path, .{});
var linux_dir = try std.fs.cwd().openDir(linux_path, .{});
defer linux_dir.close();

This code was here prior to the rewrite but I noticed it while reviewing the diff - openDirAbsolute doesn't actually do anything except assert that the path is absolute and then forward it to std.fs.cwd() openDir , and there's no reason this program should only work with absolute paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, i should've paid attention to the missing dir close, thanks for pointing that out

comptime additional_args: ?[]const []const u8 = null,
target: []const u8,

pub inline fn get_args(self: *const @This(), zig_exe: []const u8, file_path: []const u8) []const []const u8 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub inline fn get_args(self: *const @This(), zig_exe: []const u8, file_path: []const u8) []const []const u8 {
pub inline fn getArgs(self: *const @This(), zig_exe: []const u8, file_path: []const u8) []const []const u8 {

// TODO: maybe extract these from arch/arm/include/uapi/asm/unistd.h
try writer.writeAll(
const ArchInfo = union(enum) {
tableBasedArch: struct {
Copy link
Member

@alexrp alexrp Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tableBasedArch: struct {
table: struct {

filters: Filters,
additional_enum: ?[]const u8,
},
preprocessedArch: struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
preprocessedArch: struct {
preprocessor: struct {

file_path: []const u8,
header: ?[]const u8,
extra_values: ?[]const u8,
processFile: ProcessTableBasedArchFileFn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
processFile: ProcessTableBasedArchFileFn,
process_file: ProcessTableBasedArchFileFn,

Likewise in the other case below.

@alexrp
Copy link
Member

alexrp commented Aug 1, 2024

Other than style nitpicks, this looks reasonable to me.

Just as a sanity check, does it produce a byte-for-byte identical syscalls.zig?

@ippsav
Copy link
Contributor Author

ippsav commented Aug 1, 2024

Other than style nitpicks, this looks reasonable to me.

Just as a sanity check, does it produce a byte-for-byte identical syscalls.zig?

Does the hexagon arch work ? the only difference is the order of enums is different

@alexrp
Copy link
Member

alexrp commented Aug 1, 2024

Does the hexagon arch work ?

Yes, #20798 was merged, so a freshly built compiler should work.

the only difference is the order of enums is different

It would be nice to maintain the current order just to reduce diff churn on the next syscalls update. But if that requires more work than just reordering the arch_infos, then eh, don't bother.

@ippsav
Copy link
Contributor Author

ippsav commented Aug 1, 2024

I ll recompile it on linux and test it and ensure all the values are in the same order

@ippsav
Copy link
Contributor Author

ippsav commented Aug 1, 2024

@alexrp, After the changes there is no difference between syscalls.zig and the newly generated one
image

Copy link
Member

@alexrp alexrp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Copy link
Contributor

@The-King-of-Toasters The-King-of-Toasters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate you and @alexrp taking the time to extend/simplify the code, but I think these are the wrong abstractions for the table-based arch's. Given each table has the same structure of:

<number> <abi> <name> [everything else]

I think you can have a function that splits all the lines into these three strings, and then passes them into functions like so:

const Result = union(enum) {
    entry: []const u8,
    skip: void,
    done: void,
};

*const fn (number: usize, abi: []const u8, name: []const u8, buf: []u8) Result;

That way you can pair everything down to a helper function and a couple of arch-specific Result functions. Besides, a little copy-pasta never hurt anyone ;).


fn processPowerPcBasedArch(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a cute trick when I wrote it, but looking back I think it'd be better to split the logic for ppc32/6 and read the same file twice. It's not like this is performance critical.

Copy link
Contributor Author

@ippsav ippsav Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reasons on why it s a bad idea ? when i've worked on it other architectures can benefit from your trick like sparc/sparc64 which gets it's values from the same file arch/sparc/kernel/syscalls/syscall.tbl and there is some weird abi edge cases, but yeah it is not performance critical if you want me to separate them I can do that

const prefix = fields.next() orelse return error.Incomplete;
// As of 5.17.1, the largest table is 23467 bytes.
// 32k should be enough for now.
const buf = try allocator.alloc(u8, 1 << 15);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this so that a single allocation could be done up front - and if there were problems one could bump the limit up. Now this gets called for every table arch and never gets free'd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed, buf is just passed now, and readFile returns a slice of buf [0..end]


const AbiCheckFn = *const fn (abi: []const u8) FlowControl;

fn getAbiCheckFunction(comptime b: []const u8, comptime flow: FlowControl) AbiCheckFn {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need to be generated at compile-time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No not necessarily, it can be just a slice of (abi, flow) instead of a function pointer

@ippsav
Copy link
Contributor Author

ippsav commented Aug 2, 2024

I appreciate you and @alexrp taking the time to extend/simplify the code, but I think these are the wrong abstractions for the table-based arch's. Given each table has the same structure of:

<number> <abi> <name> [everything else]

I think you can have a function that splits all the lines into these three strings, and then passes them into functions like so:

const Result = union(enum) {
    entry: []const u8,
    skip: void,
    done: void,
};

*const fn (number: usize, abi: []const u8, name: []const u8, buf: []u8) Result;

That way you can pair everything down to a helper function and a couple of arch-specific Result functions. Besides, a little copy-pasta never hurt anyone ;).

I'm not sure I understood what you meant by your impl, what did you mean by three strings, and wouldn't this impl require more mem allocations for entry ? instead of writing it directly, you ll allocPrint entry and then pass it afterwards to the writer ?

@andrewrk
Copy link
Member

andrewrk commented Aug 2, 2024

I'm happy to merge this when all the parties involved reach consensus.

@ehaas
Copy link
Contributor

ehaas commented Aug 2, 2024

Once it's merged would there be interest in updating it to use aro instead of calling zig cc via a subprocess? I'd be happy to do it if so, I don't think it would take much work other than updating aro to support -dD (outputs both the #define directives and the result of preprocessing)

@alexrp
Copy link
Member

alexrp commented Aug 3, 2024

I'm happy to merge this when all the parties involved reach consensus.

I have no strong feelings on the above suggestions. My primary concern was getting the copy/pasted code under control as it was getting annoying when I had to make edits to it. 🙂

@The-King-of-Toasters
Copy link
Contributor

I have no objections on the output - just style. I'm tied up today so I don't mind if this goes in as is. I second @ehaas's comments about just using Aro or clang - I'd like it so that we could run a pre-processor and get a list of key-value definitions.

@alexrp
Copy link
Member

alexrp commented Aug 3, 2024

To quote zig zen: Incremental improvements. 🙂 I think it'd be reasonable to merge this as is and then refine it in further PRs if there's interest.

@andrewrk
Copy link
Member

andrewrk commented Aug 3, 2024

OK I went to merge this but then I realized it effectively has no commit message, not even a commit title ("rewrite $foo" does not contain any more information than the git commit metadata). Can you at least explain what this does? Note, you didn't say closes any particular issue, you just mentioned that it is related. So there's actually no description in the PR writeup or commit messages as to your goals, and whether they were achieved, and how you tested, etc.

@ippsav
Copy link
Contributor Author

ippsav commented Aug 3, 2024

@andrewrk I've updated the PR description, sorry for the hassle i didn't think it need a description will make sure to be more specific next time on each change !

@andrewrk andrewrk merged commit 724804a into ziglang:master Aug 5, 2024
10 checks passed
SammyJames pushed a commit to SammyJames/zig that referenced this pull request Aug 7, 2024
refactors the syscall generation tool aiming to reduce code duplication for both the table based arches and the ones generated using a preprocessor.
@alexrp
Copy link
Member

alexrp commented Aug 8, 2024

@ippsav very minor issue I ran into in #21004: The resulting file has an extraneous newline at the end, which means zig fmt --check will fail if you don't fix that manually. (Not sure if that was there before.)

@ippsav
Copy link
Contributor Author

ippsav commented Aug 8, 2024

@alexrp is the issue still existing in the PR the issue is in syscalls.zig right ? i can take a look and fix it

@alexrp
Copy link
Member

alexrp commented Aug 8, 2024

I pushed a fixed syscalls.zig to that PR. But you can repro it by just copying the generated syscalls.zig on top of lib/std/os/linux/syscalls.zig in master and doing git diff - there will be an extra newline at the end.

@ippsav
Copy link
Contributor Author

ippsav commented Aug 8, 2024

@alexrp it was there before as well, basically after each enum generated };\n\n is written at the end of it which makes the last enum to have a trailing new line that triggers the formatting error, I can fix that so that you no longer have to go format the file after each generation

@ippsav
Copy link
Contributor Author

ippsav commented Aug 8, 2024

@alexrp can you try with this generate_linux_syscalls.zig and see if the issue is still persisting ? I've tried it should be good to go, I ll have to fix my zig build dependencies as I can't rebuild zig on my machine currently :/

PR: #21005

igor84 pushed a commit to igor84/zig that referenced this pull request Aug 11, 2024
refactors the syscall generation tool aiming to reduce code duplication for both the table based arches and the ones generated using a preprocessor.
Rexicon226 pushed a commit to Rexicon226/zig that referenced this pull request Aug 13, 2024
refactors the syscall generation tool aiming to reduce code duplication for both the table based arches and the ones generated using a preprocessor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tools/generate_linux_syscalls.zig could be cleaned up to be table-driven
5 participants