Initial version of type size agnostic changes #59

tenko · 2024-05-19T15:58:03Z

This PR contains changed to allow avoid hard coded sizes in the backend.
These changes are needed in order to later support cross compiling to target with different sizes than
currently hard coded into the compiler.

This is marked [DRAFT] as I request comments to these changes:

Currently the integer literal types are hard coded in the scanner. Here it should perhaps just check if it fits 64bit integer and then move the logic for detecting sizes to sema?
Not sure how to update the symbol files. Should we change to use fixed sizes here (int8, int16, int32 and int64)?
I am not too familiar with the part of the code touched here, so I might have overlooked issues to look out for.

With these changes the unittests pass with the same results as earlier.
Some simple test of different sizes of SET and LONGINT works as expected.

zaskar9 · 2024-05-21T14:32:38Z

First of all, thank you for starting to work on this problem! I am not sure whether this is functionality that will make it into Version 1.0, but I think it is a good idea to have a place to brainstorm how to properly realize the desired flexibility.

To be honest, my plan for supporting "configurable" type sizes was a bit different. The idea is to introduce new low-level Oberon types (maybe in the SYSTEM namespace) that have fixed precision. In analogy to C/C++'s int32_t, int64_t, etc. types, we could have SYSTEM.INT32, SYSTEM.INT64, etc. I would also add the long-missing BYTE type here. Heck, we could even introduce unsigned types! 😎 These types will then be the types used in Scanner, Parser, Sema, CodeGen as well as in import and export of symbol files, making everything consistent and readable by avoiding code blow-up stemming from repeated case distinctions. To configure the compiler for a specific target, it would then suffice to set the corresponding aliases as already supported, essentially something like this.

TYPE
    LONGINT = SYSTEM.INT32;
    INTEGER = SYSTEM.INT32;
    SHORTINT = SYSTEM.INT16;
    BYTE = SYSTEM.BYTE;
    SET = SYSTEM.SET32;

I realize that my proposal is going to require more of a rewrite, but I fear that if this functionality is not realized in a conceptually clean and systematic way, it will be an endless source of problems. The effort could be smaller than expected though, as I have set out writing the compiler with the mindset LONGINT = 64 bit, INTEGER = 32 bit, SHORTINT = 16 bit, etc. Maybe renaming the current types to fixed-precision types and adding the described layer of indirection is already enough. It could even simplify some Sema code...

What do you think?

tenko · 2024-05-21T20:31:34Z

Yes, I agree with your comment regarding internally only use fixed int sizes INT8, INT16, INT32 & INT64.
(Also SET8, SET16, SET32 & SET64) That would keep the code base clean from a lot of indirections needed otherwise.

Real unsigned types is maybe not needed. If we have a type SYSTEM.UINT32 it could only be allowed in procedure
signatures and it would just ensure that any integer passed is not sign extended. That will be enough to solve
the problem with calling OS API functions which expect HEX type constants. Also it can be used in special procedures
where integer is treated as unsigned values. Typical used in hash, random and other bit fiddling algorithms.
Then we avoid implementing unsigned integer arithmetics and the end user usually does not need unsigned
integer (and maybe not aware of the problems with overflow plagued other languages).

Also remember to take into account the two different float types.
Constants will be different depending on the underlying type.
Oberon-07 has only REAL, but Oberon-02 has both REAL & LONGREAL.

How to treat exported constants?
Should integer constants always be LONGINT, SYSTEM.INT64 or the smallest size fitting the number?
With latter we will have issues with sign extension of hex values.

Maybe this topic is big enough with enough corner cases there is need for a short design document?

The purpose of this PR was to only make the LLVM backend be able to output correct code for different sizes of integer, set and real types. I will then just edit the code in OberonSystem.cpp to create the needed changes to start testing this on
32bit platforms.

(Maybe it should be fairly easy to add the fixed types to SYSTEM?)

This is just the small step with changes not intended to affect the current compiler.
The later bigger changes need some planning i guess and is probably difficult to break up into smaller changes, except maybe for symbol files?

Initial version of type size agnostic changes [DEAFT]

52b7044

tenko changed the title ~~Initial version of type size agnostic changes [DEAFT]~~ Initial version of type size agnostic changes [DRAFT] May 19, 2024

zaskar9 marked this pull request as draft May 22, 2024 07:33

zaskar9 changed the title ~~Initial version of type size agnostic changes [DRAFT]~~ Initial version of type size agnostic changes May 22, 2024

zaskar9 added enhancement question labels May 22, 2024

Merge branch 'zaskar9:master' into cross-compile

c703fae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial version of type size agnostic changes #59

Initial version of type size agnostic changes #59

tenko commented May 19, 2024 •

edited

Loading

zaskar9 commented May 21, 2024

tenko commented May 21, 2024 •

edited

Loading

Initial version of type size agnostic changes #59

Are you sure you want to change the base?

Initial version of type size agnostic changes #59

Conversation

tenko commented May 19, 2024 • edited Loading

zaskar9 commented May 21, 2024

tenko commented May 21, 2024 • edited Loading

tenko commented May 19, 2024 •

edited

Loading

tenko commented May 21, 2024 •

edited

Loading