Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial version of type size agnostic changes #59

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tenko
Copy link
Collaborator

@tenko tenko commented May 19, 2024

This PR contains changed to allow avoid hard coded sizes in the backend.
These changes are needed in order to later support cross compiling to target with different sizes than
currently hard coded into the compiler.

This is marked [DRAFT] as I request comments to these changes:

  • Currently the integer literal types are hard coded in the scanner. Here it should perhaps just check if it fits 64bit integer and then move the logic for detecting sizes to sema?
  • Not sure how to update the symbol files. Should we change to use fixed sizes here (int8, int16, int32 and int64)?
  • I am not too familiar with the part of the code touched here, so I might have overlooked issues to look out for.

With these changes the unittests pass with the same results as earlier.
Some simple test of different sizes of SET and LONGINT works as expected.

@tenko tenko changed the title Initial version of type size agnostic changes [DEAFT] Initial version of type size agnostic changes [DRAFT] May 19, 2024
@zaskar9
Copy link
Owner

zaskar9 commented May 21, 2024

First of all, thank you for starting to work on this problem! I am not sure whether this is functionality that will make it into Version 1.0, but I think it is a good idea to have a place to brainstorm how to properly realize the desired flexibility.

To be honest, my plan for supporting "configurable" type sizes was a bit different. The idea is to introduce new low-level Oberon types (maybe in the SYSTEM namespace) that have fixed precision. In analogy to C/C++'s int32_t, int64_t, etc. types, we could have SYSTEM.INT32, SYSTEM.INT64, etc. I would also add the long-missing BYTE type here. Heck, we could even introduce unsigned types! 😎 These types will then be the types used in Scanner, Parser, Sema, CodeGen as well as in import and export of symbol files, making everything consistent and readable by avoiding code blow-up stemming from repeated case distinctions. To configure the compiler for a specific target, it would then suffice to set the corresponding aliases as already supported, essentially something like this.

TYPE
    LONGINT = SYSTEM.INT32;
    INTEGER = SYSTEM.INT32;
    SHORTINT = SYSTEM.INT16;
    BYTE = SYSTEM.BYTE;
    SET = SYSTEM.SET32;

I realize that my proposal is going to require more of a rewrite, but I fear that if this functionality is not realized in a conceptually clean and systematic way, it will be an endless source of problems. The effort could be smaller than expected though, as I have set out writing the compiler with the mindset LONGINT = 64 bit, INTEGER = 32 bit, SHORTINT = 16 bit, etc. Maybe renaming the current types to fixed-precision types and adding the described layer of indirection is already enough. It could even simplify some Sema code...

What do you think?

@tenko
Copy link
Collaborator Author

tenko commented May 21, 2024

Yes, I agree with your comment regarding internally only use fixed int sizes INT8, INT16, INT32 & INT64.
(Also SET8, SET16, SET32 & SET64) That would keep the code base clean from a lot of indirections needed otherwise.

Real unsigned types is maybe not needed. If we have a type SYSTEM.UINT32 it could only be allowed in procedure
signatures and it would just ensure that any integer passed is not sign extended. That will be enough to solve
the problem with calling OS API functions which expect HEX type constants. Also it can be used in special procedures
where integer is treated as unsigned values. Typical used in hash, random and other bit fiddling algorithms.
Then we avoid implementing unsigned integer arithmetics and the end user usually does not need unsigned
integer (and maybe not aware of the problems with overflow plagued other languages).

Also remember to take into account the two different float types.
Constants will be different depending on the underlying type.
Oberon-07 has only REAL, but Oberon-02 has both REAL & LONGREAL.

How to treat exported constants?
Should integer constants always be LONGINT, SYSTEM.INT64 or the smallest size fitting the number?
With latter we will have issues with sign extension of hex values.

Maybe this topic is big enough with enough corner cases there is need for a short design document?

The purpose of this PR was to only make the LLVM backend be able to output correct code for different sizes of integer, set and real types. I will then just edit the code in OberonSystem.cpp to create the needed changes to start testing this on
32bit platforms.

(Maybe it should be fairly easy to add the fixed types to SYSTEM?)

This is just the small step with changes not intended to affect the current compiler.
The later bigger changes need some planning i guess and is probably difficult to break up into smaller changes, except maybe for symbol files?

@zaskar9 zaskar9 marked this pull request as draft May 22, 2024 07:33
@zaskar9 zaskar9 changed the title Initial version of type size agnostic changes [DRAFT] Initial version of type size agnostic changes May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants