Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge memory model is non-optimal, suggest memory model change to large 16-bit #6

Open
joncampbell123 opened this issue Dec 12, 2016 · 13 comments
Assignees

Comments

@joncampbell123
Copy link
Collaborator

Just a heads up, the huge memory model has the benefit of having the compiler adjust pointers such that pointer math is easier, but, also causes performance loss and overhead in your code.

The large model will give you all the benefits of far code and data without the overhead of the compiler and C library's pointer adjustment routines.

You will have to normalize pointers yourself and deal with crossing 64KB segments for large data, but it's worth it.

Also, the compiled binaries for Large model are significantly smaller than Huge model binaries.

@sparky4 sparky4 self-assigned this Dec 15, 2016
@sparky4
Copy link
Owner

sparky4 commented Dec 15, 2016

ah ok thanks

@sparky4
Copy link
Owner

sparky4 commented Dec 17, 2016

ah large mode breaks scroll's ability to draw chikyuu properly hmmmm

@jmalak
Copy link
Collaborator

jmalak commented Dec 18, 2016

Take into account that if you use arrays longer then one segment you need huge memory model (it handles pointers over segment boundary) or you must handle this case yourself.

@joncampbell123
Copy link
Collaborator Author

The VRS rendering code assumes that the offset portion of the pointer is such that rendering a scanline never crosses 64KB. Try normalizing the pointer yourself before rendering.

Basic (slow) example code:

unsigned long a = (FP_SEG(ptr) << 4UL) + (unsigned long)FP_OFF(ptr);
ptr = MK_FP(a>>4UL,a&0x0FUL);

Is there anything in your code that uses an array that exceeds 64KB?
You might shrink the array, or set up the array in segments so that no part of it exceeds 64KB.

@joncampbell123
Copy link
Collaborator Author

You may also consider dynamically allocating the array segments as well.

@Ruedii
Copy link

Ruedii commented Dec 18, 2016

Might I recommend a solution?
Switch the arrays out for matrix that use page alignment for lines.

When X and Y are the address, X is the pointer, Y is the segment designator.

For simplified code:
Data Segment = Base+(Y*Pagesize)
Data Pointer = X

If you REALLY need a liniar array, you can then virtualize this into pages by:
X = L Modulus 4K (Utilizing truncation of high bits)
Y = Integer Truncation of L/4K (Utilizing truncation of low bits followed by shifting high bits low)

This of course uses 4K segments, you can use any increment of 4K, but I recommend keeping to powers of two for convenience of simplification into binary operations so arithmetic operations aren't needed.

@jmalak
Copy link
Collaborator

jmalak commented Dec 18, 2016

With OW you can use large memory model (more eficient) and appropriate variable/pointers can be marked as __huge to use right aritmetic (slow) only for these variables/pointers.

@sparky4
Copy link
Owner

sparky4 commented Dec 19, 2016

well holy shit it is SIGNIFICANTLY FASTER GOD DAMN!

@joncampbell123
Copy link
Collaborator Author

Here's hoping you're not joking or being sarcastic. Good luck :)

@Ruedii
Copy link

Ruedii commented Jan 4, 2017

The method I mentioned is very clean, it's an old method for fixed segment databases.

@sparky4
Copy link
Owner

sparky4 commented Jun 5, 2018

@Ruedii wolf 3d dose a pre calculation of the render off set in an array ..

@Ruedii
Copy link

Ruedii commented Apr 4, 2019

Sorry for the late reply, got lost in my master in pile, and my life has been quite busy as well, mostly family issues for the past year.

The x and y as variables are confusing. I should have used Ps and Po for "Pointer Segment" and "Pointer Offset" Of course, in rendering you can convert this as directly as possible by using certain block sizes.

It might be good to dynamically assign segment block size based on platform if it's capable. Each Intel generation buffers in larger and larger segments. In the 486 an optional cache gets added. The L1/L0 data cache slowly grows with generations and can usually be read on all CPUs 486 and later.

Further assembler optimizations of memory access of arrays would be done in the C Library itself, or supplementary math libraries you can add. Arrays simply add one more multiplier in (component object size) to determine their pointer size. As long as the component object is smaller than the preferred memory page size you can handle cross pages.

Since you aren't protecting pages individually, cross-page access with the offset when accessing an object crossing into the next page via a offset from the last page should only create a small performance loss, that should be a non-issue. However, calculating your object size to be a common denominator to the page size should prevent this issue altogether hands-off. This may mean adding a bit of padding that you can use for some nice added metadata flags or something.

If you wish to add streaming basic math and copy routines. The proper way to handle it is to run the stream-handler loop so that the pointers for the next data piece are assigned to the data pointer register immediately after the data is pulled to the register, before computing the data in the register. This will allow the full time of doing operations on the data in the register to provide time for the memory to have it's registers flipped. This will particularly help on 486 and later processors that have a (very small) internal L0/L1 data buffer or cache.

@sparky4
Copy link
Owner

sparky4 commented Mar 17, 2022

i honestly been swamped with school work constantly and lack of help made me not work on it in general. lots have been going on with my mental health but i am getting more stable. i aint dead nor i forgot the project. i just been super busy with school thats all. the biggest problem is i don't have the XT sitting around as it is in storage to work on the game some more. so i cannot really test it on authentic hardware except a 286 i will continue once my life is better and not grinding away at school.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants