A 6502 system for high-level programming languages
Fenestra6502 is a 6502-based system with an extra circuit to improve the performance of procedure/function calls.
The circuit realizes "zero-page windows", which is similar to "register windows" adopted in several RISC processors. The circuit detects the instruction fetch of JSR/RTS and changes the mapping of zero-page on physical memory.
Since 6502 has no pointer registers, the operations of stack frames are expensive compared to other modern processors. The "zero-page windows" dramatically simplifies the operations and improve the performance of procedure/function calls.
In the BASIC era, from the late 1970s to the early 1980s, 6502 was competitive to the processors of the later such as Z80 and MC6809. After that, the structure programming waves had come and procedural programming languages such as PASCAL and C had come to be used on microprocessors.
Since MC6809 has modern functionalities to assist procedural programming languages, there were no difficulties to implement such languages. Z80 was also able to adapt to them thanks to the index registers added, but more costly than MC6809.
However, 6502 faced difficulties. There were two reasons:
-
The stack area of 6502 is limited to only 256 bytes. It is not enough to realize stack frames.
-
6502 has no pointer registers. To realize a user-defined stack and the stack pointer (and frame pointers, if necessary), the zero-page indirect indexed addressing mode has to be used instead. It is relatively expensive.
To overcome the above difficulties, I came up with the idea of "zero-page windows". The idea is based on "register windows".
On the processors with register windows, the arguments for the calling procedure are stored to the registers in the parameter section instead of pushing onto the stack. When a call is invoked, the registers are automatically "shifted" and the parameter section becomes the current frame section.
The zero-page is divided into four sections. The first three sections are "shifted" on calls (the JSR instruction) and returns (RTS). The last section (0x00C0-0x00FF) are fixed. When the stack pointer and the frame pointer are needed, they can be placed in this section.
The zero-page is shifted by 128 bytes for each call. Because the maximum number of calls possible is limited to 128 (since the size of the stack is 256 bytes), 16k+128 bytes of memory are needed for the zero-page windows in addition to the 64k-256 bytes for the fixed area.
To evaluate the effectiveness of the zero-page windows, the cost of the operations on stack frames on several microprocessors are compared.
The following assumptions are placed:
- The parameter passing via registers is out of scope.
- Frame pointers are not used.
Comment on the second: If the size of all objects allocated on the stack can be determined on compile time, frame pointers are not needed, and indexed addressing mode on the stack pointer is enough. On the other hand, if there is a procedure using an object which size is determined in run-time (alloca
is used, for example), a frame pointer is needed in addition to the stack pointer because the compiler cannot calculate the release offset of the stack pointer when returning from the procedure.
The following four operations are considered:
- prologue code (make a new frame)
- epilogue code (release the frame and return)
- load from the stack frame
- store to the stack frame
In this part, general cases are considered. There are compilers that can emit more efficient code in some cases (for example, use HL instead of IX if possible, use the PUSH instruction instead of offsetting the stack pointer SP). Moreover, the best code could be derived by hand-tuning. However, I do not take into account such cases.
Because Z80 does not have indexed addressing modes on the stack pointer SP, a frame pointer that supports the indexed addressing mode is needed. The index register IX is used for this purpose.
; clocks
push ix ; 15
ld ix,#0 ; 14
add ix,sp ; 15
ld hl,#<disp> ; 10
add hl,sp ; 11
ld sp,hl ; 6
SP offset can be replaced by the PUSH instructions when the offset is small enough.
ld sp,ix ; 10
pop ix ; 14
ret ; 10
ld r,(ix+<disp>) ; 19
ld (ix+<disp>),r ; 19
Since MC6809 supports the indexed addressing mode on the hardware stack pointer S, things are very simple and ideal.
NOTE: If a frame pointer is needed, the user stack pointer U can be used.
leas -<disp>,s ; 5
leas <disp>,s ; 5
rts ; 5
lda <disp>,s ; 5
sta <disp>,s ; 5
In the case of 6502, the stack pointer has to be implemented as a pointer placed on the zero-page.
The access to the stack frame can be performed by the zero-page indirect indexed addressing mode with the index register Y.
lda *SP ; 3
sec ; 2
sbc #<disp> ; 2
sta *SP ; 3
lda *(SP+1) ; 3
sbc #0 ; 2
sta *(SP+1) ; 3
lda *SP ; 3
clc ; 2
adc #<disp> ; 2
sta *SP ; 3
lda *(SP+1) ; 3
adc #0 ; 2
sta *(SP+1) ; 3
rts ; 6
ldy #<disp> ; 2
lda (SP),y ; 5
ldy #<disp> ; 2
sta (SP),y ; 5
In the case of Fenestra6502, both prologue and epilogue code is not needed.
The access to the frame can be performed by the zero-page addressing.
lda *<addr> ; 3
sta *<addr> ; 3
Z80 | MC6809 | 6502 | Fenestra6502 | |
---|---|---|---|---|
prologue | 71 | 5 | 18 | 0 |
epilogue | 34 | 10 | 24 | 6 |
load | 19 | 5 | 7 | 3 |
store | 19 | 5 | 7 | 3 |
(unit: clocks)
The zero-page windows are realized with the following components:
-
JSR/RTS decoder detects the fetch of JSR/RTS instructions by observing the clock E and SYNC signal and generate up/down signals for the frame counter.
-
Frame counter counts the nested count of JSR/RTS. The count value is used to calculate the address to which the zero-page is mapped.
-
Adder adds the value of the frame counter and the address A7 to shift the zero-page mapping by 128 bytes for each call.
-
Selector switches the address of the zero-page and the fixed area.
-
Address decoder decodes the address and determines which area is to be accessed.
The address decoder and JSR/RTS decoder are implemented by GAL22V10.
The companion chip AVR ATMEGA164P plays the following four roles:
-
Firstly, bootstraps 6502: write bootstrap code to the memory, then reset and start 6502.
-
Works as a UART to communicate with PC. The communication between AVR and 6502 is handled by the RDY and IRQ signals with the WAI instruction.
-
Generates the clocks for 6502.
-
Measures the time by the internal 16-bit timer.
To check if the circuit works, I ran the following code and observed the signals by a logic analyzer.
lda #0xAA
sta *0x80
jsr f1
jmp done
f1:
lda *0
lda #0xBB
sta *0x80
jsr f2
rts
f2:
lda *0
lda #0xCC
sta *0x80
jsr f3
rts
f3:
lda *0
rts
The behavior of the JSR/RTS decoder and the frame counter can be observed.
In addition to Fenestra6502, three systems are prepared: pure W65C02S, Z80, and MC6809 systems, no wait is applied, full-speed.
The following compilers are used:
- Z80: SDCC (http://sdcc.sourceforge.net/)
- MC6809: gcc6809 (https://github.com/jmatzen/gcc6809)
- 6502: cc65 (https://cc65.github.io/)
I developed a compiler called CCLV for Fenestra6502 which takes advantage of the zero-page windows.
Each compiler is used with default settings. No optimization options are specified.
The following two codes are evaluated.
The function fib
calculates the Fibonacci number. The argument 23 is chosen because it is the maximum argument of fib which result is within 16 bit signed integer.
The function tarai
(a.k.a. Takeuchi function) causes many sub-calls. In the case tarai(10, 1, 0)
, 803,421 calls are invoked. The maximum depth of the stack is 55.
int fib(int n)
{
if (n < 2)
return n;
else
return fib(n - 1) + fib(n - 2);
}
int tarai(int x, int y, int z)
{
if (x <= y)
return y;
else
return tarai(tarai(x - 1, y, z),
tarai(y - 1, z, x),
tarai(z - 1, x, y));
}
System | Compiler | fib(23) | tarai(10, 1, 0) |
---|---|---|---|
Fenestra6502 (2MHz) | CCLV | 3.688 | 36.457 |
W65C02S (2MHz) | cc65 | 11.669 | 198.801 |
W65C02S (2MHz) | CCLV | 9.304 | 96.022 |
Z80 (4MHz) | SDCC | 6.296 | 83.256 |
MC6809 (2MHz) | gcc6809 | 4.824 | 51.421 |
MC6800 (2MHz) | CCLV | 9.461 | 102.639 |
(unit: sec)
I used Bsch3V (https://www.suigyodo.com/online/schsoft.htm) to write schematics of Fenestra6502. I would like to thank Mr. 岡田仁史.
I used FGAL (http://elm-chan.org/works/pgal/report_e.html) to compile GAL22V10 definitions. I also used PGAL and its schematics as reference. I would like to thank Mr. ChaN.