-
Notifications
You must be signed in to change notification settings - Fork 23
PDP 11 Co Pro Notes
This page documents various approaches to compiling a Pi Spigot program written in C, such that it's runnable on one of the BBC Micro PDP-11 Co Processors, of which there are two:
- the Pi/ARM-based PiTubeDirect Co Pro
- the FPGA-based Matchbox Co Pro. In addition, the B-em emulator also contains a PDP-11 Co Pro emulation, so maybe that counts as three!
The Pi Spigot is a short C program that prints the first 1000 digits of Pi:
#include <stdio.h>
#define N 3500
main() {
short r[N + 1], i, k, b, c;
long d;
c = 0;
for (i = 1; i <= N; i++)
r[i] = 2000;
for (k = N; k > 0; k -= 14) {
d = 0;
i = k;
for(;;) {
d += r[i]*10000L;
b = i*2 - 1;
r[i] = d%b;
d /= b;
i--;
if (i == 0) break;
d *= i;
}
printf("%.4d", (int)(c + d/10000));
c = d%10000;
}
}
For an explanation of how this works, see this discussion by Ben Lynn.
One notable point about the Pi Spigot is that it requires 32-bit arithmetic (longs in C on the PDP-11).
Initially we struggled to find any C compilers that were easily runnable on a modern Linux distribution (Ubuntu 18.04) and that actually worked.
Eventually, we tracked down four possibilities:
- the original C compiler from AT&T's Unix Version 7 (V7)
- GCC (GNU C Compiler) built as a PDP-11 cross compiler
- PCC (Portable C Compiler) built as a PDP-11 cross compiler
- ACK (Amsterdam Compiler Kit) which includes a PDP-11 cross compiler as standard
The rest of this page describes our experiences trying to get each of these to successfully compile and run the Pi Spigot program. Some were trivially easy (ACK) and just worked. Some were fiendishly difficult (GCC), and involved fixing bugs in the compiler itself.
Ultimately, all four compilers produced working code that we were able to run on the PDP-11 Co Processor.
Here are links to the four chapters of the adventure:
- Chapter 1 - Compiling with the V7 Unix Compiler in SIMH and later APOUT
- Chapter 2 - Compiling with the PCC PDP-11 Cross Compiler
- Chapter 3 - Compiling with the GCC PDP-11 Cross Compiler
- Chapter 4 - Compiling with the ACK PDP-11 Cross Compiler
The following tables summarises our results:
Compiler | Assembler | Maths Lib | Std C Lib | Spigot size | Spigit runtime |
---|---|---|---|---|---|
V7 Unix CC | Yes | Yes | Yes | 1950 bytes | 15.82s (5) |
PCC | No (1) | No (2) | No (3) | 1616 bytes | 17.45s (5) |
GCC | Yes | Yes | No (4) | 1536 bytes | 89.33s (5) |
ACK | Yes | Yes | Yes | 1568 bytes | 17.49s (5) |
Notes:
- PCC doesn't include an assembler - we ended up using V7 Unix's as and GCC's pdp11-aout-as
- PCC includes a maths lib (libpcc) but this isn't supported on the PDP-11, so we used some routines from V7 Unix
- PCC doesn't support compiling libC for PDP-11
- GCC doesn't support compiling libC for PDP-11
- Benchmarked is on PiTubeDirect on a Pi Zero using the Hognose build with the PDP-Debugger enabled
All the source code for the tests can be found here, including a build script that generates all of the executables.
General issues:
- The Pi Co Pro system calls uses different traps to V7 Unix system calls. This means that even if lib C is supported, you can't use any calls that use Unix system calls. Currently we have provided our own library for outc() and osword(). It might be possible to install a trap handler for some Unix system calls (like printing a character).
- The Pi Co Pro expects code to start at 0x100 - we have managed to accomodate this in all the tool chains.
- The a.out files generated by Unix V7's ld linker are not compatible with GCC's binutils tools. You got a file truncated error. It turns out they use an older format for the symbol table. We were able to write a tool (call mangle)to reformat the symbol table, allowing us to use pdp11-aout-objdump to generate consistent disassemblies of executables.
V7 Unix Compiler issues:
- The compiler syntax is archaic (pre K&R) but is well documented in the C Reference Manual. Specifically, function definitions are different, and the integer types are limits to: char, short, unsigned, int and long. We had to maintain seperate source files for this compiler.
- There is a known issue with Unix V7's division libraries relying on undefined behaviour. This bug shows up in APOUT, but not SIMH, Matchbox or PiTubeDirect. So not really a problem for us.
- We hit a minor code generation bug (worked around here) that caused the Pi Spigot to output all zero.
PCC issues:
- PCC doesn't include an assembler, so an external assembler must is needed:
- the V7 Unix assembler as works well
- the GCC assmembler pdp11-aout-as works less well for two reasons:
- the default number base is decimal not octal (we worked around this in PCC by adding 0 prefixes)
- extended branch istructions (JBR/JCC) are not supported and cannot be fakes with macros
- Some bugs remain in PCC's PDP-11 code generation that affected our test programs:
- PMINI output 00780078 instead of 00000078
- PTEST fails on unsigned long division tests
- PSPIGIT doesn't correctly report the run time (since fixed)
GCC issues:
- We hit two serious code generation bugs - detailed and fixed below
- Enabling the optimizer breaks everything - we have not investigated this further
- The maths routines in libgcc are written in C and make no used of the PDP-11's div and mul instructions
- Consequently the generated code is very slow indeed
ACK issues:
- None at all - everything worked very well out of the box and the Pi Spigot ran very quickly
Based on this, I think ACK (the Amsterdam Compiler Kit) is currently the best option
- PDP-11 Programming Card from 1975
- Developing for a PDP-11
- Diane's PDP-11 Page
- subgeniuskitty - PDP-11 Cross-Compiling - Building a cross compiler with GCC for pdp11-aout.
- C Programming on a bare metal PDP-11
- BBC Basic for the PDP-11 (Jonathan Harston)
- PDP-11 CoProcessor Technical Reference (Jonathan Harston)
- MMB/SSD Utils in perl (Stephen Harris)
I followed these instructions to get V7 Unix running on SIMH.
Here's a sample session compiling and running a Pi Spigot:
$ cat > pi.c
#include <stdio.h>
#define N 3500
main() {
short r[N + 1], i, k, b, c;
long d;
c = 0;
for (i = 1; i <= N; i++)
r[i] = 2000;
for (k = N; k > 0; k -= 14) {
d = 0;
i = k;
for(;;) {
d += r[i]*10000L;
b = i*2 - 1;
r[i] = d%b;
d /= b;
i--;
if (i == 0) break;
d *= i;
}
printf("%.4d", (int)(c + d/10000));
c = d%10000;
}
}
$ cc pi.c
$ ls -l a.out
-rwxrwxr-x 1 dmr 5294 Sep 22 08:55 a.out
$ file a.out
a.out: executable not stripped
$ nm -gn a.out
000000 T start
000074 T _main
000542 T _printf
000616 T __doprnt
001732 T pfloat
001732 T pgen
001732 T pscien
001744 T __strout
002244 T __flsbuf
002606 T _fflush
002730 T __cleanu
002766 T _fclose
003130 T _exit
003146 T _malloc
003640 T _free
003676 T _realloc
004150 T _isatty
004220 T _stty
004252 T _gtty
004304 T _close
004332 T _ioctl
004400 T _sbrk
004452 T _brk
004512 T _write
004554 T aldiv
005062 T almul
005136 T cerror
005154 T ldiv
005436 T lmul
005504 T lrem
005742 T csv
005756 T cret
006120 D __iob
006360 D __lastbu
006406 B __sobuf
007406 B __sibuf
010406 B _errno
010410 B _environ
010426 B _end
$ a.out
031410592605358097930238406264033830279502880419701693099370510508209074940459203078016400628602089098620803408253042110706709821048080651302823066407093084460955058202317025350940801284081110745002841027001938052110555096440622904895049300381906442088100975606593034460128407564082330786708316052710201909140564805669023460348061040543206648021330936007260024910412703724058700660063150588107488015200920906282092540917015360436708925090360110330503054088200466502138041460951904151016090433005727036507595091950309201861017380193206117093100511805480074460237909627049560735108857052720489102279038180301109491029830367303624040650664308600213904946039520247307190070210798609430702707053092170176209317067520384607481084670669405130200005681027140526305608027780577103427057780960901736037170872104684040900122409534030140654905853071050792027960892508923054200199506112012900219608640344018150981306297074770130909605018700721103499099990837209780049950105907317032810609603185095020445904553046900830206425022300825303446085030526109311088170101003103783087520886508753032080381402061071770669104730035980253409042087550468703115095620863808235037870593705195077810857708053021710226806610300109278076610119509092016420198
So this runs fine within Unix V7 on SIMH, but I'd actually like to run this on the PiTubeDirect PDP-11 Co Pro.
There are a number of problems:
-
Executables on V7 Unix are compiled to run from address 0x0000 (000000) and are generally not position independant. The PDP-11 Co Pro has a table of vectors at address 0, so expects a program to run from 0x0100 (000400).
-
The executable starts with a floating point instruction (setd) that isn't present on the Co Pro.
-
Unix V7 uses TRAP instructions to trap to the Kernel, with call parameters mostly embedded in the code after the trap. The PDP-11 Co Pro uses EMT instructions (emulator TRAP), with call parameters passed in registers. Somewhat incompatible!
So let's try a slighlty more modern C compiler: PCC...
PCC (Portable C Compiler) is a C compiler that was written by Stephen C. Johnson of Bell Labs in the mid-1970s. A new (circa 2008) version of PCC is now maintained by Anders Magnusson. The website is here
The source was a CVS repository archive; I prefer working with git, so started by converting it:
sudo apt-get install cvs cvs2svn
cd ~/pdp11
wget http://pcc.ludd.ltu.se/ftp/pub/pcc/pcc-cvs-20220117.tgz
tar xf pcc-cvs-20220117.tgz
export CVSROOT=~/pdp11/pcc-cvs-20220117
cvs init
cvs2git --blobfile=git-blob.dat --dumpfile=git-dump.dat --fallback-encoding=utf8 $CVSROOT
mkdir pcc.git
cd pcc.git/
git init --bare
cat ../git-blob.dat ../git-dump.dat | git fast-import
cd ..
rm git-dump.dat git-blob.dat
git clone pcc.git
Building PCC as a Cross Compiler:
Two main steps:
- Build binutils for the target
- Build PCC for the target
We already have pdp11-aout version of binutils, so we just did step two.
Configure PCC:
git checkout $(git log --pretty=oneline | grep 20211219 | cut -c1-8)
sudo apt-get install build-essential flex bison
cd pcc
sed -i 's/MANPAGE=@BINPREFIX@cpp/MANPAGE=@BINPREFIX@pcc-cpp/' cc/cpp/Makefile.in
sed -i 's/ cxxcom//' cc/Makefile.in
./configure --target=pdp11-aout-bsd --prefix=/usr/local --libexecdir=/usr/local/libexec/pcc --with-assembler=pdp11-aout-as --with-linker=pdp11-aout-ld
make
sudo make install
Notes:
- The last commit where the PDP-11 target builds seems to be the one dated 20211219.
- The first sed patches the manual path to avoid a conflict with cpp on Ubuntu (this was documented)
- The second sed prevents the C++ compiler from being built as it's compatible with the PDP-11 target.
Running PCC:
cat > pi.c
//#include <stdio.h>
#define N 3500
main() {
short r[N + 1], i, k, b, c;
long d;
c = 0;
for (i = 1; i <= N; i++)
r[i] = 2000;
for (k = N; k > 0; k -= 14) {
d = 0;
i = k;
for(;;) {
d += r[i]*10000L;
b = i*2 - 1;
r[i] = d%b;
d /= b;
i--;
if (i == 0) break;
d *= i;
}
// printf("%.4d", (int)(c + d/10000));
c = d%10000;
}
}
pdp11-bsd-pcc pi.c
pdp11-aout-as: unrecognised option '-V'
error: pdp11-aout-as terminated with status 1
Seems like an incompatibility with the assembler...
The assembler command being generated is:
pdp11-aout-as -V -u -o /tmp/ctm.4kVtfh /tmp/ctm.iXWzuP
The -V and -u options appear to be specific to 2.11BSD: http://pdp11.nocrew.org/binutils/as-opt.html
So it looks like the pdp11 target needs to be hosted on BSD for it to work. I could continue to hack, but I expect this will be the tip of the iceberg.
Update 21/1/2022: It was indeed the tip of the iceberg...
So the specific case the GNU assembler is failing to handle is extended branch instructions, see section 8.5 of Dennis Ritchie's UNIX Assembler Reference Manual: https://www.tom-yam.or.jp/2238/ref/as.pdf#page=8 i.e. they are effectively synthetic instructions which are not currently handled by the GNU assembler.
So I did a quick SED hack to replace these by short branch instructions.
I then found GNU assembler falling to deal with embedded data, for example:
.data
.even
_pl:
~~pl:
35632 ; 145000
2765 ; 160400
230 ; 113200
17 ; 41100
1 ; 103240
0 ; 23420
0 ; 1750
0 ; 144
0 ; 12
0 ; 1
0 ; 0
And finally, it looks like the default base for constants is different.
For example, the start of the .s file produced by PCC includes:
_program:
~~program:
jsr r5,csv
sub $20,sp
where $20 here is an immediate octal constant (if it were decimal it would terminated by a decimal point ‘‘.’’)
If I assemble this .s file (using GNU assembler), and disassemble the result (using GNU obj-dump), what I see is:
0000010c <_program>:
10c: 0977 0290 jsr r5, 3a0 <csv>
110: e5c6 0014 sub $24, sp
The value has now become 0x14, or 20 decimal or 24 octal.
According to the manual, GNU assembler is assuming constants are in decimal, unless they start with a '0' digit: https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_node/as_36.html
This is different to the old BSD Unix Assembler (see the earlier link). It's not just immediate operands; it affects all constants in the file. So it affects accessing objects in the stack frame.
For example, 4 successive words in the stack frame (-12 -14 -16 -20):
mov -16(r5),-(sp)
mov -20(r5),-(sp)
mov -12(r5),-(sp)
mov -14(r5),-(sp)
become:
12e: 1d66 fff0 mov -20(r5), -(sp)
132: 1d66 ffec mov -24(r5), -(sp)
136: 1d66 fff4 mov -14(r5), -(sp)
13a: 1d66 fff2 mov -16(r5), -(sp)
Not good!
I gave up in dispair at this point and switched over to GCC.
Maybe I should try updating PCC to prefix octal values with a '0'...
...(some time later)...
That actually worked - the files I changed were:
modified: arch/pdp11/local.c
modified: arch/pdp11/local2.c
I used sed again:
$ sed -i 's/\([^0]\)%o/\10%o/g' arch/pdp11/local*.c
$ sed -i 's/\([^0]\)%llo/\10%llo/g' arch/pdp11/local*.c
This allowed me for the first time to run some some simple C code and not crash horribly.
For long maths functions, the compiler generates code that calls out to functions like ldiv/lrem.
I'm currently using the implementation of these from 10th Edition of Unix.
Unfortunately during the Pi Spigot the stack becomes unbalanced during function calls:
1d6: 1066 mov r1, -(sp)
1d8: 1026 mov r0, -(sp)
1da: 1d66 ffec mov -24(r5), -(sp)
1de: 1d66 ffea mov -26(r5), -(sp)
1e2: 09f7 0134 jsr pc, 31a <ldiv>
1e6: 65c6 000a add $12, sp
1ea: 1035 ffea mov r0, -26(r5)
1ee: 1075 ffec mov r1, -24(r5)
The value being added at 1e6 to remove the four call arguments is too large (by 2).
My guess is this is a bug in PCC, but it could conceivably be an incompatibility with the libraries I am using.
More later...
There is still a PDP-11 target present in GCC, and it seems to have been actively maintained from 2004 to 2018 by Paul Koning, so I had high hopes of it working.
For more details, see my PDP-11 GCC Cross Compiler build notes.
The first major issue I encontered was a bug in the code generation when 32-bit longs are used.
For example:
outhex32(*pp);
Which ends up as:
386: 1d40 fffe mov -2(r5), r0
38a: 1200 mov (r0), r0
38c: 1c01 0002 mov 2(r0), r1
390: 1066 mov r1, -(sp)
392: 1026 mov r0, -(sp)
394: 09f7 ff50 jsr pc, 2e8 <_outhex32>
The instructions at 38a and 38c are in the wrong order!
After lots of head scratching, it turns out the bug is in pdp11_expand_operands().
This function expands operands of 32-bit Standard Int (SI) type to pairs of operands of the 16-bit Half Int (HI) type. Part of the logic is to decide the order of the two 16-bit halves. And it looks like it doesn't consider the case where the destination register is also the source register. In this case, the order of the two instructions needs to be reversed.
The code to do this looks like:
/* DMB - detect the case where source [1] is an indirect access via a register that
is also used as the destination [0], and force little endian half-word order */
if (GET_CODE (operands[0]) == REG && GET_CODE (operands[1]) == MEM) {
int dstreg = REGNO (operands[0]);
int srcreg = -1;
if (GET_CODE (XEXP (operands[1], 0)) == REG) {
srcreg = REGNO (XEXP (operands[1], 0));
} else if (GET_CODE (XEXP (operands[1], 0)) == PLUS) {
if (GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG) {
srcreg = REGNO (XEXP (XEXP (operands[1], 0), 0));
} else if (GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == REG) {
srcreg = REGNO (XEXP (XEXP (operands[1], 0), 1));
}
}
if (srcreg == dstreg) {
useorder = little;
}
}
This code is rather scary, because operands are represened as small trees of rfx nodes.
So the above code is trying to match a particular pattern in the operand trees.
Operand[0] is the destination and needs to look like:
-->REG
Operand[1] is the source can needs to look like one of:
-->MEM-->REG
-->MEM-->PLUS-->REG
-->Address
-->MEM-->PLUS-->Address
-->REG
There are some macros which help processing these operands:
- GET_CODE(rfx) returns the type of the rfx object
- XEXP(rfx, n) follows the nth child of the rfx object
- REGNO(rfx) return the register number of the rfx object (assuming it's a REG node)
Adding this into pdp11_expand_operands() fixed this particular code generation bug, but still the 32-bit division doesn't work.
After more debugging, the code that is failing is part of libgcc (the maths support library for gcc):
unsigned long
__udivmodsi4(unsigned long num, unsigned long den, int modwanted)
{
unsigned long bit = 1;
unsigned long res = 0;
while (den < num && bit && !(den & (1L<<31)))
{
den <<=1;
bit <<=1;
}
while (bit)
{
if (num >= den)
{
num -= den;
res |= bit;
}
bit >>=1;
den >>=1;
}
if (modwanted) return num;
return res;
}
This code works fine when compiled for Linux, but fails when compiled for the PDP-11 target.
The specific thing that's behaving incorrectly is the evaluation of this test:
!(den & (1L<<31))
GCC is (legitimately) mapping this to:
((signed long) den) >= 0
Which results in the following code (when the constant operand is zero)
160: 0bc2 tst r2
162: 0201 bne 166 <_udivmodsi4+0x5a>
164: 0bc3 tst r3
166: 04e2 bge 12c <_udivmodsi4+0x20>
Notes:
- r2 is the high word of den
- r3 is the low word of den
- BGE branches if N xor V = 0, TST sets V=0 so this is effectively BPL, it also sets C=0)
There is an intuitive argument that this code is incorrect. When comparing against zero, the final value of the N flag should only depend on r2 (the high word). In the above code, when r2=0, then N = sign(r3), which is wrong.
This code is coming from the cmpsi template in pdp11.md
This template introduces a cmpsi(a,b) instruction that in the general case produces:
;; compare the high word
cmp ahi, bhi
bne done
;; compare the low word
cmp alo, blo
done:
However, if b is zero, then the CMP instructions are replaced by the TST instructions.
;; compare the high word
tst ahi
bne done
;; compare the low word
tst alo
done:
This works because the TST instruction on the PDP-11 sets the flags identically to CMP A,#0. This optimization is not the source of the bug.
For unsigned comparisons (BHI, BHIS, BL, BLOS), which test the C/Z bits, the above code works fine. If the high words are equal, the result is based on the comparison of the low word, which yields the correct values for C/Z
For signed comparisons (BGT, BGE, BLT, BLE), which test the N/V/Z bits, there is a problem with this impleemntation, as it doesn't correctly set N/V for the 32-bits as a whole.
As GCC is using this "cmpsi" instruction for both signed and unsigned 32-bit comparisons, this is a problem.
After more head scratching, I fixed it as follows:
;; compare the high word
cmp ahi, bhi
bne done ;; A < B or A > B ;; flags correct
;; compare the low word
cmp alo, blo
beq done ;; A=B ;; Result=0 ;; N=0 Z=1 V=0 C=0
;; clear the V bit, as 32-bit overflow is impossible if ahi == bhi
clv
;; copy the C flag to the N flag
cln
bcc done
sen
done:
And for the case of B=0, this simplifies to:
;; compare the high word
tst ahi
bne done ;; A < 0 or A > 0 ;; flags correct
;; compare the low word
tst alo
beq done ;; A=B ;; Result=0 ;; N=0 Z=1 V=0 C=0
cln
done:
The change to pdp11.md is to add these extra instructions into the template for cmpsi:
// Correct V/N flags so signed comparisons work
output_asm_insn ("cln", NULL);
if (!CONST_INT_P (exops[1][1]) || INTVAL (exops[1][1]) != 0) {
output_asm_insn ("clv", NULL);
output_asm_insn ("bcc\t%l0", lb);
output_asm_insn ("sen", NULL);
}
And with that in place, the Test program (for 32-bit div/mod) and the finally Pi Spigot work!
BTW, all this has taken me about a week....
The test programs and associated build scripts can be found here: https://github.com/hoglet67/pdp11pispigot
There is one remaining issue: when I enable optimization (-Os or -O2) the test program still passes, but the Pi Spigot generate incorrect results.
With -Os it generates: 0000000000000000....
With -O1, -O2, -O3 it generates: 3140343800000000....
I'm currently undecided about whether to upstream the GCC compiler fixes.
ACK is the Amsterdam Compiler Kit, originally developed by Andrew Tanenbaum and Ceriel Jacobs in the 1980s, later ported to Linux by David Given. It's now an actively maintained github project.
ACK included a PDP-11 backend, support for the standard C library (libc) and doesn't require the use of a third-party assembler.
Hardware
Software
- Build dependencies
- Running cmake
- Compiling kernel.img
- Deploying on a Pi
- Recommended config.txt and cmdline.txt options
- Validation
- Compilation flags
Implementation Notes