PDP 11 Co Pro Notes

Introduction

This page documents various approaches to compiling a Pi Spigot program written in C, such that it's runnable on one of the BBC Micro PDP-11 Co Processors, of which there are two:

the Pi/ARM-based PiTubeDirect Co Pro
the FPGA-based Matchbox Co Pro. In addition, the B-em emulator also contains a PDP-11 Co Pro emulation, so maybe that counts as three!

The Pi Spigot is a short C program that prints the first 1000 digits of Pi:

#include <stdio.h>
#define N 3500
main() {
   short r[N + 1], i, k, b, c;
   long d;
   c = 0;
   for (i = 1; i <= N; i++)
      r[i] = 2000;
   for (k = N; k > 0; k -= 14) {
      d = 0;
      i = k;
      for(;;) {
         d += r[i]*10000L;
         b = i*2 - 1;
         r[i] = d%b;
         d /= b;
         i--;
         if (i == 0) break;
         d *= i;
      }
      printf("%.4d", (int)(c + d/10000));
      c = d%10000;
   }
}

For an explanation of how this works, see this discussion by Ben Lynn.

One notable point about the Pi Spigot is that it requires 32-bit arithmetic (longs in C on the PDP-11).

Initially we struggled to find any C compilers that were easily runnable on a modern Linux distribution (Ubuntu 18.04) and that actually worked.

Eventually, we tracked down four possibilities:

the original C compiler from AT&T's Unix Version 7 (V7)
GCC (GNU C Compiler) built as a PDP-11 cross compiler
PCC (Portable C Compiler) built as a PDP-11 cross compiler
ACK (Amsterdam Compiler Kit) which includes a PDP-11 cross compiler as standard

The rest of this page describes our experiences trying to get each of these to successfully compile and run the Pi Spigot program. Some were trivially easy (ACK) and just worked. Some were fiendishly difficult (GCC), and involved fixing bugs in the compiler itself.

Ultimately, all four compilers produced working code that we were able to run on the PDP-11 Co Processor.

Details

Here are links to the four chapters of the adventure:

Chapter 1 - Compiling with the V7 Unix Compiler in SIMH and later APOUT
Chapter 2 - Compiling with the PCC PDP-11 Cross Compiler
Chapter 3 - Compiling with the GCC PDP-11 Cross Compiler
Chapter 4 - Compiling with the ACK PDP-11 Cross Compiler

Summary

The following tables summarises our results:

Compiler	Assembler	Maths Lib	Std C Lib	Spigot size	Spigit runtime
V7 Unix CC	Yes	Yes	Yes	1950 bytes	15.82s (5)
PCC	No (1)	No (2)	No (3)	1616 bytes	17.45s (5)
GCC	Yes	Yes	No (4)	1536 bytes	89.33s (5)
ACK	Yes	Yes	Yes	1568 bytes	17.49s (5)

Notes:

PCC doesn't include an assembler - we ended up using V7 Unix's as and GCC's pdp11-aout-as
PCC includes a maths lib (libpcc) but this isn't supported on the PDP-11, so we used some routines from V7 Unix
PCC doesn't support compiling libC for PDP-11
GCC doesn't support compiling libC for PDP-11
Benchmarked is on PiTubeDirect on a Pi Zero using the Hognose build with the PDP-Debugger enabled

All the source code for the tests can be found here, including a build script that generates all of the executables.

General issues:

The Pi Co Pro system calls uses different traps to V7 Unix system calls. This means that even if lib C is supported, you can't use any calls that use Unix system calls. Currently we have provided our own library for outc() and osword(). It might be possible to install a trap handler for some Unix system calls (like printing a character).
The Pi Co Pro expects code to start at 0x100 - we have managed to accomodate this in all the tool chains.
The a.out files generated by Unix V7's ld linker are not compatible with GCC's binutils tools. You got a file truncated error. It turns out they use an older format for the symbol table. We were able to write a tool (call mangle)to reformat the symbol table, allowing us to use pdp11-aout-objdump to generate consistent disassemblies of executables.

V7 Unix Compiler issues:

The compiler syntax is archaic (pre K&R) but is well documented in the C Reference Manual. Specifically, function definitions are different, and the integer types are limits to: char, short, unsigned, int and long. We had to maintain seperate source files for this compiler.
There is a known issue with Unix V7's division libraries relying on undefined behaviour. This bug shows up in APOUT, but not SIMH, Matchbox or PiTubeDirect. So not really a problem for us.
We hit a minor code generation bug (worked around here) that caused the Pi Spigot to output all zero.

PCC issues:

PCC doesn't include an assembler, so an external assembler must is needed:
- the V7 Unix assembler as works well
- the GCC assmembler pdp11-aout-as works less well for two reasons:
  - the default number base is decimal not octal (we worked around this in PCC by adding 0 prefixes)
  - extended branch istructions (JBR/JCC) are not supported and cannot be fakes with macros
Some bugs remain in PCC's PDP-11 code generation that affected our test programs:
- PMINI output 00780078 instead of 00000078
- PTEST fails on unsigned long division tests
- PSPIGIT doesn't correctly report the run time (since fixed)

GCC issues:

We hit two serious code generation bugs - detailed and fixed below
Enabling the optimizer breaks everything - we have not investigated this further
The maths routines in libgcc are written in C and make no used of the PDP-11's div and mul instructions
Consequently the generated code is very slow indeed

ACK issues:

None at all - everything worked very well out of the box and the Pi Spigot ran very quickly

Based on this, I think ACK (the Amsterdam Compiler Kit) is currently the best option

Useful links

PDP-11 Programming Card from 1975
Developing for a PDP-11
Diane's PDP-11 Page
subgeniuskitty - PDP-11 Cross-Compiling - Building a cross compiler with GCC for pdp11-aout.
C Programming on a bare metal PDP-11
BBC Basic for the PDP-11 (Jonathan Harston)
PDP-11 CoProcessor Technical Reference (Jonathan Harston)
MMB/SSD Utils in perl (Stephen Harris)

Chapter 1 - Compiling with the V7 Unix Compiler in SIMH and later APOUT

I followed these instructions to get V7 Unix running on SIMH.

Here's a sample session compiling and running a Pi Spigot:

$ cat > pi.c
#include <stdio.h>
#define N 3500
main() {
   short r[N + 1], i, k, b, c;
   long d;
   c = 0;
   for (i = 1; i <= N; i++)
      r[i] = 2000;
   for (k = N; k > 0; k -= 14) {
      d = 0;
      i = k;
      for(;;) {
         d += r[i]*10000L;
         b = i*2 - 1;
         r[i] = d%b;
         d /= b;
         i--;
         if (i == 0) break;
         d *= i;
      }
      printf("%.4d", (int)(c + d/10000));
      c = d%10000;
   }
}
$ cc pi.c
$ ls -l a.out
-rwxrwxr-x 1 dmr      5294 Sep 22 08:55 a.out
$ file a.out
a.out:  executable not stripped
$ nm -gn a.out
000000 T start
000074 T _main
000542 T _printf
000616 T __doprnt
001732 T pfloat
001732 T pgen
001732 T pscien
001744 T __strout
002244 T __flsbuf
002606 T _fflush
002730 T __cleanu
002766 T _fclose
003130 T _exit
003146 T _malloc
003640 T _free
003676 T _realloc
004150 T _isatty
004220 T _stty
004252 T _gtty
004304 T _close
004332 T _ioctl
004400 T _sbrk
004452 T _brk
004512 T _write
004554 T aldiv
005062 T almul
005136 T cerror
005154 T ldiv
005436 T lmul
005504 T lrem
005742 T csv
005756 T cret
006120 D __iob
006360 D __lastbu
006406 B __sobuf
007406 B __sibuf
010406 B _errno
010410 B _environ
010426 B _end
$ a.out
031410592605358097930238406264033830279502880419701693099370510508209074940459203078016400628602089098620803408253042110706709821048080651302823066407093084460955058202317025350940801284081110745002841027001938052110555096440622904895049300381906442088100975606593034460128407564082330786708316052710201909140564805669023460348061040543206648021330936007260024910412703724058700660063150588107488015200920906282092540917015360436708925090360110330503054088200466502138041460951904151016090433005727036507595091950309201861017380193206117093100511805480074460237909627049560735108857052720489102279038180301109491029830367303624040650664308600213904946039520247307190070210798609430702707053092170176209317067520384607481084670669405130200005681027140526305608027780577103427057780960901736037170872104684040900122409534030140654905853071050792027960892508923054200199506112012900219608640344018150981306297074770130909605018700721103499099990837209780049950105907317032810609603185095020445904553046900830206425022300825303446085030526109311088170101003103783087520886508753032080381402061071770669104730035980253409042087550468703115095620863808235037870593705195077810857708053021710226806610300109278076610119509092016420198

So this runs fine within Unix V7 on SIMH, but I'd actually like to run this on the PiTubeDirect PDP-11 Co Pro.

There are a number of problems:

Executables on V7 Unix are compiled to run from address 0x0000 (000000) and are generally not position independant. The PDP-11 Co Pro has a table of vectors at address 0, so expects a program to run from 0x0100 (000400).
The executable starts with a floating point instruction (setd) that isn't present on the Co Pro.
Unix V7 uses TRAP instructions to trap to the Kernel, with call parameters mostly embedded in the code after the trap. The PDP-11 Co Pro uses EMT instructions (emulator TRAP), with call parameters passed in registers. Somewhat incompatible!

So let's try a slighlty more modern C compiler: PCC...

Chapter 2 - Compiling with the PCC PDP-11 Cross Compiler

PCC (Portable C Compiler) is a C compiler that was written by Stephen C. Johnson of Bell Labs in the mid-1970s. A new (circa 2008) version of PCC is now maintained by Anders Magnusson. The website is here

The source was a CVS repository archive; I prefer working with git, so started by converting it:

sudo apt-get install cvs cvs2svn
cd ~/pdp11
wget http://pcc.ludd.ltu.se/ftp/pub/pcc/pcc-cvs-20220117.tgz
tar xf pcc-cvs-20220117.tgz
export CVSROOT=~/pdp11/pcc-cvs-20220117
cvs init
cvs2git --blobfile=git-blob.dat  --dumpfile=git-dump.dat --fallback-encoding=utf8 $CVSROOT
mkdir pcc.git
cd pcc.git/
git init --bare 
cat ../git-blob.dat ../git-dump.dat | git fast-import
cd ..
rm git-dump.dat git-blob.dat 
git clone pcc.git

Building PCC as a Cross Compiler:

See notes here and here.

Two main steps:

Build binutils for the target
Build PCC for the target

We already have pdp11-aout version of binutils, so we just did step two.

Configure PCC:

git checkout $(git log --pretty=oneline | grep 20211219 | cut -c1-8)
sudo apt-get install build-essential flex bison
cd pcc
sed -i 's/MANPAGE=@BINPREFIX@cpp/MANPAGE=@BINPREFIX@pcc-cpp/' cc/cpp/Makefile.in
sed -i 's/ cxxcom//' cc/Makefile.in 
./configure --target=pdp11-aout-bsd --prefix=/usr/local --libexecdir=/usr/local/libexec/pcc --with-assembler=pdp11-aout-as --with-linker=pdp11-aout-ld
make
sudo make install

Notes:

The last commit where the PDP-11 target builds seems to be the one dated 20211219.
The first sed patches the manual path to avoid a conflict with cpp on Ubuntu (this was documented)
The second sed prevents the C++ compiler from being built as it's compatible with the PDP-11 target.

Running PCC:

cat > pi.c
//#include <stdio.h>
#define N 3500
main() {
   short r[N + 1], i, k, b, c;
   long d;
   c = 0;
   for (i = 1; i <= N; i++)
      r[i] = 2000;
   for (k = N; k > 0; k -= 14) {
      d = 0;
      i = k;
      for(;;) {
         d += r[i]*10000L;
         b = i*2 - 1;
         r[i] = d%b;
         d /= b;
         i--;
         if (i == 0) break;
         d *= i;
      }
      //      printf("%.4d", (int)(c + d/10000));
      c = d%10000;
   }
}
pdp11-bsd-pcc pi.c 
pdp11-aout-as: unrecognised option '-V'
error: pdp11-aout-as terminated with status 1

Seems like an incompatibility with the assembler...

The assembler command being generated is:

pdp11-aout-as -V -u -o /tmp/ctm.4kVtfh /tmp/ctm.iXWzuP

The -V and -u options appear to be specific to 2.11BSD: http://pdp11.nocrew.org/binutils/as-opt.html

So it looks like the pdp11 target needs to be hosted on BSD for it to work. I could continue to hack, but I expect this will be the tip of the iceberg.

Update 21/1/2022: It was indeed the tip of the iceberg...

So the specific case the GNU assembler is failing to handle is extended branch instructions, see section 8.5 of Dennis Ritchie's UNIX Assembler Reference Manual: https://www.tom-yam.or.jp/2238/ref/as.pdf#page=8 i.e. they are effectively synthetic instructions which are not currently handled by the GNU assembler.

So I did a quick SED hack to replace these by short branch instructions.

I then found GNU assembler falling to deal with embedded data, for example:

.data
.even
_pl:
~~pl:
35632 ; 145000
2765 ; 160400
230 ; 113200
17 ; 41100
1 ; 103240
0 ; 23420
0 ; 1750
0 ; 144
0 ; 12
0 ; 1
0 ; 0

And finally, it looks like the default base for constants is different.

For example, the start of the .s file produced by PCC includes:

_program:
~~program:
jsr r5,csv
sub $20,sp

where $20 here is an immediate octal constant (if it were decimal it would terminated by a decimal point ‘‘.’’)

If I assemble this .s file (using GNU assembler), and disassemble the result (using GNU obj-dump), what I see is:

0000010c <_program>:
 10c:   0977 0290       jsr r5, 3a0 <csv>
 110:   e5c6 0014       sub $24, sp

The value has now become 0x14, or 20 decimal or 24 octal.

According to the manual, GNU assembler is assuming constants are in decimal, unless they start with a '0' digit: https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_node/as_36.html

This is different to the old BSD Unix Assembler (see the earlier link). It's not just immediate operands; it affects all constants in the file. So it affects accessing objects in the stack frame.

For example, 4 successive words in the stack frame (-12 -14 -16 -20):

mov     -16(r5),-(sp)
mov     -20(r5),-(sp)
mov     -12(r5),-(sp)
mov     -14(r5),-(sp)

become:

 12e:   1d66 fff0       mov -20(r5), -(sp)
 132:   1d66 ffec       mov -24(r5), -(sp)
 136:   1d66 fff4       mov -14(r5), -(sp)
 13a:   1d66 fff2       mov -16(r5), -(sp)

Not good!

I gave up in dispair at this point and switched over to GCC.

Maybe I should try updating PCC to prefix octal values with a '0'...

...(some time later)...

That actually worked - the files I changed were:

        modified:   arch/pdp11/local.c
        modified:   arch/pdp11/local2.c

I used sed again:

$ sed -i 's/\([^0]\)%o/\10%o/g' arch/pdp11/local*.c
$ sed -i 's/\([^0]\)%llo/\10%llo/g' arch/pdp11/local*.c

This allowed me for the first time to run some some simple C code and not crash horribly.

For long maths functions, the compiler generates code that calls out to functions like ldiv/lrem.

I'm currently using the implementation of these from 10th Edition of Unix.

Unfortunately during the Pi Spigot the stack becomes unbalanced during function calls:

 1d6:   1066            mov r1, -(sp)
 1d8:   1026            mov r0, -(sp)
 1da:   1d66 ffec       mov -24(r5), -(sp)
 1de:   1d66 ffea       mov -26(r5), -(sp)
 1e2:   09f7 0134       jsr pc, 31a <ldiv>
 1e6:   65c6 000a       add $12, sp
 1ea:   1035 ffea       mov r0, -26(r5)
 1ee:   1075 ffec       mov r1, -24(r5)

The value being added at 1e6 to remove the four call arguments is too large (by 2).

My guess is this is a bug in PCC, but it could conceivably be an incompatibility with the libraries I am using.

More later...

Chapter 3 - Compiling with the GCC PDP-11 Cross Compiler

There is still a PDP-11 target present in GCC, and it seems to have been actively maintained from 2004 to 2018 by Paul Koning, so I had high hopes of it working.

For more details, see my PDP-11 GCC Cross Compiler build notes.

The first major issue I encontered was a bug in the code generation when 32-bit longs are used.

For example:

      outhex32(*pp);

Which ends up as:

  386:    1d40 fffe          mov    -2(r5), r0
  38a:    1200               mov    (r0), r0
  38c:    1c01 0002          mov    2(r0), r1
  390:    1066               mov    r1, -(sp)
  392:    1026               mov    r0, -(sp)
  394:    09f7 ff50          jsr    pc, 2e8 <_outhex32>

The instructions at 38a and 38c are in the wrong order!

After lots of head scratching, it turns out the bug is in pdp11_expand_operands().

This function expands operands of 32-bit Standard Int (SI) type to pairs of operands of the 16-bit Half Int (HI) type. Part of the logic is to decide the order of the two 16-bit halves. And it looks like it doesn't consider the case where the destination register is also the source register. In this case, the order of the two instructions needs to be reversed.

The code to do this looks like:

      /* DMB - detect the case where source [1] is an indirect access via a register that
         is also used as the destination [0], and force little endian half-word order */
      if (GET_CODE (operands[0]) == REG && GET_CODE (operands[1]) == MEM) {
         int dstreg = REGNO (operands[0]);
         int srcreg = -1;
         if (GET_CODE (XEXP (operands[1], 0)) == REG) {
            srcreg = REGNO (XEXP (operands[1], 0));
         } else if (GET_CODE (XEXP (operands[1], 0)) == PLUS) {
            if (GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG) {
               srcreg = REGNO (XEXP (XEXP (operands[1], 0), 0));
            } else if (GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == REG) {
               srcreg = REGNO (XEXP (XEXP (operands[1], 0), 1));
            }
         }
         if (srcreg == dstreg) {
            useorder = little;
         }
      }

This code is rather scary, because operands are represened as small trees of rfx nodes.

So the above code is trying to match a particular pattern in the operand trees.

Operand[0] is the destination and needs to look like:

-->REG

Operand[1] is the source can needs to look like one of:

-->MEM-->REG

-->MEM-->PLUS-->REG
             -->Address

-->MEM-->PLUS-->Address
             -->REG

There are some macros which help processing these operands:

GET_CODE(rfx) returns the type of the rfx object
XEXP(rfx, n) follows the nth child of the rfx object
REGNO(rfx) return the register number of the rfx object (assuming it's a REG node)

Adding this into pdp11_expand_operands() fixed this particular code generation bug, but still the 32-bit division doesn't work.

After more debugging, the code that is failing is part of libgcc (the maths support library for gcc):

unsigned long
__udivmodsi4(unsigned long num, unsigned long den, int modwanted)
{
  unsigned long bit = 1;
  unsigned long res = 0;
  while (den < num && bit && !(den & (1L<<31)))
    {
      den <<=1;
      bit <<=1;
    }
  while (bit)
    {
      if (num >= den)
      {
        num -= den;
        res |= bit;
      }
      bit >>=1;
      den >>=1;
    }
  if (modwanted) return num;
  return res;
}

This code works fine when compiled for Linux, but fails when compiled for the PDP-11 target.

The specific thing that's behaving incorrectly is the evaluation of this test:

 !(den & (1L<<31))

GCC is (legitimately) mapping this to:

((signed long) den) >= 0

Which results in the following code (when the constant operand is zero)

 160:   0bc2            tst     r2
 162:   0201            bne     166 <_udivmodsi4+0x5a>
 164:   0bc3            tst     r3
 166:   04e2            bge     12c <_udivmodsi4+0x20>

Notes:

r2 is the high word of den
r3 is the low word of den
BGE branches if N xor V = 0, TST sets V=0 so this is effectively BPL, it also sets C=0)

There is an intuitive argument that this code is incorrect. When comparing against zero, the final value of the N flag should only depend on r2 (the high word). In the above code, when r2=0, then N = sign(r3), which is wrong.

This code is coming from the cmpsi template in pdp11.md

This template introduces a cmpsi(a,b) instruction that in the general case produces:

;; compare the high word
    cmp     ahi, bhi
    bne     done
;; compare the low word
    cmp     alo, blo
done:

However, if b is zero, then the CMP instructions are replaced by the TST instructions.

;; compare the high word
    tst     ahi
    bne     done
;; compare the low word
    tst     alo
done:

This works because the TST instruction on the PDP-11 sets the flags identically to CMP A,#0. This optimization is not the source of the bug.

For unsigned comparisons (BHI, BHIS, BL, BLOS), which test the C/Z bits, the above code works fine. If the high words are equal, the result is based on the comparison of the low word, which yields the correct values for C/Z

For signed comparisons (BGT, BGE, BLT, BLE), which test the N/V/Z bits, there is a problem with this impleemntation, as it doesn't correctly set N/V for the 32-bits as a whole.

As GCC is using this "cmpsi" instruction for both signed and unsigned 32-bit comparisons, this is a problem.

After more head scratching, I fixed it as follows:

;; compare the high word
    cmp     ahi, bhi
    bne     done        ;; A < B or A > B  ;; flags correct
;; compare the low word
    cmp     alo, blo
    beq     done        ;; A=B ;; Result=0 ;; N=0 Z=1 V=0 C=0
;; clear the V bit, as 32-bit overflow is impossible if ahi == bhi
    clv
;; copy the C flag to the N flag
    cln
    bcc     done
    sen
done:

And for the case of B=0, this simplifies to:

;; compare the high word
    tst     ahi
    bne     done        ;; A < 0 or A > 0  ;; flags correct
;; compare the low word
    tst     alo
    beq     done        ;; A=B ;; Result=0 ;; N=0 Z=1 V=0 C=0
    cln
done:

The change to pdp11.md is to add these extra instructions into the template for cmpsi:

  // Correct V/N flags so signed comparisons work
  output_asm_insn ("cln", NULL);
  if (!CONST_INT_P (exops[1][1]) || INTVAL (exops[1][1]) != 0) {
   output_asm_insn ("clv", NULL);
   output_asm_insn ("bcc\t%l0", lb);
   output_asm_insn ("sen", NULL);
  }

And with that in place, the Test program (for 32-bit div/mod) and the finally Pi Spigot work!

BTW, all this has taken me about a week....

The test programs and associated build scripts can be found here: https://github.com/hoglet67/pdp11pispigot

There is one remaining issue: when I enable optimization (-Os or -O2) the test program still passes, but the Pi Spigot generate incorrect results.

With -Os it generates: 0000000000000000....

With -O1, -O2, -O3 it generates: 3140343800000000....

I'm currently undecided about whether to upstream the GCC compiler fixes.

Chapter 4 - Compiling with the ACK PDP-11 Cross Compiler

ACK is the Amsterdam Compiler Kit, originally developed by Andrew Tanenbaum and Ceriel Jacobs in the 1980s, later ported to Linux by David Given. It's now an actively maintained github project.

ACK included a PDP-11 backend, support for the standard C library (libc) and doesn't require the use of a third-party assembler.