Implementing a New Co Processor

Overview

This tutorial should guide you through adding a new Co Processor to PiTubeDirect.

It uses the task of adding a Z80 Co Processor as a concrete example.

Create the Co Processor Skeleton

First, created a header file for the new Co Processor, with a single method, named copro_xxx_emulator()

src/copro-z80.h

// copro-z80.h
#ifndef COPRO_Z80_H
#define COPRO_Z80_H

extern void copro_z80_emulator();

#endif

Next, create a skeleton implementation of this method:

src/copro-z80.c

/*
 * Z80 Co Pro Emulation
 *
 * (c) 2016 David Banks
 */
#include <stdio.h>
#include <string.h>
#include "tube-defs.h"
#include "tube.h"
#include "tube-ula.h"

static void copro_z80_poweron_reset() {
   // Wipe memory
   // TODO: add something here
}

static void copro_z80_reset() {
  // Log ARM performance counters
  tube_log_performance_counters();

  // Re-instate the Tube ROM on reset
  // TODO: add something here

  // Reset the processor
  // TODO: add something here

  // Do a tube reset
  tube_reset();

  // Reset ARM performance counters
  tube_reset_performance_counters();
}

void copro_z80_emulator()
{
   static unsigned int last_rst = 0;

   // Remember the current copro so we can exit if it changes
   int last_copro = copro;

   copro_z80_poweron_reset(); 
   copro_z80_reset();
  
   while (1)
   {
      // Execute emulator for one instruction
      // TODO: add something here

      if (tube_mailbox & ATTN_MASK) {
         unsigned int tube_mailbox_copy = tube_mailbox;
         tube_mailbox &= ~(ATTN_MASK | OVERRUN_MASK);
         unsigned int intr = tube_io_handler(tube_mailbox_copy);
         unsigned int nmi = intr & 2;
         unsigned int rst = intr & 4;
         // Reset the processor on active edge of rst
         if (rst && !last_rst) {
            // Exit if the copro has changed
            if (copro != last_copro) {
               break;
            }
            copro_z80_reset();
            // Wait for rst become inactive before continuing to execute
            tube_wait_for_rst_release();
         }
         // NMI is edge sensitive, so only check after mailbox activity
         if (nmi) {
            // TODO: add something to call the emulator NMI function here
         }
         last_rst = rst;
      }
      // IRQ is level sensitive, so check between every instruction
      if (tube_irq & 1) {
         // TODO: check if the emulator IRQ is enabled
         //if () {
            // TODO: add something to call the emulator IRQ function here
         //}
      }
   }
}

Now decide what co processor number(s) you want to assign to the new coprocessor. If possible, be consistent with the numbering used by the Matchbox Co Processor. In the case of the Z80, numbers 4-7 will be used. Multiple numbers might be used, for example, to select different emulation speeds.

The new co processor can now be added into the top-level application.

Three changes need to be made:

Add a #include for the header file:

src/tube-client.c

#include "copro-lib6502.h"
#include "copro-65tube.h"
#include "copro-80186.h"
#include "copro-arm2.h"
#include "copro-32016.h"
#include "copro-null.h"
#include "copro-z80.h"   // New emulator

Edit the emulator_names[] array to make sure the elements that correspond to your new Co Processor are correctly named:

src/tube-client.c

static const char * emulator_names[] = {
   "65C02 (65tube)",
   "65C02 (65tube)",
   "65C02 (lib6502)",
   "65C02 (lib6502)",
   "Z80",               // updated entry 4
   "Z80",               // updated entry 5
   "Z80",               // updated entry 6
   "Z80",               // updated entry 7
   "80286",
   "6809",
   "68000",
   "PDP11",
   "ARM2",
   "32016",
   "Null/SPI",
   "BIST"
};

And to the same for the emulator_functions[] array:

src/tube-client.c

static const func_ptr emulator_functions[] = {
   copro_65tube_emulator,
   copro_lib6502_emulator,
   copro_65tube_emulator,
   copro_lib6502_emulator,
   copro_z80_emulator,  // updated entry 4
   copro_z80_emulator,  // updated entry 5
   copro_z80_emulator,  // updated entry 6
   copro_z80_emulator,  // updated entry 7
   copro_80186_emulator,
   copro_null_emulator,
   copro_null_emulator,
   copro_null_emulator,
   copro_arm2_emulator,
   copro_32016_emulator,
   copro_null_emulator,
   copro_null_emulator
};

Edit the add_executable() section of the CMakeLists.txt file to include the new files:

src/CMakeLists.txt

# Z80
    copro-z80.h
    copro-z80.c

At this point, everything should compile correctly without warnings:

dmb@quadhog:~/PiTubeDirect/src$ cd scripts/
dmb@quadhog:~/PiTubeDirect/src/scripts$ make
-- Configuring done
-- Generating done
-- Build files have been written to: ~/PiTubeDirect/src/scripts
Scanning dependencies of target tube-client
[  2%] Building C object CMakeFiles/tube-client.dir/tube-client.c.obj
[  5%] Building C object CMakeFiles/tube-client.dir/copro-z80.c.obj
Linking C executable tube-client
Convert the ELF output file to a binary image
[100%] Built target tube-client

Add the processor emulation

That's all the boiler-plate stuff done, now to implement the actual emulator. In all likely hood, you won't be writing this from scratch. In the case of the Z80, we'll try using YAZE

Often, existing emulators are big/complex things that emulate complete systems, where as we just need the processor part. So try to be as minimal as possible, and start by pulling in just the processor:

In the case of yaze, we start by adding just: src/yaze/mem_mmu.h src/yaze/simz80.h src/yaze/simz80.c

And as before, add these new files to CMakeLists.txt:

src/CMakeLists.txt

# Z80
    copro-z80.h
    copro-z80.c
    src/yaze/mem_mmu.h
    src/yaze/simz80.h
    src/yaze/simz80.c

At that point we can try another Make.

It compiles, but there are several undefined references.

Looking at simz80.h, several things are declared as extern, i.e. elsewhere in yaze:

src/yaze/sinz80.h

/* two sets of accumulator / flags */
extern WORD af[2];
extern int af_sel;

/* two sets of 16-bit registers */
extern struct ddregs {
	WORD bc;
	WORD de;
	WORD hl;
} regs[2];
extern int regs_sel;

extern WORD ir;
extern WORD ix;
extern WORD iy;
extern WORD sp;
extern WORD pc;
extern WORD IFF;

For now, lets add definitions of these into simz80.c, just after the #includes:

src/yaze/simz80.c

#include "mem_mmu.h"
#include "simz80.h"
/* Z80 registers */
WORD af[2];			/* accumulator and flags (2 banks) */
int af_sel;			/* bank select for af */

struct ddregs regs[2];		/* bc,de,hl */
int regs_sel;			/* bank select for ddregs */

WORD ir;			/* other Z80 registers */
WORD ix;
WORD iy;
WORD sp;
WORD pc;
WORD IFF;

After building this, we end up with just three undefined references:

in - a function called when the Z80 reads I/O space
out - a function called when the Z80 writes I/O space
ram - a byte array representing main memory

Map IO accesses onto the tube parasite interface

The functions in and out come from here, and can be redefined by changing the macro definitions:

src/yaze/simz80.h

extern int in(unsigned int);
extern void out(unsigned int, unsigned char);
#define Input(port) in(port)
#define Output(port, value) out(port,value)

So lets change these to point back to functions in our co-processor skeleton:

src/yaze/simz80.h

extern int copro_z80_read_io(unsigned int);
extern void copro_z80_write_io(unsigned int, unsigned char);
#define Input(port) copro_z80_read_io(port)
#define Output(port, value) copro_z80_write_io(port,value)

And add corresponding methods to copro-z80.c

src/copro-z80.c

int copro_z80_read_io(unsigned int addr) {
   return tube_parasite_read(addr & 7);
}

void copro_z80_write_io(unsigned int addr, unsigned char data) {
   tube_parasite_write(addr & 7, data);
}

As the tube is the only I/O device, we start by mapping all I/O address onto the tube.

Implement main memory

So now there is just ram to deal with, which is used via macros mem_mmu.h:

src/yaze/mem_mmu.h

/* Some important macros. They are the interface between an access from
   the simz80-/yaze-Modules and the method of the memory access: */
#define GetBYTE(a)	RAM(a)
#define GetBYTE_pp(a)	RAM_pp(a)
#define GetBYTE_mm(a)	RAM_mm(a)
#define mm_GetBYTE(a)	mm_RAM(a)
#define PutBYTE(a, v)	RAM(a) = v
#define PutBYTE_pp(a,v)	RAM_pp(a) = v
#define PutBYTE_mm(a,v)	RAM_mm(a) = v
#define GetWORD(a)	(RAM(a) | (RAM((a)+1) << 8))

We could just define a 64KB byte array called RAM, but there is a complication. On the Z80 co processor, on RST or NMI, the ROM is overlaid onto the address map (at address 0). We somehow have to implement this behaviour.

This interface is unfortunately quite complex, but is seems the best tack is to try to remap the GetByte/PutByte layer onto a couple of functions that can deal with the ROM Overlay. So here goes:

src/yaze/mem_mmu.h

/* Some important macros. They are the interface between an access from
   the simz80-/yaze-Modules and the method of the memory access: */
#define GetBYTE(a)	    copro_z80_read_mem(a)
#define GetBYTE_pp(a)	 copro_z80_read_mem( (a++) )
#define GetBYTE_mm(a)	 copro_z80_read_mem( (a--) )
#define mm_GetBYTE(a)	 copro_z80_read_mem( (--a) )
#define PutBYTE(a, v)	 copro_z80_write_mem(a, v)
#define PutBYTE_pp(a,v)	 copro_z80_write_mem( (a++) , v)
#define PutBYTE_mm(a,v)	 copro_z80_write_mem( (a--) , v)
#define mm_PutBYTE(a,v)	 copro_z80_write_mem( (--a) , v)
#define GetWORD(a)	    (copro_z80_read_mem(a) | ( copro_z80_read_mem( (a) + 1) << 8) )
#define PutWORD(a, v)    { copro_z80_write_mem( (a), (BYTE)(v & 0xFF) ); copro_z80_write_mem( ((a)+1), (v)>>8 ); }

Now, lets start implementing copro_z80_read_mem and copro_z80_write_mem in copro-z80.c:

/src/copro-z80.c

int overlay_rom = 0;

unsigned char copro_z80_ram[0x10000];

unsigned char copro_z80_rom[0x1000] = {
   // TODO - add client ROM later
};

int copro_z80_read_mem(unsigned int addr) {
   if (addr >= 0x8000) {
      overlay_rom = 0;
   }
   if (overlay_rom) {
      return copro_z80_rom[addr & 0xfff];
   } else {
      return copro_z80_ram[addr & 0xffff];
   }
}

void copro_z80_write_mem(unsigned int addr, unsigned char data) {
   copro_z80_ram[addr & 0xffff] = data;
}

We just need to remember to set overlay_rom to 1 on reset, and when there is an NMI.

After compiling this, there still a couple of undefine references to the old ram[] array.

These come from: src/yaze/simz80.c

#define POP(x)	do {                    \
        FASTREG y = RAM_pp(SP);         \
        x = y + (RAM_pp(SP) << 8);      \
} while (0)

#define PUSH(x) do {                    \
        mm_RAM(SP) = (x) >> 8;          \
        mm_RAM(SP) = x;                 \
} while (0)

and src/yaze/simz80.c

    switch(RAM_pp(PC)) {

so lets re-write these as: src/yaze/simz80.c

#define POP(x)	do {                    \
        FASTREG y = GetBYTE_pp(SP);     \
        x = y + (GetBYTE_pp(SP) << 8);  \
} while (0)

#define PUSH(x) do {                   \
        mm_PutBYTE(SP, (x) >> 8);      \
        mm_PutBYTE(SP, x);             \
} while (0)

and src/yaze/simz80.c

    switch(GetBYTE_pp(PC)) {

And now all the undefined references are gone!

Add any required support functions to the emulator

There are a few functions we need relating to interrupt that are missing from simz80, so we'll add them:

src/yaze/copro-z80.h

extern void simz80_reset();

extern void simz80_NMI();

extern void simz80_IRQ();

extern int simz80_is_IRQ_enabled();

extern FASTWORK simz80_execute(int n);

src/yaze/copro-z80.c

void simz80_reset() {
   pc = 0x0000;
   sp = 0x0000;
}

void simz80_NMI()
{
   FASTREG SP = sp;
   PUSH(pc);
   pc = 0x0066;
   sp = SP;
}

void simz80_IRQ()
{
   FASTREG SP = sp;
   IFF &= ~1;
   PUSH(pc);
   pc = GetWORD(0xfffe);
   sp = SP;
}

int simz80_is_IRQ_enabled() {
   return IFF & 1;
}

The implementation of execute existed already, we just changed the name and signature slightly, so that the number of instructions to execute could be passed in.

Complete the emulator wrapper

All that now needs to be done is to complete the emulator wrapper.

Include the yaze header files: src/yaze/copro-z80.c

#include "yaze/mem_mmu.h"
#include "yaze/simz80.h"

On a power-on reset we need to wipe memory:

src/yaze/copro-z80.c

static void copro_z80_poweron_reset() {
   // Wipe memory
   memset(copro_z80_ram, 0, 0x10000);
}

On a normal reset we need to overlay the ROM and reset the Z80:

src/yaze/copro-z80.c

static void copro_z80_reset() {
  // Log ARM performance counters
  tube_log_performance_counters();
  // Re-instate the Tube ROM on reset
  overlay_rom = 1;
  // Reset the processor
  simz80_reset();
  // Do a tube reset
  tube_reset();
  // Reset ARM performance counters
  tube_reset_performance_counters();
}

Then we are ready to implement the emulator wrapper function:

src/yaze/copro-z80.c

void copro_z80_emulator()
{
   static unsigned int last_rst = 0;

   // Remember the current copro so we can exit if it changes
   int last_copro = copro;

   copro_z80_poweron_reset(); 
   copro_z80_reset();
  
   while (1)
   {
      // Execute emulator for one instruction
      simz80_execute(1);

      if (tube_mailbox & ATTN_MASK) {
         unsigned int tube_mailbox_copy = tube_mailbox;
         tube_mailbox &= ~(ATTN_MASK | OVERRUN_MASK);
         unsigned int intr = tube_io_handler(tube_mailbox_copy);
         unsigned int nmi = intr & 2;
         unsigned int rst = intr & 4;
         // Reset the processor on active edge of rst
         if (rst && !last_rst) {
            // Exit if the copro has changed
            if (copro != last_copro) {
               break;
            }
            copro_z80_reset();
            // Wait for rst become inactive before continuing to execute
            tube_wait_for_rst_release();
         }
         // NMI is edge sensitive, so only check after mailbox activity
         if (nmi) {
            overlay_rom = 1;
            simz80_NMI();
         }
         last_rst = rst;
      }
      // IRQ is level sensitive, so check between every instruction
      if (tube_irq & 1) {
         // check if the emulator IRQ is enabled
         if (simz80_is_IRQ_enabled()) {
            simz80_IRQ();
         }
      }
   }
}

The last thing that needs to be done is to initialize the ROM contents to the Client ROM for the Z80 Co Processor, which can be found here: http://mdfs.net/Software/Tube/Z80/ClientZ80

Dowload this, and convert this to hex that can be included in a C program:

wget -qO- http://mdfs.net/Software/Tube/Z80/ClientZ80  | xxd -i -c16

Add this into copro-z80.c as initialization data for the ROM.

Summary

You can see the additions we have had to make here: https://github.com/hoglet67/PiTubeDirect/commit/a989b8c3a04c61ff0c566e5249b795a91b729a0d

And the final version of copro-z80.c here: https://github.com/hoglet67/PiTubeDirect/blob/master/src/copro-z80.c

So, does it actually work?

Build, copy kernel.img onto the SD Card (renaming to kernel7.img for the Pi 2/3), then start it up.

To switch to the Z80 Co Processor, type:

*FX 151,230,4

then hit break.

Et voilà

Epilogue 1

As is often the case, sometimes things don't quite go to plan. In testing Z80 BBC Basic, I found I was able to LOAD and LIST programs, but RUN would hang. Interestingly, GOTO 10 worked fine!

This type of very obscure error is quite typical of Co Processor development.

I actually hit this issue before, when developing the Matchbox Z80 Co Processor. And I remembered the solution.

It turns out the RUN relies on the Z80 R register (refresh counter) to initialize a random number generator. If that register returns all zero, it will try again (expecting it to change). If it never changes from zero, then RUN loops for ever.

The fix in this case is very simple: https://github.com/hoglet67/PiTubeDirect/commit/9de797a8c8e046b8857ceac878f34bda99951312

Epilogue 2

The next issue I encountered was that the language transfer on control break hangs.

To investigate, we'll need to log all tube transfers to an in-memory buffer, and dump this on the next reset. That should let us work out where things are going wrong.

Here's the additional code to do this, enabled by defining DEBUG_TUBE: https://github.com/hoglet67/PiTubeDirect/commit/8e6c666d0ff47da3e262269244d787c665c096e7

Lets start by seeing an example of a correct log, from the 6502 Co Processor:

Wr R1 = 0a      # Start of reset message
Wr R1 = 41      # A
Wr R1 = 63      # c
Wr R1 = 6f      # o
Wr R1 = 72      # r
Wr R1 = 6e      # n
Wr R1 = 20      # <spc>
Wr R1 = 54      # T
Wr R1 = 55      # U
Wr R1 = 42      # B
Wr R1 = 45      # E
Wr R1 = 20      # <spc>
Wr R1 = 36      # 6
Wr R1 = 35      # 5
Wr R1 = 30      # 0
Wr R1 = 32      # 2
Wr R1 = 20      # <spc>
Wr R1 = 36      # 6
Wr R1 = 34      # 4
Wr R1 = 4b      # K
Wr R1 = 0a      # <nl>
Wr R1 = 0a      # <nl>
Wr R1 = 0d      # <cr>
Wr R1 = 00      # termimating zero

Rd R4 = 07      # Start of type 7 data transfer request
Rd R4 = ff      # ID = ff is langauge startup
Rd R4 = 00      # addr(3)
Rd R4 = 00      # addr(2)
Rd R4 = 80      # addr(1)
Rd R4 = 00      # addr(0)
Rd R3 = 00      # dummy read to empty R3
Rd R3 = 00      # dummy read to empty R3
Rd R4 = 06      # sync byte

Rd R3 = c9      # Byte 0 of BBC Basic II
Rd R3 = 01      # Byte 1 of BBC Basic II
Rd R3 = f0      # Byte 2 of BBC Basic II
Rd R3 = 1f      # Byte 3 of BBC Basic II
Rd R3 = 60      # Byte 4 of BBC Basic II
Rd R3 = ea      # Byte 5 of BBC Basic II
Rd R3 = 60      # Byte 6 of BBC Basic II
Rd R3 = 0e      # Byte 7 of BBC Basic II

But on the Z80 we see:

Wr R1 = 16      # Start of reset message# Start of reset message
Wr R1 = 08      # Mode 8
Wr R1 = 0a      # <nl>
Wr R1 = 0d      # <cr>
Wr R1 = 41      # A
Wr R1 = 63      # c
Wr R1 = 6f      # o
Wr R1 = 72      # r
Wr R1 = 6e      # n
Wr R1 = 20      # <spc>
Wr R1 = 54      # T
Wr R1 = 55      # U
Wr R1 = 42      # B
Wr R1 = 45      # E
Wr R1 = 20      # <spc>
Wr R1 = 5a      # Z
Wr R1 = 38      # 8
Wr R1 = 30      # 0
Wr R1 = 20      # <spc>
Wr R1 = 36      # 6
Wr R1 = 34      # 4
Wr R1 = 4b      # K
Wr R1 = 20      # <spc>
Wr R1 = 31      # 1
Wr R1 = 2e      # .
Wr R1 = 32      # 2
Wr R1 = 31      # 1
Wr R1 = 0a      # <nl>
Wr R1 = 0d      # <cr>
Wr R1 = 0a      # <nl>
Wr R1 = 0d      # <cr>
Wr R1 = 00      # terminating zero

Rd R4 = 07      # Start of type 7 data transfer request
Rd R4 = ff      # ID = ff is langauge startup
Rd R4 = 00      # addr(3)
Rd R4 = 00      # addr(2)
Rd R4 = 80      # addr(1)
Rd R4 = 00      # addr(0)
Rd R3 = 00      # dummy read to empty R3
Rd R3 = 00      # dummy read to empty R3
Rd R4 = 06      # sync byte

Rd R3 = c9      # Byte 0 of BBC Basic II
Rd R2 = 7f      # spurious read
Rd R1 = 00      # spurious read
Rd R4 = 06      # spurious read
Rd R3 = 01      # Byte 0 of BBC Basic II
Rd R2 = 7f      # spurious read
Rd R1 = 00      # spurious read
Rd R4 = 06      # spurious read
Rd R3 = f0      # Byte 0 of BBC Basic II
Rd R2 = 7f      # spurious read
Rd R1 = 00      # spurious read
Rd R4 = 06      # spurious read
Rd R3 = 1f      # Byte 0 of BBC Basic II
Rd R2 = 7f      # spurious read
Rd R1 = 00      # spurious read
Rd R4 = 06      # spurious read
Rd R3 = 60      # Byte 0 of BBC Basic II
Rd R2 = 7f      # spurious read
Rd R1 = 00      # spurious read
Rd R4 = 06      # spurious read
Rd R3 = ea      # Byte 0 of BBC Basic II
Rd R2 = 7f      # spurious read
Rd R1 = 00      # spurious read
Rd R4 = 06      # spurious read
Rd R3 = 60      # Byte 0 of BBC Basic II
Rd R2 = 7f      # spurious read
Rd R1 = 00      # spurious read
Rd R4 = 06      # spurious read
Rd R3 = 0e      # Byte 0 of BBC Basic II
Rd R2 = 7f      # spurious read
Rd R1 = 00      # spurious read
Rd R4 = 06      # spurious read
Rd R3 = 01      # Byte 0 of BBC Basic II
Rd R2 = 7f      # spurious read
Rd R1 = 00      # spurious read
Rd R4 = 06      # spurious read

So, for some reason the Z80 Co Processor is making several spurious reads of other registers during the data transfer.

Let's take a look at the code on in the client ROM that is handling a type-7 data transfer:

FBB4  DB 04        ..    IN A, (r3status)         ; Check TUBE ULA Status Register 3...
FBB6  B7           .     OR A
FBB7  F2 B4 FB     ...   JP P $FBB4
FBBA  ED A2        .     INI
FBBC  C2 B4 FB     ...   JP NZ $FBB4

(from here)

It seems likely that the bug is in the implementation of INI, as this is a complex instruction.

My copy of Rodney Zaks "How to program the Z80" has the following description for INI:

INI: Input with increment

Opcode: 4E A2

(HL) ⇐ (C); B ⇐ B - 1; HL ⇐ HL + 1
N = 1
Z is set if B = 0 after execution, reset otherwise

This is the implementation of INI in Yaze 2.30.3:

      case 0xA2:        /* INI */
         PutBYTE(HL, Input(lreg(BC))); ++HL;
         SETFLAG(N, 1);
         SETFLAG(P, (--BC & 0xffff) != 0);
         break;

Several things seem wrong with this code.

P is being set, rather than Z
the zero test is of the whole of BC, rather than just B
--BC will decrement BC (including the IO address), rather than B (the loop count)

Lets also look at the implementation of OUTI, which should be similar:

      case 0xA3:        /* OUTI */
         Output(lreg(BC), GetBYTE(HL)); ++HL;
         SETFLAG(N, 1);
         Sethreg(BC, hreg(BC) - 1);
         SETFLAG(Z, hreg(BC) == 0);
         break;

Now, that actually looks correct.

It seems the issue is just with INI, so lets try to fix it using the code from OUTI:

      case 0xA2:        /* INI */
         PutBYTE(HL, Input(lreg(BC))); ++HL;
         SETFLAG(N, 1);
         Sethreg(BC, hreg(BC) - 1);
         SETFLAG(Z, hreg(BC) == 0);
         break;

It looks like this was the problem, and we see the expected behaviour on language transfer:

In my experience it's actually very common for processor emulators (and VHDL cores) to contain bugs like this, especially in the areas of interrupt handling and I/O instructions, as these are harder to test.

I hope this tutorial has been useful in explaining how to extend PiTubeDirect, and what to do when things don't quite work as expected.