-
Notifications
You must be signed in to change notification settings - Fork 23
Ed's Diary 2016 05 18
2016-05-18 Wednesday
another visit: Ed at Dave's, and later Rich too.
Dave is now running a bootloader so can upload a new kernel.img in half a minute. Also the kernel can now read the cmdline parameter from the config.txt so we can pass in options without necessarily needing to build variants. For example we could choose the engine this way - lib6502 and 65tube are both part of the binary but presently selected as a build-time option.
This does mean the serial input is now needed.
Dave also has two Pi Zeros. What's more, we think we could power them from the Tube. And, they are all certified to run at 1000MHz ARM and 400MHz core freq
Dave has found that 350MHz was a good value - this is faster core freq than the revision 1 Pi normally uses, and it turns out that a faster core freq really helps our bit-banging timing.
CLOCKSP reports 157MHz (Basic 2) and we still have a little headroom - core_freq is the clock which controls the I/Os, which is why we find it needs to be high enough, but also it controls the L2 cache and will therefore affect emulated performance a bit too.
We see some very late data for host reads when running Elite - but only a very few - and we never see them when running CLOCKSP. Perhaps L2 cache misses in the 65tube code?
Dave's current breadboard has 100Ohm series resistors on the databus and an RC filter on the 3 address lines: 470 Ohm, 15picoF.
We adjusted the SPHERE benchmark to make it more of a soak test. It keeps overdrawing the same pixels so if it writes a stray line segment we'd expect to notice, especially when the cycle comes around to making the screen blank again.
3 CPUs:
synertek - everything fine - this is the one which thus beeb came with. Likewise Ed and Rich report stability, with the CPU which came with their respective machines. Turns out the address bus comes valid much later with this historic NMOS part compared to the C02 mentioned below, so we have less time to respond to nTUBE, so reliability was suffering (until we turned up the wick) Also true is that the CMOS part drives phi1 harder than the NMOS part which makes the rest of the Beeb a bit advanced in phase relative to the slow part. The Beeb is clocked by a rubbish rising edge! The rise time of Phi1 out from zero to 2V is 100nS. Wow. Read timing is affected by: Delay time from Phi2 out to nTube: 130nS (120nS on a different part) Write timing (and contention window for reads) is affected by: Clock skew from Pi's 2MHzE clock falling edge relative to CPU's Phi2 out timing ref is: 38nS (32nS)
second synertek slighty faster - see above also, causes really brief negative pulses on nTube. Maybe??
Update: we also saw nTube glitches with the first Synertek. Need to test a CPU both hot and cold, perhaps.
rockwell 65C02 2MHz (8424 datecode) - almost fine, but long TIMEs measured and key repeat went fast (after a few minutes of soak) So we think either system VIA has gone mad (why would it) or the IRQ input to CPU has gone mad - but in a very determistic way. A stray write to the system VIA clock period? Tube is FEE0 System via is FE40 - FE46 and FE47 are supposed to be &E and &27 respectively. So an error on a single address line would be enough to cause what we saw. Tube Elite also is apparently very slightly unreliable with this CPU in the host. We also once saw a load of stray pixels on the intersection points of the SPHERE which indicates that one of the GCOL commands didn't get through correctly.
Of course it's a bit of a surprise that a 65c02 will work in a Beeb - it very nearly does work, but our sort of stress testing is perhaps just showing up the very slight incompatibility.
The rise time of Phi1 out from zero to 2V is 32nS. Much better. Read timing is affected by: Delay time from Phi2 out to nTube: 78nS Write timing (and contention window for reads) is affected by: Clock skew from Pi's 2MHzE clock falling edge relative to CPU's Phi2 out timing ref is: 18nS
This device also does exhibit short negative nTube glitches (which would cause a LATE that doesn't mean we missed a legit ISR)
rockwell 4MHz (1145 datecode and LOLRARE!) - seems fine, except language transfer really not working - previously we'd seen a final block of 255 bytes, this time we saw really odd small quantities transferred. Elite also bombed out very quickly - doesn't even get to attract mode We don't yet know if the matchbox would work with this CPU
- actually we tried, and matchbox has same failure. So this CPU is just out of spec in some sense. We don't understand exactly what the failure is.
We added back in a telltale toggle to the ISR to allow time measurement (using a GPIO which only appears on 40 pin expansion header) but initially we made it a minimally short toggle. We learn at once that the very late reads and all read variability comes from the time to enter the ISR.
Now, we do mask interrupts at one point in 65tube - it's in the event handler, which is the destination of the alternate dispatch when the 6502 needs to interrupt the instruction flow. Interrupts are masked for quite a few ARM instructions as the event handler can branch to C code... and yet, if we need to mask interrupts to avoid crashing, surely we must be failing to service the nTube soon enough?
Dave believes he's previously moved the clearing of interrupt masking to a slightly earlier point without much effect - and yet now that isn't so - moving the clearing up is causing tube init to fail.
We added a LATE diagnostic - it does sometimes appear. That means nTube is sometimes glitching, or the ISR is sometimes taking an extremely long time to get started.
We did measure that the ISR duration always seems pretty constant - as does the time spent in the event handler, which is the subsequent non-ISR code that presently (somewhat inexplicably) masks interrupts. But we do see that the ISR takes quite a variable time to get launched, after nTube. And also the time of the event handler starting is quite variable. Even so, the event handler is done during the host cycle after the nTube cycle. And the earliest next nTube cycle - ignoring the dummy read problem which we cannot presently deal with - is five cycles after the previous one.
Dave demonstrated that the lib6502 engine also works, by no means flawlessly but not crashing instantly. It doesn't mask interrupts at all.
Maybe sporadic very long interrupt latency is behind every bad effect we now see?
Rich turned up, and told us he'd even tried connecting with just series resistors with no apparent damage to Pi or difficulty interoperating with a Beeb. This would probably be more risky with a Master as it probably operates at full rail logic levels.
TODO (not an ordered list):
- start up some wiki content, with pages for Todo, level shifting, hardware build, software build, soak testing, and so on.
- use Git's Issues for some outstanding questions, as appropriate
- put some ordering into the Todo page
- understand the issue of masking interrupts: we don't know quite why we need to mask them and now we're unsure why we're getting away with missing nTube handling entirely. If we miss a read, then the value driven to the bus by our hardware will be related to the last value we wrote and also related to the pullups on the pins which are outside our control.
- change dispatch code somehow to slug performance to the 3MHz of the original or thereabouts - does not need to be exact.
- figure out the performance counters to see for example if L2 cache misses are important or whether working set of Elite is especially big - only Elite shows very late read data (and not very often)
- write a soak test which repeatedly saves and reloads a block of RAM containing something like Basic, and finally compares or checksums to see if any errors have accumulated
- write a soak test which makes an effort to cache-bust by touching most of 64k memory map (and perhaps also by exercising almost all 6502 instructions)
- support cmdline in config.txt and allow choice of lib6502 or 65Tube or 3MHz vs full speed perhaps
- try the sparkfun nibble-wide level converter. (Rich has it now)
- try the contraption on an Atom. Or a Master Compact, or an Electron...
- make up a little Pi Hat board?
- see if any other HLL emulation core has the trouble which lib6502 has - perhaps a Z80 in C (there must be one) - or a core with a bigger RAM map like an x86, which might have even worse cache pressure at worst case.
Hardware
Software
- Build dependencies
- Running cmake
- Compiling kernel.img
- Deploying on a Pi
- Recommended config.txt and cmdline.txt options
- Validation
- Compilation flags
Implementation Notes