Over on my Patreon Discord, we have been looking at several versions of Forth for the ZX81, including Husband Forth and Tree Forth.
Those (just about) work on the ZX81, but none of them work on the Minstrel 3. George Beckett is currently reverse engineering those to see how they work, and we think we know now why they aren't running on the Minstrel 3. Geroge's work in progress can be found here.
As part of that, George had isolated the display routines and I suggested the main loop could be replaced with a delay to give a minimal implementation of just the display, so we can analyse it further.
That turned out to be more complicated than I expected. More on that to follow another day.
Whilst that was going on, I thought it would be interesting to compare that to a similarly cut down ZX81 display routine.
So that's what I did.
I have got it down to under 200 bytes (plus the font), and it may be useful for anyone writing some stand alone code to run on ZX81 or Minstrel 3 hardware. Maybe as the basis for a self test ROM or something like that? a ROM based game?
This is based on 8K BASIC (and the incredibly useful annotated disassembly by Geoff Wearmouth). That ROM was supplied to run on the ZX80 and ZX81, but I have only implemented "slow mode" or "compute and display", which is only available on the ZX81 hardware. It will always be running user code in between drawing the screen.
It has simplified things not having to worry about supporting this and the "fast" mode where it is either just running code drawing the screen.
How does it work?
From the user code point of view, the flowchart looks something like this.
The system starts up, does a bit of initialisation and starts off the display. Then it jumps into the user code where it does it's own setup and finally enters the user code main loop and just keeps running around.
Somewhere in the background, the display is being generated, but the user code does not need to know or care about that (unless it is doing anything timing specific).
What actually happens is slightly different, the display code is actually in control and it keeps stopping and starting the user code as required.
The user code runs in 56 bursts of around 140 cycles during the top border and another identical set during the bottom border, giving a total of around 16,000 cycles per 20ms frame, or the equivalent of about one quarter speed, or 0.8 MHz.
You can see in this trace the two blocks in red at the start and end of the display cycle (left and right of the trace). This is where the user code is running. It is paused for the entire central section where the Z80 is occupied drawing the display - executing NOPs to clock through the display file as the display hardware is generating the display on the fly and sending that to the electron gun drawing the display on the CRT we are both pretending you are viewing it on.
The trace above shows time progressing from left to right over the course of a frame. On the screen, it moves from left to right each line, and from top to bottom over the frame. The computing happens in the borders, and the display happens in the middle, where it needs to be.
Compare that to the ZX80 and ZX81 in "fast" mode, where it can either run code all the time.
or it can only draw the display (and simply wastes time in the borders)
Let's look at the code
The code involved is actually quite short, so we can go through all of it here.
START
The Z80 starts at address $0000. The code here is brief, to leave space for RST handlers at $0008, $0010, etc. up to $0038. (If you aren't aware, those are the places for short, fast, common routines that can be called with a single byte instruction RST $08 etc.)
Those aren't used here, but the space has been left in case the final application needs to make use of them in future.
The first instruction turns off the NMI generator. When active, that will generate an NMI pulse when a hardware counter reaches 64µs at the end of a display line. It is turned off until we are ready.
The second instruction sets the stack pointer to the top of RAM, assuming 16K, but it could be adjusted or a RAM test routine added as required.
Then it jumps to the system init code...
INIT
The code here sets up the display file, an initial newline character, then 24 lines of 32 space characters with a newline at the end of each.
You could start with a collapsed display file, just 25 newline characters, but I don't think there is any reason not to use a full one, and I am thinking you might be updating the screen and until you have finished, you could have overwritten some newlines, but not added enough back to fill the screen.
The I register is set to the MSB of the address of the font ROM which is used by the display hardware to read the character bitmap data.
The INIT_DISPLAY routine is called, and then it continues on to the user code (well sort of).
INIT_DISPLAY
What INIT_DISPLAY actually does is effectively pause the main code path, pushing its registers onto the stack (the return address is also already on there as part of the CALL).
It then drops through to START_FRAME.
START_FRAME
This is the start of the display routine proper.
The frame starts with a vertical sync pulse. There is a slight delay before the pulse starts, then a longer delay before it finishes.
8K BASIC originally used a VSync pulse of 380µs, about 6 display lines. The composite video spec says it should be 8 lines, so I have adjusted the timing to match the spec of 8 x 64µs = 512µs. That takes the display from 310 lines as standard to 312, which is what it should be.
This is the place to do any keyboard scanning required. I have removed the code here to simplify things. A single IO read operation is required to start the VSync pulse, and here that is used to read bit 6 of the keyboard port, the NTSC jumper. It sets the number of lines in the top and bottom borders accordingly, 55 lines for a PAL system, 31 for NTSC. There is an extra transition line at the end of each border, so it is actually 56 and 32.
8 + 56 + (24x8) + 56 = 312 lines (two frames = 624 lines, close enough to the 625 specified)
8 + 32 + (24x8) + 32 = 264 lines (two frames = 528 lines, close enough to the 525 specified)
Any IO write ends the VSYNC pulse, then it is time to draw the display.
(I did consider using the hardware to time the remaining 6 or 7 lines after this code, rather than a delay, but I though that might add unnecessary complication, and would make it more complicated to add back in keyboard scanning code.)
The HL register pair is set to the start of the display file (a newline / halt instruction) and then setting bit 7 of the H register moves that to the echo in high RAM. (The display hardware detects instruction reads in the high RAM area and feeds the processor NOP instructions instead)
Then it's time to draw the top border.
DRAW_BORDER
When DRAW_BORDER is called, it pop's the return address off the stack and stores it in the IX register. This would be the instruction it would return to if it were to return now, but it has lots of other things to do before if finally returns to where it was called from.
Next it does a bit of manipulation with the number of border lines. The resultant value is set so that if you keep incrementing the A' register, it will hit zero when the appropriate number of lines has been drawn and the border is complete.
It then enables the NMI generator, which will generate an NMI interrupt at the end of each line.
Then the user mode registers are restored from when they were pushed on the stack way back in INIT_DISPLAY.
The RET instruction then returns from that call to INIT_DISPLAY, where it drops through to running the user code.
And then just as it gets started, the timer hits 64µs and a non-maskable interrupt occurs.
NMI
When the hardware counter reaches 208 cycles (64uS), the horizontal sync pulse beings to be generated. When the NMI Generator hardware is enabled (as it is now), that horizontal sync pulse also pulls the NMI pin low and creates a Non Maskable Interrupt. The Z80 pushes the address of the next instruction in User code land onto the stack, then jumps to the NMI handler at address $0066.
This increments the border line counter A' and checks to see if it has reached the end of the border. If it has not, it returns to user land and continues with the users code until the next NMI.
END_OF_BORDER
When it reaches the end of the border, it is time to pause the user code again, and registers are pushed onto the stack, the return address is already there from the call to the NMI handler.
It then sets up the pointer to the start of the high RAM mirror of the display file and stops the NMI generator. Finally it returns to the address we saved previously in IX.
AFTER_TOP_BORDER
This code continues setting up the counters and then calls the routine to draw the display. B is the row counter, 25 rows (one of which is the initial empty row), and C is the scan line counter, which is initialised to 1, ready to roll over on the first call.
DRAW_DISPLAY
For the first time here we enable maskable interrupts. The INT pin is hard wired to address line A6. The Z80 will be halted, so it will be internally running NOP instructions and the only external activity will be the refresh cycles. These were designed for dynamic RAM, which was larger and cheaper than the static RAM that the ZX81 used, but required to be accessed every 2ms to maintain the data. The Z80 could be set to automatically refresh a range of RAM when it was otherwise occupied, simplifying system design.
Here there is no DRAM, and the refresh counter is repurposed. It is preloaded with $F5, and that will count up with each 4 cycle instruction (4x 3.25MHz cycles = 8x 6.5MHz cycles = 1 character width on the screen). So after 11 cycles, it will clock around to $00 and A6 will go low and a maskable interrupt will occur.
The jump instruction goes to the first byte of the high RAM echo of the display file. This is a newline character, which is also the OP code for halt, so when it is executed, the Z80 halts.
This will push the return address onto the stack and call the maskable interrupt handler. The instruction after the initial newline / halt in the display file is the start of the first row of characters. The return address is now pointing to the start of this row (I hope you are still following this).
MASKABLE_INTERRUPT
Each row of characters produces eight lines on the display, one for each row of pixels in the character bitmaps.
When drawing the subsequent lines, the return address is the start of the next row of characters, that is thrown away and it is instead sent back to the previously stored start of the current row.
When the scanline counter reaches zero (which it will the first time through as it was pre-loaded with 1), the return address of the next row is used instead, and it will start on that row next time.
The refresh register is set to $DD, so it will trigger after 35 character widths. It will have been halted sometime before that count is reached, depending on the number of characters in the row. The shorter the row, the more time it spends halted, and it will always get the next interrupt at the same time at the end of each scanline.
When all the rows are complete, it returns to the previous address on the stack, the AFTER_DISPLAY function.
AFTER_DISPLAY
From there, it calls DRAW_BORDER again, and that loops around drawing the bottom border lines. Each line it jumps back to user code (remember that?) and is then interrupted by the NMI in the same way as the top border.
After the border is complete, it returns to AFTER_BOTTOM_BORDER, which jumps back to START_FRAME and it all begins again.
Meanwhile.....
USER_INIT
Throughout all of this, control has been returning to the user code during drawing of the borders.
The code here is just an example to put something onto the display.
This is filling the screen with a character set, offset each line to make it a little more interesting.
USER_LOOP
After initialisation, there is a loop which goes through the display file incrementing each character. It masks each off so that it will never go out of range. This gives a nice animated effect.
I did write a couple of faster versions, but it was a bit jarring to see it moving so fast, so I have stayed with the slower, more visually appealing version.
You could of course fill the remaining 7.5K of the ROM with whatever code you wanted.
Side notes
Timing is very important in code like this. A few cycles can make all the difference.
Here the top of the display is tearing to the left. This was due to one of the lines (the transition between the top border and the first display line) was short, clocking in at 60.6µs instead of 63.7µs like the rest. The TV takes 30 or 40 scan lines before it readjusts.
(if the line is too long, it will tear to the right)
The scanline counter used to index the character pixel data is in hardware (half of a 393 counter), and has to be duplicated in the Z80 (in the C register). These need to be in sync with each other or you can get effects like this where it is one line out, so you see bitmaps from lines 7,0,1,2,3,4,5,6 instead of 0,1,2,3,4,5,6,7 (see the graphics characters top left).
The counter is reset by the VSync pulse, so there needs to be a specific number of cycles between that and the line starting to keep them in sync with each other.
Here are all the timings if you are interested. See the previous post to find out how these were generated.
This differs slightly from the ZX81 in that it uses 312 lines and a 512µs VSync.
Collapsed display file
I did test a version with a collapsed display file, with an increasing number of characters per line. Using the "grey border" jumper on the Minstrel 3 you can see it is correctly halting at the end of the characters, and so the background is used for the rest of the line.
No newline / halt characters?
One thing I investigated was setting it up to run from a fixed block of 32x24 characters, rather than having to include an extra character at the end of each line.
It seems there are no easy answers there. I did try setting it up so the refresh counter would hit the end just after the 32nd character, but that forces the display hard to the right which doesn't work particularly well.
It seems HForth uses something similar, but also has shorter display lines to try to make those more central, which leads to some of the display problems we are seeing.
One way it could be done would be patching the display file on the fly. Read the 33rd character, store it, and then replace it with a newline. When it is time for the next line, restore that saved character and take out the 33rd character of the next line.
That _shouldn't_ affect user code as any changes should happen between the borders where no user code is running, so everything should have been put back as it was before control is returned to the user code in the bottom border. However, I am not sure there will be enough cycles in the maskable interrupt handler to deal with it.
Where's the source code?
The code for this, and a compiled binary demo ROM are available on my github. This might make the basis of a self test ROM or a utility ROM, a terminal emulator? or an 8K game maybe? If you do use it for anything, let me know, there is space in the Minstrel 3 ROM for more ROM images if anyone fancies having a go at something.
Adverts
The Minstrel 3 (with hardware support for compute and display mode) and lots of other things are available from my Tindie Store.
Patreon
You can support me via Patreon, and get access to advance previews of development logs on new projects like the Mini PET II and Mini VIC and other behind the scenes updates. New releases like this will be notified to Patreon first, if you want to be sure to get the latest things. This also includes access to my Patreon only Discord server for even more regular updates.