I started writing a blog post about all the Minstrel 4th alternate ROMs. As part of that, I wanted to include some extra ones, such as the "faster than a ZX81" version etc.
Whilst doing that, it seemed appropriate to revisit that and see if I could make that version any faster, and so I started writing a second blog post, one on optimising code for speed.
As I started to write that up, I thought it would be good to include some diagrams and some actual calculations of the speed differences.
Otherwise it would just be an awful lot of words, interspersed with the occasional screenshot from 3D Monster Maze, as many of my blog posts tend towards.
Before I go any further, I will just clear up a few things, you can skip these if you have read the previous posts on ZX81 BASIC for Minstrel 4th.
Firstly, this is about the optional and very much experimental ROMs I am providing to go with the Minstrel 4th. These allow it to run ZX81 BASIC pretty well (other than high res graphics etc.). It works by replacing the routines that draw the screen on the ZX81 with a routine that parses the ZX81 display from memory (which is 24 rows of 0-32 characters plus a newline character) and writes it to the dual port video RAM on the Minstrel 4th (which is fixed 24 rows of 32 characters with no newlines). Whilst that is happening, there is a microcontroller on the Minstrel 4th which is reading the other side of the dual port RAM and using it to generate the video signal.
The second thing, is the ZX80 ran in two states, "compute", and "display". In "display", it generated the screen, and scanned the keyboard to decide if it needed to run any code. When it wanted to run code, it had to stop the display and switch to "compute", where ran user code until the user code was finished then it returned to "display". This became known as "fast mode" as the code runs at full speed when it is running. (although it flickers annoyingly every time you press a key when it runs the code to process the keypress)
The ZX81 added a third state, so it now had a choice of "compute", "display" and "compute AND display". In the new mode, it ran the user code during the top and bottom borders of the screen, this meant in that mode, the code overall ran slower (about 1/4 of the speed) so was referred to as "slow mode", but it didn't flicker and made games look better.
How can I visualise how fast?
Previously, I have been using a utility name CLCKFREQ by Carlo Delhez. I found it on the internet probably a decades or two ago.
It runs various tests, drawing some pretty pictures
The program is all written in BASIC, and uses the system frames counter at 16436 and 16437. It sets it to $FFFF at the start, and then the ZX81's own display code counts down that timer each frame.
At the end, it works out the difference to get the number of frames counted since the start.
It then gives you a figure in terms of the number frames taken, and a comparison with the standard UK ZX81, which takes 1863 frames to complete the tasks.
My screen drawing code was initially running about 26% faster than the ZX81, according to that program.
One of the first revisions to the new Minstrel 4th ROM was adding a delay to that to slow it down to match 100% of the ZX81 speed.
I am relying on that test program being accurate, but at least it appears to be proportional to the time it takes, so the results seem to be representative.
Can I do better (or at least confirm the results?)
I decided to try hooking up the logic analyser to show the time it took the microcontroller to draw the picture content of the screen, alongside the time it took the code in the modified ROM to parse the ZX81 display RAM and write it to the Minstrel 4th video RAM.
Here I have the first block in blue showing the ATmega microcontroller accessing the video RAM to draw the screen. Below that in yellow, the Z80 side copying the data into the video RAM. That starts at the top of the frame, so there should be no overlap with the drawing routine which starts after the top border. At the end is the vertical sync, just so we can see how it all relates. There is one frame of video starting with each vertical sync pulse, lasting 20ms, giving a frame rate of 50Hz for the monochrome PAL TV standard.
The third block in red was the tricky one. That shows when user code is being executed. I needed to see that separately, and also see that there is the delay between the screen copy and the user code to bring the timing into line.
How do I show when user code is being run?
What can I do in a user program that will be detectable on the logic analyser?
I came up with writing to the IO port that drives the speaker on the Minstrel 4th. I can tap that signal and add it to the graph.
All I need to do now is write some code that will do that.
Ah, there is no OUT command. I did actually add one during the development, but didn't quite work out how to do the matching IN command without lots of changes, so left them both out.
So I decided to stop and start writing a fourth blog post about adding those commands .....
Ah no, I realised that was taking things a bit too far, so I thought about the easiest way to do it.
I should be able to do in two instructions (well, one label and two instructions)
LOOP:
OUT ($FE), A
JR LOOP
So that will write something (I don't care what, and neither does the hardware) to port $FE, and then loop back.
That works out as 4 bytes of code. That's not too bad to type in each time (because I am likely to have to do this three or four times at least).
$D3 $FE OUT ($FE), A
$18 $FC JR LOOP
The jump is relative, so it does not matter what address it is placed it, it will be a simple jump back to the previous instruction. $FC is -4, so that means go back 4 squares, to the start of the OUT command. Do not pass Go, do not collect $0200.
To load that into memory, I used the "modify a REM statement" trick (thanks to George Beckett for suggesting that method for the Minstrel Joystick manual)
I typed in the following:
1 REM TEST
10 POKE 16514,211
20 POKE 16515,254
30 POKE 16516,24
40 POKE 16517,252
When I ran that, it changed the REM statement to:
1 REM PEEK RETURN / UNPLOT
(That is still 4 bytes, three of them are keyword tokens, and one is just a forward slash character)
I could then modify line 10 to:
10 RAND USR 16514
And that should run the code (I also got rid of lines 20-40, but there is no need to, it just looked neater)
Save it at this point if you want to, because it does not test for the break key, it will just run until you press reset or pull the power out.
When I ran that, the speaker started buzzing. Excellent (well, actually not excellent, because it is a bit annoying, but never mind).
That shows up in the trace, I can now see which bit is screen drawing, screen copying and user code. The delay is the gap between the screen copy and the user code.
Excellent, right, now I need to go through all of the ROM options I currently have an capture the same sort of thing.
I went through the 100% ZX81 speed first. Then I switched the Minstrel 4th into NTSC mode. Here it generates video frames at 60Hz to suit the USA and other NTSC territories rather than 50Hz for it's native UK and the reset of the PAL world.
I needed to enter the test program again, I had not saved it (as I thought it would be faster to type in back in).
This time I took a shortcut. I added the "PEEK" (by pressing SHIFT+NEW LINE and then O), that meant I could miss out the first POKE command. I don't think there is a way I can get it back into 🅺 mode to allow me to press the Y key to "RETURN", so I just added any character. The slash I could get with SHIFT + V, and again UNPLOT I couldn't get, so just any character. Then only two pokes were required.
The program is ready to go.
Looking at the trace, you can see the VSync is now 16.6ms, 60Hz, but everything else is about the same. The only difference in the way the picture is generated is the number of lines in the borders. The time to draw the screen and the artificial delay are the same, leaving a smaller slice at the end to run the actual code.
The speed test program shows it is now slower, but then again a ZX81 with the "USA" jumper or a TS1000 is slower than a UK / PAL ZX81.
Running the test program on a TS1000 (or rather a Minstrel 3 with the NTSC jumper set). The speed is shown as a little over half the speed of a ZX81.
So I made a version of the ROM, and adjusted the delay to match the performance of the TS1000. I have labelled that TS1000 BASIC for the Minstrel 4th. (just in case anyone wants to run software designed for the TS1000 at it's original speed)
(maybe I should add a version which runs at 100% the speed of a PAL ZX81, but generates NTSC video - leave it with me).
The other thing I thought I should do is include a version without the delay, and if possible, speed it up even more.
That turned into a whole post on it's own.
But the result is clear.
Look how much more time there is to run code, it has to be faster.
The "CLCKFREQ" program says it is 222% of the speed of a ZX81. I'll take that.
There will be uses for that, I wouldn't recommend it for everything, for example anything which does not count frames for speed, will run 2.2x faster. That means Rex will run 2.2x faster.
As an example, the game will not load any faster, as that is done in fast / compute only mode. The scrolling titles are a lot faster as that is BASIC in slow mode.
The maze generation, the "mists of time" in 3D Monster Maze will not be any faster as that also runs in fast mode, so the speed is unchanged. It is only when it is showing the maze and the player is moving around that it will be 2.2x faster.
I have given this version a slightly different boot screen.
2x81 do you see what I did there?
ZX81 Testing
It is visibly faster, and that program says 2.22 times faster, but I should check it against an actual ZX81.
OK, not quite an actual ZX81, but a Minstrel 3 and that is easier to see what is going on, and it appears to run at 100% of the speed of a ZX81.
The ZX81 / Minstrel 3 video generation is very different, so I will need to show different things on the screen, but there is no obvious "drawing the middle bit of the screen" signal I can tap.
There is a jumper on the Minstrel 3 that is useful for testing and investigation, "grey border".
There is an input to the 74HC165 shift register that control what is output to the screen when no data is being clocked in. Normally this is 0V, so it is the same as the background colour and you get a white border that looks the same as the empty text areas.
To get the grey border, I feed in the line counter XORed with the 3.25MHz clock so it gives the chequerboard pattern with alternate lines inverted.
(Can you tell 3D Monster Maze was a big influence on me?)
There isn't actually a jumper for it, but if you remove the jumper and wire the central pin to 5V, you get a black border.
The ZX81 is in mourning for Rex's latest victim.
Time for the off peak return test program again.
It works very differently, so I am having to monitor different things here. The blue section now shows the video being drawn. Below are the NMI pulses in yellow, the user code IO writes in red and finally the VSync to put it all in context.
This is showing the "compute" in red (during the top and bottom borders) and "display" in blue (during the middle bit).
But because of it's nature, the ZX81 actually executes about 110 slices of time per frame, 55 lines in the top border and 55 in the bottom, so there is overhead going into and out of the NMI handler and counting lines etc.
Previously, user code was running in a continuous block, so it was easy to quantify by measuring the overall time.
Here, the code runs for a while, but is interrupted at the start of every line, and there is a delay as the NMI is handled before the user code starts up again.
I wondered how I could quantify it, then I realised it was easy. The code I was running was this:
LOOP:
OUT ($FE), A ; [11]
JR LOOP ; [12]
The first instruction is 11 cycles, and the second is 12, so that will generate one pulse on the IO_WR_FE signal every 23 cycles.
All I need to do is count those pulses, and there is an option for that on the logic analyser, so yes, there we go.
In each of the two 3.36ms border sections, the code generates 345 pulses, so that is 7,935 cycles each border, a total of 15,870 cycles of user code per frame.
I can then do the same thing on the Minstrel 4th ROMs, so here is the theoretically 100% ZX81 speed.
That shows 692 FE writes, and total of 15,916 cycles.
Wow, it worked!
I mean, of course it worked.
That shows the standard ZX81 BASIC for Minstrel 4th is within 0.3% of the ZX81, and that's well within experimental tolerance.
Let's check the others.
The "fastest" one was 1537 FE writes, so 35,351 cycles, working out at 2.22x.
That also shows that test program is giving great results as well.
How fast is fast?
Finally, what about "fast mode".
I could test that by adding in a call to switch to FAST mode in the test program.
But the ZX81 does not generate any video at this point, so I wouldn't have any VSync for reference.
I could just base it on mathematics. A 20ms frame at 3.25MHz is 65,000 cycles, that should be 2,826 of the 23 cycle IO writes.
Ah, but, the Minstrel 4th does generate video.
I have it set to fill the screen with the chequer board character when it switches to fast mode (to replicate the mists of time, i.e. your 1980s black and white TV showing snow because the picture has stopped being generated - although this is a bit of a fallacy, a false memory induced by TV programs wanting something more visual. In practice, the screen just goes black and maybe flickers a bit, as the modulator is still generating an output at the frequency the TV is tuned to. Snow would only be there if the modulator power was removed or the TV re-tuned)
That shows 2,822 cycles, pretty close to the 2,826 that the maths predicted (again, will within the tolerance of the experimental error here)
Conclusion
I am happy with that, and hopefully even with all the numbers removed you can see the differences between there various implementations.
Two more posts to follow then. One will go over all the versions, using these timings for reference, and the second will cover code optimisation techniques used to get the speed increases.
And then maybe another on ZX81 display generation to go along with work George is doing on trying to understand Husband Forth and maybe get it working on the Minstrel 3 and even 4th.
Oh, and I also want to have a go at Lambda 8300 BASIC. I think this might run nicely on the Minstrel 4th with a modified display routine. It is not tokenised so you type P R I N T rather than pressing P, which might make it easier to use without the correct keyboard overlay. It also has sound and joystick commands.
I haven't found a disassembly for it yet, I might have to give in and do that myself.
Oh, and I need to get back to optimising the glue logic for the Mini PET II. I have a great idea that might simplify things quite a lot if I can get it to work. That has been simmering away nicely in the back of my head for a week now, really need to give it a go.
Oh, and then back to the Mini VIC.
AAarrrrgggh, too many things to do, not enough time!
Adverts
The Minstrel 3 and Minstrel 4th (with the updated ROM including ZX81 BASIC) is available from my Tindie store.
Either in kit form, or assembled if you prefer.
- https://www.tindie.com/products/tynemouth/minstrel-3-with-keyboard-z80-based-zx81-kit/
- https://www.tindie.com/products/tynemouth/minstrel-4th-z80-forth-rc2014-jupiter-ace-kit/
They are also available from Z80kits.com, home of the RC2014.
There you can get a bundle with the Minstrel 4th, RC2014 backplane and whatever modules you want to go with it.
Patreon
You can support me via Patreon, and get access to advance previews of development logs on new projects and behind the scenes updates. New releases like this will be notified to Patreon first, if you want to be sure to get the latest things. This also includes access to my Patreon only Discord server for even more regular updates.