Sunday, 18 January 2026

A New CRTC for the Minstrel 4th? Part 2

Continuing a series of posts from the last year on Patreon....

In the previous post, I looked at my options for replacing the (apparently) end of life ATmeag48PA microcontroller in the Minstrel 4th which acts as the CRT Controller, generating the video sync and clocking through the video RAM.

The best option seemed to be rewriting the code to run on one of the new AVR series of microcontrollers.

Writing the code

I have vanquished the beast of the framework for another project, and I am now able to write plain assembler and build from the command line.

On the Minstrel 4th, this chip controls various latches and buffers to be able to read the video RAM. This contains both the character data and then character bitmaps for each character.

The screen is generated on the fly, so the microcontroller first has to read the character. It then uses that character code and a line counter to address the character bitmap and get the 8 pixels it needs to feed to the shift register (as well as latching in the character inverted state). All this has to be done in 8 cycles per character.

It is also responsible for generating the composite sync signal, which has to be cycle exact. The 154ns difference that one cycle makes is noticeable on the video output as a jaggedy edge to the screen or wobbly characters.

It seems I have the facilities I need here, these chips are based on the later version of the AVR architecture, most of the IO control operations are a single cycle, such as setting a pin high or low, or changing a whole 8 bit port. And with reference to a previous post, on these chips, one instruction cycle  does equal one clock cycle.

Pinouts

On the old ATmega48, I had two ports (PC0-5 and PD0-4) that I could write a counter value to the port directly to provide an address on the lower 5 bits of the port.

The ports available on the new chip are different. I have an 8 bit port (PD0-7), a 4 bit port (PC0-3) and a 2 bit port (PF0-1). There is another 8 bit port (PA0-7), but I loose PA0 and PA1 for the crystal and PA7 for the clock out (all of which are fixed positions, like port B was on the ATmega48).

That only leaves me with PD0-7 for one counter. The other I have only PA2-PA6, but I think I can get around that by counting up in fours.

(N.B., I had to search through quite a few old datasheets to find the picture of the ATmega48 in the "new" colourful style. The current datasheet on their website has gone back to the "old" black and white style. I give up)

HSync timing

The most critical timing is the start of the horizontal sync pulse. There needs to be a consistent period of 64µs at the start of this pulse or the picture will drift. We saw that in a previous post where the line length changes and the picture shears to the side.

There are three ways I have considered to achieve that.

1) Count cycles

Make sure there are exactly 416 cycles between horizontal sync pulses. That means counting each instruction, and also making sure that all paths for conditionals and loop all take exactly the same number of cycles each time. That is quite hard work. That was the method I used on the original Minstrel 4th and Mini PET. (416 * 1/6.5MHz = 64µs)

2) Timer interrupt

I rewrote both the Minstrel 4th and the Mini PET to use a timer interrupt. I would put the microcontroller to sleep at the end of a line, and it would get woken up when an internal timer had counted to 64µs. It would then continue by setting the composite sync low to mark the start of the horizontal sync. This would always happen at the exact time, and so the edges were always clean, no matter when it had finished the line and gone to sleep, it would always be woken at the right time.

3) External interrupt

As above, but the way I am doing it on the Mini PET II is using a hardware clock divider to divide the 16MHz clock by 1024 to get 15.625KHz or 64µs I need. In this case, I don't think I will have a spare pin, and no need to add the extra hardware to do the division.

Lets go with Option 2

My plan was to go for the second option again, as that seemed to offer the best / neatest / shortest code.

As we saw in a previous post (that was all a bit ranty so I cut it out, sorry), the interrupt handlers generated by the framework were 120 bytes or so, without even any user code, so I couldn't use those.

So I wrote my own. My plan was to use the version I had on the Minstrel 4th, go to sleep and set the interrupt to simply wake the user up and return.

It's a little more complicated here as I also need to clear the relevant interrupt flag, which involves writing to one of the registers, but that is one that required additional instructions to access. I managed to get that down to a few cycles by pre-setting the address and the value to set it to in spare registers.

That seems to be working, the interrupt times are approximate, just testing it out. In theory, I can set an interrupt for each part of the video signal. The horizontal sync, the back porch, the border, the start of the text etc.

I modified the previous code to setup the new style timers and do all the clock initialisation stuff, but it didn't seem quite right.

I was not getting consistent pulse lengths, one or two cycles each way.

- Much time passes -

I tried various other options, including using multiple timer and compare interrupts to generate an interrupt at all the critical points of the line.

One interrupt would fine, set CSync low then return. The next would fire 4.7µs later and set CSync high. Then 5.7µs later another would fire and turn the blanking off, 6µs later another would start the characters being drawn etc.

It was a lot of work restructuring everything to fit into that architecture (I hate special cases, so was trying to make everything work in a neat and consistent way).

And it didn't work.

It was still occasionally a cycle or two out, no pattern, no consistency.

- Even more time passes -

I had to adjust my timings as apparently even in idle mode, where it is not meant to stop the clock, it would take 6 cycles to start the interrupt.

It seems that means "up to 6 cycles" and may explain the inconsistency?

- Expletive deleted -

This was very frustrating, I spent way too much time getting to this point and trying to make it consistent, but it was just not doing it.

- Expletives deleted -

I finally decided to go back to option 1, cycle counting.

Back to Option 1

I am going to have to cycle count the entire thing.

Twice.

The initial version will be for PAL, and I will have to go through again and cycle count a version with timing suitable for NTSC.

Here we go again....

One approach to this is to unroll everything, have pages and pages of NOPs and no conditional expressions, no loops etc.

By my calculation, that would be 129,792 cycles per frame. I could just about fit that into the 131,072 bytes of program memory in the 128K version of the chip, with a few hundred bytes available for initialisation etc.

That would not leave space for the 108,468 cycles for the NTSC frame that chip also needs to generate, based on a jumper setting.

Oh, and unlike the other AVRs, this variant only goes up to 64K anyway.

Looks like I am going to be using loops then.

It's 1973, almost dinner time, and I'm having Loops

With a loop there are three different paths through, which each take a different number of cycles.

These are first cycle, the intermediate cycles and the last cycle.

Regard this simple salt cellar, I mean regard this simple loop

  •     ldi COUNTER, 19            ; [1]
  • loop:
  •     dec COUNTER                ; [1]
  •     brne loop                  ; [2/1]

The first time though we get the ldi (load immediate, 1 cycle), the dec (decrement, 1 cycle) and the brne (branch if not equal, i.e. not 0). That is false, so it branches back, so that's 2 cycles. 1+1+2 = 4 cycles

The next 18 times we get the dec and the brne (with branch), 1+2 = 3 cycles

The final pass is the dec, but the brne condition is true, so it is just 1 cycle, so 1+1 = 2 cycles

That actually balances out if you just want a delay, 4 + (18x3) + 2 = 60 = 20x3

But what if you were to add an IO operation in the middle of it, say to write out the counter to port A.

  •     ldi COUNTER, 19            ; [1]
  • loop:
  •     out VPORTA_OUT, COUNTER    ; [1]
  •     dec COUNTER                ; [1]
  •     brne loop                  ; [2/1]

That just adds 1 to each cycle.

  • The first time though we get 1+1+1+2 = 5 cycles
  • The next 18 times we get 1+1+2 = 4 cycles
  • The final pass is 1+1+1 = 3 cycles

It might look like you could just treat them all as 4 cycles and the first and last would balance out, but look at where the IO instruction occurs, during the loop, you can think of it as the first instruction, but in the first run, the ldi is executed first, so it is the second instruction in. When it exits the loop only two instructions after the IO instruction.

That means in practice, whatever came before needs to be one cycle shorter that normal, and there needs to be an extra one cycle delay before the next one starts.

Conditionals also need to be counted, for example, at the start of every frame I need to read the PAL/NTSC pin and decided which type of frame to draw.

But if you were to add an IO operation

  • start_of_frame:
  •     sbic VPORTF_IN, PAL_NTSC_PIN        ; check PAL/NTSC pin
  •     rjmp pal_frame                      ; draw a PAL frame
  •     rjmp ntsc_frame                     ; draw an NTSC frame

The sbic instruction skips the next instruction if the bit in the IO port is clear, so if the NTSC jumper is not set, it executes the rjmp instruction to relative jump to the PAL frame code (rjmp is 2 cycles and can jump +/-2K, and my code is only about 2.5K, so saves one cycle over a normal jmp instruction). If the jumper is set, it skips that instruction and rjmps to the NTSC frame code instead, but that adds an extra 2 cycles, so the NTSC code needs to be 2 cycles shorter than the PAL code to compensate.

I also need to deal with resetting row, column and line counters, checking for line >7, row>24, column>32 etc. so there are a lot of cases to check.

There is also the weird vertical sync section of short and long sync pulses, all of which need to match.

This was a lot easier with the external clock on the new analyser, I could scan along the rows, checking each one was the same number of cycles, and spotting the ones where I was a cycle out here and there due to the conditionals etc.

Testing and Tweaking

I think I must have done this sort of thing too many times, it only took a couple of hours to put something together.

Not bad for a first attempt. I have disabled the character read so it is showing random character data.

The display lines were too long in the middle, and there is a problem with the top line, or one nearby, so it skews the picture for a few lines.

I fixed the display line, now for the distortion at the top.

The last line of the top border was one cycle short, 59.076µs vs 59.228µs for all the others. The difference is about 152µs, or one clock cycle.

One nop added, and it is now fixed. (it was the extra cycle after the loop I described above and then fell for myself)

OK, time to add back in the characters.

Oops, forgot to update the character row counter, so I have 24 copies of row 0, but they look OK.

Looking good.

Once it was all sorted, all the lines were the same length, and the whole thing was exactly the calculated number of cycles.

Time for more testing.

The black line on the side flashed up like that before it booted.

It went away when booted. I think I know what that is about. When it boots, it fill the screen with inverse not-space ($FF), to avoid conflicts when writing the ZX81 character set to the font RAM). When ZX81 BASIC starts, it clears the screen with normal non-inverse spaces ($00).

OK, so the normal display it is working OK (10 PRINT "X"; because the semicolon is SHIFT X and I am lazy). The black lines do not appear with non-inverse characters.

Let's try the inverse and see if I am right.

Looks OK? but check out the last character on each line.

There are two extra black pixels on inverted characters at the end of the row.

The invert signal is latched separately from the character data, and I need to manually clear the latch at the end of the line.

I used to have to wait 6 cycles to clear this, but now it is set earlier, so I only need 4.

I swapped the order of a couple of instructions and it's back where it should be.

Cycling Efficiency Test

Previously, I needed all 8 cycles per character to set everything up, but by rearranging the ports I was able to set some bits with the column / line counters, and so was able to shave off a couple of instructions, three in fact.

I made use of two of those to limit the time the dual port video RAM chip was selected.

In the old version, I did not have any spare cycles, so turned on the video RAM chip select before the first character, and turned if off 32 characters later. That meant it was occupied for a while and increased the chance of collision where both sides of the dual port RAM were trying to access the same section at the same time.

Now I have spare cycles, I had space (well, time) to turn the VRAM chip select on and off, I got it down to 4 cycles on, 4 cycles off, which is an improvement on the 264 cylces on, 0 cycles off in the previous version.

That should help reduce the potential for snow on the screen when the video RAM is occupied.

NTSC

The next job was to do all of that again, but for NTSC.

And the first part of that was deciding which NTSC.

It seems more straightforward to do the PAL version. The lines are 64µs long, and there are 625 lines interlaced, so 312.5 lines progressive. 

Ignoring the half line, generating 312 lines x 64µs gives 19.968ms per frame, which gives a frame rate of 50.08Hz. 64us is a exactly 416 cycles at 6.5MHz, so that sounds good to me.

It also means the half line sync pulses are 207 cycles long, so that's all still neat.

The NTSC spec (a mythical thing I have never seen) uses a framerate of 30/1.001 interlaced frames per second, so 16.683ms. With 525 lines, that makes it 63.55µs per line.

Some implementations use a fixed 64µs for both PAL and NTSC, including the ZX81 and the Commodore PET.

I thought I would see if I could get the 63.55µs line length working, but it doesn't work out as well.

413 cycles is close, giving 63.54µs. That makes a progressive frame of 262 lines 16.647ms or 60.071Hz, which feels a bit high.

The half lines don't split easily, they would have to alternate between 206 and 207 cycles.

I decided to go with 414 cycles, giving 63.69µs. That means all the half lines are 207 cycles.

By having 262 lines rather than 262.5, and a slightly longer line length, the frame works out at 16.687ms or 59.925Hz, which is closer to the spec.

Less adjusting to do now as most of it is just a case of shortening the front and back porch elements by 1 cycle each.

Again, it is just a case of looking for lines which are too long or too short and adjusting they timings until they all match.

This was the start of the frame where it did the PAL/NTSC test.

All done, and the NTSC output is looking good.

Sorted.

Phew.

I wish I had just done that a couple of weeks ago, but it was the lure of such a neat solution using the interrupts to trigger all the changes and sleeping most of the time. (which is ironic as I spend too much time sleeping, interrupted at the wrong times by doorbells or phone calls)

I am frustrated I didn't get it to work and annoyed at myself for spending as long to then fail.

What's next?

That was all running from the curiosity nano development board, wired up to a Minstrel 4th.

That uses a different microcontroller to the one I will finally use (44 pin vs 28 pin), so the next step is to rewire that to a 28 pin chip and check everything is still OK.

(it should be OK, but the errata sheets on the manufacturer's website are full of things which don't work in some packages for whatever reason, the smaller packages sometimes have fewer LUTs and timers etc. and the links between them sometimes don't work when some are missing)


Adverts

Check out my Tindie store for all sort of kits, test gear and upgrades for the ZX80, ZX81, Jupiter ACE, Commodore PET.


Patreon

You can support me via Patreon, and get access to advance previews of development logs like the ones these posts are based on, and progress on new projects like the Mini PET II and Mini VIC and other behind the scenes updates. Mini PET II Parts 13 and 14 are part written, I just need to do some prototyping and get some screencaps to complete part 13 and then decided if part 14's plan is even possible. This also includes access to my Patreon only Discord server for even more regular updates.