Sunday 30 April 2023

Mini VIC Development Part 4 - Video out tests

The last of the Mini VIC development logs for the moment. From February 2022.

Over the years, I have come up with all sorts of ways of generating a video signal from a microcontroller. The different technique have pros and cons, and limits. The MiniVIC project has pushed the limits of the techniques and devices I have previously used, so I a need to explore some new ones.

Practically speaking, I need to generate sync signals and video data. For composite video, composite sync (CSync) is generated from a vertical sync and a horizontal sync (HSync), and then merged with the video data in the analogue domain.

Vertical sync is one (usually negative) pulse for each frame, at a rate of 50Hz or 60Hz depending if it's PAL or NTSC.

The frame is composed of a number of lines (312 / 262 for PAL / NTSC), each of which are 64uS, a rate of 15.625KHz (or 63.5us / 15.750KHz for NTSC).

It is very important when generating these signals that the timings remain consistent. Particularly the start of the HSync pulse. If that is out by only one microcontroller cycle, you will end up with wobbly vertical lines.

One of the things I wanted to play with was the Atmel Start tool for configuring a project and automatically generating code ( From previous experience, I will probably end up refactoring the massive number of files it generates into a single file, and then likely re-implementing the whole thing in assembler anyway. But, let's give it a go, it seems to have a lot of potential.

You setup your project, pick your microcontroller, and then can graphically configure the peripherals on the microcontroller. It seems this is the only way to find out some things which are missing from the datasheet. I couldn't find the magic numbers to unlock the clock select register, or the truth tables for the logic cells, and had to reverse engineer the auto generated code to find them.

You get to add various modules for timers, USARTs, ADCs, Digital IO etc.

You can rename some of these to help, but the names don't follow onto all the pages, so you need to keep notes as you go along of what timers you are using for what etc.

You can also name pins etc. and setup each pin individually to set if they are input or output, set the initial states, pullups etc.

You can setup and link all the clocks, but in most cases you only have the choice of the main clock, but it's nice to see.

You can use the "events" system to link things together. In this case, I am linking the overflow of the horizontal counter to the count input of the vertical counter. This is one of the pages were you don't get to see your renamed modules so you have to remember which is TCA0 and which is TCA1 etc.

This means the vertical counter will automatically count up each time a horizontal line is completed. With the horizontal and vertical counters linked up, I have set them in waveform generation mode, which is automatically generating the HSync and VSync signals I need. This requires some initial setup code, but after that, it requires no code when it is actually running. It just sits there generating the precisely timed sync signals. That's a really neat feature.

Finally, there is the combination logic, which is like a GAL chip inside the microcontroller, and you can set that up just on digital IO, or you can reference the internal peripherals. In this case, I am generating the CSync output automatically from the horizontal and vertical counters.

This is one of the things I couldn't work out from the datasheet. Apparently !(A^B) is table number 9, but I had to search through the 160 files in the project that Atmel Start generates to find that. (note it also generates a load of commented out code, not sure that helps)

There are various definitions of what would composite sync should be. The simplest is just ORing the active negative sync pulses to so that CSync is low for the entire VSync period and all the HSyncs. This is not ideal as the TV can loose track of the HSyncs and can take several lines to get back into sync (leading to tearing of the image at the top of the screen). The second is XORing those signal, so the CSync is inverted during the VSync period.

This is better, but also not ideal as the transitions are a bit messy.

Finally there is the third version which is the correct one. It uses specially formatted lines with a pattern of shorter and longer HSync pulses during the VSync period. That is the version I generate in the Mini PET, and what I will end of doing on the Mini VIC, but for the time being, the XOR version will do fine.

Most of these peripherals have the options of driving several alternative pins, but these still seem to end up all over the chip. That is going to be a problem when I need several full 8 bit ports for the address and data busses, but I'll worry about that later.

My tinkering has lead to a usable display, with a few issues.

The vertical lines aren't perfect, which seems to be an issue with the USART output being used as a shift register to generate the video pixels. That is also an issue at the start and end of a line, leading to the thick first pixel, with visibly thicker pixels at the start and end of the lines.

I had problems aligning to the start and end of the USART data, which is going to be a problem synchronising this to the colour data which will need to change every eight pixels. At the end, I had to add a load of NOP instructions to wait until the data had completed being sent. The transmit buffer empty and data transmission flags both indicated it was clear, but there were still a byte and a half to send.

If I reduced the number of NOPs to try to tidy that up, it stopped working and would only ever send the first line. Here I toggle the invert bit on the output at the point it says transmission is complete. As you can see, it's no where near the end. The serial output is high when there is no data, so I planned to disable the USART and set it to a plain digital output set to low. That didn't work, so I ended up using the "invert this pin" setting to invert it, but that left the unsightly thick border at the end. A final option is just to keep sending 0x00 all the way to the start of the next line, but I'd rather not.

I think the USART in SPI mode (or SPI hardware itself) generating the pixel data is going to be problematic if I can't control when it starts or know when it stops.

One idea I had to get around that was to do the colour mapping internally and generate R, G and B pixel data, using three USART ports. In the capture below, yellow is the 4MHz dot clock used to clock these pixels out, and red, green and blue are their respective colour signals.

Here I am sending the character 0x42. Not for the obvious reason, but because it's a good character to test, 0100010, as you can see three times above. I was hoping I would be able to get them to synchronise, but they are not lining up. That will result in a white character with colour ghosting on one side or the other.

I had thought I would be able to send all three signals through a latch gated at the 4MHz dot clock, so the edges would like up (the same technique used on the Minstrel 3 to align the character and invert data). That would work as long they are not more than one pixel out from each other.

But unfortunately not, it is not consistent from line to line, and wobbles about more than one pixel so that's not going to work. You can just about see the visible wobble on a single USART in the black and white image.

I tried lots of different ways to try to get those into sync, but couldn't get it to work. I tried resetting the baud rate counter before sending the bytes to see if that would help. I also tried staging them with a precisely timed delay just less than 1 character before second the second one etc. Nothing seemed to work consistently from line to line.

In the previous posts I had a bit of a go at the autogenerated C code for the AVR microcontrollers from Atmel Start. You may think that was unfounded, or biased, or that I am a stick-in-the-mud-grey-beard-assembler-programmer. It is true that my beard does have grey bits in it (I am quite pleased that it is taking on a very Roger Delgado look at the moment). However, during this development, I caught an example of the sort of thing I was complaining about.

I present the case for the prosecution:

This is from the .lss file generated by the compiler which shows the assembly generated (in black) from the C (in grey). Here I am setting the three data output registers to the value of 0x42. This time it has generated fairly optimal code, it loads 0x42 into a register and then does three STS store instructions to set each of the USART registers.

I added some code above this to set the baud rates, but did not make any changes to the three lines of existing code below. However, it somehow messed up the display.

Looking at the .lss I can see why.

It has replaced the STS instructions with various store instructions relative to X, Y and Z registers. Granted STS is 2 cycles and most of those are single cycle, but with all the setup required, it's using more instructions that the STS would have been. 21 cycles if that was does just using LDI and STS vs 24 for that lot.

It also messed up the relative timing of the point at which the registers were set. Ideally, I would set all three at the same time, but that's not possible as far as I can see, so setting them two cycles apart I can keep track of things.

That wasn't the cause of the USART outputs not being in sync, but when it messed up the code it made it a lot worse, and more difficult to diagnose what was going on because the relative measurements all changed.

This is one of the reasons you can have a bit of code which works fine, you change something seemingly unrelated, rebuild it and it stops working. If you are working with less precise timing, it probably doesn't make any difference, but here just one cycle out can mess up the display. I need to write this stuff in assembler so that once it is correct and working, it will not change.

The prosecution rests.

I found the same degree of variation with the SPI ports. There are only two of them on the AVR Dx series of chips, so I couldn't do the RGB outputs with this chip, but I gave it a try. I couldn't get even two in sync. I also couldn't get the divide by 6 I needed for a 4MHz clock from the 24MHz master clock anyway.

But I could get 4MHz out of the USART ports. When operating in fake Master SPI mode, they generate a clock on the XCK pin. I tried using one USART, and connecting the clock output from the XCK pin to the SCK clock inputs of the two real SPI ports in slave mode.

Blue was the USART, and red and green were the SPI slave ports. This sort of worked, but suffered from the same drift problem as it was still derived from the USART. It also drifted too far later on so that the synchronisation latch wouldn't have worked. I also had problems controlling the levels when not transmitting data (I tried disabling IO, using internal and external pullups and pull downs).

I had better results with an external 4MHz clock, but that would only give me two of the colours, and it was still varying a cycle or two here and there. Running in slave mode also ties up 4 pins per port, as SS is required to enable the output, MISO is the data, SCK is the external clock and although MOSI is not being used, it still has to be set as an input.

So progress of sorts. I have a working composite video generator using just the microcontroller and few external components. All of the synchronisation pulses are being generated by the timers internally. The only code involved is to set this up. After that, they run on their own and the actual code can go away and do other things (i.e. generate the pixel data). That should mean it's all cycle exact and not relying on interrupts (which may occur 1-4 cycles after they are fired depending on how many cycles the current instruction has left, not to mention the function calling overhead)

It looks like I am not going to have sufficient control over when data is clocked out of the USART or SPI ports and that is not being consistent line to line, so I think I am going to have to go back to an external shift register, but it has been interesting to try different methods out, if only to prove that they are not going to be good enough in practice.

That is the last development log for the moment. I think I have gone as far as I can with the microcontroller version to prove it's not going to work unless I go to something like an ARM chip, and then may as well be a Raspberry Pi, and then may as well just run an emulator on there,

I moved on to doing a discrete logic implementation of the VIC, but that was getting a bit silly, almost taking up a 2 pin VIC20 sized PCB full of logic chips.

There are also at least two people working on CPLD VIC chip replacements, so I am not sure if I will need to continue down this road.

To be continued.......


Minstrel 4D

No Mini VIC kits any time soon, but the Minstrel 4D kits are shipping now, you can order one from The Future Was 8 bit - SPECIAL OFFER - £15 off the Minstrel 4D and free shipping to the USA

More info in a previous post:


You can support me via Patreon, and get access to advance previews of posts like this and behind the scenes updates. These are often in more detail than I can fit in here, and some of these posts contain bits from several Patreon posts. This also includes access to my Patreon only Discord server for even more regular updates.