Sunday 28 January 2018

Commodore PET 8032 Repairs - Killer RAM Chips!

This is an old post, preserved for reference.
The products and services mentioned within are no longer available.

Today, a couple of PET 8032 boards in for repair. The first is one of those where I get worried as soon as I see the board.
This one has had a lot of work done to it already. That's usually not a good sign. Many chips have been replaced, others show signs of corrosion and water damage. This one's had a hard life.
The ROMs have all been socketed and some replaced, as has the video RAM and most of the 244 buffers.
This one's not booting, black screen, no sync, so the 6545 hasn't been initialised. You rarely see the garbage screen on these later PETs as the 6545 generates the video sync, so it only works when the PET has started to boot. Early PETs had a series of counters fed from the system clock, so they would generate a video output with no ROM or RAM and even with no CPU.
It looks like this has already been replaced (by a 6845 which is backwardly compatible), as has the 6502, a Rockwell from 1980 rather than a MOS from 1982 as it would have been. Over to the PET diagnostics, and yes we are getting video out, so the 6545 is OK.
Oh dear, that doesn't look good. It is showing 12 of the 16 RAM chips have failed, one of the video RAM is bad, and only two of the ROM chips are reading correctly. Since some of the RAM is reading correctly, it's probably not going to be the RAM address buffering or refresh counters. Power supply is often the cause of such a wide ranging failure, the -5V and 12V rail both only being required by the RAM. The supply rails were all reading correctly, but maybe a previous power problem had taken out the RAM. There are also signs of corrosion, had this thing been running whilst submerged?
This era of boards had a very annoying construction technique, all the chip legs seem to be cut short and bent over. This makes desoldering them a real pain, and so although removing and replacing all the RAM would probably be the best option, it would be rather expensive in terms of time as well as the limited availability of new-old-stock 4116 chips.
A ROM/RAM board seemed an easier option at this point, and it did resolve the issues of both the faulty RAM and the partly faulty ROMs. The only thing it can't address is the video RAM, but that just needed a replacement 2114 RAM and that fixed the issue.
A quick test and the keyboard was working, and it was loading from tape. I try not to put things back in the box as soon as I see signs of life, as it's not always that simple. And indeed in this case, it wasn't. After a while it appeared to lock up, and when reset, it started showing random keypresses from the keyboard. I checked nothing was leaning on the keys, and even unplugged the keyboard. Hmm, looks like the 6520 has failed.
I replaced that, and it was working again, keyboard reading correctly. I started further testing, the IEEE-488 port wasn't working, showing errors on both data (the rear 6520) and NDAC and ATN (the 6522). Then it locked up again. And this time when I reset it, the beep noise was wrong. Oh great, the 6522 has gone as well. I removed and socketed both of those. The beep was now right, but the keyboard was messed up again, so I fitted second new keyboard 6520.
Once again, the keyboard was back to working, and the IEEE-488 was almost there, still a fault on NDAC, which looked to be more likely a 3446 buffer. Before I got around to fixing that. It locked up and the keyboard went funny again. OK, at this point, all three 6520/6522 chips had been replaced, and were working, but I was reluctant to fit another. Time for a different approach. It seems something on the databus must have a faulty enable line and is enabling itself, or is being enabled at the wrong time, and is causing a bus conflict, as this thing appears to be killing chips.
One of the likely causes of things being enabled at the wrong time is the 74154 which provides 16 enable lines, one for each block of 4K in the system, but I see that has already been replaced. All the ROMs, and the 244 buffers which provide the buffered databus for the video circuitry had already been removed, socketed and replaced, so I removed those, and also took out all the 40 pin chips.
I was going to start testing with a NOP generator, but first I thought I'd check that there was nothing untoward with the clock and control signals and nothing pulling the busses high or low.
Then I looked at the D6 line. Hmm, that's not right. It was a regular pulse permanently on the D6 line at about 15.6KHz (1MHz/64 ?). All the others were floating as expected, but this was being actively driven. The only thing left connected to D6 at this point was the RAM. Even with no CPU, that was being refreshed, and although I couldn't see any of the address lines at that sort of speed, they were all faster as part of the refresh cycles. I did find a 15.625KHz pulse on one of the outputs of the refresh address counter, so it was presumably a multiple of the update rate, pulling high for 63 of every 64 refresh cycles or something like that. So it looks like one of the RAM chips was indeed faulty and without being asked was constantly writing to the databus, fighting with anything else trying to drive the bus. This being a CMOS device, it would have been a FET pulling it high to 5V, not just a weak resistor like a TTL output, so this thing could have been responsible for damaging half the chips on the board.
Removing the two 4116 RAM chips attached to the D6 line, the 15KHz signal had gone and all looked well. I put the new chips back in place and retested. I had replaced the 3446 buffer, and now the IEEE-488 was working as well. I went back and retried all the previous 6520 and 6522 chips, and unfortunately, all apart from one 6520 were showing faults now. Whilst the system hadn't been booting, they hadn't been writing to the bus, but as soon as it booted and they started writing to the databus and fighting with the RAM chip, their outputs drivers must have been damaged.
The board has now been running all morning, no signs of any further problems, I think the culprits have been found.

Board 2

The second one also showed signs of previous work, although I was less worried on this occasion, as I had done the work. This was from an 8096, but is the same board as the 8032, the 64K RAM board having been removed.
This board had previously had a couple of unusual RAM faults, including some unusual ones which led to me writing more and more RAM testing routines until I traced that down.
Now it was showing various errors which pointed to the 6520 or 6522, including sometimes not booting, some random keyboard activity, and jumping to the machine code monitor. Could it be that, or given this machines history, could it be RAM again?
The chips weren't socketed, so I didn't want to go straight for changing them. I did some testing with PET diagnostics again (this time on the LCD version as my screenshots from the previous board were so awful). Some of the time it was reading ok, but I was seeing some occasional errors. One bit in the lower bank of RAM would occasionally fail on power on, but was otherwise fine. The other occasionally failed after running for a while.
Two intermittent RAM chips. Possibly doing the same thing as on the previous board, but not easy to see here as I couldn't remove the other chips to check.
It was clear the RAM was faulty, so I removed the two fault chips and fitted some new (well, new old stock) 4116s chips. It's a judgement call at this point, 7 of 16 RAM chips have now failed, is it time to remove them all and replace the full set? or remove them and fit a ROM/RAM board? For the moment, I replaced the new newly faulty chips. With those replaced, the symptoms all seem to have gone, a long soak test with the 8296 diagnostics program showed no more errors.