Sunday, 28 December 2025

Fitting Perilous Swamp into 8K on a PET 2001

In a previous post, I converted the ZX81 game "Perilous Swamp" into Commodore BASIC to run on the VIC20 and PET (and also zero effort C64 and TED ports of the PET version).

The PET version needs 16K, it won't run with only 8K installed.

The VIC20 version does run on a VIC20 with an 8K expansion, but there is the internal 5K to gives 13K in total and with the BASIC and OS overheads, that leaves 11775 bytes available.

It would be nice if the PET version could be made to run in 8K, so it would run on the PET 2001-8, the original model PET with the built in datasette and chiclet keyboard. (There was in theory a 4K model, but I think they had settled on 8K before production started)

The program file is about 8.5K, so that needs to be reduced in size, but it's not that far off.

The program is BASIC, so there are various things that can be done to reduce the file size.

10 PRINT "SAVING SPACE"

Let's have a look at much simpler programs, and see how they are stored under the hood and what can be done to reduce program size.

I'll start with the traditional example.

10 PRINT "HELLORLD"
20 GOTO 10

On the PET, the program memory starts at $0400 with a fixed $00. After that, each line is in the following format:

  • 1 word - the address of the start of the next line
  • 1 word - the line number
  • 1-73 bytes - the line itself (approximately two screen lines on the PET)
  • 1 byte - $00 to indicate the end of line

Looking at the memory with that program entered:

0400  00 12 04 0a  00 99 20 22  48 45 4c 4c  4f 52 4c 44   ...... "HELLORLD
0410  22 00 1b 04  14 00 89 20  31 30 00 00  00 24 24 24   "...... 10...$$$

Starting with the first line, that splits up as follows:

  • $0412 - address of the next line (the 6502 in the PET is little-endian)
  • $000A - line number 10
  • $99 - this is the BASIC token for PRINT
  • $20 - this is the space after PRINT
  • $22 - this is the first quotes "
  • $48 $45 $4C $4C $4f $52 $4C $44 - this is HELLORLD in PETSCII
  • $22 - this is the second quotes "
  • $00 - this is the end of the line end

There are a few points to make, but I will just go straight to the next line.

  • $041B - address of the next line (or the place where the next line would be if there were one)
  • $0014 - line number 20
  • $89 - this is the BASIC token for GOTO
  • $20 - this is the space after GOTO
  • $31 $30 - this is PETSCII for 10, the line number to go to
  • $00 - this is the end of the line end

Finally, there are two more zeroes, $0000, indicating no more lines.

What can we take from that?

Well, the BASIC keywords are represented by tokens, so you can type ? or PRINT, it doesn't actually matter, it will be stored as the token $99, so no saving in size there. (The same for the others like G shift O for GOTO etc.)

Secondly, the space between the line number and the first token is not stored. It is implicit. Whether you type it or not, it will not be stored, but it will be printed in the listing. So again, no savings there.

All of those are stored the same, and are listed the same and run the same.

0400  00 11 04 0a  00 99 22 48  45 4c 4c 4f  52 4c 44 22   ......"HELLORLD"
0410  00 21 04 14  00 99 22 48  45 4c 4c 4f  52 4c 44 22   .!...."HELLORLD"
0420  00 31 04 1e  00 99 22 48  45 4c 4c 4f  52 4c 44 22   .1...."HELLORLD"
0430  00 41 04 28  00 99 22 48  45 4c 4c 4f  52 4c 44 22   .A.(.."HELLORLD"
0440  00 00 00 24  24 24 24 24  24 24 24 24  24 24 24 24   ...$$$$$$$$$$$$$

I am using "HELLORD" for two reasons, one is it makes nice 16 bytes lines for examples like the above. The other would require you to watch Usagi Electirc's You Tube channel, which I recommend you do if you aren't already glued.

Unlike the space before the token, the spaces after the PRINT token are stored, so that does make a difference to the overall file size.

The extra spaces are stored, so take up extra memory, but they do not make any difference to how the program runs, so are only used to make the listing easier to read. (well, it makes the program slightly slower as it has to skip over the spaces)

If you are short of space, you can get rid of the spaces in the code. It will still run the same, but the file will be smaller, requiring less memory. The only downside is it becomes more difficult to follow.

0400  00 11 04 0a  00 99 22 48  45 4c 4c 4f  52 4c 44 22   ......"HELLORLD"
0410  00 22 04 14  00 99 20 22  48 45 4c 4c  4f 52 4c 44   .".... "HELLORLD
0420  22 00 3e 04  1e 00 99 20  20 20 20 20  20 20 20 20   ".>....         
0430  20 20 20 22  48 45 4c 4c  4f 52 4c 44  22 00 00 00      "HELLORLD"...
0440  24 24 24 24  24 24 24 24  24 24 24 24  24 24 24 24   $$$$$$$$$$$$$$$$

A line like

10 FOR N = 1 TO 10 STEP 2

is easier to read than

10FORN=1TO10STEP2

The compressed version saves seven bytes, and that can make a difference over a large program.

Here I am using PRINT FRE(0) to show the number of bytes free, the second figure being 7 bytes less than the first, at the cost of code readability

$000A PRINT "HELLO FROM LINE 10"

From the breakdown above, you can see the line number is stored in two bytes as a raw number, so line 10 was stored as $000A.

That means you will not make great savings by renumbering your program from 10,20,30 to 1,2,3.

They are all stored in the same amount of space, two bytes for each line number.

0400  00 11 04 01  00 99 22 48  45 4c 4c 4f  52 4c 44 22   ......"HELLORLD"
0410  00 21 04 0a  00 99 22 48  45 4c 4c 4f  52 4c 44 22   .!...."HELLORLD"
0420  00 31 04 64  00 99 22 48  45 4c 4c 4f  52 4c 44 22   .1.d.."HELLORLD"
0430  00 41 04 e8  03 99 22 48  45 4c 4c 4f  52 4c 44 22   .A...."HELLORLD"
0440  00 51 04 10  27 99 22 48  45 4c 4c 4f  52 4c 44 22   .Q..'."HELLORLD"
0450  00 00 00 24  24 24 24 24  24 24 24 24  24 24 24 24   ...$$$$$$$$$$$$$

The lines 1,10,100,1000 and 10000 are stored as $0001, $000A, $0064, $03E8 and $2710 respectively.

You might expect the largest line to be 65535, $FFFF, but apparently it is actually 63999, $F9FF.

There is a small benefit to be gained from using lower line numbers. Remember GOTO 10 was stored as GOTO and "1" and "0", so GOTO 10 is one byte more than GOTO 1.

That is stored as follows

0400  00 12 04 0a  00 99 20 22  48 45 4c 4c  4f 52 4c 44   ...... "HELLORLD
0410  22 00 1b 04  14 00 89 20  31 30 00 00  00 24 24 24   "...... 10...$$$

Where as

Is one byte shorter

0400  00 12 04 01  00 99 20 22  48 45 4c 4c  4f 52 4c 44   ...... "HELLORLD
0410  22 00 1a 04  02 00 89 20  31 00 00 00  24 24 24 24   "...... 1...$$$$

That is another one for size vs readability and maintainability, but you can save a byte or two for each GOTO or GOSUB in the program.

Revisiting the Swamp

The PET version of Perilous Swamp from back in November was 8207 bytes long.

The 8K PET has 7167 bytes free, so I need to save over 1000 bytes to get it to load (running is a different matter - more on that later).

Starting point: file 8207 bytes

1) Remove Spaces

I am using CBM PRG Studio and that has a reformat tool that can remove the spaces.

This reduces readability, but makes the program smaller and slightly faster. I wouldn't do this on the normal versions, but in this case, I think it is worthwhile.

That saves 646 bytes and takes us to 7561 bytes. Still too large.

2) Renumber

This makes a small difference, but reduces readability and maintainability.

That saves only 83 bytes, taking it down to 7478 bytes.

3) Combine Lines

From the previous descriptions, you will see that each line has an overhead of five bytes with a two byte pointer, a two byte line number and a 1 byte terminator.

If I take something like this:

2620 a=x
2630 b=y
2640 x=int(rnd(1)*7)+2
2650 y=int(rnd(1)*7)+2

and combine in to:

2620 a=x:b=y:x=int(rnd(1)*7)+2:y=int(rnd(1)*7)+2

That should save 15 bytes minus the three colons, so 12 bytes in total.

You can only do so much of this, as you need to keep below the line length limit.

This is another of those sacrifices of space over readability

Also some lines are jumped to directly, so can't be split up. I couldn't use the above example as there is a GOTO 2640.

{down}, {down}, Deeper and {down}

There are a few example like that, but far more common are multiple PRINT statements, left over from what were ZX81 SCROLL instructions.

10 print "{clear}"
20 print
30 print
40 print "            =============="
50 print "            perilous swamp"
60 print "            =============="
70 print
80 print
90 print
100 print "         a new adventure game"

One way to reduce that is to combine as above, something like this:

10 print "{clear}":print:print:print "            =============="
50 print "            perilous swamp"
60 print "            ==============":print:print:print
100 print "         a new adventure game"

However, you can take it a step further and replace the blank PRINT lines with {down} characters at the start of the next line.

10 print "{clear}{down}{down}{down}            =============="
50 print "            perilous swamp"
60 print "            =============="
100 print "{down}{down}{down}         a new adventure game"

It is not a clear in terms of alignment of the various lines. In the original version it is easier to align the === lines with the text and check centering, but again, it is readability vs file size.

The {down} characters are just used in the development environment, the code itself contains a single character, the inverse Q (the inverse heart is the clear/home as previously) .

Each of the {down} codes converts to a single byte ($11) in memory.

0400  00 28 04 0a  00 99 20 22  93 11 11 11  20 20 20 20   .(.... "....    
0410  20 20 20 20  20 20 20 20  3d 3d 3d 3d  3d 3d 3d 3d           ========
0420  3d 3d 3d 3d  3d 3d 22 00  4b 04 32 00  99 20 22 20   ======".K.2.. " 
0430  20 20 20 20  20 20 20 20  20 20 20 50  45 52 49 4c              PERIL
0440  4f 55 53 20  53 57 41 4d  50 22 00 6e  04 3c 00 99   OUS SWAMP".n.<..
0450  20 22 20 20  20 20 20 20  20 20 20 20  20 20 3d 3d    "            ==
0460  3d 3d 3d 3d  3d 3d 3d 3d  3d 3d 3d 3d  22 00 97 04   ============"...
0470  64 00 99 20  22 11 11 11  20 20 20 20  20 20 20 20   d.. "...        
0480  20 41 20 4e  45 57 20 41  44 56 45 4e  54 55 52 45    A NEW ADVENTURE

The single character replacing a six byte line means each line replaced with a {down} character saves five bytes. This is 126 bytes just on the title and instruction page.

OK, lots of editing to do.....

I went back to the original PET version from November, 8207 bytes. After some judicious merging of lines and removing PRINT statements in favour of {down} codes, I got that down to 7836 bytes.

With the spaces removed, that went down to 7192 bytes.

That's so close.

Time for a few more combined lines, this time, being less precious about making it readable.

Trying it out

OK, that's got it down to 7128 bytes.

That should fit, right?

Yay! it fits, 40 whole glorious bytes free.

Let's give it a try.

So far, so good.

Oh dear, I guess 40 bytes wasn't enough to run the program in.

Line 430 is where all the arrays are DIMensioned, space is reserved in the heap at the end of RAM, and there is not enough space available there as it grows down and hits the end of the program.

It probably failed on the first statement, DIM A(11,11). Commodore BASIC used zero-based arrays, so that is actually a 12x12 array, 144 elements.

It only needs to be 11x11, the map is 9x9 with a border, but ZX81 BASIC uses one-based arrays, so the code accesses elements from 1 through 11, rather than 0 through 10.

I could rewrite the code to use a(x-1,y-1) type accesses, but all those "-1"s will use up space, make it more difficult to follow and potentially error prone if I miss one, so easier to waste the 23 elements for the unused 0th elements.

I had been thinking in terms of bytes rather than elements, but of course this is a floating point BASIC, so each number is actually 5 bytes.

Wow, I didn't expect that, I was willing to blame the string arrays.

That has thrown me slightly. I did have a plan for the string arrays, which I might still need, but I hadn't counted on losing 720 bytes to the map array.

Each elements only needs to hold one of 5 values (border, empty square, swamp, princess, player), so a single byte would be more than enough. 

Ideally, I would use a 1 byte per element array, but I can't get that. I could use a block of RAM and PEEK and POKE the values in there, or maybe a 144 character string and some access function to pick an element (possibly already converted from number into display form). Or maybe a 12 element array with one row of the map in each element?

All of those require quite a bit of change. I think I can save 432 bytes by making a small change.

If I change

DIM A(12,12)

to

DIM A%(12,12)

That will convert it into an integer array. That is not single byte, it is a 16 bit integer, so two bytes per element, but still better than 5 bytes per element.

I just need to change all 23 references to the A array to be A%(X,Y) etc.

Oh, that was easy.

Those 23 changes added 23 bytes to the file, and it is just small enough to load, but even less space available to run.

What about changing the rest of the variables?

Well, changing to an integer array made sense since it saved 3 bytes per entry in a 144 entry array (432; bytes total) and only added 23 bytes to the file, so a net saving of 409 bytes.

If I change the Q variable to Q%, that would only save 3 bytes, but it is references 18 times, so would add 18 bytes to the file changing all the Q to Q%, a net increase of 15 bytes.

No one reads them anyway

One quick thing I can do to save space is to lose the title screen and instructions.

Not ideal, I would like to keep those. Maybe I can put them back later if there is space.

That takes it down to 6378 bytes,

Before running, there are 790 whole bytes available! That should be enough.

You could fly to the moon on 790 bytes. (incidentally, it is frightening to think that there is only about 10 years between those Apollo missions and the ZX81 and the PET)

Yay!

If I quit out immediately, there are still 156 bytes available.

Success.

Now, I wonder what I would need to do to add the titles back?

How much space do I need?

The file size dropped from 7151 with title screen and instructions to 6378 without, so I need to save 773 further bytes to be able to add the title and instructions back.

Currently it is using 634 bytes (790 free after loading, 156 bytes free after running). I know 288 of those are from the map which has to stay (unless I can think of something else for that), but there could be savings to be made from the rest?

Some of that is from the string arrays.

The ZX81 version stored the list of treasures, monster names and some suitable adjectives for the monsters.

Each time it accessed those, it used string manipulation to get to the appropriate string.

When I concerted the program to Commodore BASIC, the string access was different, I couldn't define the string on a single line, and would have to use MID$ to snip out the section (which would require two calculations as it is string, start, end, rather then string, start, length).

I change those to string arrays and DATA statements into arrays.

That means all of these strings at the end will appear twice in memory when the program is running.

Firstly the strings appear as lines of code, the DATA statements. When the program runs, it will DIMension some arrays, and READ the values from the DATA statements into the arrays.

That eats away at the extra RAM required to run the game.

The V and V$ arrays define the treasure you can find, it's value and description

430 ... : dimv(8) : dimv$(8) : ...
440 for n = 1 to 8 : read v(n), v$(n) : next n

(I have put back in the spaces for clarity, and redacted some irrelevant sections)

The data statements appear near the end of the file (this is the original version, the 8K version currently has them packed onto fewer lines to save space):

3980 data 10, "10 silver spoons"
3990 data 30, "a jewelled sword"
4000 data 50, "a jar of rubies"

etc.

The v and v$ arrays are only used once, here:

1670 print "is guarding "; v$(i)
1680 ...
1690 p=v(i)

If I rewrite that as:

1670 if i=1 then p=10 : print "10 silver spoons"
1671 if i=2 then p=30 : print "a jewelled sword"
1672 if i=3 then p=50 : print "a jar of rubies"

etc.

That will mean those strings only appear once in the file. There is extra overhead with the additional IF statements, but overall it only adds 5 bytes to the program file, and crucially, it will save 86 bytes from the heap.

If I run that now, it still works (which is nice). But ouch, that's an unfortunate map, quite a journey to get to the princess. You can get there, you can move diagonally, there are actually two routes.

So far, I have only seen a few maps that were not at first glance possible, but could still work if something happened to move you around the map...... (spoilers)

Quitting out after the arrays have been setup, shows 237 bytes free, previously it was 156, so that is an 81 byte overall saving.

Not a bad start, I think it is worth doing the other arrays.

Both n$ and p$ and setup in the same way, but both are referenced twice:

1620 print "a "; p$(a1); ", "; p$(a2); " "; n$(i)
...
3550 print "{down}the wizard has his pet "; n$(j)

"A FOUL, SMELLY OGRE" and "THE WIZARD HAS HIS PET 2001" sorry "THE WIZARD HAS HIS PET DRAGON 32" sorry "THE WIZARD HAS HIS PET ORIC". Oh I give in.

Time passes.

Code is edited.

Because they were used twice, I went for subroutines which print the strings then return.

1620 print "a "; : gosub 4000+a1 : print ", "; : gosub 4000+a2 : print " "; : gosub 4010+i

The functions are very simple, again with spaces for clarity, later removed.

4010 print "nothing" : return
4011 print "werewolf" : return
4012 print "bunyip" : return
4013 pat "phoenix" : return

etc.

The file has grown again, but hopefully it will be worth it.

Oh, apparently not.

That's now only showing 227 bytes free, that extra work has made it 10 bytes worse.

Undo Undo Undo Undo.....

So, back to where I was before. The first array changes saved a bit of space so it worth keeping, but there is no benefit in keeping the changes to the other two.

Other ideas?

I could make it a multi-part loader. Display the title and instructions, then load the game itself?

I'd rather not. Let's keep going.

I had several more passes through the code, combining lines and adding {down} and {return} to PRINT lines to allow them to be combined and replaced. (I don't like using {return} as it looks odd in the listing, but it has saved quite a few bytes, so I will accept it here).

Other tweaks like NEXT N doesn't actually need the N, it's just there from force of habit. NEXT will loop the most recent FOR loop. Actually, the N is there because I wrote it, and when I write FOR loops I use N so that NEXT N is two presses of the N key on the ZX81. I've been doing that since about 1982, so I do it automatically, even though I am writing this on a PC to run on a PET.

Here N is used in some of the loops I added, but I could use a different variable, one already defined, and possibly save 5 bytes from the heap (depending how good the garbage collection is).

All the NEXT N => NEXTN => NEXT has saved maybe a dozen bytes, but it all adds up.

And after all of that, I got the file down to 5782 bytes.

Still working, and an even more fiendish (but still possible) path to the princess.

That has 840 bytes free after setting up the map etc. and 791 free after completing the game.

That should be enough to put the titles and instructions back.

6540 with titles, 60 free at first out.

That's a bit tight

OK, I hadn't been doing this to make it easier to check when I made changes, but lets renumber the file from line 1 at step 1.

The renumbered file went down to 6428 bytes, and there were 172 bytes free at the first out.

And a far less swampy route.

There were 130 bytes free after completing the game.

I think I am happy with that, I have succeeded in getting the whole game, titles, instructions and all to run on a base 8K PET.

The 130 bytes free should be a decent safety margin.

The code has changed quite a bit.

340 print
350 print "if you could release her..."
360 print
370 print
380 print "should you have to leave early,"
390 print
400 print "type out to leave - permanently"
410 print
420 clr : restore
430 dim a(11,11) : dim v(8) : dim v$(8) : dim n$(10) : dim p$(8) : dim q$(5)
440 for n = 1 to 8 : read v(n), v$(n) : next n
450 for n = 0 to 10 : read n$(n) : next n
460 for n = 0 to 8 : read p$(n) : next n
470 for n = 0 to 5 : read q$(n) : next n

Became

14print"{down}if you could release her..."
15print"{down}{down}should you have to leave early,"
16print"{down}{down}type out to leave - permanently"
17clr:restore:dima%(11,11):dimn$(10):dimp$(8):dimq$(5):fori=0to10:readn$(i):next
18fori=0to8:readp$(i):next:fori=0to5:readq$(i):next

Not the easiest code to follow, but needs must to fit in 8K. The full version is still there if you want to follow the code.

In summary:

Well, that was a lot of work, but it was an interesting challenge, and I like setting myself little challenges like this. Sometimes it's good to get a win with something like this when other things I am meant to be working on are stuck for whatever reason.

I learned a few things, especially the space used by the map array, really wasn't expecting that little 12x12 map to use almost 3/4K of RAM.

The reduced size version also runs faster, as BASIC has less code to parse, so takes less time.

  • Combining Lines: My original PET version had 493 lines. The 8K version has 193 lines. That's 300 lines removed, saving between 4 and 5 bytes per line, 1200-1500 bytes.
  • Removing Space: That saved about 650 bytes.
  • Changing from floating point to integer array: That saved about 400 bytes.
  • Renumbering starting at 1, step 1: Saved 114 bytes.
  • Converting V and V$ Array to IF statements: 86 bytes.
  • Removing variable from NEXTs: 12 bytes.

Some of the later ones may seem like they weren't worth it, but really every byte counts. There are only 130 bytes free after running the game.

Final Comparisons

Version
File Size
RAM Used
Total
PET 16K/32K
8207
1127
9334
PET 8K
6428
610
7038
Savings
1779
517
2296

Both PET versions, along with the VIC20, C64 and TED versions are available on my github:


Update:

After posting this to Patreon, I had some suggestions of other things I could do to save space, these from Andy Hewco:

Remove quotes in data statements

I can change

191data"nothing","werewolf","bunyip","phoenix","troll","goblin","giant"

to

191datanothing,werewolf,bunyip,phoenix,troll,goblin,giant

That seems to work, and saved 40 bytes.

I had to leave the quotes around the map characters as that seemed to break things.

Remove quotes at the end of a line

If the line ends with a PRINT statement that ends with closing quotes, those can be omitted.

10 PRINT "HELLO WORLD"

Can be changed to:

10 PRINT "HELLO WORLD

And that seems to work.

Changing that saved 46 bytes.

Semicolons are not required for string concatenation

10 A$=" WO"
20 PRINT "HELLO";A$;"RLD"
30 PRINT "HELLO"A$"RLD"

That also seems to work, and saved 20 bytes.

I still need ones at the end of a line where I want to continue printing e.g.

10 PRINT "HELLO ";
20 PRINT "WORLD"

Overall, that saved 107 bytes. Nice.

Thanks Andy.

Thandy.


Update 2 - Bug Fix

Whilst looking at other things, I noticed one bug (shock! horror!)

There is a random event that you get given a ring, which can be marked with three things.

I had noticed I was only ever seeing the first.

It would appear that GOTO line+random number does not work.

181 print"he had a magic ring marked ";:goto182+int(rnd(1)*3)

Actually, GOTO line + number doesn't work either.

this is still showing the result of goto 182, not goto 184.

I changed that to ON random number GOTO line, other line

181 print"he had a magic ring marked ";:onrnd(1)*3goto183,184

I have tested it, and it appears I don't need the INT( ) around the number as it is implicit in the ON instruction.

I only needed 2 cases here as INT(RND(1)*3) returns 0, 1 or 2. The ON statement goes to 1, 2, 3 etc. if the number is 0 or greater than the last number in the list, it drops to the next line, which is the 0th case.

10 on rnd(1)*3 goto 30, 40
20 print "option 0" : return
30 print "option 1" : return
40 print "option 2" : return

Testing that, it looks good

And testing it in the game, I finally saw all three options dished out to the bold traveller.

Jac Goudsmit had suggested the ON x GOTO ... as an option. It didn't work out in the place suggested as the PETSCII line numbers and extra RETURN statements took up more space than the IF statements they relaced, but is ideal here.

The bug fix cost 1 byte, so overall the updates have saved 106 bytes, taking the total saved with the 8K version to 2402 bytes.


I have updated the versions in the github:

I have also uploaded all the versions to my itch.io page.

That shows a mock up tape cover, but maybe if enough of you persuade TFW8b that you would buy one.....


Adverts

My Tindie store is starting to pick up orders again, thank you to those of you who are spending your Christmas money wisely.


Patreon

You can support me via Patreon, and get access to advance previews of development logs on new projects like the Mini PET II and Mini VIC when I get back to them in the new year and other behind the scenes updates. This also includes access to my Patreon only Discord server for even more regular updates.

Sunday, 21 December 2025

Faster isn't always better - A cautionary tale

I have a project on the bench that has been dragging on for a while, and I just want to get it finished.

I can see it being finished in one of two ways.

1) I get it working, and working well and get it into production

2) I conclude to my satisfaction that it's not going to work so I can move on to something else

It is one of those projects that is always going to be doomed to failure if you are not careful -  trying to do something in software which would be better done in hardware.

And yes, the answer is easy, just do it in hardware. 

Well, as usual, there are various problems with that. Modern microcontrollers can do an awful lot and can be of great benefit in terms of cost and board space and simplifying things.

And also the hardware is not necessarily available, some of it went out of production in the early 1980s, some of it only recently, but it doesn't seem like any of it is likely to come back

Until they make a 3D printer which can produce custom silicon chips, it's a software solution.

(I am sure that will happen sometime in the next 5 years, people are already making their own silicon. That is assuming we don't end up in some kind of technological dark age where knowledge is feared and banned by the powers that be ....)

What I need to do

I need a microcontroller to do some fast IO. I need it to detect a signal, read some data and respond to it within a short window.

My microcontroller of choice here is the AVR128DA48 or one of that series. These run at 24 MHz and has mostly single cycle instructions, specifically including single cycle access to the IO ports.

That means I should be able to run up to 24 million instructions per second.

In this application, I am waiting for a pulse from a 1MHz system, and so being 24 times faster, I should have 24 instructions to do it?

Well, not quite.

The pulse is actually only active when the clock is high, only half a cycle, but that's fine, that is still 12 instructions.

I got my code down to 10 instruction cycles and tried it.

It worked most of the time.

Not good enough.

Checking deeper, it turns out I don't have 12 cycles, the pulse is asymmetric, so I only get about 8-10.

The first instructions are along the lines of "is the pin low, no? loop back and test again" which takes up to three of the cycles, depending which cycle the tested pulse falls on, so in some cases, it just misses the change and takes the full 10 cycles, and is a bit late.

Here I am monitoring an LED signal I am using for testing. It is activated last, so if that still falls within the pulse, it should be safe.

But if it comes too late, it is likely the preceding pulse will also have been missed.

I have tried various things to streamline the code, but I think it is about as optimal as it can be.

There are various elements to this:

  1. detect the pulse
  2. work out the response
  3. respond
  4. clear and loop back for the next one.

I think I can save a few cycles on steps 3 and 4 by using some additional hardware, and maybe optimise the processing, but that only gets me close to the 8 cycles I have.

I can't work out the response in advance as it will be different every time.

I would prefer to have a lot more clear space than that. 24 cycles to run 10 instructions would have been reasonable, 8 cycles to run 8 instructions is too tight, specially given likely variation in the host hardware, some machines may be 7, some may be 9 etc.

What about interrupts?

Just as a side note, could I save time on the detection using an interrupt?

Well, no.

It looks like it is going to be about 6 cycles before it even gets to my code, so that's not an option.

An Alternative Option

I put that to the side for a while and got on with some other things, including building a Minstrel ZXpand that had been ordered.

Whilst programming the microcontroller for that, I wonder if the PIC18F4525 used on the ZXpand might be better, that has a couple of features that might help?

I covered my early adventures with PIC microcontrollers in a previous post, I glossed over the point I switched my allegiance from Microchip to Atmel.

(get ready to mark off the first "bloated framework" on your Dave rants Bingo card, possibly the first of many)

I had been working on a project and was getting frustrated by the PIC development environment, generating too much code, and code which didn't build or didn't work.

Whilst I had put that to one side, this thing called an Arduino started to appear, and that used the Atmel ATmega328, and I started looking into that and was impressed.

Since then I have been sticking with the Atmel family (which was going great until Microchip bought them out, but that's a rant for another day).

PIC a number

I wonder what's been going on with the PICs, maybe there is something in the PIC world that would help?

The chip in the ZXpand only works up to 20MHz, oh well, maybe not.

But then I noticed some other chips which said they would run up to 64MHz.

Now you're talking, 64MHz internal oscillator, that would mean I have 32 instructions for the half cycle, with the asymmetry, maybe 24, and my code is 10, so that would be a a good clearance.

We're back in business.

Oh great, there are dozens of part numbers and no clear indication of what the differences are.

I mean who put the marketing department in charge of a technical website. Who is that sort of table aimed at? it's just nonsense. Can Microchip please find someone with a grey beard and let them write a useful version of that.

OK, it's getting late in the day, but if I order a few of these now I will get them in the morning. So I scatter bought a selection of the Curiosity Nano boards, which are development boards with integrated programmers, which means I don't need to bother messing about with PICKits or Snaps etc.

Testing Methodology

You get an ology and you're a scientist.

The most important thing here is the IO operations, so to test things, I wrote a little bit of code which toggles an IO pin as fast as it can.

Loops can throw things out, so I have been doing eight instructions that turn the output pin on and then off alternately.

I have written versions in assembler with a register holding the value to write, but here the compiler optimised that to some single cycle instructions, so I am happy enough to stick with C for now.

I am not concerned about the overhead at the start or the loop, as I can just measure the speed of the 4 pulses. It gives 4 pulses at top speed, then a gap, then another 4.

Out of the box, the AVRxDx chips run at 4MHz, so when I run the code, it turns the output on and then off 4 times in 8 cycles, so I see a run of pulses at 2MHz.

With a bit of extra code to change the clock, it can get 24MHz.

The clock settings are locked, so you need to set a register to enable access for a few cycles, so that bit has to be in assembler.

(you also need to set RAMPZ if using the 128K chip, but I am just testing here)

Now I get a train of pulses at 12MHz, which is as fast as I can change them with a 24MHz clock.

If these new chips run at 64MHz, that should mean similar test code would generate a stream of pulses at 32MHz.

Now I just need to wait for the postman......

Lots of red boxes, I tried to cover all the eventualities.

These are all Curiosity Nano boards, quite neat development boards with a built in programmer debugger and pins for all the signals, and available with lots of different microcontrollers on them.

The PIC18F57Q43 seemed to do everything I would need.

I will spare you all the faffing around with different system like Atmel Start and Microchip MCC, Microchip Studio and MPLAB X, automatically generated code and hand turned from a blank page. Suffice to say I tried every combination I could think of, and the best I could get was 8MHz.

I was using the simplest code with the same sort of loop as before.

Checking the listing file, it is indeed using the BCF and BSF instructions that I would like to use.

And double checking, those instructions are 1 cycle instructions, highlighted below.

It is supposed to be running at 64MHz, but the pulses are only showing up at 8MHz, which if they are "single cycle", as it appears from the listing file they are, then the chip is running at only 16MHz?

Looking around it finally dawned on me, on the PIC, clock cycles and instruction cycles are not the same.

They really should have had that in big flashing lights at the start.

Each instruction cycle is actually 4 clock cycles, so 64MHz is 16MIPS.

On the AVR they are the same, 1 instruction cycle is 1 clock cycle, so 24MHz is 24MIPS.

That means these PICs are slower than the AVR.

Well, that was a bit of a waste of time.

I had started thinking about a couple of projects I could revisit with the extra speed, and the better peripherals the PIC chips have over the AVR, but sadly they just aren't going to be fast enough.

They might come in handy one day on something that doesn't need as much speed, but for this application they aren't going to help.

But wait, what's this other red box I bought?

I don't really want to move to an ARM chip, I would like to stay 8-bit if I can, but there are some ATSAMC microcontrollers which might do the job. These run at 48MHz (it says there is a 64MHz option, but that seems to be a different part number with a -64 on the end), but either way, that's got to be better than 24MHz or 16MHz.......

As part of the scatter buying exercise, I had picked up the ATSAMD21 Curiosity Nano. That's not ideal, I would prefer the ATSAMC21 as that is 5V, and the ATSAMD21 is 3.3V, but I thought I could at least get a proof of principle test working and look at other options later.

Credit here for this very useful table in the datasheet which explains the major differences between versions.

Why can't they do that for the other series of chips?

This is what they show for the AVR series. Same nonsense as for the PICs.

More expletives and trying different development environments and toolchains etc. Endless updates and completely reinstalling one of the development studios which broke itself doing even more updates.

5.86MHz is the best I could get out of it. How did it even get there? That isn't an integer multiple?

I think that is running at 48MHz, but it is using internal PLLs generated from 32.768KHz, and I am not quite sure I got the configuration right. How did it get down to 5.88MHz? Confused.

It does have an 8MHz fixed oscillator. I know it's not using that as the best I could hope to get from that would be 4MHz toggling if it were one clock cycle per instruction (which I doubt it is).

I remembered I had looked at the ATSAMC series before, I couldn't find any of the boards I had, but I did find a sole IC.

(how I found one single IC in my house is quite a miracle, finding the PCBs as well would be too much to ask).

The ATSAMC series have two advantages over the ATSAMD as far as I can see. They run at 5V, and they have an internal 48MHz oscillator as well as the 8MHz one. Maybe I had the configuration of the PLL wrong? With a fixed 48MHz oscillation there should be no problem getting full speed.

Can I just swap the ATSAMD on the Curiosity Nano board?

Is that chip smaller or further away?

Ah no then. Although both are available in 48 pin and 64 pin versions (and many other variations), the board is 48 pin and the chip is 64 pin. Of course they are different. I wouldn't be that lucky.

OK, time to see if the jungle website can rush me a breakout board I can use for a 64 pin 0.5mm TQFP.

Anyway, at 1AM I ordered an overpriced breakout board from Mr Bezos, which he promised to bring to my door in the morning.

(how is that "compatible with Arduino", in what conceivable way would you use that with an Arduino? Oh never mind. )

Seems to fit, it's the right pitch, even if the pin numbering and pad ordering is completely wrong as the board is designed for TQFP 100.

Add sockets for all 64 pins and a few decoupling capacitors (and the obligatory power LED).

I just need a programmer.

Luckily, the Curiosity Nano boards can do that, if you cut some of the traces on the back to disconnect the onboard programmer from the original microcontroller..

Not the neatest, but it should work. (It's JTAG, so more wires than the PIC or AVR)

Again, lots of messing around with code generation and bloated frameworks etc.

I also had to reconfigure the Curiosity Nano onboard programmer to suit the different part. I haven't needed to do that on the AVR versions, I built a programmer using a Curiosity Nano that I have used for a variety of parts.

The ATSAM one wasn't having any of it, so I had to change it. I found pyedebuggerconfig which allowed me to to change the target chip type (and host device voltage limit) and I was away, it was talking to the chip.

Great.

Now for the code.

Well, that's helpful. Thank you Atmel Start.

I managed to get some code built in MPLab X, that also required a bit of fiddling to get it to switch to the 48MHz oscillator, but I got there, and here is the fastest I managed to get it to twiddle IO pins.....

8MHz.

Bugger.

I don't know much about ARM assembly, but the compiler got it down to a single instruction for each bit toggle.

I can only assume that instruction takes six clock cycles, and I couldn't find any obvious alternatives that would be faster.

So after all that, it is also slower than the chip I started with.

(even the 64MHz model would end up 10 and a bit MHz, still slower than the AVRs)

Any other options?

I have previously looked at some different ARM chips, including the STM32F405. Those run at 168MHz, so should be fast enough. However, that is a whole new framework and toolchain to learn from a different manufacturer and probably a step too far.

Conclusion

None of these chips are fast enough to do what I need to do in the time available.

A frustrating, but necessary exercise, due diligence.

This project, in this form, is not viable with the parts I have looked at.

I think I have to give up on this project.

Unless something changes or a new part comes along that will do the job (or I find there is one that I have somehow overlooked).

But, on the positive side, I can clear my bench and start on something new in the morning.

Post Script

Whilst putting this post together, I had a look for photos of the board I first tried the SAMC20 with.

I also found I had edited some of the photos, so I must have written a blog post about it, but this was in the dark ages of 2021 when the public blog was offline and I was writing only for Patreon.

Thank you to the Patrons that stayed with me through that and kept me (just about) sane.

I think this was the main photo for that article - I don't think it went very well.

In that post, I ended with a photo of a note I left myself.

If only 

1) I had remembered that 

or 

2) I had found the box.

Oh well.

Those who do not learn from history are doomed to repeat it.....

Post-Post Script

Since posting this on Patreon last month, I have managed to get the project working, not the ideal version I was hoping for, but still quite nice. It's a bit messy, hopefully the production version will be a bit neater. (ignore the board, that was just an old board used as a base to hold it all together.)


Adverts

My Tindie store is still open, the Christmas deadline is passed for everywhere other than the UK (if you are very quick).

Although that doesn't seem to be a problem the way things have been recently.


Patreon

You can support me via Patreon, and get access to advance previews of development logs on new projects like the Mini PET II and Mini VIC and other behind the scenes updates. This also includes access to my Patreon only Discord server for even more regular updates.