Saturday, 23 December 2023

Commodore Disk Directory Structure

One from the Patreon archive, this from the development of the Penultimate +2 Cartridge file browser.

I am sure any Commodore user is very familiar with typing LOAD "$",8 and then LIST but what exactly is going on when you do that?

I wanted to add a file browser to the Penultimate +2 Cartridge so I looked for more information. I have several very thick and weighty tomes about Commodore disk drives, but none of them seemed to go into any detail about the directory format, so time to do some digging.

I started doing this on the VIC20, since it seemed appropriate, but the way this is handled in Commodore DOS is pretty much unchanged in all the 8 bit machines. I've gone back to the PET because that's where it all started, and it is easier to see what is going on with a 40 column screen.

I'd buy that for a dollar

Back in the late 1970s, Commodore BASIC was on five very expensive ROM chips, so it would not have been an easy option to extend that to a sixth or seventh to add support for DOS. Instead, the first disk drives found ways to work with what was already there. Rather than adding a DIRECTORY command (which finally appeared in BASIC 4), they created the concept of a special file called $.

To get a directory of a disk, you load that special file.

LOAD "$",8

And then you list it.

LIST

And you see a directory of the files on the disk.

But what you are seeing is actually a BASIC program. The drive creates this BASIC program on the fly, with a line for each file.

You can see that if you add your own lines to it and list again.


Back to BASICs

Time for a quick overview of how Commodore BASIC programs are stored.

Let's start with something simple.

10 PRINT "HELLO WORLD"
20 GOTO 10

The PET stores programs in RAM starting at address $0401.

Looking at the memory at $0401, you can see the following structure:

>C:0401  15 04 0a 00  99 20 22 48  45 4c 4c 4f  20 57 4f 52   ..... "HELLO WOR
>C:0411  4c 44 22 00  1e 04 14 00  89 20 31 30  00 00 00 ff   LD"...... 10....
>C:0421  ff ff ff ff  ff ff ff ff  ff ff ff ff  ff ff ff ff   ................

I filled the RAM with FF before typing the program, so you can more easily see where it ends

Each line starts with the address of the next line.

15 04

So the second line starts at $0415 (the 6502 is little-endian).

Next is the line number, in this case 10 ($0A in hex).

0a 00

Then the code starts. Here 99 is the token for PRINT, and 20 is the space after it. (hint, get rid of the spaces if you want your programs to be smaller and faster)

99 20

Then comes the string "HELLO WORLD" in quotes.

22 48 45 4c 4c 4f 20 57 4f 52 4c 44 22

Finally a null ends the line.

00

Line 20 is a similar format, address of the next line, line number, GOTO (code $89), space and the destination line number as a string and finally a null.

1e 04 14 00 89 20 31 30 00

The address of the next line is $041E, and at $041E, there are two zeroes, indicating the end of the program.

00 00

And that's it.

Now, let's save that to disk. I formatted a new disk, again there is no FORMAT command (when BASIC 4 arrived, it used INITIALISE). So the BASIC OPEN and CLOSE commands are used, n is initialise, 0 is the drive number within the device (0 or 1 on a dual drive, 0 on single). The name can be followed by a number, which is displayed in the directory listings. I used 10 here. It is used by the drive to work out if the disk has been changed so it can reload it's cache.

open15,8,15,"n0:diskname,10":close15

I can now save the program with the SAVE command with ,8 to indicate device 8, the disk drive.

SAVE "HELLO",8

(you could verify at this point to make sure it has saved, but we have nice reliable storage these days, so no need).

VERIFY "HELLO",8

Now we can get to the actually important bit.

LOAD "$",8

If I LIST now, the hello program has been overwritten, and I get a directory. It is important to remember that LOAD "$",8 will overwrite the program. It was not until BASIC 4 added the DIRECTORY command (and rather redundant identical CATALOG command) that you could do that without overwriting the program. 

(DIRECTORY was a nightmare to support on the SD2PET, it reads a few bytes - not a consistent number - then stops mid flow to update the screen, then reads a few more - again 1, 2 or 3, then interrupts the drive again. Would have made sense to do 32 characters at a time at least, but maybe it was also trying to deal with some of the oddities - see later.)

So if that last been listed, then it must be a program, right?

Let's have a look at $0401.

>C:0401  1f 04 00 00  12 22 54 45  53 54 44 49  53 4b 20 20   ....."TESTDISK  
>C:0411  20 20 20 20  20 20 22 20  31 30 20 32  41 00 3f 04         " 10 2A.?.
>C:0421  01 00 20 20  20 22 48 45  4c 4c 4f 22  20 20 20 20   ..   "HELLO"    
>C:0431  20 20 20 20  20 20 20 20  50 52 47 20  20 00 5d 04           PRG  .].
>C:0441  97 02 42 4c  4f 43 4b 53  20 46 52 45  45 2e 20 20   ..BLOCKS FREE.  
>C:0451  20 20 20 20  20 20 20 20  20 20 20 00  00 00 0d ff              .....
>C:0461  ff ff ff ff  ff ff ff ff  ff ff ff ff  ff ff ff ff   ................

The structure is the same, just a nonsensical program that is only useful for listing.

So the first two bytes tell us that the second line starts at $041F. Then there is the line number, in this case zero ($00 $00). 

1f 04 00 00

Next there is a $12 which is the PETSCII code to switch to inverse text. After that, there is the string which is the disk title, padded with spaces inside the quotes.

12 22 54 45 53 54 44 49 53 4b 20 20 20 20 20 20 20 20 22

The a space, then the disk number as a string (as typed in above, in this case 10).

20 31 30

Then another space and the disk type, which is always $2A (Do you think the Commodore guys were Hitch Hikers Guide to the Galaxy fans?) and finally a null which signals the end of the line.

20 32 41 00

There is no $92 to cancel the inverse characters, but that will be cancelled by the end of line so it is not required.

The second line is the first file, the first two bytes point to the next entry, $043F, and then the line number, 0001 in this case, although it is actually the number of blocks used by the file. Neat huh?

3f 04 01 00

Next comes the filename, padded with some spaces outside the quotes.

20 20 20 22 48 45 4c 4c 4f 22 20 20 20 20 20 20 20 20 20 20 20 20

The space padding continues up to the file type, not in quotes, in this case PRG

50 52 47

Finally a couple of spaces and then a null.

20 20 00

The third line is the "blocks free" information.

5d 04

This one starts as normal, the next line is at $045D, but the line number is actually the number of blocks free, 663 in this case, as $0297 in hex.

97 02

After that is the BLOCKS FREE. text (no quotes)

42 4c 4f 43 4b 53 20 46 52 45 45 2e

And then a load of spaces padding it to the end of the line. A final null, and then the two nulls at start of the next line indicate the end of the program.

20 20 20 20 20 20 20 20 20 20 20 20 20 00  00 00

All the entries are exactly 32 bytes long.

When there are multiple files, there are simply more lines in the middle, and if the files are the same size, the line numbers are duplicated, but that does not matter as it is just used for listing.

Here I went back and using the magic of the screen editor, entered the program again, formatted the disk again (with a different ID) and save the program 4 times.

>C:0401  1f 04 00 00  12 22 54 45  53 54 44 49  53 4b 20 20   ....."TESTDISK  
>C:0411  20 20 20 20  20 20 22 20  32 30 20 32  41 00 3f 04         " 20 2A.?.
>C:0421  01 00 20 20  20 22 48 45  4c 4c 4f 20  31 22 20 20   ..   "HELLO 1"  
>C:0431  20 20 20 20  20 20 20 20  50 52 47 20  20 00 5f 04           PRG  ._.
>C:0441  01 00 20 20  20 22 48 45  4c 4c 4f 20  32 22 20 20   ..   "HELLO 2"  
>C:0451  20 20 20 20  20 20 20 20  50 52 47 20  20 00 7f 04           PRG  ...
>C:0461  01 00 20 20  20 22 48 45  4c 4c 4f 20  33 22 20 20   ..   "HELLO 3"  
>C:0471  20 20 20 20  20 20 20 20  50 52 47 20  20 00 9f 04           PRG  ...
>C:0481  01 00 20 20  20 22 48 45  4c 4c 4f 20  34 22 20 20   ..   "HELLO 4"  
>C:0491  20 20 20 20  20 20 20 20  50 52 47 20  20 00 bd 04           PRG  ...
>C:04a1  94 02 42 4c  4f 43 4b 53  20 46 52 45  45 2e 20 20   ..BLOCKS FREE.  
>C:04b1  20 20 20 20  20 20 20 20  20 20 20 00  00 00 0d ff              .....
>C:04c1  ff ff ff ff  ff ff ff ff  ff ff ff ff  ff ff ff ff   ................

By the way, on the subject of things you can do with the screen editor. Whether by design or accident, the position of the filenames in a directory listing means you can cursor up and have space to type in LOAD before the name, then just cursor along and add ,8 in place of the PRG and load your program. Handy with long filenames, or ones with odd characters that are not easy to type out.


Penultimate Cartridge File Browser

I wanted to integrate the file browser into the existing menu structure. This stile of menu was introduced for the Penultimate +, and has been added to with each release, as the list of games grew. The latest improvement that has made it a lot faster to use was adding SHIFT + letter navigation to jump directly to a page of the games list, starting with that letter.

I could have looked at building in something like the FB program that is used on the SD2IEC, but I thought the difference in appearance was a bit jarring, and it would require a lot of work to deal with the memory management of the loading side.

To make that work with the existing menu structure meant I had to adapt that from using the hard coded games list and fixed menus to being able to work with a dynamically loaded file list from disk. I will not go into the internal workings of the menu, but I do want to look at getting the data to create the directory listings.

Having the games list dynamically loaded did come in very handy for adding the "filter by category" and the source for the random games list.

Based on what I had learned, I was able to implement the file browser in the Penultimate +2 Cartridge.

Once you are thinking about the directory as a program, then all I have to do is load the program into memory. The maximum number of files on a Commodore disk is 144. Each entry is 32 bytes, so the total is about 4.5K, including the extra lines for the disk title and blocks free. That means I couldn't load the listing to $0401 as it was on the PET as that would overwrite the screen when it got to $1000.

The menu program on the cartridge is already taking up three 8K blocks of ROM, and is ever growing with all the new things being added. I was thinking I might expand that to four 8K blocks of ROM, but I will need to keep 8K free to load the directory listing.

I override the load address, and load the data to RAM at $2000, the start of the 8K bank 1.

>C:2000  01 01 00 00  12 22 54 45  53 54 44 49  53 4b 20 20   ....."TESTDISK  
>C:2010  20 20 20 20  20 20 22 20  32 30 20 32  41 00 01 01         " 20 2A...
>C:2020  01 00 20 20  20 22 48 45  4c 4c 4f 20  31 22 20 20   ..   "HELLO 1"  
>C:2030  20 20 20 20  20 20 20 20  50 52 47 20  20 00 01 01           PRG  ...
>C:2040  01 00 20 20  20 22 48 45  4c 4c 4f 20  32 22 20 20   ..   "HELLO 2"  
>C:2050  20 20 20 20  20 20 20 20  50 52 47 20  20 00 01 01           PRG  ...
>C:2060  01 00 20 20  20 22 48 45  4c 4c 4f 20  33 22 20 20   ..   "HELLO 3"  
>C:2070  20 20 20 20  20 20 20 20  50 52 47 20  20 00 01 01           PRG  ...
>C:2080  01 00 20 20  20 22 48 45  4c 4c 4f 20  34 22 20 20   ..   "HELLO 4"  
>C:2090  20 20 20 20  20 20 20 20  50 52 47 20  20 00 01 01           PRG  ...
>C:20a0  94 02 42 4c  4f 43 4b 53  20 46 52 45  45 2e 20 20   ..BLOCKS FREE.  
>C:20b0  20 20 20 20  20 20 20 20  20 20 20 00  00 00 00 00

Looking at the data which is loaded, you can see the addresses for the next lines are not as they were before. The are all $0101. This is because the BASIC LOAD command relinks the program lines depending on where they are loaded in memory. The disk drive doesn't know what machine you have. Back in 1978, they could not even dream that the VIC20 would follow a couple of years later and have three different load addresses based on the amount of expansion RAM installed.

So going back to the VIC20, if I load the same "$" file from the same drive, it lists the same, but when you look at the RAM, you can see the line addresses have all been changed.

>C:1000  00 1f 10 00  00 12 22 54  45 53 54 44  49 53 4b 20   ......"TESTDISK 
>C:1010  20 20 20 20  20 20 20 22  20 32 30 20  32 41 00 3f          " 20 2A.?
>C:1020  10 01 00 20  20 20 22 48  45 4c 4c 4f  20 31 22 20   ...   "HELLO 1" 
>C:1030  20 20 20 20  20 20 20 20  20 50 52 47  20 20 00 5f            PRG  ._
>C:1040  10 01 00 20  20 20 22 48  45 4c 4c 4f  20 32 22 20   ...   "HELLO 2" 
>C:1050  20 20 20 20  20 20 20 20  20 50 52 47  20 20 00 7f            PRG  ..
>C:1060  10 01 00 20  20 20 22 48  45 4c 4c 4f  20 33 22 20   ...   "HELLO 3" 
>C:1070  20 20 20 20  20 20 20 20  20 50 52 47  20 20 00 9f            PRG  ..
>C:1080  10 01 00 20  20 20 22 48  45 4c 4c 4f  20 34 22 20   ...   "HELLO 4" 
>C:1090  20 20 20 20  20 20 20 20  20 50 52 47  20 20 00 bd            PRG  ..
>C:10a0  10 94 02 42  4c 4f 43 4b  53 20 46 52  45 45 2e 20   ...BLOCKS FREE. 
>C:10b0  20 20 20 20  20 20 20 20  20 20 20 20  00 00 00 00               ....

It all looks fine, to the naked eye, but it don't really happen that way at all.

That all looks good, but I have found quite a few oddities. Consider this, the 1540/1541 demo disk.

Listing looks good doesn't it.

But looking at the program in memory it is not consistent. Some lines have two spaces before the quotes, some three. 

>C:0401  1f 04 00 00  12 22 31 35  34 30 54 45  53 54 2f 44   ....."1540TEST/D
>C:0411  45 4d 4f 20  20 20 22 20  5a 5a 20 32  41 00 3f 04   EMO   " ZZ 2A.?.
>C:0421  04 00 20 20  20 22 44 49  52 22 20 20  20 20 20 20   ..   "DIR"      
>C:0431  20 20 20 20  20 20 20 20  50 52 47 20  20 00 5f 04           PRG  ._.
>C:0441  06 00 20 20  20 22 56 49  45 57 20 42  41 4d 22 20   ..   "VIEW BAM" 
>C:0451  20 20 20 20  20 20 20 20  50 52 47 20  20 00 7f 04           PRG  ...
>C:0461  0e 00 20 20  22 44 49 53  50 4c 41 59  20 54 26 53   ..  "DISPLAY T&S
>C:0471  22 20 20 20  20 20 20 50  52 47 20 20  20 00 9f 04   "      PRG   ...
>C:0481  04 00 20 20  20 22 43 48  45 43 4b 20  44 49 53 4b   ..   "CHECK DISK
>C:0491  22 20 20 20  20 20 20 20  50 52 47 20  20 00 bf 04   "       PRG  ...
>C:04a1  09 00 20 20  20 22 50 45  52 46 4f 52  4d 41 4e 43   ..   "PERFORMANC
>C:04b1  45 20 54 45  53 54 22 20  50 52 47 20  20 00 df 04   E TEST" PRG  ...
>C:04c1  05 00 20 20  20 22 53 45  51 55 45 4e  54 49 41 4c   ..   "SEQUENTIAL
>C:04d1  20 46 49 4c  45 22 20 20  50 52 47 20  20 00 ff 04    FILE"  PRG  ...
>C:04e1  0d 00 20 20  22 52 41 4e  44 4f 4d 20  46 49 4c 45   ..  "RANDOM FILE
>C:04f1  22 20 20 20  20 20 20 50  52 47 20 20  20 00 1d 05   "      PRG   ...
>C:0501  61 02 42 4c  4f 43 4b 53  20 46 52 45  45 2e 20 20   a.BLOCKS FREE.  

It took me quite a while to realise this was to make all the quotes line up. The line number is the number of blocks, which can be 1, 2 or 3 digits long, so the number of spaces after the line number is reduced by 1 for >9 and by two for >99. Seems perfectly logical once I worked it out, but before that it just looked like it randomly changed the padding.

It does make it quite a challenge to parse that to extract the data because you can't just read filename, skip 32 bytes, read next filename etc.

But, I got there in the end. I spent quite a while searching out various esoteric disk images to make sure all would be supported, and with the exception of some animated ones with trick graphics, all work well. Up to, and including the maximum 144 files.


What about directories?

I couldn't find any way to create a directory within a D64 image, I do not think it was ever supported, however, it is supported by things like the SD2IEC disk drive.

However, this is no support for it in the LOAD "$" directories. What you see if the folder name instead of the disk name, and the list of files. No ".." for the root directory as you might expect.

I have to manually add the .. to the list of files, as well as the back arrow to go back to the menu.

I have switched to photos from a real VIC20 (in this case an NTSC one) to show the folder structure, since I cannot get a disk image with subdirectories on the Vice emulator I have been for the other screenshots in this post.

There was a bug in the original version where the extra line needed for the .. was not taken into account when moving to the second screen, so in the case of A-Z folders in a subdirectory, the Q would be skipped. This is fixed in later versions.

Addendum

Just to add to this, the version of BASIC 4 for the VIC20 includes the DIRECTORY and CATALOG commands, and they appear to be better formatted for the available screen width.

You can type out DIRECTORY or CATALOG in full, but I find the shortcuts easier, I normally use C shift A.


Advertisements

Penultimate +2 Cartridge

The Penultimate +2 Cartridge, with built in file browser and so much more are in stock at The Future Was 8 bit:

More info in a previous post:

http://blog.tynemouthsoftware.co.uk/2023/06/penultimate-plus-2-cartridge.html

See also a great video from Robin, 8 bit show and tell:


Patreon

You can support me via Patreon, and get access to advance previews of posts like this and behind the scenes updates. These are often in more detail than I can fit in here. This also includes access to my Patreon only Discord server for even more regular updates.

https://www.patreon.com/tynemouthsoftware