Hardware Expansion and Driver Handling

BruceMcF · Post by **BruceMcF** » Tue Jan 12, 2021 7:02 pm

3 hours ago, TomXP411 said:

This is why I wanted to see a second slot for user ROMs. David explicitly said, at one point, that he does not want to install an In System Programmer for the KERNAL ROM, so if we want to re-flash the KERNAL, we will have to remove the chip, flash it with a programmer, and re-install it.

I had not seen that, but given "work", the curse of the programming hobbyist, I certainly have not always been able to keep up.

And I have already seen how cheap some of those are when looking for ones that would handle Microchip SLDs (which are not the cheap ones), so it's surely nowhere near as expensive as it used to be back in the days of actual EPROMs to get a FlashEPROM programmer.

But it's six of one, half a dozen of the other ... I surely would not be wanting to insert and remove a User ROM every time I wanted to use a card, so if there was more than one set of ROM images to write in the FlashROM in the User ROM socket, it would still be a case of putting the FlashROM into a programmer to build it's set of ROM slots.

The only alternative that avoids that is the ROM on the card, which requires the Kernal ROM to not be selected when the high bit of the ROM bank is set ... but as mentioned, a block pin header to allow the rest of the bank bits to come out on a ribbon cable would probably be cheaper than a ZIF socket.

lamb-duh · Post by **lamb-duh** » Wed Jan 13, 2021 2:20 am

I've been following along with the thread, but I'm still a bit confused about a few different things,

1. what does it mean that driver software will need to be relocatable-- does this mean code that does not use any absolute addressing, or is there going to be some kind of loader that supports relocation in some way?

2. if I'm writing a driver for a serial port, I would be writing new routines for chrin, chrout, &c, right? so that means that I replace each routine with a new one that will check if this call is directed at this piece of hardware, and process it if it is, or else call the original routine (I guess you would do that based on the device id in this case?)

3. what does non-driver software have to do to not interfere drivers? are drivers loaded in ram somewhere already marked for kernel use?

4. given that (unless I'm completely wrong on #2) every driver that is loaded will slow down certain system calls, do we really want to autoload drivers when the system comes on? I think in most cases it would make more sense to write startup scripts for software that needs certain drivers that load the needed drivers and start the program. but on the other hand, there are some drivers you might want to always load immediately, and having the option isn't going to hurt anything.

BruceMcF · Post by **BruceMcF** » Wed Jan 13, 2021 11:10 am

8 hours ago, lamb-duh said:

I've been following along with the thread, but I'm still a bit confused about a few different things,

1. what does it mean that driver software will need to be relocatable-- does this mean code that does not use any absolute addressing, or is there going to be some kind of loader that supports relocation in some way?

2. if I'm writing a driver for a serial port, I would be writing new routines for chrin, chrout, &c, right? so that means that I replace each routine with a new one that will check if this call is directed at this piece of hardware, and process it if it is, or else call the original routine (I guess you would do that based on the device id in this case?)

3. what does non-driver software have to do to not interfere drivers? are drivers loaded in ram somewhere already marked for kernel use?

4. given that (unless I'm completely wrong on #2) every driver that is loaded will slow down certain system calls, do we really want to autoload drivers when the system comes on? I think in most cases it would make more sense to write startup scripts for software that needs certain drivers that load the needed drivers and start the program. but on the other hand, there are some drivers you might want to always load immediately, and having the option isn't going to hurt anything.

1. Relocatable in some way. If the driver has it's own loader program, it is up to each driver how it relocates. But whether the driver has it's "check and call" routine in Golden RAM, or by pulling down the top of Basic RAM, that code will need to adapt to how much Golden RAM is alread in use or where the top of Basic is located. And if the Driver routines are stored in a free High RAM segment, it will have to store the bank number somewhere in its "check and call" routines.

Note that the check and call routine can be LARGELY relocatable, since each check and call routine will be relatively short. You might have only the call to the prior routine in the chain and the far call to the the driver routine in high RAM to patch.

2. Exactly. The loader program will store the vector that is already there, and the "check and call" routine will call the prior vector if the check shows it is not used.

3. 3a: nothing other than respect RAM allocation. Grab low RAM from the top of available Basic RAM. If Golden RAM has some allocation mechanism (standard or de facto), respect it. If a High RAM segment is marked as in use, don't use it. 3b: No, in the original mechanism there is no marking of what is used for an OPEN/CLOSE/CHRIN/CHROUT driver, and there hasn't been any promise of adding something like that. To be clear, I don't know if a High RAM segment BAM is promised, but it's been mentioned as a possibility and since it is only 32 bytes, I am hoping it will be included. At present the only memory allocation information is the top of RAM available for a Basic program.

When using this kind of KERNAL routine wedge in the C64, "where to put it" was always an issue. But since there so much less RAM available for it, loading multiple KERNAL routine wedges was rarely contemplated and the KERNAL wedge technique not used very often: most wedges used the character input routine in Basic or the Syntax Error vector. KERNAL wedges most often used Golden RAM at $C000-$CFFF, and if you used it, it killed any soft loaded command line wedge. So unless you had a fast loader cart with your Basic command line wedge in ROM, the KERNAL routine wedge wasn't very useful.

4. There is that tradeoff. OPEN will have to always have to be live, but if the driver handles one open channel, the other routines can pass through when the device is not open, which is something the CLOSE routine can handle.

Since the chain is literally each driver storing the address of the vector address when it is loading, you can't close by extracting yourself from the chain. But you easily reduce your overhead in the other calls to 5 clocks by having the start of each check and call routine be:

Driver_op: NOP : NOP

Prior_op: JMP RESET ; This is patched to the prior op vector by the loader routine

Driver_op1: ...

... and when the channel is opened, the nop's are replaced by "BRA Driver_op1". When the device is closed, the BRA Driver_op1 is replaced by "NOP : NOP".

That also conserves Low RAM, since most driver API routines will not have to check whether their device is open ... they only execute when their device is open. Only OPEN itself needs to keep track.

This is why it is handy if a new configuration script can be executed at any time, so you can set up the configuration you want to do something, rather than loading up every driver in your SD card "just in case", which could bog down the KERNAL calls.

kktos · Post by **kktos** » Wed Jan 13, 2021 2:57 pm

3 hours ago, BruceMcF said:

2. Exactly. The loader program will store the vector that is already there, and the "check and call" routine will call the prior vector if the check shows it is not used.

From a software architecture point of view, It makes sense only if you want to override the vector.

But if you want to print chars on screen or to a serial link, you want to have both vectors and you want to access them directly.

So instead of having a vector checking I don't know what in order to know if the call was meant for it or another, I would prefer a kinda generic call with a parameter being an output descriptor.

Does it make sense ?

picosecond · Post by **picosecond** » Wed Jan 13, 2021 3:12 pm

On 1/11/2021 at 6:26 PM, TomXP411 said:

The easy way to do this is put a ROM chip on the expansion card and run a jumper wire from the chip's CS pin to the unused bank pin on the motherboard. Bank out the system RAM and bank in the ROM by setting the correct value in $01, and code is running from the expansion ROM.

This is a good idea that can be done natively using the expansion connector, with only minor changes to the current design. First we need to free up some expansion connector pins.

I never understood the peculiar idea to send five IO selects to every expansion connector. It wastes four connector pins and burdens every expansion card with switches to pick one of them. Instead, send a different IO select to each connector. 32 bytes of memory-mapped IO is plenty for most applications. For those that want more we can exploit the unused ROM banks as described next.

Each expansion slot now has four free pins. Use one pin as the slot select, which is a trivial decode of ROM bank register upper bits and the existing ROM address space decode. The remaining three bits can be just the three LSBs of the bank register. This sparse decoding "wastes" lots of ROM banks but they were unused anyway. Now each slot has eight banks = 128KB of address space to use as desired. Nothing says ROMs need to live here of course. It can be used for anything. A dumb frame buffer might be a fun project.

Yes, this idea does require a minor addition to decoding glue logic. I think it's a bargain but this isn't my project.

What to do with the remaining unused IO select? I would use it for a UART but I doubt the team has the stomach for that.

m00dawg · Post by **m00dawg** » Wed Jan 13, 2021 4:00 pm

<slight tangent>

erm actually, I'll post this in another thread. I don't want to derail this convo.

</slight tangent>

BruceMcF · Post by **BruceMcF** » Wed Jan 13, 2021 4:11 pm

46 minutes ago, kktos said:

From a software architecture point of view, It makes sense only if you want to override the vector.

But if you want to print chars on screen or to a serial link, you want to have both vectors and you want to access them directly.

So instead of having a vector checking I don't know what in order to know if the call was meant for it or another, I would prefer a kinda generic call with a parameter being an output descriptor.

Does it make sense ?

The original call does not have to check anything, it just jumps to the Kernel routine to perform that action. And the Kernel does NOT have anything set-up to "install" device drivers for devices it does not handle. #0 is the keyboard, #1 was the datasette in the C64, it is the SD card in the CX16, #2 is the serial interface implemented through bit banging the User Port, #3 is the display, and #4-#30 are IEC devices (though I guess the CX16 Kernel treats #8 as a synonym for #1).

So when it does a OPEN on Device #7, if your device driver is handling the device that is being referred to as #7, it HAS to say, "oh, wait, don't go do the Kernel routine and put that out on the IEC bus, I am going to handle that instead. The Kernel has no mechanism for "installing" your API for device #7. Instead, you (or, in practice, an installation program) preempts that call, and jumps to it's own OPEN routine when the current DEVICE# set by SETLFS is #7. But if the DEVICE# registered by SETLFS is not #7, it jumps to the routine that was in the vector when the driver installer was run.

Now if you have #7 is your parallel port slipnet and you have a serial port extension card, one of those two driver installation programs will be running after the other one, but if both of them respect the rudimentary memory allocation facilities, it's fine, since the most recently installed vector will be in place when the second driver installation program runs, so when OPEN is called, first the serial port driver checks if #2 has been registered by SETLFS, if not then it jumps the next in the chain and the parallel port driver checks if #7 has been registered, if not it jumps to the next in the chain and the Kernel OPEN routine executes.

Now, if complex configurations become commonplace, a more efficient dispatcher is a central device manager that loads on top of the coldstart vector, copies the cold start vector, and has dispatch tables and an API for installing a device into the device manager. And if makers of hardware or other programmers write driver install programs that use that 3rd party device manager API, then you have a more efficient system that checks the device number and directly dispatches the correct driver.

But for the case where some particular 3rd party device driver is the only device driver installed, the original Kernel vector system is more efficient and where it's only a couple installed, it's probably not slower than an explicit device driver dispatch system. So I'd expect the original Kernel system to be supported for those who want Basic access to a device, and other systems specialized for more complex configurations to be supported by a smaller set of hardware.

Note that since this is not running on an operating system, there is no necessity to use the Kernel API, so lots of card might ship with just their own API that you can load into High RAM and instructions how to call it's routines. For those, the Kernel API device drivers might be written by a 3rd party.

BruceMcF · Post by **BruceMcF** » Wed Jan 13, 2021 4:48 pm

1 hour ago, picosecond said:

This is a good idea that can be done natively using the expansion connector, with only minor changes to the current design. First we need to free up some expansion connector pins.

I never understood the peculiar idea to send five IO selects to every expansion connector. It wastes four connector pins and burdens every expansion card with switches to pick one of them. Instead, send a different IO select to each connector. 32 bytes of memory-mapped IO is plenty for most applications. For those that want more we can exploit the unused ROM banks as described next.

Each expansion slot now has four free pins. Use one pin as the slot select, which is a trivial decode of ROM bank register upper bits and the existing ROM address space decode. The remaining three bits can be just the three LSBs of the bank register. This sparse decoding "wastes" lots of ROM banks but they were unused anyway. Now each slot has eight banks = 128KB of address space to use as desired. Nothing says ROMs need to live here of course. It can be used for anything. A dumb frame buffer might be a fun project.

Yes, this idea does require a minor addition to decoding glue logic. I think it's a bargain but this isn't my project.

What to do with the remaining unused IO select? I would use it for a UART but I doubt the team has the stomach for that.

I don't think it's a question of stomach, I think its a question of build cost. They were going to go with the WDC UART until they found out there was a bug, then tried to add a UART for the VERA, but then decided the pins were more important for register addressing, then decided to go with bit-banging serial on the VIA based User port. And given the variety of preferences in how to have the UART work, it really is the kind of thing the expansion cards are FOR.

So the unused I/O select is going to continue to go to the expansion cards.

I would LIKE the /IOSelect direct 4-7 to slots 1-4 and /IOSelect 3 decode A3 and A4 to distribute an additional 8 bytes to slots 1-4, so each I/O space has a distinct slot it connects to ... which would free up three current /IOSEL pins for other uses ... but I expect the design team is going to go for the approach that makes it easier to have a single slot on the small form factor card that can still support multiple slots through a riser board.

kktos · Post by **kktos** » Wed Jan 13, 2021 7:25 pm

3 hours ago, BruceMcF said:

a more efficient dispatcher is a central device manager that loads on top of the coldstart vector, copies the cold start vector, and has dispatch tables and an API for installing a device into the device manager. And if makers of hardware or other programmers write driver install programs that use that 3rd party device manager API, then you have a more efficient system that checks the device number and directly dispatches the correct driver.

precisely what I was thinking ?

I do love this idea. Clean and Neat.

Either for each command a vector table for each devices.

Or the other way round for compactness:

For each "allocated" device, its vectors table.

Ok, as you said, no OS..... but that does mean we cannot do the things properly ? I know this is retrocomputing... but we're grown up now ;oD

I know how my 16 years old self would have done it on his //c ...... I'd rather try differently now :oD

BruceMcF · Post by **BruceMcF** » Wed Jan 13, 2021 7:39 pm

The thing about being able to work on the bare metal is there is no obstacle to a community created dispatcher. The vector tab!e is entire?y agnostic as to what is getting executed, as long as the calls that do not have a replacement routine are directed to the original Kernal routine.