Paging of stack

kelli217 · Post by **kelli217** » Fri Apr 30, 2021 4:02 am

Thought was seriously given to using a processor that supports the 65C02 instruction set but also already has relocatable stack and direct pages. However, there was still a problem; this processor has part of the address bus and data bus sharing the same lines via multiplexing, and the external demux logic was determined to be too much to deal with.

And that processor is the 65C816.

x16tial · Post by **x16tial** » Fri Apr 30, 2021 4:50 am

2 hours ago, Scott Robison said:

They can multitask just as fast as they can do anything else (which is to say, not very fast by modern standards).

Right, which is basically what I was implying. It *can* be done, but... why? Part of the charm (imo) is the single task nature of the machine.

Scott Robison · Post by **Scott Robison** » Fri Apr 30, 2021 5:01 am

10 minutes ago, x16tial said:

Right, which is basically what I was implying. It *can* be done, but... why? Part of the charm (imo) is the single task nature of the machine.

Agreed. I wasn't sure if you meant "physically cannot" or "logically why would you". Sorry to be pedantic. ?

Roman K · Post by **Roman K** » Fri Apr 30, 2021 12:03 pm

LOOOL. I came here today RIGHT AFTER I watched exactly the same video ? With exactly the same question. What a coincidence.

Roman K · Post by **Roman K** » Fri Apr 30, 2021 12:13 pm

It's not only about multitasking in sense of typical multiprocessing OS. What about something like complex video-music handlers? There are portions of code that have to be executed periodically. Like a complex subroutine that can switch the context totally without need to put data in/out of context. Maybe even not the whole zero page but something like block of so called registers - 256 of them multiplied by 16 gives us 4096 bytes of easily accessible memory. And probably the same with stack, 16 instances of 256 byte stack. 8K of RAM in total.

Treat it not as a request but rather as a theoretical question - how would you use such an addition?

StephenHorn · Post by **StephenHorn** » Fri Apr 30, 2021 11:54 pm

11 hours ago, Roman K said:

It's not only about multitasking in sense of typical multiprocessing OS. What about something like complex video-music handlers? There are portions of code that have to be executed periodically. Like a complex subroutine that can switch the context totally without need to put data in/out of context. Maybe even not the whole zero page but something like block of so called registers - 256 of them multiplied by 16 gives us 4096 bytes of easily accessible memory. And probably the same with stack, 16 instances of 256 byte stack. 8K of RAM in total.

Treat it not as a request but rather as a theoretical question - how would you use such an addition?

I mean, we already have banked RAM that can do all of this "putting data in/out of context" thing you're referring to, except in whole blocks of 8K instead of your subset of registers in the ZP. And if I'm working with heavy A/V data, I care about the quantity way more than the exact cycle count of accessing it, because all the cycle count savings in the world won't help if I have to slurp in new data from external storage - we're talking about multiple orders of magnitude difference to copy from memory versus copying from the SD card. And if I'm stuck with a loading screen, I want to be stuck only and exactly once, if possible.

And honestly, if I'm in an A/V situation where I desperately need to maximize the bandwidth to, say, the VERA -- so much so that I can't afford the access penalty of banked himem -- then I'm going to be highly interested in an expansion card that provides hardware DMA. In fact, maybe a card that interfaces with its own SD card and caches the raw binary image of the card (or, ideally, a specified file in a filesystem on the card, maybe through some global launcher that comes with the card, but I'm happy to live within less sophisticated features), and that can then be instructed to use DMA to push selected regions of its local storage to a single address (piping it to the VERA or another device), to an address range (dumping it to RAM), or even to a series of RAM banks. Or hell, even if the card simply provided DMA to copy things between memory ranges and devices on the X16, with no additional memory, that would still be faster than any software copy by an order of magnitude, if it really came down to that.

But then, there's already a lot about the X16 that wouldn't have been remotely possible on the C64, just because the X16 is 8x the clock speed. Pushing it further seems likely to run into more basic problems involving the finite amount of memory on the system, requiring different hardware solutions to expand the available memory so that the increased bandwidth is... well, useful. About the only special purpose where I'd even think I'd want DMA is game programming, to quickly transfer large quantities of assets to the VERA without having to sit on a black screen while 128K or whatever is spooled.

Really, multitasking seems like the only significant purpose served by swapping the ZP and stack, and I have a whole essay I could go into about that (tl;dr, I'm not a fan and think you really want an entirely different CPU family and system architecture if you really want to look into multitasking as anything more than an extremely fragile parlour trick).

Roman K · Post by **Roman K** » Sat May 01, 2021 8:47 am

Thank you for such a detailed answer!

Is it possible to make an expansion card with onboarding DMA controller and current schematics?

ZeroByte · Post by **ZeroByte** » Sat May 01, 2021 2:03 pm

Yes. Lorin Millsap posted a primer for designing such hardware. The expansion slots give access to the full system bus and the RDY, bus enable, and IRQ lines. That’s enough to completely shut down the CPU and drive the system from an expansion port.

I think the most immediate need / benefit for DMA is for PCM audio playback. At 16-bit 48K stereo quality, the DSP burns through a 4K buffer extremely quickly. (if I did my math right, it’s 1/48 of a second), so basically a little more than once per frame, you must blit 4K into the DSP.

BruceMcF · Post by **BruceMcF** » Sat May 01, 2021 2:32 pm

27 minutes ago, ZeroByte said:

Yes. Lorin Millsap posted a primer for designing such hardware. The expansion slots give access to the full system bus and the RDY, bus enable, and IRQ lines. That’s enough to completely shut down the CPU and drive the system from an expansion port.

I think the most immediate need / benefit for DMA is for PCM audio playback. At 16-bit 48K stereo quality, the DSP burns through a 4K buffer extremely quickly. (if I did my math right, it’s 1/48 of a second), so basically a little more than once per frame, you must blit 4K into the DSP.

Yes, if that math is right, you need to pump in 3.2KiB every 1/60th of a second.

Michael Kaiser · Post by **Michael Kaiser** » Mon May 03, 2021 3:21 am

The reason I asked is because the C128 could do it. I had a C128 30+ years ago and did not have the skill at the time to write a task switching kernel. I was hopeful that the X16 would be able to do this and had hoped to write a task switching kernel that could do this. And yes, I saw both GeckOS and Contiki and their ability to switch tasks by subdividing the stack. I just found that limiting. I may still do this by subdividing the stack.