Page 2 of 2

How to implement slow chips on expansion cards

Posted: Thu Jul 09, 2020 11:18 am
by Lorin Millsap
2MHz should be reliable enough. C128s were sold with both models of SID chip, and those were definitely used at 2MHz. 
It sounds to me like a first task for third party developers would be to build an 8:1 clock divider, along with some sort of buffer to read from the CPU bus and stretch that cycle on the expansion card side. If we're only talking about writing data (to a SID or VIC), then that should actually work fairly well. I'm just not sure what kind of hardware that would require - I expect a CLPD would be the device of choice, with address and data latches. 

I think I even have a process to make it happen, but I know nothing about programming CLPDs. 
 

Well if you are using logic chips, all you need to divide the clock is a binary counter or a few flip-flops. For your buffer all you need is a few latches. Then a few minor supporting gates and flip-flops and you could do the simple interfaces without any programmable logic at all. Consider all of our glue logic on the main board is done with common logic chips and no programmable logic. The bus clock is present on the expansion connector.


Sent from my iPhone using Tapatalk

How to implement slow chips on expansion cards

Posted: Fri Jan 01, 2021 11:09 pm
by aex01127

I am not sure how to proceed if I want to try do something to work with the IO bus with chips or other electronics that have a different format. Could you give me some ideas about what to search for or look after in order to do so? Using a much faster CPU (on the other end) will not manage to deal with the signalling so that a program can manage the expansion slot in real-time?


How to implement slow chips on expansion cards

Posted: Sat Jan 02, 2021 1:39 am
by Lorin Millsap
I am not sure how to proceed if I want to try do something to work with the IO bus with chips or other electronics that have a different format. Could you give me some ideas about what to search for or look after in order to do so? Using a much faster CPU (on the other end) will not manage to deal with the signalling so that a program can manage the expansion slot in real-time?

Your question is somewhat vague. So I’m not actually completely sure what you are actually asking. But I’ll try to provide an answer.

Chips do come in a variety of formats if you will divided into 2 basic types. Parallel and serial. For the purposes of the expansion bus we are discussing here, it is only directly compatible with parallel type interfaces.

Within the parallel type chips there is typically a data bus and sometimes might be an address or register select as we as a Chip Select or Enable and depending on the chip there may be an Output Enable and Write Enable or a multiplexed R/W line. For chips that require separate OE and WE lines you will need to provide the circuitry to split the CPU R/W line into the required OE and WE signals.

As to asking if a much faster CPU can be used I have in other articles tried to explain in layman’s terms why this is an issue. It can be mitigated once you understand what the issue is but it’s not as simple as just connecting the buses together.

To grasp this you need to think in terms of timing and for my example I’ll use an 8 MHz 6502 and we’ll have a 48 MHz AVR chip. And for simplicity sake we will assume only a single register which can be read and written.

How would you interface it? Well for starters you would need to assign 8 of the IO pins as the data bus. Next the CS line coming from the X16 needs to be set up as an external interrupt on the AVR. Since we have only a single register you will not need any address lines. And we can use the X16 R/W line to tell the AVR whether we are reading or writing. So you need 10 total IO pins on the AVR.

The X16 can now either read or write to the single register we are presenting. So to understand the timing that will be taking place and we have to allow a margin at the edges because the clocks are not synchronous. You need a margin in the design anyway but more if you cannot guarantee synchronization.

So what happens in timing terms when you attempt to write the register. Well we are going to do some quick math to determine what our access windows are. For a 6502 at 8 MHz you take 1000 ns (nanoseconds) and dude it by 8 to get the size of our cycles in nanoseconds. In this case it’s 125 ns. However the 6502 performs accesses in half cycles rather than full cycles so we need to divide that figure in half. Which gives us 62.5 nanoseconds for the access. Next we need to solve for the AVR at 48 MHz. Same equation 1000ns/48mhz gives us about 20 nanoseconds per cycle.

So let’s go step by step with what will happen starting with the start of the access window (when the CS line is triggered). This will invoke an interrupt on the AVR. The AVR will need to complete its current instruction so more or less your first 20ns of your 62ns window is taken up. Next the AVR is going to check to see if this is a read or a write access and branch accordingly. This takes a full cycle. So now you are into 40ns into the access. Next the AVR needs to store the value on the data bus which is going to use a single cycle. You are now at 60ns into the cycle and the X16 has successfully written to the register. But there is a caveat. This is assuming the clock edges are closely aligned. In reality they could be up to 20ns off in which case you could add nearly 20ns before the CS/IRQ is registered. This would throw all the actions that follow by the same amount. Since the data on the bus is only guaranteed to be valid for 10ns after the end of the cycle you will in this example possibly work, but it wouldn’t be guaranteed 100% of the time. So this is clear during this action the AVR only has time to execute 2 or three instructions before the access window ends and the data that is written is no longer guaranteed to be valid.

So what if the x16 needs to read. Well in terms of steps the AVR needs to get into it IRQ, branch based on the R/W state, and provide the requested data in less than 62ns. But it gets tougher here. The data needs to be valid and stable for at least 10ns before the end of the access window. In this scenario it is just possible if the clocks are correctly lined up but otherwise is a timing violation and will not work correctly.

So how do we mitigate this?

You need either a faster AVR and by that I mean much faster. Probably in the 60-200 MHz range. What about PIC, ARM, or Propeller? Well the same things apply to each of those. Each architecture will have its own limitations and some may be better suited but it’s still going to be a matter of how many steps can be completed during that access window.

Another way of mitigating it is by introducing waitstates. This would be added circuitry though in some cases it could be implemented by the AVR of equivalent. Basically the the CS line is asserted the x16s RDY line needs to be pulled low and held low as long as required. The issue with this is the X16 wasn’t exactly designed with waitstate in mind. In theory it would probably work fine but as of this writing it hasn’t been tested. If this waitstate was implemented in logic it could be done with a counter and a flip flop. If it was done by the AVR you could use one of its IO lines connected to RDY to perform the same task. You would just pull that line low first thing in the AVR and then switch the line back to high when the code is completed. This would have the benefit of allow as much access time as needed. But it has the effective drawback of slowing down the X16 cpu though in most cases it would probably only be a single cycle of delay.

As an important not I am making some assumptions about the AVR and that other micros will be similar. I am assuming that the IRQ only takes a single cycle and that you can execute a branch instruction based on the value of an IO pin and that all these instructions only take 1 cycle to complete. If any of these take longer then you definitely have to invoke waitstates.

By contrast most of these same tasks can easily be done by a sufficiently fast CPLD or FPGA which is why they a better suited to use on the system bus than a microcontroller. Microcontrollers are better off interfacing to the user port where the timing is much more forgiving.


Sent from my iPhone using Tapatalk

How to implement slow chips on expansion cards

Posted: Mon Jan 04, 2021 12:21 pm
by aex01127


On 1/2/2021 at 2:39 AM, Lorin Millsap said:




Your question is somewhat vague. So I’m not actually completely sure what you are actually asking. But I’ll try to provide an answer.



Thanks for your reply. I had to spend a bit time on research before replying back to you.

I agree that the scope I presented was a bit vague. What I want is to enable communication with a faster device using the I/O area. This faster device is a SoC of some kind that can provide stuff like wireless connectivity etc. All IO access is initiated from the X16 side. So in this setting the X16 would be the slow device. The decoding of the address bus (from 16 bits down to 5 bits + CS) will be handled in hardware.

When I was thinking about trying to implement this is software I wanted to use a much faster system in the GHz range, like a Raspberry Pi Compute Module 4. I would need 5 pins for addressing, 8 pins for data and then a few for system (CS, clock, RWB). Using a 1,5GHz CPU would give a clock cycle on 0,67ns, compared to 125ns. I tought it was possible to manage this in code.

I have spent some time reading and trying to understand CPLD and FPGA. CPLD seems (in theory) fairly easy to implement, and I have some ideas to work on. FPGA seems somewhat harder to implement but gives me more options that I am not sure I need. For both of these solutions the main issue is how to develop and "compile" the code. The toolchain needed often seems to cost lots of money with a few exceptions. Documentation and examples on now to develop is hard to come by.

I could not find what model of FPGA is used for Vera. If I want to look further into this it would be nice to at least try work with the same vendor that is used by Vera.

Should we think about a "reference design" on how to connect an expansion cartridge using CPLD og FPGA? I think some would come from how Vera is designed / programmed and can't wait for the schematics and code to be published.


How to implement slow chips on expansion cards

Posted: Mon Jan 04, 2021 1:05 pm
by Lorin Millsap
Just because a Raspberry Pi is in the ghz range doesn’t mean it’s IO is. I couldn’t find any verifiable info on that but it looks like a Pi3 can theoretically do around 66Mhz. So it might be possible depending on whether it can read entire ports in a single cycle or if it has to read or set one pin at a time.  As to CPLD vs FPGA there isn’t a huge difference between them and as to which FPGA the VERA uses that’s not a big deal either. Most FPGAs are gonna be chosen based on your actual needs ie how many macrocells you require.   Sent from my iPhone using Tapatalk