User Port Dreams: SPI

kelli217 · Post by **kelli217** » Wed Oct 12, 2022 5:13 pm

There isn't going to be a datasette port.

BruceMcF · Post by **BruceMcF** » Wed Oct 12, 2022 11:32 pm

On 10/11/2022 at 9:46 PM, Kalvan said:

Hypothetical question for the development team:

If someone else were to do another Wavicle-style home brew of a Commander X16 motherboard, and the only difference (as far as end-user hardware and software interface is concerned) is that it returns that second VIA to its place on the the motherboard and its place in the port architecture and memory map, and further assuming the screwdriver that the user port sonnected up somehow matches the planned Official User Port expansion card's pinout, voltage, signal, and bandwidth profiles exactly, would either that second VIA in its original position, or the User Port in its position among the circuitry break software compatiblity with the port architecture and/or memory map the Commander X16 system in its latest hardware and systems software specification, and/or would the presence of that second VIA and/or the User Port wired to it steal CPU cycles or otherwise interfere with Audio, Video, or I/O timing in any way, shape or form?

I ask this question because, as there are only four expansion slots expansion, and there is no other analogue to, say, cartridge slots on the motherboard, each individual expansion slot represents precious, extremely limited system real estate. I had several concepts for Commander X16 expansion cards in my head, and relegating the User Port to an expansion card cuts into the basic expandability of the machine by at least 2/5 (1 fifth for the expansion slot itself, the second for the missing User Port/Header on the motherboard itself). I know, one could theoretically reclaim some of that expandability by hacking the Commodore Datasette and/or Floppy Drive ports, but those appear only useful for software medium interface.

IIC, Kevin Willliams was talking about the VIA#2 board appearing in the address space where the VIA#2 appears, so it wouldn't be any less space from a software perspective. From a hardware perspective, since the slot lines are brought out to each slot, you'd only need a 1 -> 3 riser board to have a VIA#2 board plus 5 additional slots, so it doesn't seem like a serious bottleneck.

BruceRMcF · Post by **BruceRMcF** » Sat Feb 11, 2023 10:49 pm

A small note about the general purpose "all mode" SPI channels when working with a VIA, but also of wider use.

There is quite a lot of verbose thinking things through in this thread by that "BruceMcF" fellow, so I decided to clean this thread up by setting out the most generally useful part of all of this.

Recap: Consider the VIA serial shift register when it is in single byte PHI2 clock mode. In this mode, it starts has the serial clock high as the starting point, shifts the shift register on the clock low transition, and then data is latched (by the receiver for output, by the VIA for input) on the return to high. Like this:

___   _   _   _   _   _   _   _   ___
   \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/

This is called "Mode3". The phase is whether the data is sampled on the leading edge (phase=0) or trailing edge (phase=1), the clock polarity is whether the rest state of the clock is 0 (polarity 0) or 1 (polarity 1), and Mode=2*Polarity+Phase.

SD cards and quite a lot of SPI chips are Mode0, which is like this:

    _   _   _   _   _   _   _   _ 
___/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \___

After all, the first MOS VIA chips were NMOS technology, which pulls down strongly and pulls up weakly, so you use a lot of pull up resisters and design selects and other things that need to have a "clean transition" around pulling down then releasing up. SPI was developed in the 90s when CMOS had taken over, where CMOS pulls either up or down with similar strength, so a "rest state" at 0 had stronger appeal than in the late 70s.

A lot of the above thread is concerned with bit banging SPI, and with bit banging, you just "follow the recipe". But with a VIA hardware shift register, or with a glue logic serial shift register that works most cleanly in Mode3, it would be convenient to have a simple circuit that can convert a Mode3 SPI Master to control a Mode0 Servant.

This can be done with a /S /R Flip-Flop with Q and /Q outputs and an AND gate (try to say "and an AND gate" four times fast).

With a /S /R flip flop, pulling /S (set) low while /R (reset) is high sets Q=1, /Q=0. This is held so long as /R stays high. Then pulling /Reset low while /S is high sets QH=0. /QH=1. Conveniently both Q and /Q are output.

So, you have a GPIO that is the /MODE0_SET output line. That goes into the flip-flop /Set. The internal Mode3 SPI bus SCLK goes into /Reset. Then you AND(SCLK_LOCAL,/QH).

So in Mode0 operation:

With the local SPI clock at rest, therefore high, pulling the SET line low sets Q=1, so /Q=0, so AND(SCLK_LOCAL,/QH)=0, so the external clock is low.
Pull the device select low. The device starts with the external SPI_CLK low as it expects
When the local SPI clock comes low, this resets Q=0, /Q=1, so now the AND gate passes through the SCLK ... but this happens when SCLK is low, so the external SPI clock stays low.
Now Q=0 and /Q=1 remains true now matter how many times SPI pulses low, so the trailing rising edge of the internal serial clock is the leading rising edge of the external serial clock, and the next bit of data is shifted in the next falling edge
When 8 bits have been latched, the external channel requires one last falling edge, so it can return to its rest state. So pull the SET line low, then pull it high again, and the channel is ready for another byte.
When the transaction is finished, pull the /Select line high.

To switch from Mode0 to Mode3 operation, simply send a dummy byte of data with all of the select lines high. That pulls Q low and /Q high, where it will stay so long as leave /MODE0_SET high

Even better, an /S /R can be made with two NAND gates. Now, if you use a NAND instead of an AND, the clock is inverted -- either Mode1 or Mode2. And the fourth NAND can be used to invert the clock to get the desired MODE0 and MODE3.

There isn't any second GPIO to "set" whether the clock polarity is inverted ... rather, if you want to connect to a MODE1 or MODE2 device, you use the inverted /SPI_CLK line.

This looks like a netlist, but if looking at the chip with pin 1 at the top left, this is in pin order, one the left side from 1-7 and on the right side from 14 to 8

; Quad 2-input NAND 74xx00, NAND1
;
; From Pin 1 to Pin 7
NAND1_1A := /MODE0_SET ; From bit banged GPIO
NAND1_1B := NAND1_2Y
NAND1_1Y =: NAND1_2A ; QH
;
NAND1_2A := NAND1_1Y
NAND1_2B := MODE3_CLK
NAND1_2Y =: NAND1_1B =: NAND1_3A ; /QH
NAND1_GND := GND
;
; From Pin 14 to Pin 8
NAND1_VCC := VCC
NAND1_4B := NAND1_2Y = NAND1_1B ; /QH
NAND1_4A := MODE3_CLK
NAND1_4Y =: NAND1_3B =: /SPI_CLK
;
NAND1_3B := NAND1_4Y
NAND1_3A := NAND1_4Y
NAND1_3Y =: SPI_CLK

rje · Post by **rje** » Tue Feb 28, 2023 10:31 pm

About speed considerations.

(1) Banked RAM

When I think about the banked RAM, I become keenly interested in faster load speeds.

So 18 kb versus 30 kb read times, for example, are significant. 17 seconds versus 28 seconds for loading 512k of data into banks, though, might not reeeeallllly matter.

(2) Network Proxy

Then there's network proxying. If I'm querying an internet data source across a nanny program running on an RPi bridge, granted I will seriously pre-pack the data I want to return. But it still really ought to be on the order of 1 second to transmit. That limits my transaction response to, say, 20 kilobytes of data. Which really can be quite a lot of data, when you think about it.

But, _faster_ is always enticing, so I don't know. I do think that once we're close to 20 kb per second, I think we're probably at reasonable speeds.

BruceRMcF · Post by **BruceRMcF** » Thu Mar 02, 2023 2:18 pm

Wavicle wrote: ↑Wed Sep 21, 2022 4:28 pm Kevin posted an update to Facebook on the topic of 6522s and user port: Commander X16™ Prototype | Facebook

The October, 2022 post is more up to date, it specifies

The user port remains on the PCB as an optional add-on.

So long as a VIA expansion card has the same arrangement of block pin header as the optional motherboard user-port, the only difference would be the port address for the VIA, where it would be straightforward to simply provide a version of the SPI support routines for each possible expansion slot address in addition to the address in the top 16 bytes of the VIA I/O "slot".

___________

rje wrote: ↑Tue Feb 28, 2023 10:31 pm About speed considerations.

(1) Banked RAM

When I think about the banked RAM, I become keenly interested in faster load speeds.

So 18 kb versus 30 kb read times, for example, are significant. 17 seconds versus 28 seconds for loading 512k of data into banks, though, might not reeeeallllly matter.

(2) Network Proxy

Then there's network proxying. If I'm querying an internet data source across a nanny program running on an RPi bridge, granted I will seriously pre-pack the data I want to return. But it still really ought to be on the order of 1 second to transmit. That limits my transaction response to, say, 20 kilobytes of data. Which really can be quite a lot of data, when you think about it.

But, _faster_ is always enticing, so I don't know. I do think that once we're close to 20 kb per second, I think we're probably at reasonable speeds.

A third main SPI target is mass storage -- whether a second SD card slot to leave one in the main slot and have one for sneakernetting, or a thumb drive MCU that has an SPI interface that offers more speed than the already available 100-150kbits/sec of the I2C channel. That is 12KB-18KB/sec bandwidth, obviously less with channel management and processing overheads ... while the X16 is a lot faster than my old C64, it's not the speed demon we are sometimes thinking of when thinking of negligible processing overheads for I2C.

Dedicating a VIA with auto-MISO on the serial shift register and individual hardware generated clock cycles with the Port A handshake lines as with:

TX_SPI_MODE3:
     STA VIA2PORTA
     ASL VIA2PORTA
     ASL VIA2PORTA
     ASL VIA2PORTA
     ASL VIA2PORTA
     ASL VIA2PORTA
     ASL VIA2PORTA
     ASL VIA2PORTA
     LDA VIA2SERDATA
     RTS

... Call/Return 11 cycles, STA/LDA 8 cycles, 7 shifts for 42 cycles, so 129KB/sec when running at 8MHz seems plenty fast to me for those kinds of "load onto the built in SD card" tasks ... a full 640KB of data (512KB HighRAM and 128KB Vera RAM) plus maybe 60KB of associated material would be copied onto the SD card in under 15 seconds -- remembering that the Vera SPI port can be talking to the internal SD card while the 65C02 is talking to the other mass storage device.

And I am really happy that Bill Mensch shifted the "internal processing" clock cycles for 65C02 ops to being reads, so that the "ASL VIAPORTA" has only one write cycle ... I am not sure whether this style of SPI would work on a C64 (though with 2 serial shift registers, a C64 wouldn't need it).

I am happy for the HighRAM or Vera RAM to be loaded from the faster built-in SD card port, so I wouldn't chase getting up to the 12.5Mbit/sec Vera SPI clock.

rje · Post by **rje** » Fri Mar 03, 2023 2:32 pm

BruceRMcF wrote: ↑Thu Mar 02, 2023 2:18 pm A third main SPI target is mass storage [...]

I think any solution that improves mass storage transfer speeds, automatically serves my two considerations, since I think mass storage is the most demanding requirement...

BruceRMcF · Post by **BruceRMcF** » Fri Mar 03, 2023 3:39 pm

rje wrote: ↑Fri Mar 03, 2023 2:32 pm
BruceRMcF wrote: ↑Thu Mar 02, 2023 2:18 pm A third main SPI target is mass storage [...]
I think any solution that improves mass storage transfer speeds, automatically serves my two considerations, since I think mass storage is the most demanding requirement...

That's what I'm thinking. I'd be most interested in the thumb drive interface and a faster UART interface than the I2C channel can support, but while SPI is not as widely available for various I/O bridge IC's as it was a decade ago, there is still a good number out there.

Another dimension of this is that if an I2C port on the Gen3 board uses an FPGA hardware I2C interface, that can run the I2C channel much faster than bit-banging the VIA, so it could have one I2C channel running on a VIA core for backward compatibility and a second I2C channel for secondary I/O access without using up a lot of FPGA I/O pins -- or multiplex the bit-banged I2C channel on pins that can access a hardware I2C module, for an even lower pin count.

The same thumb drive bridge chip that I was looking at last year supports UART, SPI and I2C interfaces, so in theory you could make one I2C daughterboard with an optional pin header for faster SPI access, and then it would be already ready for a faster I2C channel in a hypothetical Gen3 board that supports (say) I2C Fast Mode Plus (1Mbps) or High Speed (3.4Mbps).

On that design approach (which is just a hypothetical), the SPI to get faster mass storage (etc.) I/O is only really targeting Gen 1 and 2.

BruceRMcF · Post by **BruceRMcF** » Wed Mar 15, 2023 5:48 pm

The newest X16 "PRG" that I have seen documents the 6 pin block header that brings out the unused VIA1 pins.

These are PB0, PB1 and PB2, along with three of the four handshake lines, CA1, CA2, and CB2.

That is, it turns out, plenty to bit-bang an SPI port, with the SPI_CLOCK and SPI_MOSI (Master-In, Servant-Out) lines set to outputs and the SPI_MISO (Master-In, Servant-Out) line set to input. PortB is also used for the IEC, so we have to be careful to ONLY touch port B bits 0, 1 and 2

CA2 and CB2 can be individually set high or low, so they both work as select lines. So this is enough to bus two different SPI devices.

Indeed, adding one IC and an eight-line network resister as pull-ups, it is enough to bit-bang an SPI port with up to eight selects, since you can hang a serial-in, tri-state parallel-out shift register fed by the SPI MOSI line, connect one select line to the parallel data latch, connect the other to the serial shift register Output Enable, and have up to eight select lines, with all lines being pulled high to de-select when the output-enable is de-selected.

However, what I am sketching here would be for two select lines. The idea is for a daughter-board that plugs into the block pin-header, and has a pass through which passes the PB2 line in the position where the original pin header passes PA2, so the "first" board is always selected at the VIA1 on CA2 and the "second" board selected at CB2.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

; SPI
; VIA#1 CA1 := J4 Pin 1
; VIA#1 CA2 =: J4 Pin 2
; VIA#1 PB0 :=: J4 Pin 3
; VIA#1 PB1 :=: J4 Pin 4
; VIA#1 PB2 :=: J4 Pin 5
; VIA#1 CB2 =: J4 Pin 6

; Note that two selects can generate 3 selects through a 2 to 4
; decoder with the %11 pin not connected, or 8 or more selects
; through using one select to select the parallel latch for a
; serial in, parallel out shift register and the other select to
; select the output enable of the outputs, with the select lines
; attached to a network pull up resister.

; CA1 can be a general alert line, pulled high through a pull up
; resister, pulled down by a peripheral.

; This assumes two selects. With two selects, I will specify
; CA1 as Select1, and a daughterboard will have a 6 pin block
; header with the Pin1 and Pin3-5 fed straight through, Pin6
; input fed through to Pin2 output.

; CA1 is SPI_ALERT
; CA2 is SPI_SELECT1
; PB0 is SPI_CLK
; PB1 is MOSI -- Master Out, Servant In
; PB2 is MISO -- Master In, Servant Out
; CB2 is SPI_SELECT2

SPI_CLK	= %00000001
SPI_MOSI	= %00000010
SPI_MISO	= %00000100

MODE0_SELECT: ; A contains selection, 1 to max,
	; A=0 => deselect all
	; Returns selection in A, or 0 if no device selected
	; Uses X, does not use Y
	; Returns Carry Set if select fails

	LDA #(SPI_CLK|SPI_MOSI)
	TSB VIA1DDRB	; set SPI_CLK, MOSI to output
	TSB VIA1PORTB	; "reset" SPI_CLK, MOSI to 0

	LDA #SPI_MISO
	TRB VIA1DDRB	; "reset" MISO to input
	BRA SPI_SELECT

MODE3_SELECT: ; A contains selection, 1 to max,
	; A=0 => deselect all
	; Returns selection in A, or 0 if no device selected
	; Uses X, does not use Y
	; Returns Carry Set if select fails

	LDA #(SPI_CLK|SPI_MOSI)
	TSB VIA1DDRB	; set SPI_CLK, MOSI to output
	TSB VIA1PORTB	; "set" SPI_CLK, MOSI to 1

	LDA #SPI_MISO
	TRB VIA1DDRB	; "reset" MISO to input
; fall through to SPI_SELECT

SPI_SELECT: ; Re-select same device
	; port will be set up & clock idle will match mode 
	; A contains selection, 1 to max, 0 to deselect all
	; Returns selection in A, or 0 if no device selected
	; Uses X, does not use Y
	; returns Carry Clear on success, Carry Set on fail
	; Note that a call to select the same device will
	; deselect the device then select it again
; PCR: $9F0C
; * bit0: CA1 control, rising/falling edge detect
; * bit1-3: CA2 control, %110 low output, %111 high output
; * bit4: CB1 control, rising/falling edge detect
; * bit1-3: CB2 control, %110 low output, %111 high output

	SEC
	TAX
	LDA #$EE		; Don't touch CA1/CB1 control settings
	TSB VIA1PCR	; Deselect lines
	CPX #0
	BEQ SELECT_DONE
	CPX #1
	BNE +
	LDA #$02
	BRA SELECT_SPI
	CPX #2
	BNE SELECT_ERR
	LDA #$20
SELECT_SPI:
	TRB VIA1PCR
SELECT_DONE:
	CLC
SELECT_ERR:
	TXA
	RTS

SPI_WORK = $20	; Kernal ABI scratch location
	; move this if interrupt routines use "New ABI" routines

MODE0_BYTE ; A has output byte, returns with input byte
	; Uses X, not Y
	
; Uses Mode3_byte routine for most operations, but
; cannot decrement PORTB in first bit if MOSI_b7=1
; and in either event must return with clock low
	ASL
	STA SPI_WORK
	LDX #8
	BCS +
; Do Mode3 loop starting with 0 bit, then drop clock
	JSR SPI_BIT0
	DEC VIA1PORTB
	RTS

; Do Mode3 loop starting with 1 bit but no leading clock drop
; then drop clock
+	JSR MODE0_BIT1_ENTRY
	DEC VIA1PORTB
	RTS


MODE3_BYTE: ; A has output byte, returns with input byte
	; Uses X, not Y
	ASL		; First bit in Carry

MODE_BIT0_ENTRY:
	STA SPI_WORK
	LDX #8
	BCS SPI_BIT1
SPI_BIT0:
	LDA #(SPI_CLK | SPI_MOSI)
	TRB VIA1PORTB	; Drop clock & Output Bit = 0
	INC VIA1PORTB 	; Raise clock
	LDA #SPI_MISO
	AND VIAPORTB
	BEQ +			; Carry is currently clear
	SEC
+	ROL SPI_WORK	; save input bit, get next output bit
	DEX
	BEQ ++	; done
	BCC SPI_BIT0
SPI_BIT1:
	DEC VIA1PORTB	; Drop clock

MODE0_BIT1_ENTRY:
	LDA #SPI_MOSI
	TSB VIAPORTB	; Output Bit = 1
	INC VIA1PORTB 	; Raise clock
	LDA #SPI_MISO
	AND VIAPORTB
	BNE +			; Carry is currently set
	CLC
+	ROL SPI_WORK	; save input bit, get next output bit
	DEX
	BEQ ++	; done
	BCS SPI_BIT1
	BCC SPI_BIT0

++	CLC
	LDA SPI_WORK
	RTS

BruceRMcF · Post by **BruceRMcF** » Thu Mar 16, 2023 2:41 pm

Analyzing that gives throughput of the routine itself at around 20KB/sec. This is fast enough to run an SPI UART chip at a reasonable speed, since the SPI UART chip that I know of operates with a 16bit transaction with the R/W bit and the register address first and then the byte written or read. For this case, the process can be sped up by writing a dedicated Write byte operations which will proceed over 25% faster than the bit banging the standard SPI "transceive byte" operation (because extracting the input bit is relatively expensive). So 20KB raw throughput is 10KB worth of data transfers, and 80,000 bits one way seems like it can fairly easily support perhaps 38,400baud, 4 times the 9.600baud of the C64 hardware serial shift register supported User Port interface.

But the game might not be worth the candle, since the I2C bus would have nearly this same throughput, and is already supported by the Kernel. Given that RS-232C transmission rarely involved operating at full throughput in both directions, using an I2C UART might be able to achieve similar speeds

Look at the code, the most expensive part of the operation is placing the bit read into the carry flag. Clearing the carry flag and using the input bit to set the Z flag and then either jumping over a set carry or letting it go ahead is slower than loading Port B and shifting the input bit directly into the carry flag. With PB0 dedicated to the SPI Clock so it can be toggled with DEC and INC operations, MISO should be in PB1. This gives about a 25% faster throughput for the core byte transceive routine.

; SPI
; VIA#1 CA1 := J4 Pin 1
; VIA#1 CA2 =: J4 Pin 2
; VIA#1 PB0 :=: J4 Pin 3
; VIA#1 PB1 :=: J4 Pin 4
; VIA#1 PB2 :=: J4 Pin 5
; VIA#1 CB2 =: J4 Pin 6

; Note that two selects can generate 3 selects through a 2 to 4
; decoder with the %11 pin not connected, or 8 or more selects
; through using one select to select the parallel latch for a
; serial in, parallel out shift register and the other select to
; select the output enable of the outputs, with the select lines
; attached to a network pull up resister.

; CA1 can be a general alert line, pulled high through a pull up
; resister, pulled down by a peripheral.

; This assumes two selects. With two selects, I will specify
; CA1 as Select1, and a daughterboard will have a 6 pin block
; header with the Pin1 and Pin3-5 fed straight through, Pin6
; input fed through to Pin2 output.

; There is contention for the special PB0 pin, which could be
; used to toggle the serial clock with DEC/INC instructions or
; to extract the input bit with "LDA PORTB : LSR". Placing the
; input bit at PB1 costs only two clock more, so this is the
; approach used here. The MOSI bit will be set using "TSB" or
; "TRB", so it does not cost more to place in PB2 than in PB0
; or PB1.

; CA1 is SPI_ALERT
; CA2 is SPI_SELECT1
; PB0 is SPI_CLK
; PB1 is MISO -- Master In, Servant Out
; PB2 is MOSI -- Master Out, Servant In
; CB2 is SPI_SELECT2

SPI_CLK	= %00000001
SPI_MISO	= %00000010
SPI_MOSI	= %00000100

MODE0_SELECT: ; A contains selection, 1 to max,
	; A=0 => deselect all
	; Returns selection in A, or 0 if no device selected
	; Uses X, does not use Y
	; Returns Carry Set if select fails

	LDA #(SPI_CLK|SPI_MOSI)
	TSB VIA1DDRB	; set SPI_CLK, MOSI to output
	TRB VIA1PORTB	; "reset" SPI_CLK
				; MOSI is a "don't care" at this point
	LDA #SPI_MISO
	TRB VIA1DDRB	; "reset" MISO to input
	BRA SPI_SELECT

MODE3_SELECT: ; A contains selection, 1 to max,
	; A=0 => deselect all
	; Returns selection in A, or 0 if no device selected
	; Uses X, does not use Y
	; Returns Carry Set if select fails

	LDA #(SPI_CLK|SPI_MOSI)
	TSB VIA1DDRB	; set SPI_CLK, MOSI to output
	TSB VIA1PORTB	; "set" SPI_CLK, MOSI to 1

	LDA #SPI_MISO
	TRB VIA1DDRB	; "reset" MISO to input
; fall through to SPI_SELECT

SPI_SELECT: ; Re-select same device
	; port will be set up & clock idle will match mode 
	; A contains selection, 1 to max, 0 to deselect all
	; Returns selection in A, or 0 if no device selected
	; Uses X, does not use Y
	; returns Carry Clear on success, Carry Set on fail
	; Note that a call to select the same device will
	; deselect the device then select it again
; PCR: $9F0C
; * bit0: CA1 control, rising/falling edge detect
; * bit1-3: CA2 control, %110 low output, %111 high output
; * bit4: CB1 control, rising/falling edge detect
; * bit1-3: CB2 control, %110 low output, %111 high output

	SEC
	TAX
	LDA #$EE		; Don't touch CA1/CB1 control settings
	TSB VIA1PCR	; Deselect lines
	CPX #0
	BEQ SELECT_DONE
	CPX #1
	BNE +
	LDA #$02
	BRA SELECT_SPI
	CPX #2
	BNE SELECT_ERR
	LDA #$20
SELECT_SPI:
	TRB VIA1PCR
SELECT_DONE:
	CLC
SELECT_ERR:
	TXA
	RTS

SPI_WORK = $20	; Kernal ABI scratch location
	; move this if interrupt routines use "New ABI" routines

MODE0_BYTE ; A has output byte, returns with input byte
	; Uses X, not Y
	
; Uses Mode3_byte routine for most operations, but
; cannot decrement PORTB in first bit if MOSI_b7=1
; and in either event must return with clock low
	ASL
	STA SPI_WORK
	LDX #8
	BCS +
; Do Mode3 loop starting with 0 bit, then drop clock
	JSR SPI_BIT0
	DEC VIA1PORTB
	RTS

; Do Mode3 loop starting with 1 bit but no leading clock drop
; then drop clock
+	JSR MODE0_BIT1_ENTRY
	DEC VIA1PORTB
	RTS


MODE3_BYTE: ; A has output byte, returns with input byte
	; Uses X, not Y
	ASL		; First bit in Carry

MODE_BIT0_ENTRY:
	STA SPI_WORK
	LDX #8
	BCS SPI_BIT1
SPI_BIT0:
	LDA #(SPI_CLK | SPI_MOSI)
	TRB VIA1PORTB	; Drop clock & Output Bit = 0
	INC VIA1PORTB 	; Raise clock
	LDA VIA1PORTB
	LSR
	LSR			; Input bit
	DEX
	BEQ +			; done
	ROL SPI_WORK	; save input bit, get next output bit
	BCC SPI_BIT0
SPI_BIT1:
	DEC VIA1PORTB	; Drop clock

MODE0_BIT1_ENTRY:
	LDA #SPI_MOSI
	TSB VIA1PORTB	; Output Bit = 1
	INC VIA1PORTB 	; Raise clock
	LDA VIA1PORTB
	LSR
	LSR
	DEX
	BEQ +			; done
	ROL SPI_WORK	; save input bit, get next output bit
	BCC SPI_BIT0	; Repeating 1 bit is faster
SPI_RPT1:
	DEC VIA1PORTB
	INC VIA1PORTB
	LDA VIA1PORTB
	LSR
	LSR
	DEX
	BEQ +
	ROL SPI_WORK
	BCC SPI_BIT0
	BCS SPI_RPT1

+	LDA SPI_WORK
	ROL			; save input bit, carry cleared
	RTS

BruceRMcF · Post by **BruceRMcF** » Sun Mar 19, 2023 9:13 pm

Extending that, with the SPI addressed UART that I have seen a datasheet for, you may not ever have to transceive bytes. For that IC, Select is asserted low for two bytes. The first byte writes the operation -- read / write, data or port address -- and the second byte either writes a byte or reads a byte.

So you can go faster, since you don't have to manipulate both the MISO and the MISO in the same pass through the SPI routine.

IIRC, this can operate in MODE3, so I will set aside handling MODE0 as would be needed for SD card SPI mode, and just focus on Mode3.

MODE3_READ is straightforward. I go ahead and reset the MOSI bit, to make it clearer on a scope what is going on, and also if this is borrowed for an IC that uses MOSI byte #0 as a "normal operation" status to the peripheral when reading from it. MODE3_WRITE is a little slower when sending high bits than low bits, and can be sped up a bit by skipping the setting of the one bits when it is following a one, but this version gets the idea across. Both of them are on the order of 40% faster than a single combined Read/Write routine.

MODE3_BYTES_1LOOP

MODE0_READ:
	JSR MODE3_READ
	DEC VIA1PORTB
	RTS

MODE3_READ: ; 0 is output as the output byte
	; Returns with input byte in A
	; Uses X, not Y
	STZ SPI_WORK
	LDX #7
	LDA #(MOSI | SPI_CLK)	; Send #0
	TRB VIA1PORTB	; also ensure clock low, if Mode3
	INC VIA1PORTB
	LDA VIA1PORTB
	LSR
	LSR
	ROL SPI_WORK
8BIT_LP
	DEC VIA1PORTB
	INC VIA1PORTB
	LDA VIA1PORTB
	LSR
	LSR
	ROL SPI_WORK
	DEX
	BNE 8BIT_LP
;
; Loop Finished
	LDA SPI_WORK
	RTS

; ~~~~~~~~~~~~~~

MODE3_WRITE: ; A has output byte
	; Returns with low bit equal to final input bit
	; Uses X, not Y
	ASL		; First bit in Carry
	STA SPI_WORK
	LDX #8
	BCS SW_BIT1
SW_BIT0:
	LDA #(SPI_CLK | SPI_MOSI)
	TRB VIA1PORTB	; Drop clock & Output Bit = 0
	INC VIA1PORTB 	; Raise clock
	DEX
	BEQ +			; done
	ASL SPI_WORK	; get next output bit
	BCC SW_BIT0
SW_BIT1:
	DEC VIA1PORTB	; Drop clock
	LDA #SPI_MOSI
	TSB VIA1PORTB	; Output Bit = 1
	INC VIA1PORTB 	; Raise clock
	DEX
	BEQ +			; done
	ASL SW_WORK	; get next output bit
	BCC SW_BIT0
        BCS SW_BIT1

+	LDA VIA1PORTB
	LSR
	LSR
	LDA #0
	ROL			; return final input bit
	RTS

Commander X16