Page 1 of 2

VERA FX

Posted: Wed Aug 23, 2023 7:52 pm
by DragWx
I have a silly question, was the "FX" extension for the VERA named after the SNES's "Super FX" chip? I noticed the similarities in the feature set and thought it might not be a coincidence. :P

Also, it's nice to finally see what this is all about after several commits on GitHub had been referring to it. :P

For anyone who hasn't seen, the VERA's new "FX" features provide some helper functions to accelerate drawing arbitrary lines, polygons, fills, and some simple bitmap rotation and scaling, but they're all passive functions, meaning it's up to the CPU to actually poke the VERA for each byte or pixel that needs to be written to VRAM. It's pretty neat and it seems like it'd be a fun thing to play with at some point, and I can't wait to see what people do with it. :D

Re: VERA FX

Posted: Thu Aug 24, 2023 1:22 am
by ahenry3068
I'm intrigued. Is this ready for FLASHING to Hardware yet ?

Re: VERA FX

Posted: Thu Aug 24, 2023 2:02 am
by DragWx
I'm not sure, it looks like there's a pre-release build available on the GitHub page, but I think it might still need some more time for an "official" release or else we would've seen an announcement here.

Re: VERA FX

Posted: Thu Aug 24, 2023 2:19 am
by Ender
Well, it's in R44 of the emulator, and I've heard of people that already have it on their hardware without issues, so I would say it's probably safe to flash it.

Re: VERA FX

Posted: Fri Aug 25, 2023 2:36 am
by FearLabs
I flashed it on to my hardware, no issues

Re: VERA FX

Posted: Fri Aug 25, 2023 10:37 pm
by Guybrush
I have a (stupid?) question for the people who worked on VERA FX, so here it goes:

Why is there no option to read the entire 32-bit cache in one read operation, since there is an option to write it in one operation?

It would allow for near-DMA speeds when copying data within the video RAM. LDA DATA0/1 is 4 cycles, STA DATA0/1, #val is 5 cycles, which would make it possible to copy 4 bytes in just 9 cycles not accounting for loops, but let's add 3 more cycles for that, which makes it 12 cycles, 3 cycles per byte. That's pretty damn fast, and still totally under CPU control unlike traditional DMA.

32-bit cache write could stay just as it is right now, with nibble mask and everything, only a read mode would need to be added where all 4 bytes of the 32-bit cache would be loaded (they're already read from memory anyway). As for what would actually be returned to the CPU by the read operation, it could be the first byte or whatever.

Re: VERA FX

Posted: Fri Aug 25, 2023 11:25 pm
by Ed Minchau
If you're using VERA channel 0 or 1, that STA is also 4 cycles, since it's going to an absolute address.

Re: VERA FX

Posted: Sat Aug 26, 2023 12:10 am
by Guybrush
Ed Minchau wrote: Fri Aug 25, 2023 11:25 pm If you're using VERA channel 0 or 1, that STA is also 4 cycles, since it's going to an absolute address.
You're absolutely right, I was probably thinking of standard loops and indexed addressing. That means that the simple non-unrolled loop would be 11 cycles per 4 bytes, which is even better :D

Re: VERA FX

Posted: Wed Aug 30, 2023 6:39 am
by Ed Minchau
I think I found a major bug in the multiplier. I made a test program:

Code: Select all

bra test

testresult: 
.word $0000
.word $0000

testinput:
.word $7fff
.word $4000

test:
 ldx #$00 ;setting vera channel 0 and 1 to 1df00
 ldy #$df
 lda #$00
 sta $9f25
 lda #$11 ;step size 1 for channel 0
 stx $9f20
 sty $9f21
 sta $9f22
 lda #$01
 sta $9f25
 lda #$31 ;step size 4 for channel 1
 stx $9f20
 sty $9f21
 sta $9f22

 lda #$0c ;DCSEL = 6
 sta $9f25
 ldy #$00 ;copy test input into cache
:lda testinput,y
 sta $9f29,y
 iny
 cpy #$04
 bne :-

 lda #$04 ;DCSEL = 2
 sta $9f25
 lda #$40 ;enable cache write, addr-1 mode normal
 sta $9f29
 lda #$10 ;multiply
 sta $9f2c
 sta $9f24 ;send result to VRAM
 stz $9f29 ;disable cache write
 stz $9f25 ;DCSEL = 0

 ldy #$00 ;copy result to RAM
:lda $9f23
 sta testresult,y
 iny
 cpy #$04
 bne :-
 rts
It took a bit of code wrasslin' to figure out to shut off cache write when I was done. Anyhow, My results were unexpected. it looks like bits 16-19 of the result are always 0.

For example,

Code: Select all

inputs		expected output		actual output
7fff 4000   	1fffc000    		1ff0c000
7fff 2000   	0fffe000    		0ff0e000
5555 7fff   	2aaa2aab	 	2aa02aab

Re: VERA FX

Posted: Wed Aug 30, 2023 7:25 am
by Ed Minchau
I've also found a strange problem with the bits 7:0. These are the results of sequential runs, without resetting testresult to 00000000 each time; is it a problem with my test program? Or with the multiplier?

Code: Select all

starting with testresult set to 00000000 before the first run
inputs		expected	actual
5555 7fff	2aaa2aab	2aa02a00
5555 7fff	2aaa2aab	2aa02aab
5555 5555	1c718e39	1c708eab	
5555 5555	1c718e39	1c708e39
7fff 5555	2aaa2aab	2aa02a39
7fff 5555	2aaa2aab	2aa02aab