DragWx wrote: ↑Wed Apr 10, 2024 6:00 pmThe way I tried it, it did not do 4bit transparent writes AND byte reads...hstubbs3 wrote: ↑Wed Apr 10, 2024 3:46 pmYep! It's also nice when you just want to do general-purpose pixel plotting in 16-color bitmap mode; you don't need to grab a copy of the current byte (2 pixels) from VRAM before modifying just one pixel within. In addition to FX 4-bit mode, there's also FX Transparent Writes mode ($9F29.7), where writing a "0" results in no change to VRAM, which can also simplify blitting code.The FX routines also pay no attention to the target being 4bit or 8bit... which could be fun, as using 4bit mode with BYTE increments against an 8bit target could allow you to overwrite either nibble, possibly just switching palette offset of the pixel, maybe with a given palette that means could lighten/darken or do color shifts just changing 1 nibble each byte... .
The use case is blitting 4bit sprite data from somewhere in VRAM to 4bit bitmap layer, because I have >128 sprites to get onto screen at a time, so am blitting the remainder ....
If I set FX 4-bit mode but increment for DATA0 is 1 BYTE, 4bit is ignored... so even though I do have transparency write enabled...
LDA DATA0
LDA DATA0
LDA DATA0
LDA DATA0
STZ DATA1
will only actually do transparency write if the pair of 4bit pixels within the BYTE are both zero... because it is operating in 8bit mode, not 4bit mode...
If I set the DATA0 increment to nibble, I would get 4bit transparency writes still, with DATA1 increment being 4BYTE ?
Or would this just be waste of time and end up same as before -
set DATA0 increment to nibble...
LDA DATA0 ; pixel 0
LDA DATA0 ; pixel 1
LDA DATA0 ; pixel 2
LDA DATA0 ; pixel 3
LDA DATA0 ; pixel 4
LDA DATA0 ; pixel 5
LDA DATA0 ; pixel 6
LDA DATA0 ; pixel 7
STZ DATA1 ; write cache out
.....
I can 100% accept a limitation like transparency can only mask per byte in this case.. That can be managed via the assets involved....
or if really desperate alter the nibble mask writing the cache out ? ( which may make optimized code very weird and maybe not very general ).
https://www.youtube.com/watch?v=1TKdrVahM8g
https://github.com/hstubbs3/CommanderX1 ... ex_sprites
hex.PRG
press '9' should disable it from using sprites and you can really see it blitting its little heart out..
(is not optimized, there is a ton of overdraw as the tile sprites are 16x64 ... )
How to Use VERA FX "Line Helper"?
Re: How to Use VERA FX "Line Helper"?
Re: How to Use VERA FX "Line Helper"?
According to Verilog, FX docs:
When using the 32-bit cache...
If FX 4-Bit Mode is ON, Transparent Writes mode will function on each zero-nybble of the 32-bit cache when you attempt to write the cache to VRAM.
If FX 4-Bit Mode is OFF, Transparent Writes mode will function on each zero-byte of the cache instead.
If you'd like, you could disable FX 4-Bit Mode to load the cache with four reads, then enable FX 4-Bit Mode right before writing the cache to VRAM to still get the nybble-based transparency. Best case scenario, that's 7 memory accesses instead of the 9 you'd need if you just stayed in FX 4-Bit Mode, so it's still faster.
When using the 32-bit cache...
If FX 4-Bit Mode is ON, Transparent Writes mode will function on each zero-nybble of the 32-bit cache when you attempt to write the cache to VRAM.
If FX 4-Bit Mode is OFF, Transparent Writes mode will function on each zero-byte of the cache instead.
If you'd like, you could disable FX 4-Bit Mode to load the cache with four reads, then enable FX 4-Bit Mode right before writing the cache to VRAM to still get the nybble-based transparency. Best case scenario, that's 7 memory accesses instead of the 9 you'd need if you just stayed in FX 4-Bit Mode, so it's still faster.
Re: How to Use VERA FX "Line Helper"?
LDA #magic_enable_8bit ; 2 cyclesDragWx wrote: ↑Wed Apr 10, 2024 7:28 pm According to Verilog, FX docs:
When using the 32-bit cache...
If FX 4-Bit Mode is ON, Transparent Writes mode will function on each zero-nybble of the 32-bit cache when you attempt to write the cache to VRAM.
If FX 4-Bit Mode is OFF, Transparent Writes mode will function on each zero-byte of the cache instead.
If you'd like, you could disable FX 4-Bit Mode to load the cache with four reads, then enable FX 4-Bit Mode right before writing the cache to VRAM to still get the nybble-based transparency. Best case scenario, that's 7 memory accesses instead of the 9 you'd need if you just stayed in FX 4-Bit Mode, so it's still faster.
STA FX_CTRL ; 6 cycles
LDA DATA0
LDA DATA0
LDA DATA0
LDA DATA0 ; 4 cycles x 4 = +16 22 cycles..
LDA #magic_enable_4bit ;2 cycles 24
STA FX_CTRL ; 4 28
STZ DATA1 ; 4 32 cycles
vs not fiddling with it...
LDA DATA0
LDA DATA0
LDA DATA0
LDA DATA0 ; 4 cycles x 4 = +16
STZ DATA1 ; 4 cycles 20
is >50% cycles switching between... if my program wasn't already trying to do too much, I could see it being worth it, sure.
is not so bad, just very retro
Re: How to Use VERA FX "Line Helper"?
Code: Select all
LDA #magic_enable_8bit
LDX #magic_enable_4bit
loop:
STA FX_CTRL ;4, Disable FX 4-bit
BIT DATA0 ;4, Read DATA0 but don't store it in A
BIT DATA0 ;4
BIT DATA0 ;4
BIT DATA0 ;4
STX FX_CTRL ;4, Enable FX 4-bit
STZ DATA1 ;4, Write cache to VRAM with 4-bit transparency
; = 28 cycles
Edit: And then for comparison (for anyone else reading), the "normal" way for 4-bit mode (i.e., read DATA0 eight times and write to DATA1 once) is 36 cycles.
Re: How to Use VERA FX "Line Helper"?
DragWx wrote: ↑Thu Apr 11, 2024 12:55 amThis will save 4 cycles, just in case.Code: Select all
LDA #magic_enable_8bit LDX #magic_enable_4bit loop: STA FX_CTRL ;4, Disable FX 4-bit BIT DATA0 ;4, Read DATA0 but don't store it in A BIT DATA0 ;4 BIT DATA0 ;4 BIT DATA0 ;4 STX FX_CTRL ;4, Enable FX 4-bit STZ DATA1 ;4, Write cache to VRAM with 4-bit transparency ; = 28 cycles
Edit: And then for comparison (for anyone else reading), the "normal" way for 4-bit mode (i.e., read DATA0 eight times and write to DATA1 once) is 36 cycles.
< bows > Thanks for that. You even left me Y to use as loop counter. BIT is an instruction I have not paid enough attention to.
-
- Posts: 13
- Joined: Sat Jan 27, 2024 7:22 pm
Re: How to Use VERA FX "Line Helper"?
Here's my progress to date: https://github.com/Russell-S-Harper/EXPLORE
Directory cx16-v2 has an example of dual 16-color screens, with swapping during the VBI, and 16-color line drawing routines, all using VERA. Even though it's written in C and implements clipping, it's still about 25% faster than the 256-color line drawing routines in TGI.
Thanks to hstubbs3 and DragWx for their assistance!
Directory cx16-v2 has an example of dual 16-color screens, with swapping during the VBI, and 16-color line drawing routines, all using VERA. Even though it's written in C and implements clipping, it's still about 25% faster than the 256-color line drawing routines in TGI.
Thanks to hstubbs3 and DragWx for their assistance!