Page 10 of 11

New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Posted: Sun Mar 28, 2021 7:18 pm
by Ed Minchau

@Jeffrey I'm not just looking at your source code, I'm also looking at the compiled code with the META/L editor.  There is a lot of room for improvement in that code; although macros make things easier to read and C makes it easier to write, the compiler produces much longer code than is necessary.  For instance, there's a lot of places where the code reads

LDX #00

LDA #22

STAA VERA_DAT_0

LDX #00

LDA #22

STAA VERA_DAT_0

over and over again.  The second and subsequent LDX and LDA instructions aren't necessary because the value being sent to VERA isn't changing, and each adds two cycles.  The sequences like that aren't all the same, some of them include unnecessary LDX#00 and LDY#08 and LDA ($22),Y instructions over and over, or some similar things.  A few cycles here, a few cycles there, repeated hundreds of times per column of pixels and it really adds up.  If this was all optimized assembly code, then a target FPS of 15 to match the original is definitely achievable.

 


New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Posted: Sun Mar 28, 2021 8:36 pm
by Jeffrey


1 hour ago, Ed Minchau said:




@Jeffrey I'm not just looking at your source code, I'm also looking at the compiled code with the META/L editor.  There is a lot of room for improvement in that code; although macros make things easier to read and C makes it easier to write, the compiler produces much longer code than is necessary.  For instance, there's a lot of places where the code reads



LDX #00



LDA #22



STAA VERA_DAT_0



LDX #00



LDA #22



STAA VERA_DAT_0



over and over again.  The second and subsequent LDX and LDA instructions aren't necessary because the value being sent to VERA isn't changing, and each adds two cycles.  The sequences like that aren't all the same, some of them include unnecessary LDX#00 and LDY#08 and LDA ($22),Y instructions over and over, or some similar things.  A few cycles here, a few cycles there, repeated hundreds of times per column of pixels and it really adds up.  If this was all optimized assembly code, then a target FPS of 15 to match the original is definitely achievable.



 



I think you are looking at the compiled C code. I have a assembly version in the .asm which contains the asm version. The c version is just for testing purposes.

Edit: in fact: I call the asm version from c in different ways (for easier debugging). And some of the c-code I didn't convert yet to assembly because its not very performance critical atm (like drawing the menu once or clearing the render part once).


New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Posted: Sun Mar 28, 2021 10:57 pm
by Ed Minchau


2 hours ago, Jeffrey said:




I think you are looking at the compiled C code. I have a assembly version in the .asm which contains the asm version. The c version is just for testing purposes.



Edit: in fact: I call the asm version from c in different ways (for easier debugging). And some of the c-code I didn't convert yet to assembly because its not very performance critical atm (like drawing the menu once or clearing the render part once).



Yeah that makes sense. I'll keep digging. BTW I generated the lookup tables for interpolation last night, that was the easy part. I should have the code done in a day or so.


New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Posted: Tue Mar 30, 2021 5:38 pm
by Ed Minchau


On 3/28/2021 at 12:06 AM, Ed Minchau said:




Anyhow, I'll generate the data tables and subroutines needed for the interpolation and will post them here soon.



OK, so I got the data tables generated; this will all go in ray.h 


Quote




// interpolation tables

//

// first the ray to try; the first 4 are always cast



extern i16 _tryray[] = {

0,256,288,304,128,64,192,32,96,160,224,16,48,80,112,144,

176,208,240,272,8,24,40,56,72,88,104,120,136,152,168,184,

200,216,232,248,264,280,296,4,12,20,28,36,44,52,60,68,

76,84,92,100,108,116,124,132,140,148,156,164,172,180,188,196,

204,212,220,228,236,244,252,260,268,276,284,292,300,2,6,10,

14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74,

78,82,86,90,94,98,102,106,110,114,118,122,126,130,134,138,

142,146,150,154,158,162,166,170,174,178,182,186,190,194,198,202,

206,210,214,218,222,226,230,234,238,242,246,250,254,258,262,266,

270,274,278,282,286,290,294,298,302,1,3,5,7,9,11,13,

15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,

47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,

79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,

111,113,115,117,119,121,123,125,127,129,131,133,135,137,139,141,

143,145,147,149,151,153,155,157,159,161,163,165,167,169,171,173,

175,177,179,181,183,185,187,189,191,193,195,197,199,201,203,205,

207,209,211,213,215,217,219,221,223,225,227,229,231,233,235,237,

239,241,243,245,247,249,251,253,255,257,259,261,263,265,267,269,

271,273,275,277,279,281,283,285,287,289,291,293,295,297,299,301,303

};





// the ray previously calculated to the left of the ray being tried



extern i16 _leftray[] = {

32767,32767,32767,32767,0,0,128,0,64,128,192,0,32,64,96,128,

160,192,224,256,0,16,32,48,64,80,96,112,128,144,160,176,

192,208,224,240,256,272,288,0,8,16,24,32,40,48,56,64,

72,80,88,96,104,112,120,128,136,144,152,160,168,176,184,192,

200,208,216,224,232,240,248,256,264,272,280,288,296,0,4,8,

12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,

76,80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,

140,144,148,152,156,160,164,168,172,176,180,184,188,192,196,200,

204,208,212,216,220,224,228,232,236,240,244,248,252,256,260,264,

268,272,276,280,284,288,292,296,300,0,2,4,6,8,10,12,

14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,

46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,

78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,

110,112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,

142,144,146,148,150,152,154,156,158,160,162,164,166,168,170,172,

174,176,178,180,182,184,186,188,190,192,194,196,198,200,202,204,

206,208,210,212,214,216,218,220,222,224,226,228,230,232,234,236,

238,240,242,244,246,248,250,252,254,256,258,260,262,264,266,268,

270,272,274,276,278,280,282,284,286,288,290,292,294,296,298,300,302

};





// the ray previously calculated to the right of the ray being tried



extern i16 _rightray[] = {

32767,32767,32767,32767,256,128,256,64,128,192,256,32,64,96,128,160,

192,224,256,288,16,32,48,64,80,96,112,128,144,160,176,192,

208,224,240,256,272,288,304,8,16,24,32,40,48,56,64,72,

80,88,96,104,112,120,128,136,144,152,160,168,176,184,192,200,

208,216,224,232,240,248,256,264,272,280,288,296,304,4,8,12,

16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76,

80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,140,

144,148,152,156,160,164,168,172,176,180,184,188,192,196,200,204,

208,212,216,220,224,228,232,236,240,244,248,252,256,260,264,268,

272,276,280,284,288,292,296,300,304,2,4,6,8,10,12,14,

16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,

48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,

80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,

112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,142,

144,146,148,150,152,154,156,158,160,162,164,166,168,170,172,174,

176,178,180,182,184,186,188,190,192,194,196,198,200,202,204,206,

208,210,212,214,216,218,220,222,224,226,228,230,232,234,236,238,

240,242,244,246,248,250,252,254,256,258,260,262,264,266,268,270,

272,274,276,278,280,282,284,286,288,290,292,294,296,298,300,302,304

};





// if the above two rays are on the same map block and face, then this table



// is the number of rays to interpolate +1 ;  also rightray minus leftray



//in this case 1 indicates 255 rays, 0 is no

// interpolation.  This table is also the starting point for the interfrac

// table.  If you are interpolating 127 rays, you start at position 128

// on the interfrac table; if you are interpolating 31 rays you start at

// position 32 on the interfrac table



extern i16 _interpolnum[] = {

0,0,0,0,1,128,128,64,64,64,64,32,32,32,32,32,

32,32,32,32,16,16,16,16,16,16,16,16,16,16,16,16,

16,16,16,16,16,16,16,8,8,8,8,8,8,8,8,8,

8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,

8,8,8,8,8,8,8,8,8,8,8,8,8,4,4,4,

4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,

4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,

4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,

4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,

4,4,4,4,4,4,4,4,4,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2

};



// the different interpolation routines use these fractions; there are

// actually seven tables of fractions in this one page of RAM

// Note that if you are only interpolating one ray (indicated by a 2 in

// the interpolnum table) then you don't need to use this fraction table,

// as the results will just be the average of the leftray and rightray

// parameters.  If you're interpolating 255 values you also don't need 

// this fraction table, as the column number itself would be the fraction





extern u8 _interfrac[]={

0,0,0,128,0,64,128,192,0,32,64,96,128,160,192,224,

0,16,32,48,64,80,96,112,128,144,160,176,192,208,224,240,

0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,

128,136,144,152,160,168,176,184,192,200,208,216,224,232,240,248,

0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,

64,68,72,76,80,84,88,92,96,100,104,108,112,116,120,124,

128,132,136,140,144,148,152,156,160,164,168,172,176,180,184,188,

192,196,200,204,208,212,216,220,224,228,232,236,240,244,248,252,

0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,

32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,

64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,

96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,

128,130,132,134,136,138,140,142,144,146,148,150,152,154,156,158,

160,162,164,166,168,170,172,174,176,178,180,182,184,186,188,190,

192,194,196,198,200,202,204,206,208,210,212,214,216,218,220,222,

224,226,228,230,232,234,236,238,240,242,244,246,248,250,252,254

};




New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Posted: Fri Apr 23, 2021 7:17 am
by Jeffrey

Just a little personal update: I have been very busy lately IRL. But I will be returning to this project. ?

Also, the last few days/weeks I have been working on a (completely) new demo. And I am very excited about it :). ?

Lets just say that the x16 is much more capable than I had thought...


New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Posted: Tue Feb 22, 2022 7:02 pm
by Rob

The Vera can scale its video output:



Using only 70% of the original resolution (226 x 140) still looks decent while cutting the number of pixels to update by half (if 320 x 200 was targeted).

How much would the frame rate increase?

Eliminates the need for drawing borders around the first person view.



It should be easy to implement and verify? I'd like to see it! ?



HSCALE

$9F2A, $2C



VSCALE

$9F2B, $2C


New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Posted: Tue Feb 22, 2022 9:13 pm
by svenvandevelde


On 2/22/2022 at 8:02 PM, Rob said:




The Vera can scale its video output:



Using only 70% of the original resolution (226 x 140) still looks decent while cutting the number of pixels to update by half (if 320 x 200 was targeted).

How much would the frame rate increase?

Eliminates the need for drawing borders around the first person view.



It should be easy to implement and verify? I'd like to see it! ?



HSCALE

$9F2A, $2C



VSCALE

$9F2B, $2C



Very smart remark.


New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Posted: Tue Feb 22, 2022 9:40 pm
by Rob


On 2/22/2022 at 1:13 PM, svenvandevelde said:




Very smart remark.



Thanks.

The real intelligence goes into optimizing the algorithm. This is just a cheat. ?


New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Posted: Tue Feb 22, 2022 11:59 pm
by Ed Minchau


On 2/22/2022 at 12:02 PM, Rob said:




The Vera can scale its video output:



Using only 70% of the original resolution (226 x 140) still looks decent while cutting the number of pixels to update by half (if 320 x 200 was targeted).

How much would the frame rate increase?

Eliminates the need for drawing borders around the first person view.



It should be easy to implement and verify? I'd like to see it! ?



HSCALE

$9F2A, $2C



VSCALE

$9F2B, $2C



Good idea, but there is a drawback.  When you use a VSCALE or HSCALE that is anything other than $80, VERA will still make the resultant image 640x480. Your "pixels" are actually more than one pixel wide or high. When V/HSCALE are $40 (ie 320x240) or $20 (ie 160x120) that isn't a problem, all the "pixels" are just 2x2 or 4x4, respectively. 

But for a scaling factor that isn't a power of two, VERA has to make variable-size pixels. I'm using $33 for Asteroid Commander, giving me a resolution of 255x192. VERA handles this by making half the "pixels" 3 pixels wide, alternating with 2 pixels wide, and the same for the height. So my pixels are either 2x2, or 2x3, or 3x2, or 3x3. With a value of 2C, you'd get a resolution of 220x165, and 200 of your columns would be 3 pixels wide, the other 20 only 2; similarly 150 rows would be 3 pixels tall, the other 15 only two. Basically every 11th row and column is smaller.

A huge advantage of using $33 or below is that you only need one byte for a column index. That simplifies and speeds up a lot of calculations. 

 


New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Posted: Wed Feb 23, 2022 5:08 pm
by Rob


On 2/22/2022 at 3:59 PM, Ed Minchau said:




Good idea, but there is a drawback.  When you use a VSCALE or HSCALE that is anything other than $80, VERA will still make the resultant image 640x480. Your "pixels" are actually more than one pixel wide or high. When V/HSCALE are $40 (ie 320x240) or $20 (ie 160x120) that isn't a problem, all the "pixels" are just 2x2 or 4x4, respectively. 



But for a scaling factor that isn't a power of two, VERA has to make variable-size pixels. I'm using $33 for Asteroid Commander, giving me a resolution of 255x192. VERA handles this by making half the "pixels" 3 pixels wide, alternating with 2 pixels wide, and the same for the height. So my pixels are either 2x2, or 2x3, or 3x2, or 3x3. With a value of 2C, you'd get a resolution of 220x165, and 200 of your columns would be 3 pixels wide, the other 20 only 2; similarly 150 rows would be 3 pixels tall, the other 15 only two. Basically every 11th row and column is smaller.



A huge advantage of using $33 or below is that you only need one byte for a column index. That simplifies and speeds up a lot of calculations. 



 



So, in short, unless you're using either a native resolution or a scaling factor of 2, you will end up with odd-sized pixels that won't look right unless those odd-sized pixels are more evenly dispersed.

I think I'd be okay with a border-less lower resolution option to experience an even smoother frame rate.



I got to this thread while thinking of a PETSCII Wolfenstein engine, so seeing this engine working on even 160 x 120 would still be glorious.