New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Ed Minchau · Post by **Ed Minchau** » Thu Mar 25, 2021 4:17 am

Thinking more about interpolation:

Suppose pixel column 0 and pixel column 8 are pointing at the same map block and the same face; say both pointing at the north side of map block 0xFE. Suppose further that column 0 is pointing to (group 6) the leftmost column of the image associated with that map block (call it image column 0) and pixel column 8 is pointing to (group 6) an image column about 2/3 of the way to the right, image column 42. And let's say for pixel column 0 the group 7 data is 511 (it's really far away) and for pixel column 8 the group 7 data is 502 (a little closer).

So, we've got 7 pixel columns to interpolate, and two ranges: 42-0= 42 for the image column range and 502-511= -9 for the height or distance range.

You've already got a macro or subroutine that multiplies a 16 bit number by an 8 bit number representing the range 0/256 to 255/256. So for a 7 point interpolation like this, you'd have a table of 7 numbers to multiply these ranges by 1/8, 2/8, 3/8... 7/8. So you use that macro to multiply the lookup by the range and add the value stored in the leftmost pixel column's group 6/7 values.

So pixel column 1's image column would be pixel column 0's image column (0) plus 1/8 times the range (42) = 5 and the height would be 511 + (1/8)(-9) = 510. All the others between pixel column 2 and pixel column 7 would be just that easy, just increment the interpolation pointer as you increment the pixel column pointer.

A 15 pixel range would need a table of 15 numbers, a 31 pixel range would need a table of 31 numbers... you could have these multiplication lookup tables for interpolating 1, 3, 7, 15, 31, 63, and 127 pixel columns all in one 256 byte page of RAM.

So, total overhead: 10 or 11 bytes for each pixel column; in practical terms you'd have 512 bytes set aside for each variable even though there's only 304 columns. The remaining 192 bytes can remain empty or be used for other purposes such as the linear interpolation tables. So that's 20-22 pages of RAM, 5 to 5.5 kb, depending on whether or not you keep groups 5 and 6 separate. You could use slightly less memory by cramming it together more, at the expense of speed.

Ed Minchau · Post by **Ed Minchau** » Thu Mar 25, 2021 5:23 am

It just occurred to me that you wouldn't need a fraction lookup table at all to interpolate 255 pixel columns; the fraction to multiply by would be the column number itself. Also, instead of raycasting column 303 it might work better to cast the imaginary column 304 to make the binary split work better.

ZeroByte · Post by **ZeroByte** » Thu Mar 25, 2021 4:26 pm

This algorithm sounds pretty good. I've been doing a little thinking about it myself, and perhaps a recursive algorithm is in order. Essentially you call Render(MinX, MaxX, MyRay) and it works as follows:

Interpolate the leftmost column (TestCol) of the face the ray hit, clamp TestCol to MinX and cast a ray (testRay) at TestCol
- If TestCol hits the same face,
  - render from that column to MyRayX.
  - new range is MinX .. TestCol
  - Cast a new TestRay at the center of the new range
  - call Render with the new range and testRay
- Else call Render with MinX=MinX, MaxX = MyRay.X, TestRay
  - ~~Render returns the rightmost column it rendered - set MinX = returned column+1~~
  - ~~(it is assumed that~~ everything from old MinX to MyRay.X has now been completely rendered.

Do the same thing going rightwards from MyRay.X
- i.e. if the right end of the face is visible, render from MyRay to that column, and recursive call to render on the remaining rightmost space, else render the rightmost space, then render from MyRay.X to the leftmost column not rendered.

I think the hidden challenge in this method is going to be in providing stack-like behavior for the MinX/MaxX, MyRay values so they don't get lost during the call to yourself. Another optimization would probably be to choose some minimum number of columns where it's faster to just brute force that left to right, and just do it if MinX/MinY are smaller than this threshold.

This algorithm should be very fast when walls are close to the camera, and might be slower when there's a lot of clutter in the wall topology.

desertfish · Post by **desertfish** » Thu Mar 25, 2021 5:27 pm

won't this break when you add non-wall elements such as monsters, projectiles

Ed Minchau · Post by **Ed Minchau** » Thu Mar 25, 2021 5:41 pm

8 minutes ago, desertfish said:

won't this break when you add non-wall elements such as monsters, projectiles

In the original game the bad guys/ projectiles were all sprites. That's going to take some trickery to make it work here because we can't resize sprites.

SlithyMatt · Post by **SlithyMatt** » Thu Mar 25, 2021 5:58 pm

14 minutes ago, Ed Minchau said:

In the original game the bad guys/ projectiles were all sprites. That's going to take some trickery to make it work here because we can't resize sprites.

No, but the sprites did not have a ton of animation to them, so you can easily store sprites at different sizes and switch between them when they approach. Individual sprites can be up to 64x64 and you can easily use multiple sprites for the same figure if they need to be bigger.

ZeroByte · Post by **ZeroByte** » Thu Mar 25, 2021 10:55 pm

5 hours ago, desertfish said:

won't this break when you add non-wall elements such as monsters, projectiles

Z-buffer each pixel column, then draw the sprites on the other BG layer (assuming there's enough VRAM)?

ZeroByte · Post by **ZeroByte** » Thu Mar 25, 2021 10:56 pm

4 hours ago, SlithyMatt said:

No, but the sprites did not have a ton of animation to them

I read through that part of the source code last weekend - it's interesting how the actors are done - the animations are part of the state machine logic - most states only have 2 or 3 frames of animation.

Jeffrey · Post by **Jeffrey** » Fri Mar 26, 2021 3:03 am

FYI: the next couple of days I will be very busy IRL ?

Ed Minchau · Post by **Ed Minchau** » Fri Mar 26, 2021 7:14 am

14 hours ago, ZeroByte said:

This algorithm sounds pretty good. I've been doing a little thinking about it myself, and perhaps a recursive algorithm is in order. Essentially you call Render(MinX, MaxX, MyRay) and it works as follows:

Interpolate the leftmost column (TestCol) of the face the ray hit, clamp TestCol to MinX and cast a ray (testRay) at TestCol

If TestCol hits the same face,

render from that column to MyRayX.

new range is MinX .. TestCol

Cast a new TestRay at the center of the new range

call Render with the new range and testRay

Else call Render with MinX=MinX, MaxX = MyRay.X, TestRay

~~Render returns the rightmost column it rendered - set MinX = returned column+1~~

~~(it is assumed that~~ everything from old MinX to MyRay.X has now been completely rendered.

Do the same thing going rightwards from MyRay.X

i.e. if the right end of the face is visible, render from MyRay to that column, and recursive call to render on the remaining rightmost space, else render the rightmost space, then render from MyRay.X to the leftmost column not rendered.

I think the hidden challenge in this method is going to be in providing stack-like behavior for the MinX/MaxX, MyRay values so they don't get lost during the call to yourself. Another optimization would probably be to choose some minimum number of columns where it's faster to just brute force that left to right, and just do it if MinX/MinY are smaller than this threshold.

This algorithm should be very fast when walls are close to the camera, and might be slower when there's a lot of clutter in the wall topology.

I think I laid out something very similar, except instead of calculating the center of the new range every time, I'd have a precalculated list of which pixel columns to do in which order. You wouldn't be calculating the leftmost column of the face, you'd be interpolating rays between the rays you'd already cast. The column height or distance (however you want to store it, same thing really) is just interpolated between two rays that happen to be on the same face. Likewise, choosing which column of the image that corresponds to (or the xintercept/yintercept, same thing really) is an interpolation between the intercepts/columns. You wouldn't need recursion, just a way of seeing whether the next one in the list has already been mooted by interpolation, and the high byte of the distance variable indicates that.

Basically, if you have two rays that hit the same wall, then all the rays between those two are calculated with one multiplication and one addition for the image column choice, and then one multiplication and one addition for the distance. It's pretty much the same number of cycles as you need for just the final step of a raycast; converting the distance into a distance relative to the direction the player faces uses two multiplications, lookups to sin and cosine tables, and an addition. It would indeed be fast, if you were very close to a wall you might only cast five or six rays and just interpolate all the rest. But even if there's lots of walls in view, you'd still probably only have to cast 30 or 40 rays total before the entire screen was filled in.

The fastest you can push the data to VERA for the whole image is about 260000 cycles, so 30fps wouldn't leave any time to calculate anything else. 20 fps is 400000 cycles though, and I think you'll be able to hit that easily with this method, with time left over to calculate bad guys and pew pews and sound effects etc.