BASIC: Converting/Optimizing a simple BASIC program from another Commodore platform to the X16...

voltage · Post by **voltage** » Wed Aug 25, 2021 7:34 am

That's a huge speed boost. Are you using fixed point math in place of the default rom routines? That's what I would have attempted.

Scott Robison · Post by **Scott Robison** » Wed Aug 25, 2021 2:47 pm

8 hours ago, Snickers11001001 said:

So who wants to guess what I did tonight in order to accomplish this with the X16 conversion?!!

Replaced TI$ with "000347"? ?

ZeroByte · Post by **ZeroByte** » Wed Aug 25, 2021 2:48 pm

I'd guess that maybe you switched from using simple variables to using an array, with that array being the first variable initialized....

Scott Robison · Post by **Scott Robison** » Wed Aug 25, 2021 2:50 pm

Just now, ZeroByte said:

I'd guess that maybe you switched from using simple variables to using an array, with that array being the first variable initialized....

I've not benchmarked that, but it feels to me like that would only slow things down if still using it in BASIC computations. Math still has to be done to compute array index values on top of just getting at the data in the individual values.

ZeroByte · Post by **ZeroByte** » Wed Aug 25, 2021 2:58 pm

It all depends on how much of a penalty it is for the parser to interpret the array references vs. a potentially faster (I have no idea - just a guess) means of accessing the memory once it's known that an array is being used.

An array can be referenced with a memory offset so you can jump straight to the right spot in memory whereas the simple variables are located via a linear search, which is going to incur time loss in the various logic statements (all of the "is this it? Nope." stuff. Granted, using Y as a guaranteed one-shot hit for the most common operations is probably even better than using an array, as the array index decode is going to incur multiplication.

I'm not a BASIC guru by any stretch, so I really don't know what sort of optimization was available to nearly double the performance aside from things like embedding an assembly routine in DATA statements and POKEing those into memory at the get-go. But that really isn't in the spirit of this exercise, so I'm quite sure that's not it.

The only other idea I have is that there was a way to simplify the equations with faster math - kind of like the famous fast 1/sqrt() function in Quake3 Arena... except for BASIC. Honestly, this is more likely to be what he's done.

ZeroByte · Post by **ZeroByte** » Wed Aug 25, 2021 3:02 pm

13 minutes ago, Scott Robison said:

Replaced TI$ with "000347"?

Or added a bunch of DATA statements and replaced the main loop with READ X : READ Y : PSET X,Y,.

Snickers11001001 · Post by **Snickers11001001** » Wed Aug 25, 2021 3:42 pm

On 8/25/2021 at 1:34 AM, voltage said:

That's a huge speed boost. Are you using fixed point math in place of the default rom routines? That's what I would have attempted.

I've actually thought of something along those lines but with sine/cosine table in ML with a little USR() routine that coughs up a lookup. But that's not quite on the menu (yet!).

On 8/25/2021 at 8:47 AM, Scott Robison said:

Replaced TI$ with "000347"? ?

LOL!

On 8/25/2021 at 8:48 AM, ZeroByte said:

I'd guess that maybe you switched from using simple variables to using an array, with that array being the first variable initialized....

I've tried this, and in game type programs especially it actually helps.   Even the first array is slower than about the first 10 or 12 initialized scalar variables because of the interpreter character fetch (parsing) overhead. If you have a variable array "V(n)" and you refer to it in the program with V(1) its got a bunch of extra fetches and a numeric to FAC conversion to do, plus some added overhead because even though BASIC finds your first array at ARYTAB it also runs some extra tests to compare the index you supplied and the dimension of the variable as dimensioned. Also, once there are more than 10 values 'V(0) to V(9)' there's a linear increase in time as you have additional digits of the array index value to parse, i.e., 'V(5)' is faster to parse than 'V(27)'. Alternatively, if you use a regular simple/scaler variable to tell BASIC which element you want e.g., N=A(V), you incur not an extra cost for the 'fetch' of the variable that contains the element number. And its maddening to program with! But it does help create a rather more consistent variable fetch time than just using 24 or 30 scalar variables.

But that's not what I did here.

1.  I did not use any assembly/machine code routines.   There are no SYS calls, no USR() calls, no pokes and peeks.

2.  I did not add data statements.

3.  The program does not load anything from the file system.   No external data is loaded from any source.

4.   The previously fastest program version (the one that clocked in at 6 minutes, 52 seconds above) was 709 bytes long.  This version is 878 bytes long. 42 of the new bytes are a print statement and string; and 23 more bytes are fluff to display a 'progress bar'...

5.   Its not using turbo mode or any modification of the emulator, ROMs, host operating system or anything like that.

I'll write it up tonight when I get back from some medical stuff today, but it comes down to looking hard at the precursor calculation that gets fed into that COSINES stack and which is based on 'Pythagorean Addition' of our outer and inner loop indexing variable values.

Bottom line the program spends about 1 minute and 39 seconds preparing a lookup table as a 2 dimension array, and the rest of the time plotting to the screen.

More this evening. Cheers!

Scott Robison · Post by **Scott Robison** » Wed Aug 25, 2021 3:45 pm

35 minutes ago, ZeroByte said:

It all depends on how much of a penalty it is for the parser to interpret the array references vs. a potentially faster (I have no idea - just a guess) means of accessing the memory once it's known that an array is being used.

An array can be referenced with a memory offset so you can jump straight to the right spot in memory whereas the simple variables are located via a linear search, which is going to incur time loss in the various logic statements (all of the "is this it? Nope." stuff. Granted, using Y as a guaranteed one-shot hit for the most common operations is probably even better than using an array, as the array index decode is going to incur multiplication.

I'm not a BASIC guru by any stretch, so I really don't know what sort of optimization was available to nearly double the performance aside from things like embedding an assembly routine in DATA statements and POKEing those into memory at the get-go. But that really isn't in the spirit of this exercise, so I'm quite sure that's not it.

The only other idea I have is that there was a way to simplify the equations with faster math - kind of like the famous fast 1/sqrt() function in Quake3 Arena... except for BASIC. Honestly, this is more likely to be what he's done.

In Microsoft BASIC for 6502, as used in Commodore legacy machines, a variable reference is stored as it's literal name, so an array of integers named IA% is stored as the three characters IA%.

To access a variable at an index, the reference is stored in its fully expanded form: IA%(index) (where index can be a literal or another variable).

So to access a scalar variable, you have to read the name and look for it in the variable table.

To access an element of an array, you have to read the name and look for it in the variable table. Then you need to get the size of each dimension of the array. Then you have to read the index expressions from the variable use. Then you have to do math to multiply and add each index (if a multi dimensional array) to find the final address relative to the start of the array in memory.

If your index is a constant, BASIC has to interpret it at runtime (convert the string "10" to the number 10, for example). Or you can use a variable, which means looking up another variable in the vartab.

In any case: the multiple levels of indirection will make a subscripted array access slower than a single scalar variable. When you really need an array, that's great, but scalar is faster when you don't need an array. Below screen show shows 10 seconds accessing array indexes in a loop and 3 seconds for scalars. I could replace 42 & 47 with variables to make them faster, but then they'd still require lookup of an extra variable each.

Scott Robison · Post by **Scott Robison** » Wed Aug 25, 2021 3:47 pm

Note: Replacing PRINT TI$ with PRINT TI came out to 641 jiffies for array vs 209 for scalars.

Scott Robison · Post by **Scott Robison** » Wed Aug 25, 2021 11:28 pm

Here is a better benchmark (I think). First the program:

And the results:

The first number is just how much time it takes to do a for next loop 1000 times without anything in the body. Used to subtract out "overhead"

The second number is how long it takes to interpret two constants, use them to lookup locations by index in an array, add the two locations, and assign to x (1000 times).

The third number is how long it takes to lookup locations by variable index in an array, add the two locations, and assign to x (1000 times).

The fourth is how long it takes to interpret two constants, add them together, and assign to x (1000 times).

The fifth is how long it takes to add two variables and assign to x (1000 times).

Working backward, we know it takes 156 jiffies to lookup three variables (b, c, and x), read two of them, add them, and assign one. Just for simplicity, let's call that 7 operations per iteration, 7000 operations total. That gives us an average of 371 microseconds per operation.

The previous step (adding constants) took 170 jiffies longer to do 2000 conversions from digits to floating point which it did in lieu of 2000 variable lookups. This will vary based on the size of the number, but we'll just call it 1416 extra microseconds to convert vs lookup.

Before that it took 248 jiffies longer to do math with array lookup vs simple variables. Two extra array index lookups per loop iteration equates to 2066 microseconds to do an array lookup. This makes sense on 6502 because each array lookup involves a multiplication plus an extra variable lookup.

The worst case is the first one, which takes 168 jiffies longer to do 2000 lookups with constant index values.

The first case takes almost the same amount of extra time over the second as the third does over the fourth, so converting constants to floating point is consistent.