Brute force Double PETSCII search with Python

Chat about anything CX16 related that doesn't fit elsewhere
User avatar
kliepatsch
Posts: 247
Joined: Thu Oct 08, 2020 9:54 pm

Brute force Double PETSCII search with Python

Post by kliepatsch »


Hi,

following up on the brief discussion in this thread


I wanted to see how far I can get with image to double PETSCII conversion. As a first step I wrote a converter to single PETSCII in Python. It works very well and takes about 6 seconds on my machine to find a very good solution. It can pick a custom color palette that works well for the input image.

As the second step, I wanted to write a brute force double PETSCII search and see how far I can optimize it. My brute force method tries all possible combinations of foreground and background characters and picks the closest colors from the palette for each of them, respectively,  and picks the best combinations. Albeit being very inefficient, this is the gold standard to compare other double PETSCII conversion algorithms against.

A naive single-threaded implementation takes about 25 minutes to convert an image. I am afraid that none of the parallelization techniques I tried worked as well as I hoped. Even with 8 processes on my machine with 8 CPUs it still takes more than 10 minutes to complete. It seems that on my laptop, the search algorithm is severely memory bandwidth limited. (I tried the python modules "threading", "multiprocessing" and also "ray", and I found no significant difference in performance between them).

Anyway, find the script plus supporting files in the attachment. To run it, you need Python3 with the following packages installed: numpy, pillow, scipy, matplotlib


brute_force_double_petscii.zip
User avatar
kliepatsch
Posts: 247
Joined: Thu Oct 08, 2020 9:54 pm

Brute force Double PETSCII search with Python

Post by kliepatsch »


The foreground layer can use any of the 256 colors, while the background layer can only use the first 16 colors, but both foreground and background must be set. My script determines a palette and finds the best solutions using that palette. Improvements could be made finding a better/different palette.

See a few examples below:


double_petscii_brute_force_4.png
im03.jpg
double_petscii_brute_force_5.png
im02.jpg
User avatar
desertfish
Posts: 1098
Joined: Tue Aug 25, 2020 8:27 pm
Location: Netherlands

Brute force Double PETSCII search with Python

Post by desertfish »


Hey, great stuff!

Yeah multiprocessing can be a bit iffy, it could be that much time is spent passing the data set across different processes to work on it, and passing the results back.

To avoid this, multi threading could help, but then you're sometimes constrained by Python's Global interpreter lock.

I want to take a look at your code and see if I can tweak it a bit to make it faster!   (for reference it runs in 5 minutes 30 seconds on my Ryzen 2700x)

 

User avatar
kliepatsch
Posts: 247
Joined: Thu Oct 08, 2020 9:54 pm

Brute force Double PETSCII search with Python

Post by kliepatsch »


What times can you get if you increase the "N_processes" variable? Or did you already do that to achieve that time?

User avatar
desertfish
Posts: 1098
Joined: Tue Aug 25, 2020 8:27 pm
Location: Netherlands

Brute force Double PETSCII search with Python

Post by desertfish »


Yeah it didn't matter much. I suspect it is taking a lot of overhead somewhere to pickle/unpickle the data passed between the processes. And indeed, switching to multiprocessing.pool.ThreadPool didn't improve things.

I see no obvious big improvements to the code as-is.

But I think it would be worthwhile to investigate what happens if you split the problem up differently. Rather than dividing it over the tokens to search for, I suggest trying an approach where you split up the source image in strips of 8 pixels high and process each strip in a worker process. So one worker processes one line of characters.

I realize this will require quite a bit of rewriting...

User avatar
kliepatsch
Posts: 247
Joined: Thu Oct 08, 2020 9:54 pm

Brute force Double PETSCII search with Python

Post by kliepatsch »


I doubt the limiting factor is data transfer between the processes, as the parallelization is almost trivial. And in fact, I think that each process holds a copy of all the relevant variables (I think they are even initialized in every process, e.g. the image is loaded, the palette determined etc... but I am not totally sure about that) -- so there is no communication necessary, except for the final results. I have tried parallelization with the "ray" library, which claims to solve issues with large numeric datasets and accessing identical data from multiple processes, but it didn't help.

I have thought the same about splitting the problem up differently. I.e. doing more in-place computation and less memory-intensive tasks. One obvious way to achieve this would be to compute each 8x8 pixel tile individually. That way, all necessary variables could fit into CPU cache, so memory accesses should be a lot faster. With that in mind, I ran a few tests on a very simple problem earlier, but couldn't find any noticeable difference between different ways of splitting it up. I guess the present version is roughly as efficient as I can make it. Improvements will be about using better algorithms.

Well, if there *is* an example where you can actually see different performance for different ways of splitting it up, I will be happy to see it!

CursorKeys
Posts: 82
Joined: Tue Jan 12, 2021 9:52 pm

Brute force Double PETSCII search with Python

Post by CursorKeys »


Wow really cool. Did I understand you try all combination of pixels to all combinations of characters, or do you do this per 8x8 square?   The 8x8 square method is what I do, it's not as sophisticated as yours. Your 2petscii images look really close to the original.  I needed to click on it and check it twice, to see it was petscii, and not a bitmap.  ?.    I did not do 256 color ones yet,  but in your results I see the improvements, they are significant.



Really fun seeing.  I'll test your program when I have some time, I like to see how it runs on my laptop. 



To optimise, for me a really fun experiment would be to go all out, and do like this.

Use the GFX card to do the "mass parallel threading", like with Cuda or so.

This is how they speed up deep learning for example.  Maybe some day ?



Great work so far ?






 

Ed Minchau
Posts: 503
Joined: Sat Jul 11, 2020 3:30 pm

Brute force Double PETSCII search with Python

Post by Ed Minchau »


Are you starting with pictures that are bigger than 640x480 resolution?  The ones I see above are 1000x750.  

This is great.  If we can do this with the PETSCII font, imagine what we can do with a custom font.  PETSCII wasn't originally designed for flipping vertically and horizontally (i.e. we don't need four tiles to be short curves, only one), and the default font on this machine has been modified for easier display on CRT terminals like TV screens; there are very few tiles with a ...010... anywhere in the current font.  A custom font (at least for layer 1) could improve these images significantly.  I think I could re-do the video demos with this technique and cut the video image data by 75% while doubling the resolution; I might be able to get 24fps for a 320x136 video with this technique.

User avatar
kliepatsch
Posts: 247
Joined: Thu Oct 08, 2020 9:54 pm

Brute force Double PETSCII search with Python

Post by kliepatsch »


Thanks. Yes, the input image is scaled down (and cropped if necessary) to 640x480 pixels. It's then subdivided into 8x8 pixel areas. Then the best match for each 8x8 tile is sought. My implementation does all the 8x8 areas at once, making heavy use of numpy expressions. Numpy is one of, if not the most important Python library, offering highly optimized numerical routines and great flexibility.

Yes, I also thought about using the GPU, but haven't tried yet, but could provide significant speedup. The operations that do the heavy lifting are just arithmetic operations and not many comparisons and branching, so it should be possible to accelerate the method with a GPU, I think.

I think it would be desirable to find an algorithm which finds decent solutions with a fraction of computational effort. My idea is to have a two-stage algorithm which tries all the foreground bitmasks and selects a couple of them, say, 10, according to some clever criteria, and then systematically tries all background combinations for each of the 10 foreground candidates. This would already reduce the computational effort by a factor of 20 to 25 over the dumb brute-force method.

CursorKeys
Posts: 82
Joined: Tue Jan 12, 2021 9:52 pm

Brute force Double PETSCII search with Python

Post by CursorKeys »


Not sure if it helps. But one other speed optimization, that I've been thinking of is slashing the resolution in the computation phase.



So, lets say, we do a 640x480 picture, and divide it in 8x8 pixels "blocks".  But instead we scratch that, half the resolution to  320x200, and compare with "pretend" 4x4 pixels blocks. (scale down the font)

Then when done, we still have the same (ish) result, but the calculation time is slashed to 1/4th or so.



I think it may work, the reason is the actual pixel perfect compare is maybe not needed since the end result will never be a perfect match. (unless for some special cases).



Anyway, interesting little project, I keep following it, to see where it leads ?

Post Reply