The process of reading a particular file in a FAT file system:
Find the master boot record to identify the partition table type.
Is it truly MBR or is it really a GPT? These have different ways of identifying partitions, so a robust system has to handle both, though I suspect we only care about the "true MBR" case and not the GPT.
Is there more than one partition on the media of a compatible type? Pick one.
What is the size of the partition? This is the "official" way to identify whether it is FAT12/FAT16/FAT32.
Get the boot sector of the partition. Use the values in it to find the FAT, root directory, and cluster size.
Use a directory to find a file of interest, which will tell us the size of the file and the location of the first cluster of the file.
Read the cluster.
Use the FAT to find the next cluster. If not end of tile, go to 6.
Most of the items have variables in them that you do not know how to process until after you've read them, and this is just the high level overview.
Can an FPGA be tasked with handling this? Sure. It doesn't require a full blown general purpose CPU. But it is certainly far more than just one or even a few logic gates.
Could it be possible to implement this so that the CPU sends the commands and tells the FPGA the eventual destination of the next X bytes in advance so that it doesn't have to handle the final delivery when VRAM is the destination? Yes, and that would be less complicated than the co-processor (where a co-processor is less than a full CPU but more than a few logic gates). There likely is not enough space left in the FPGA for even that, though I do not know. And it would have difficulty dealing with error conditions.
Then of course we have the question "what if the bytes in the file must be processed in some way before spewing them into VRAM?" Many / most file formats include at minimum a header of some sort to identify the contents of the file. Compression is often used to make files smaller. Some define a program in some virtual machine. For anything more complex than "raw sequence of bytes already in the format you want for VRAM" you'd have to have the CPU read the data so that some processing can be done (decompression for PNG files, as an example, or running a VM over it for true type font byte codes, processing the header to know where to seek to so that substreams of data can be processed correctly, etc).
All these things require logic. You are correct in what is theoretically possible, though I think your estimates of how many resources would be required are on the low side given the flexibility built into the FAT32 format. One can mandate a lot of the variables be set to specific constant values to limit the complexity for particular use cases, but you can never get rid of it completely based on my experience.
General purpose systems provide flexibility, though they cannot offer an optimal solution to every problem.