I think a load format for banked data is very useful and it should be in the ROM.
That way we can just load a file and have it placed in memory without having to re-invent the wheel by providing custom loaders. Does it prevent anyone from rolling their own? Not at all, you are free to load a plain binary and load data and program chunks from the file system as you see fit. I on my side prefer to have a binary file created by a tool and not need to bother how it gets loaded into memory, even when I need to use the bank system.
I like the ELF format myself as it rather simple to use when the requirements are simple, yet it provides a lot of flexibility and can be quite expressive when needed. Most of the complexity comes when adding debug sections or linkable sections, but that is not what we are doing here. I will describe it a bit below to give an idea what it is and show that it actually have a lot of similarities to the proposed format.
An ELF file has an ELF header that starts with a magic number (actually the string ELF) that identifies it as ELF. You only need to sanity check that it is 32-bit ELF, inspect the processor type and values related to program headers that describe where the program headers are in the file, their size and how many they are.
Each program header describes a chunk of memory to be loaded. The headers appear after each other. A header can describe an area to be cleared, or it can point to actual raw data to be loaded and it can specify a start address.
Thus, in a file you have:
-----------------
ELF Header
---------------
Program header 1
----------------
Program header 2
----------------
...
----------------
Program header n
----------------
Load data
----------------
The file can also contain sections, mainly for debug or vendor specific stuff, but you totally ignore them. Even if sections are in the file, you will automatically ignore them when you use the file offsets to locate the program headers and data.
For a banked system you could use a program header for each bank. The address is 32-bit so you can easily combine a bank number with the lower address.
If you compare this to the proposed format you will see that the structure is very similar. The ELF header is 52 bytes (in 32-bit ELF) compared to 64 in the proposed format. Each program header is 32 bytes in ELF and 8 in the proposed format.
The main difference is that in ELF the program headers are located after each other and contains a field that gives the file offset of the actual data.
A loader can interpret relevant fields and do some simple sanity checking, it does not need to be any more complicated. Here is a rough load algorithm:
Open file, read ELF header and check the magic number, sanity check some fields in the ELF header to see that it is 32-bit ELF, for 6502 and that there is at least one program header. Store number of program headers, the size and keep a pointer to the current one, initialize it with the offset of the first program header as given in the ELF header.
Seek to the next program header, load it and inspect type, read start address (if program type), size, offset in file to actual data.
Load raw data from file, if the specified area is larger than what is in file, fill the rest with zero.
Step to next program header, decrement header counter, if more program headers then go to 2.
Jump to the start address of the program.
The main advantage of using an existing format like ELF is that it makes it possible to use other tools for inspecting and altering the executable file. It also avoids re-inventing the wheel again as someone already gave this some thought. We only need to specify how we use the format for the CX16.