In the HTTP spec it says:
3.3 use of multipart/form-data
The definition of multipart/form-data is included in section 7. A
boundary is selected that does not occur in any of the data. (This
selection is sometimes done probabilisticly.)
Probabilisticly? Who wrote this shit? If you choose a boundary probabilisticly then given enough usage you will almost definitely, at some point in time, pick a boundary which will occur naturally in the data (though one alternative, choosing a boundary deterministically after scanning through the data, is not too appealing either). This sort of thing just shouldn’t be allowed to happen (even if the probability is low) because it can easily be prevented. There are two perfectly suited techniques to avoid the problem that this could cause:
- Mandate a Content-length header in each part (which obviates the need for a boundary tag anyway) OR
- Use content transfer encoding (escaping) so that the boundary cannot possibly occur in the encoded content
Neither of these techniques would be particularly difficult to implement or costly in terms of processing time or bandwidth (considering that Content-length for the entire body will generally need to be calculated anyway). The first one seems to be allowed by current standards, but not recommended anywhere and certainly not mandated (and arguably, doesn’t really solve the problem unless the standards are updated to say that it does – since the boundary tag should arguably be recognized wherever it occurs, even if it is partway through a body part according to that body part’s known content length). The second one has the problem that the only defined encoding which would be suitable is base64, and that incurs a 33% bandwidth overhead.
It’s really annoying that this sort of stupid blunder can make it into a standard which is so widely used. At least it seems it can’t lead to any security vulnerabilities (I think) but I pity the poor sucker whose file-upload-via-POST is failing due to a shoddy standard which says it’s ok that a boundary tag could possibly occur within the content.