re: Fix reliability assumption
Ryan Pierce / Townsend Analytics Ltd. / Archipelago LLC
12 Jan 1998 4:05PM ET> I was wondering if anyone could tell me what assumptions about reliability the spec makes, eg, do we assume that the transport mechanism is reliable (eg TCP/IP gaurantees data will be delivered or the connection will be dropped). Without this assumption the fix spec does not seem sufficient.
>
> The reliability assumption I'm coming to is that the fix session protocol assumes that messages will be delivered intact and in order from one party to another (with possible data corruption). Retransmission of messages is for the case when a message is corrupted in transit or to handle a software fault (ie bug) on either end.
This brings up a question I've always wondered about. FIX makes a slightly more strict set of assumptions than IP makes about its underlying transport mechanism. IP assumes best-effort delivery - that a datagram may be dropped, corrupted, delivered out of order, or duplicated.
FIX handles dropped messages quite well. While the checksum is not as robust as one would hope for catching garbled messages (reordering bytes will produce the same checksum) chaning the spec this far along to require a CRC would probably cause more problems than benefits. Out of order messages should be handled properly, though they would slow things down due to all the resend requests and resends. (TCP handles out of order messages quite elegantly, but using that approach would complicate the protocol, as well as prevents the firm from gap filling over old orders.) But a duplicated message would, per the spec, require the party that received it to log out.
I would think that FIX should work fine over an unreliable transport mechanism, if:
* The message reordering, drops and corruptions are not too great. Any protocol will slow down and gag should the errors become too high, although I think FIX may be more sensitive to this than other protocols like TCP.
* Individual message corruption happens infrequently enough that the 1/256 chance that a random glitch will produce a message with valid checksum becomes a significant problem, and absolutely no byte reordering takes place inside a message, given the 100% chance of the corrupted message passing checksum.
* The transport layer must not duplicate any messages.
I would imagine that encapsulating FIX in something like UDP would probably fail over WANs or the Internet since duplication of datagrams may occur.
It seems that the only thing barring FIX from the same assumptions as IP is the defined behavior of logging out upon receiving a duplicated message. I'm wondering why this is the case instead of logging an error, ignoring the message, and continuing, though I wouldn't be surprised if there's something I'm overlooking which would make doing so a bad idea.
Ryan Pierce
Townsend Analytics Ltd.