Answering your "// @NOTE(final): Why does MSVC have no _InterlockedExchange64 on x86???" comment in source code - that's because InterlockedXYZ functions exists only when there are simple CPU instruction that does what you need. x86 cannot do 64-bit atomic operation (LOCK + mov/add/.. opcode). You can do that only with cmpxchg8b and loop. That is what actual "InterlockedExchange64" function does, not intrinsic.
And cmpxchg8b function did not exist in original x86 instruction set. Only since Pentium Pro.
Btw, I think your fplAtomicAddS64 function for 32-bit windows is wrong. First it tries to cmpxchg value with 0? What if original value is not 0?
That explains why there is no exchange for 64-bit. Thanks!
Also it seems that your question solved it magically -> I removed the #if defined(FPL_ARCH_X64) and use _InterlockedExchange64 or _InterlockedExchangeAdd64 always. It seems it got fixed after i corrected the type cast to LONG64.
But regarding your question.
InterlockedCompareExchange returns always the initial value, regardless of the change.
If i compare and exchange a value with 0 and 0 i get always the old value and replacing 0 zero 0 does not change the "actual" result - but technically it may change it to zero so it do a few extra cycles.