On 64-bit CPUs with no-execute support and non-snooping icache, such as
970 or POWER4, we have a software mechanism to ensure coherency of the
cache (using exec faults when needed).
This was broken due to a logic error when the code was rewritten
from assembly to C, previously the assembly code did: