Just so you know, x86 machine-code is now a "high-level" language. What instructions say, and what they do, are very different things.
I mention this because of those commenting on this post on OpenSSL's "constant-time" calculations, designed to avoid revealing secrets due to variations in compute time. The major comment is that it's hard to do this perfectly in C. My response is that it's hard to do this even in x86 machine code.
Consider registers, for example. Everyone knows that the 32-bit x86 was limited to 8 registers, while 64-bit expanded that to 16 registers. This isn't actually true. The latest Intel processors have 168 registers. The name of the register in x86 code is really just a variable name, similar to how variables work in high-level languages.
So many registers are needed because the processor has 300 instructions "in flight" at any point in time in various stages of execution. It rearranges these instructions, executing them out-of-order. Everyone knows that processors can execute things slightly out-of-order, but that's understated. Today's processors are massively out-of-order.
Consider the traditional branch pair of a CMP (compare) followed by a JMPcc (conditional jump). While this is defined as two separate instructions as far as we humans are concerned, it's now a single instruction as far as the processor is concerned.
Consider the "xor eax, eax" instruction, which is how we've traditionally cleared registers. This is never executed as an instruction, but just marks "eax" as no longer used, so that the next time an instructions needs the register, to allocate a new (zeroed) register from that pool of 168 registers.
Consider "mov eax, ebx". Again, this doesn't do anything, except rename the register as far as the processor is concerned, so that from this point on, what was referred to as ebx is now eax.
The processor has to stop and wait 5 clock cycles to read something from L1 cache, 12 cycles for L2 cache, or 30 cycles for L3 cache. But because the processor is massively out-of-order, I can continue executing instructions in the future that don't depend upon this memory read. This includes other memory reads. Inside the CPU, the results always appear as if the processor executed everything in-order, but outside the CPU, things happen in strange order.
This means any attempt to get smooth, predictable execution out of the processor is very difficult. That means "side-channel" attacks on x86 leaking software crypto secrets may always be with us.
One solution to these problems is the CMOV, "conditional move", instruction. It's like a normal "MOV" instruction, but succeeds or fails based on condition flags. It can be used in some cases to replace branches, which makes pipelined code more efficient in some cases. Currently, it takes constant time. When moving from memory, it still waits for data to arrive, even when it knows it's going to throw it away. As Linus Torvalds famously pointed out, CMOV doesn't always speed up code. However, that's not the point here -- it does make code execution time more predictable. But, at the same time, Intel can arbitrarily change the behavior on future processors, making it less predictable.
The upshot is this: Intel's x86 is a high-level language. Coding everything up according to Agner Fog's instruction timings still won't produce the predictable, constant-time code you are looking for. There may be some solutions, like using CMOV, but it will take research.