Only for the 32-bit CPUs used in microcontrollers, which may have clock frequencies under 100 MHz and which may lack a cache hierarchy, the cost differences between many kinds of operations may collapse.
For instance even for not too old 32-bit CPUs it is right to classify the instructions in the following groups, based on their cost in clock cycles:
1. Simple integer operations with operands in registers
2. Loads from the L1 cache memory and simple floating-point operations, like addition and multiplication
3. Loads from the L2 cache memory, division (integer or floating-point), square root and mispredicted branches
4. Loads from the L3 cache memory and atomic read-modify-write operations (like atomic exchange, atomic fetch-and-add, atomic compare-and-swap)
5. Loads from the main memory
This classification matches the chart from TFA.
The transistors get smaller every year. The capacitors, like you say, don't anymore. At some point those 5 extra transistors will be cheaper than the capacitor, unless Moore's Law well and truly bites it.
Simulation theory is dead.
Modern C++ CPUs as in LISP CPUs or as in Verilog CPUs?
so the physical layout has a bit vector with one bit for each optional. and a popcnt over that bitvector (masked up to the value we're interested in) will give the actual slot to look into?
would also make sense to reorder / bucket fields by (byte) size
if you want to do that in any low level language (rust, c++) you have to deviate from their standard syntax for optionals, and you have to manually keep track of slot order. but for domains with many optional/default values, this amy really reduce cache pressure, no?
In higher level languages you can fake the effect (with flyweight facades), so from python such a packed "dataclass"-like class can look neat and clean. however at the low level there is no abstraction that allows to create your own data layout.
at least I didn't find anything yet.
The article in general is interesting since it gives a rough idea of cost of operations relative one to each other but since CPUs are much more complex beasts it also gives us an incomplete picture, and if you're unaware of it the chance is that you will use it derive incomplete conclusions from it - understanding performance implications of a software running on an actual hardware is much more involved than what one article can fit.