Perils of structured bindings

Published on . Tagged with C++.
a small dog near deep crack

C++17 introduced structured bindings:

auto [a, b] = func();

It looks innocent and convenient. You can think of it as:

auto __tmp = func();
// a and b are lvalue references to __tmp's members

There are two distinct steps here. First, func() produces a temporary. At this level, copy elision usually applies. Second, the bindings are initialized from that temporary. Here, things get less friendly.

Consider this example:

B func_ext() {
    auto [a, b] = func();   // RVO here
    return b;               // no NRVO here!
}

Imagine b is a big structure. Returning b looks cheap. It isn’t. b is a named object. It is no longer the function’s return value. The compiler has to copy or move it.

Now compare this with a more traditional approach:

B func_ext() {
    A a;
    B b;
    func(&a, &b);
    return b;               // NRVO here
}

Here, b is constructed directly as the return object. No detour. No extra copy.

Why does this happen?

When you call a function that returns a large object, the caller allocates space on its own stack and passes a hidden pointer to the callee:

func:
    sub rsp, sizeof(B)      ; make space for return value
    lea rdi, [rsp]          ; pass pointer to that space as hidden first argument
    call func_ext
    ; now [rsp] contains the B object

The callee receives this pointer and constructs the return value at that address:

func_ext:
    ; rdi = pointer to where caller wants the result
    ; construct B at address rdi
    ret

The "return slot" is just that space the caller allocated. The caller decides where it is. The callee must use it. When the compiler sees:

B func_ext() {
    A a;
    B b;
    func(&a, &b);
    return b;
}

It realizes b will become the return value. So instead of giving b its own space on func_ext's stack, it uses the pointer the caller provided. b is constructed directly where the caller wants it. When the function returns, there's nothing to copy — b and the return value are the same object at the same address.

Why subobjects break this?

B func_ext() {
    auto [a, b] = func();
    return b;
}

The hidden temporary looks like this in memory:

func_ext's stack frame:
    ┌─────────────────┐
    │      ...        │
    ├─────────────────┤ 0x7fff1000
    │       a         │   <- __tmp starts here
    ├─────────────────┤ 0x7fff1008
    │       b         │   <- b is at __tmp + 8
    ├─────────────────┤
    │      ...        │
    └─────────────────┘

Meanwhile, the caller said "put the return value at 0x7fff2050" (some address in the caller's frame). For NRVO to work, b would need to exist at 0x7fff2050. But b's address is 0x7fff1008 — it's determined by where __tmp lives plus a fixed offset.

Could the compiler fix this?

You might ask: why not place __tmp such that b lands at 0x7fff2050? That would put __tmp at 0x7fff2048 (so that __tmp + 8 = 0x7fff2050). But then a would occupy 0x7fff2048 — memory that's in the caller's stack frame, not ours. We'd be writing into memory we don't own. The caller only promised that 0x7fff2050 is valid for a b. It made no guarantees about the bytes before it.

A subobject's address is derived from its parent's address. The return slot's address is chosen by the caller. These are two independent constraints on where b must live, and you can't satisfy both. Hence, a copy:

...
mov     rdx, QWORD PTR [rbp-16]
mov     rax, QWORD PTR [rbp-152]
mov     rsi, rdx
mov     rdi, rax
call    B::B(B const&) [complete object constructor]
...

Conclusion

Structured bindings are elegant. They also hide costs. If performance matters, don’t trust the surface syntax. Check the generated code. The disassembly rarely lies.