From b2be36ef3349ebdab5423ea3337c03bcc3319b60 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Wed, 3 Jul 2024 17:56:00 +0100 Subject: [Backport #11036] Add explicit compiler fence when pushing frames to ensure safe profiling (#11090) **What does this PR do?** This PR tweaks the `vm_push_frame` function to add an explicit compiler fence (`atomic_signal_fence`) to ensure profilers that use signals to interrupt applications (stackprof, vernier, pf2, Datadog profiler) can safely sample from the signal handler. This is a backport of #11036 to Ruby 3.3 . **Motivation:** The `vm_push_frame` was specifically tweaked in https://github.com/ruby/ruby/pull/3296 to initialize the a frame before updating the `cfp` pointer. But since there's nothing stopping the compiler from reordering the initialization of a frame (`*cfp =`) with the update of the cfp pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on the Datadog profiler. In practice, after some experimentation + talking to folks, this reordering does not seem to happen. But since modern compilers have a way for us to exactly tell them not to do the reordering (`atomic_signal_fence`), this seems even better. I've actually extracted `vm_push_frame` into the "Compiler Explorer" website, which you can use to see the assembly output of this function across many compilers and architectures: https://godbolt.org/z/3oxd1446K On that link you can observe two things across many compilers: 1. The compilers are not reordering the writes 2. The barrier does not change the generated assembly output (== has no cost in practice) **Additional Notes:** The checks added in `configure.ac` define two new macros: * `HAVE_STDATOMIC_H` * `HAVE_DECL_ATOMIC_SIGNAL_FENCE` Since Ruby generates an arch-specific `config.h` header with these macros upon installation, this can be used by profilers and other libraries to test if Ruby was compiled with the fence enabled. **How to test the change?** As I mentioned above, you can check https://godbolt.org/z/3oxd1446K to confirm the compiled output of `vm_push_frame` does not change in most compilers (at least all that I've checked on that site). --- configure.ac | 2 ++ vm_insnhelper.c | 12 ++++++++++++ 2 files changed, 14 insertions(+) diff --git a/configure.ac b/configure.ac index b77cbea545..a9c3796caa 100644 --- a/configure.ac +++ b/configure.ac @@ -1359,6 +1359,7 @@ AC_CHECK_HEADERS(ucontext.h) AC_CHECK_HEADERS(utime.h) AC_CHECK_HEADERS(sys/epoll.h) AC_CHECK_HEADERS(sys/event.h) +AC_CHECK_HEADERS(stdatomic.h) AS_CASE("$target_cpu", [x64|x86_64|i[3-6]86*], [ AC_CHECK_HEADERS(x86intrin.h) @@ -2039,6 +2040,7 @@ AC_CHECK_FUNCS(_longjmp) # used for AC_ARG_WITH(setjmp-type) test x$ac_cv_func__longjmp = xno && ac_cv_func__setjmp=no AC_CHECK_FUNCS(arc4random_buf) AC_CHECK_FUNCS(atan2l atan2f) +AC_CHECK_DECLS(atomic_signal_fence, [], [], [#include ]) AC_CHECK_FUNCS(chmod) AC_CHECK_FUNCS(chown) AC_CHECK_FUNCS(chroot) diff --git a/vm_insnhelper.c b/vm_insnhelper.c index b72167f75d..913fd86bb1 100644 --- a/vm_insnhelper.c +++ b/vm_insnhelper.c @@ -12,6 +12,10 @@ #include +#ifdef HAVE_STDATOMIC_H + #include +#endif + #include "constant.h" #include "debug_counter.h" #include "internal.h" @@ -388,6 +392,14 @@ vm_push_frame(rb_execution_context_t *ec, .jit_return = NULL }; + /* Ensure the initialization of `*cfp` above never gets reordered with the update of `ec->cfp` below. + This is a no-op in all cases we've looked at (https://godbolt.org/z/3oxd1446K), but should guarantee it for all + future/untested compilers/platforms. */ + + #ifdef HAVE_DECL_ATOMIC_SIGNAL_FENCE + atomic_signal_fence(memory_order_seq_cst); + #endif + ec->cfp = cfp; if (VMDEBUG == 2) { -- cgit v1.2.3