Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug superconsole test failure #9

Closed
wants to merge 2 commits into from
Closed

Conversation

ndmitchell
Copy link
Contributor

Investigate why superconsole fails in our CI

Investigate why superconsole fails in our CI
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 5, 2022
@ndmitchell ndmitchell closed this Sep 5, 2022
@ndmitchell ndmitchell deleted the ndmitchell-patch-1 branch September 5, 2022 18:34
facebook-github-bot pushed a commit that referenced this pull request Apr 7, 2023
Summary:
The introductory video link was incorrect in the README, this PR updates it to the correct video.

X-link: facebookincubator/gazebo#9

Reviewed By: stepancheg

Differential Revision: D44784296

Pulled By: ndmitchell

fbshipit-source-id: 9d0ff66e6eac66ee3843a153bd3cf33f90d3b132
facebook-github-bot pushed a commit that referenced this pull request Feb 15, 2024
Summary:
The stack around D42099161 introduced the ability to dynamically set the log level via a debug-command. This requires using `tracing_subscriber::reload::Layer`. The implementation of that machinery is backed by a `RwLock` [here](https://fburl.com/code/4xv0ihpn). This `RwLock` is read locked once on every single `#[instrumentation]` call to check whether the instrumentation is active or not.

On top of that, something in our fbsource third-party config enables the `parking_lot` feature of `tracing_subscriber`. This means that this is a `parking_lot::RwLock`, not a `std::sync::RwLock`, which is bad because it's well known that parking lot has problems with excessive spinning.

What all that adds up to is that when you put `#[instrument]` on a super hot function in dice, you get dozens of threads that are all simultaneously doing this:

```
  thread #23, name = 'buck2-rt', stop reason = signal SIGSTOP
    frame #0: 0x000000000c94a1fa buck2`<parking_lot::raw_rwlock::RawRwLock>::lock_shared_slow + 122
    frame #1: 0x0000000007a80b54 buck2`<tracing_subscriber::layer::layered::Layered<tracing_subscriber::reload::Layer<tracing_subscriber::filter::layer_filters::Filtered<tracing_subscriber::fmt::fmt_layer::Layer<tracing_subscriber::registry::sharded::Registry, tracing_subscriber::fmt::format::DefaultFields, tracing_subscriber::fmt::format::Format, std::io::stdio::stderr>, tracing_subscriber::filter::env::EnvFilter, tracing_subscriber::registry::sharded::Registry>, tracing_subscriber::registry::sharded::Registry>, tracing_subscriber::registry::sharded::Registry> as tracing_core::subscriber::Subscriber>::enabled + 292
    frame #2: 0x000000000a172d77 buck2`<dice::legacy::incremental::dep_trackers::internals::Dep<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>> as dice::legacy::incremental::graph::dependencies::Dependency>::recompute::{closure#0} + 183
    frame #3: 0x000000000a03606c buck2`<futures_util::stream::futures_unordered::FuturesUnordered<core::pin::Pin<alloc::boxed::Box<dyn core::future::future::Future<Output = core::result::Result<(alloc::boxed::Box<dyn dice::legacy::incremental::graph::dependencies::ComputedDependency>, alloc::sync::Arc<dyn dice::legacy::incremental::graph::GraphNodeDyn>), dice::api::error::DiceError>> + core::marker::Send>>> as futures_util::stream::stream::StreamExt>::poll_next_unpin + 444
    frame #4: 0x0000000009fc4755 buck2`<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::compute_whether_dependencies_changed::{closure#0} (.llvm.4229350879637282184) + 1733
    frame #5: 0x0000000009ef30d4 buck2`<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::compute_whether_versioned_dependencies_changed::{closure#0}::{closure#0} + 228
    frame #6: 0x0000000009f194ec buck2`<buck2_futures::cancellable_future::CancellableFuture<<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::new_dice_task::{closure#0}> as core::future::future::Future>::poll (.llvm.11181184606289051452) + 3500
    frame #7: 0x0000000009f04bbf buck2`<futures_util::future::future::map::Map<buck2_futures::cancellable_future::CancellableFuture<<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::new_dice_task::{closure#0}>, buck2_futures::spawn::spawn_inner<<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::new_dice_task::{closure#0}, dice::api::user_data::UserComputationData, futures_util::future::ready::Ready<()>>::{closure#0}> as core::future::future::Future>::poll + 31
    frame #8: 0x0000000009ff0339 buck2`<futures_util::future::future::flatten::Flatten<futures_util::future::future::Map<futures_util::future::ready::Ready<()>, buck2_futures::spawn::spawn_inner<<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::new_dice_task::{closure#0}, dice::api::user_data::UserComputationData, futures_util::future::ready::Ready<()>>::{closure#1}>, <futures_util::future::future::Map<futures_util::future::ready::Ready<()>, buck2_futures::spawn::spawn_inner<<dice::legacy::incremental::IncrementalEngine<dice::legacy::key::StoragePropertiesForKey<buck2_build_api::actions::calculation::BuildKey>>>::new_dice_task::{closure#0}, dice::api::user_data::UserComputationData, futures_util::future::ready::Ready<()>>::{closure#1}> as core::future::future::Future>::Output> as core::future::future::Future>::poll + 201
    frame #9: 0x00000000093f9f22 buck2`<tokio::task::task_local::TaskLocalFuture<buck2_events::dispatch::EventDispatcher, core::pin::Pin<alloc::boxed::Box<dyn core::future::future::Future<Output = alloc::boxed::Box<dyn core::any::Any + core::marker::Send>> + core::marker::Send>>> as core::future::future::Future>::poll + 130
    frame #10: 0x000000000939fdcb buck2`<tracing::instrument::Instrumented<<buck2_build_api::spawner::BuckSpawner as buck2_futures::spawner::Spawner<dice::api::user_data::UserComputationData>>::spawn::{closure#0}> as core::future::future::Future>::poll + 139
    frame #11: 0x00000000091ca5a9 buck2`<tokio::runtime::task::core::Core<tracing::instrument::Instrumented<<buck2_build_api::spawner::BuckSpawner as buck2_futures::spawner::Spawner<dice::api::user_data::UserComputationData>>::spawn::{closure#0}>, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle>>>::poll + 169
    frame #12: 0x00000000091c1b22 buck2`<tokio::runtime::task::harness::Harness<tracing::instrument::Instrumented<<buck2_build_api::spawner::BuckSpawner as buck2_futures::spawner::Spawner<dice::api::user_data::UserComputationData>>::spawn::{closure#0}>, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle>>>::poll + 146
    frame #13: 0x000000000c920f6f buck2`<tokio::runtime::scheduler::multi_thread::worker::Context>::run_task + 895
    frame #14: 0x000000000c91f847 buck2`<tokio::runtime::scheduler::multi_thread::worker::Context>::run + 2103
    frame #15: 0x000000000c932264 buck2`<tokio::runtime::context::scoped::Scoped<tokio::runtime::scheduler::Context>>::set::<tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure#0}, ()> + 52
    frame #16: 0x000000000c932169 buck2`tokio::runtime::context::runtime::enter_runtime::<tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}, ()> + 441
    frame #17: 0x000000000c91efa6 buck2`tokio::runtime::scheduler::multi_thread::worker::run + 70
    frame #18: 0x000000000c906a50 buck2`<tracing::instrument::Instrumented<tokio::runtime::blocking::task::BlockingTask<<tokio::runtime::scheduler::multi_thread::worker::Launch>::launch::{closure#0}>> as core::future::future::Future>::poll + 160
    frame #19: 0x000000000c8f8af9 buck2`<tokio::loom::std::unsafe_cell::UnsafeCell<tokio::runtime::task::core::Stage<tracing::instrument::Instrumented<tokio::runtime::blocking::task::BlockingTask<<tokio::runtime::scheduler::multi_thread::worker::Launch>::launch::{closure#0}>>>>>::with_mut::<core::task::poll::Poll<()>, <tokio::runtime::task::core::Core<tracing::instrument::Instrumented<tokio::runtime::blocking::task::BlockingTask<<tokio::runtime::scheduler::multi_thread::worker::Launch>::launch::{closure#0}>>, tokio::runtime::blocking::schedule::BlockingSchedule>>::poll::{closure#0}> + 153
    frame #20: 0x000000000c90166b buck2`<tokio::runtime::task::core::Core<tracing::instrument::Instrumented<tokio::runtime::blocking::task::BlockingTask<<tokio::runtime::scheduler::multi_thread::worker::Launch>::launch::{closure#0}>>, tokio::runtime::blocking::schedule::BlockingSchedule>>::poll + 43
    frame #21: 0x000000000c90d9b8 buck2`<tokio::runtime::task::harness::Harness<tracing::instrument::Instrumented<tokio::runtime::blocking::task::BlockingTask<<tokio::runtime::scheduler::multi_thread::worker::Launch>::launch::{closure#0}>>, tokio::runtime::blocking::schedule::BlockingSchedule>>::poll + 152
    frame #22: 0x000000000c90b848 buck2`<tokio::runtime::blocking::pool::Inner>::run + 216
    frame #23: 0x000000000c9002ab buck2`std::sys_common::backtrace::__rust_begin_short_backtrace::<<tokio::runtime::blocking::pool::Spawner>::spawn_thread::{closure#0}, ()> + 187
    frame #24: 0x000000000c90042e buck2`<<std::thread::Builder>::spawn_unchecked_<<tokio::runtime::blocking::pool::Spawner>::spawn_thread::{closure#0}, ()>::{closure#1} as core::ops::function::FnOnce<()>>::call_once::{shim:vtable#0} + 158
    frame #25: 0x000000000757e4b5 buck2`std::sys::unix::thread::Thread::new::thread_start::h618b0b8e7bda0b2b + 37
    frame #26: 0x00007f2ba1e9abaf
```

This hit an open source user super hard over in #555 .

This diff approaches this issue by putting all the `#[instrument]`s in dice behind `#[cfg(debug_assertions)]`. This allows them to still be used in debug mode, but keeps them out of any hot paths. I do not have numbers to show that most of these matter. The only one I know matters for sure is `recompute`.

Alternatives are to either try and tailor this to the callsites we know are hot (that sounds error prone) or to revert the stack that inadvertently introduced the RwLock. Even if we did that though, it's probably still safer to keep `#[instrument]` out of super hot paths like this.

Reviewed By: stepancheg

Differential Revision: D53744041

fbshipit-source-id: 85bce9af2fec8ad1d50adc241d3b8e2bfc5cec87
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants