Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: auto heap dump by default if MALLOC_CONF=prof:true #12186

Merged
merged 4 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 3 additions & 7 deletions src/common/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,7 @@ pub struct ServerConfig {
pub unrecognized: Unrecognized<Self>,

/// Enable heap profile dump when memory usage is high.
#[serde(default = "default::server::auto_dump_heap_profile")]
#[serde(default)]
pub auto_dump_heap_profile: AutoDumpHeapProfileConfig,
}

Expand Down Expand Up @@ -908,7 +908,7 @@ pub mod default {
}

pub mod server {
use crate::config::{AutoDumpHeapProfileConfig, MetricLevel};
use crate::config::MetricLevel;

pub fn heartbeat_interval_ms() -> u32 {
1000
Expand All @@ -925,10 +925,6 @@ pub mod default {
pub fn telemetry_enabled() -> bool {
true
}

pub fn auto_dump_heap_profile() -> AutoDumpHeapProfileConfig {
Default::default()
}
}

pub mod storage {
Expand Down Expand Up @@ -1131,7 +1127,7 @@ pub mod default {

pub mod auto_dump_heap_profile {
pub fn dir() -> String {
"".to_string()
".".to_string() // current directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll suggest putting it in ./.risingwave/profiling/auto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"." is not perfect but ./.risingwave is only used by risedev (for developers), it seems even worse.

Let me try to make it configurable by env var, so that kube-bench or risedev can set a proper output directory.

Copy link
Contributor

@yuhao-su yuhao-su Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually not just used by risedev.

create_dir_all("./.risingwave/sled").expect("should create");

But an env for the prefix will be great. It can be useful for putting all kinds of local files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the ci failed because of this change. You need to update the example.toml

Copy link
Member Author

@fuyufjh fuyufjh Sep 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually not just used by risedev.

create_dir_all("./.risingwave/sled").expect("should create");

But an env for the prefix will be great. It can be useful for putting all kinds of local files.

Hmm, this is not persuasive. memory storage is just for test and playground, and it is never used in production.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least put it in a directory? Output heap profile in a root directory looks ugly to me. Also we will have a directory for manually dumped in the dashboard pr.

Anyway, I think an env var for the local file prefix would be great. Maybe we can do it in later prs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized the "env var" actually already exist - MALLOC_CONF

Now it will dump memory profile to prof.prefix if server.auto_dump_heap_profile.dir is absent. Please take a look. 🥰

}

pub fn threshold() -> f32 {
Expand Down
15 changes: 8 additions & 7 deletions src/compute/src/memory_management/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -74,15 +74,8 @@ pub fn build_memory_control_policy(
total_memory_bytes: usize,
auto_dump_heap_profile_config: AutoDumpHeapProfileConfig,
) -> Result<MemoryControlRef> {
use risingwave_common::bail;
use tikv_jemalloc_ctl::opt;

use self::policy::JemallocMemoryControl;

if !opt::prof::read().unwrap() && auto_dump_heap_profile_config.enabled() {
bail!("Auto heap profile dump should not be enabled with Jemalloc profile disable");
}

Ok(Box::new(JemallocMemoryControl::new(
total_memory_bytes,
auto_dump_heap_profile_config,
Expand Down Expand Up @@ -122,6 +115,14 @@ impl MemoryControl for DummyPolicy {
/// overhead, network buffer, etc. based on `SYSTEM_RESERVED_MEMORY_PROPORTION`. The reserve memory
/// size must be larger than `MIN_SYSTEM_RESERVED_MEMORY_MB`
pub fn reserve_memory_bytes(total_memory_bytes: usize) -> (usize, usize) {
if total_memory_bytes < MIN_COMPUTE_MEMORY_MB << 20 {
panic!(
"The total memory size ({}) is too small. It must be at least {} MB.",
convert(total_memory_bytes as _),
MIN_COMPUTE_MEMORY_MB
);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know why MIN_COMPUTE_MEMORY_MB was not checked anywhere, so I added it by the way.


let reserved = std::cmp::max(
(total_memory_bytes as f64 * SYSTEM_RESERVED_MEMORY_PROPORTION).ceil() as usize,
MIN_SYSTEM_RESERVED_MEMORY_MB << 20,
Expand Down
13 changes: 12 additions & 1 deletion src/compute/src/memory_management/policy.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ use risingwave_batch::task::BatchManager;
use risingwave_common::config::AutoDumpHeapProfileConfig;
use risingwave_common::util::epoch::Epoch;
use risingwave_stream::task::LocalStreamManager;
use tikv_jemalloc_ctl::{epoch as jemalloc_epoch, prof as jemalloc_prof, stats as jemalloc_stats};
use tikv_jemalloc_ctl::{
epoch as jemalloc_epoch, opt as jemalloc_opt, prof as jemalloc_prof, stats as jemalloc_stats,
};

use super::{MemoryControl, MemoryControlStats};

Expand Down Expand Up @@ -103,9 +105,16 @@ impl JemallocMemoryControl {
if !self.auto_dump_heap_profile_config.enabled() {
return;
}

if cur_used_memory_bytes > self.threshold_auto_dump_heap_profile
&& prev_used_memory_bytes <= self.threshold_auto_dump_heap_profile
{
let opt_prof = jemalloc_opt::prof::read().unwrap();
if !opt_prof {
tracing::info!("Cannot dump heap profile because Jemalloc prof is not enabled");
return;
}
yuhao-su marked this conversation as resolved.
Show resolved Hide resolved

let time_prefix = chrono::Local::now().format("%Y-%m-%d-%H-%M-%S").to_string();
let file_name = format!(
"{}.exceed-threshold-aggressive-heap-prof.compute.dump.{}\0",
Expand All @@ -124,6 +133,8 @@ impl JemallocMemoryControl {
.write(CStr::from_bytes_with_nul(file_path_bytes).unwrap())
{
tracing::warn!("Auto Jemalloc dump heap file failed! {:?}", e);
} else {
tracing::info!("Successfully dumped heap profile to {}", file_name);
}
unsafe { Box::from_raw(file_path_ptr) };
}
Expand Down