Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(frontend): support idle in transaction session timeout #14566

Merged
merged 9 commits into from
Jan 17, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
support idle in transaction session timeout
chenzl25 committed Jan 15, 2024
commit 9f18c51e6b3e989f5c460c582735daa7917f94c9
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions e2e_test/batch/transaction/idle_in_transaction_session_timeout.slt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
statement ok
set idle_in_transaction_session_timeout = 1000

statement ok
begin read only;

statement ok
select 1;

# wait enough time to trigger idle_in_transaction_session_timeout
sleep 2s

statement error
select 1;
4 changes: 4 additions & 0 deletions src/common/src/session_config/mod.rs
Original file line number Diff line number Diff line change
@@ -207,6 +207,10 @@ pub struct ConfigMap {
#[parameter(default = 0u32)]
statement_timeout: u32,

/// Terminate any session that has been idle (that is, waiting for a client query) within an open transaction for longer than the specified amount of time in milliseconds.
#[parameter(default = 60000u32)]
idle_in_transaction_session_timeout: u32,

/// See <https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-LOCK-TIMEOUT>
/// Unused in RisingWave, support for compatibility.
#[parameter(default = 0)]
23 changes: 23 additions & 0 deletions src/frontend/src/session.rs
Original file line number Diff line number Diff line change
@@ -514,6 +514,9 @@ pub struct SessionImpl {

/// execution context represents the lifetime of a running SQL in the current session
exec_context: Mutex<Option<Weak<ExecContext>>>,

/// Last idle instant
last_idle_instant: Arc<Mutex<Option<Instant>>>,
}

#[derive(Error, Debug)]
@@ -552,6 +555,7 @@ impl SessionImpl {
current_query_cancel_flag: Mutex::new(None),
notices: Default::default(),
exec_context: Mutex::new(None),
last_idle_instant: Default::default(),
}
}

@@ -577,6 +581,7 @@ impl SessionImpl {
8080,
))
.into(),
last_idle_instant: Default::default(),
}
}

@@ -658,6 +663,13 @@ impl SessionImpl {
.map(|context| context.last_instant.elapsed().as_millis())
}

pub fn elapse_since_last_idle_instant(&self) -> Option<u128> {
self.last_idle_instant
.lock()
.as_ref()
.map(|x| x.elapsed().as_millis())
}

pub fn check_relation_name_duplicated(
&self,
name: ObjectName,
@@ -1129,10 +1141,21 @@ impl Session for SessionImpl {
let exec_context = Arc::new(ExecContext {
running_sql: sql,
last_instant: Instant::now(),
last_idle_instant: self.last_idle_instant.clone(),
});
*self.exec_context.lock() = Some(Arc::downgrade(&exec_context));
ExecContextGuard::new(exec_context)
}

fn check_idle_in_transaction_timeout(&self) -> bool {
if matches!(self.transaction_status(), TransactionStatus::InTransaction) {
if let Some(elapse_since_last_idle_instant) = self.elapse_since_last_idle_instant() {
return elapse_since_last_idle_instant
> self.config().idle_in_transaction_session_timeout() as u128;
}
}
false
}
}

/// Returns row description of the statement
1 change: 1 addition & 0 deletions src/utils/pgwire/Cargo.toml
Original file line number Diff line number Diff line change
@@ -23,6 +23,7 @@ futures = { version = "0.3", default-features = false, features = ["alloc"] }
itertools = "0.12"
openssl = "0.10.60"
panic-message = "0.3"
parking_lot = "0.12"
risingwave_common = { workspace = true }
risingwave_sqlparser = { workspace = true }
thiserror = "1"
13 changes: 13 additions & 0 deletions src/utils/pgwire/src/pg_protocol.rs
Original file line number Diff line number Diff line change
@@ -550,6 +550,12 @@ where
record_sql_in_span(&sql);
let session = self.session.clone().unwrap();

if session.check_idle_in_transaction_timeout() {
self.process_terminate();
return Err(PsqlError::SimpleQueryError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding a new error variant? I believe the handling logic can be quite similar to handling Panic here.

PsqlError::Panic(_) => {
self.stream
.write_no_flush(&BeMessage::ErrorResponse(Box::new(e)))
.ok()?;
let _ = self.stream.flush().await;
// Catching the panic during message processing may leave the session in an
// inconsistent state. We forcefully close the connection (then end the
// session) here for safety.
return None;
}

"terminating connection due to idle-in-transaction timeout".into(),
));
}
let _exec_context_guard = session.init_exec_context(sql.clone());
self.inner_process_query_msg(sql.clone(), session.clone())
.await
@@ -585,6 +591,7 @@ where
session: Arc<SM::Session>,
) -> PsqlResult<()> {
let session = session.clone();

// execute query
let res = session
.clone()
@@ -792,6 +799,12 @@ where
let sql: Arc<str> = Arc::from(format!("{}", portal));
record_sql_in_span(&sql);

if session.check_idle_in_transaction_timeout() {
self.process_terminate();
return Err(PsqlError::ExtendedPrepareError(
"terminating connection due to idle-in-transaction timeout".into(),
))?;
}
let _exec_context_guard = session.init_exec_context(sql.clone());
let result = session.clone().execute(portal).await;

16 changes: 16 additions & 0 deletions src/utils/pgwire/src/pg_server.rs
Original file line number Diff line number Diff line change
@@ -19,6 +19,7 @@ use std::sync::Arc;
use std::time::Instant;

use bytes::Bytes;
use parking_lot::Mutex;
use risingwave_common::types::DataType;
use risingwave_sqlparser::ast::Statement;
use thiserror_ext::AsReport;
@@ -112,6 +113,8 @@ pub trait Session: Send + Sync {
fn transaction_status(&self) -> TransactionStatus;

fn init_exec_context(&self, sql: Arc<str>) -> ExecContextGuard;

fn check_idle_in_transaction_timeout(&self) -> bool;
}

/// Each session could run different SQLs multiple times.
@@ -120,6 +123,8 @@ pub struct ExecContext {
pub running_sql: Arc<str>,
/// The instant of the running sql
pub last_instant: Instant,
/// A reference used to update when `ExecContext` is dropped
pub last_idle_instant: Arc<Mutex<Option<Instant>>>,
}

/// `ExecContextGuard` holds a `Arc` pointer. Once `ExecContextGuard` is dropped,
@@ -132,6 +137,12 @@ impl ExecContextGuard {
}
}

impl Drop for ExecContext {
fn drop(&mut self) {
*self.last_idle_instant.lock() = Some(Instant::now());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the last_idle_instant is only updated when execution completes, if a batch query runs for a long time e.g. scanning a large amount of data, will it be killed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the last_idle_instant is only updated when execution completes, if a batch query runs for a long time e.g. scanning a large amount of data, will it be killed?

According to the postgresql docs, this variable doesn't care about how long a statement is running, which should be handled by parameters like statement timeout

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From PG's documents:

Terminate any session that has been idle (that is, waiting for a client query) within an open transaction for longer than the specified amount of time.

IIUC, that means a running batch query should not be killed. Do we behave at the same way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we won't kill any query, but unpin the snapshot (either by background monitor or session termination triggered by a later statement). The session would be terminated when the later statement arrives.

Copy link
Member

@fuyufjh fuyufjh Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but unpin the snapshot

By "killed" I mean any means that cause the query to end. When a snapshot is unpinned, it's possible that the (running) iterator runs into errors and terminates, is it the cases? If so, I was saying that we should avoid such thing.

Copy link
Contributor Author

@chenzl25 chenzl25 Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ensure there is no running SQL in the current session during calling unpin_snapshot in check_idle_in_transaction_timeout, so we don't need to worry about that.

}
}

#[derive(Debug, Clone)]
pub enum UserAuthenticator {
// No need to authenticate.
@@ -362,9 +373,14 @@ mod tests {
let exec_context = Arc::new(ExecContext {
running_sql: sql,
last_instant: Instant::now(),
last_idle_instant: Default::default(),
});
ExecContextGuard::new(exec_context)
}

fn check_idle_in_transaction_timeout(&self) -> bool {
false
}
}

async fn do_test_query(bind_addr: impl Into<String>, pg_config: impl Into<String>) {