Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support printing postgresql's bytea data type in its "hex" and "escape" format #3438

Closed
MichaelScofield opened this issue Mar 6, 2024 · 12 comments · Fixed by #3567
Closed
Assignees
Labels
good first issue Good for newcomers

Comments

@MichaelScofield
Copy link
Collaborator

What problem does the new feature solve?

The bytea data type is the binary data type in pg. When we "select" it in GreptimeDB, the output is the raw byte string:

public=> CREATE TABLE test(b BLOB, ts TIMESTAMP TIME INDEX);
OK 0
public=> INSERT INTO test VALUES(X'616263', 1687867163);
OK 1
public=> select * from test;
   b    |             ts             
--------+----------------------------
 616263 | 1970-01-20 12:51:07.163000
(1 row)

The inserted value is a hex string "616263", which is "abc" in ascii.

In pg, however, that binary value can be outputted as different formats:

luofucong=# CREATE TABLE test(b bytea, ts bigint);
CREATE TABLE
luofucong=# INSERT INTO test VALUES('\x616263', 1);
INSERT 0 1
luofucong=# set bytea_output = 'hex';
SET
luofucong=# select * from test;
    b     | ts 
----------+----
 \x616263 |  1
(1 row)

luofucong=# set bytea_output = 'escape';
SET
luofucong=# select * from test;
  b  | ts 
-----+----
 abc |  1
(1 row)

What does the feature do?

The pg has a SET bytea_output client configuration to set the output format of the binary value. We could support that as well to offer better usage experience for pg users. The details can be found here.

Implementation challenges

  • This client configuration is pg only. So the impl is best to be confined to pg related mods in crate servers. That means the actual "set" impl shouldn't be placed in our general stmt executor. However, currently there're none places in the pg related mods that can handle "set"s. This is the first problem to solve.
  • Also since this feature is pg only, we should find a way to save the configuration in the client (or session) context that belongs to pg only. Currently there're none either.
@J0HN50N133
Copy link
Contributor

Hello, I would like to help with this issue and #3442.

@MichaelScofield
Copy link
Collaborator Author

@J0HN50N133 Thanks, feel free to take them and ask any questions!

@J0HN50N133
Copy link
Contributor

I've just read some related code. I think one reasonable choice to save the configuration is the QueryContext::extension. And there are already some codes doing the similar things. But I don't know whether it will break the semantic of extension since there is no any comment.

@MichaelScofield
Copy link
Collaborator Author

@J0HN50N133 Since it's pg only, I think a more suitable place for it is pgwire::api::ClientInfo. It has a "metadata" hashmap we can use to store and retrieve the bytea_output set by the pg client. @sunng87 can we use the "metadata" like this?

@J0HN50N133
Copy link
Contributor

I guest extension is to add some client' specific infomation, and prom_store already have code like:

new_query_ctx​.​set_extension​(​PHYSICAL_TABLE_PARAM​,​ physical_table​)​;

PHYSICAL_TABLE_PARAM should be a prom only parameter.

@MichaelScofield
Copy link
Collaborator Author

Yes, the extension has been used that way, but it's better for pg-only things stay in pg-only codes.

@J0HN50N133
Copy link
Contributor

J0HN50N133 commented Mar 19, 2024

After reading more codes, I think it's more reasonable to just use query_context to pass the bytea value and update it's value in current session like the way set timezone do. But I'm not sure whether it's a good idea to add a field call bytea_format which is very specific, or add a container to hold more session variables like bytea_format, datestyle and etc. I prefer the later solution, but have no idea what's the type of variables' value should be. Although Expr can represent the value, I don't think it's ok to introduce Expr into here.

@MichaelScofield
Copy link
Collaborator Author

bytea_format is pg only, better not set in general query context. On the other hand, "timezone" is more general, ok to be put in the general query context.

@J0HN50N133
Copy link
Contributor

J0HN50N133 commented Mar 19, 2024

query_context only pass variable value(It's not a good idea to access pgserverhandler directly in query_engine for such an easy feature). And value will finally be set in pg related session.

@J0HN50N133
Copy link
Contributor

J0HN50N133 commented Mar 20, 2024

// src/session/src/context.rs
pub struct ParameterConfiguration {
    // The name of the parameter should be constant
    param_name: &'static str,
    value: Value,
}

pub struct QueryContext {
    ...
    // The configuration parameters are used to store the parameters that are set by the user
    #[builder(default)]
    configuration_parameters: Option<ParameterConfiguration>,
}

// context.rs
impl QueryContext {
    pub fn update_session(&self, session: Session) {
        ...
    }
}

// src/operator/src/statement.rs
fn set_configuration_parameters(exprs: Vec<Expr>, ctx: QueryContextRef) -> Result<()>{
    ... // just set the configuration_parameters field in ctx here
}

// src/session/src/lib.rs
pub struct Session{
    ...
    configuration_parameters: DashMap<&'static str, Value> // parameter name -> parameter value
}

impl Session {
    fn set_configuration_parameters(param: ParameterConfiguration){
        match self.conn_info.channel {
            Postgres => { /*invoke update function writen in postgres.rs*/ }
            Mysql => todo!()
        }
    }
}

Here's what I'm going to do to support both bytea_format and datestyle, I think by this design the actual impl of set_bytea, set_datestyle and etc will be confined in postgres crate. As Postgres and Mysql both have set statement, I think it's ok to add a general field to hold the session level parameters

@MichaelScofield
Copy link
Collaborator Author

Looking forward to your PR!

@sunng87
Copy link
Member

sunng87 commented Mar 22, 2024

@sunng87 can we use the "metadata" like this?

Sorry, just saw this. Sure we can use metadata to store postgres specific session variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
3 participants