Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: using "id" (instead of its "name") to find the table #1525

Closed
MichaelScofield opened this issue May 5, 2023 · 9 comments
Closed
Assignees
Labels
C-enhancement Category Enhancements
Milestone

Comments

@MichaelScofield
Copy link
Collaborator

What type of enhancement is this?

Tech debt reduction

What does the enhancement do?

Now we are using table names to find the tables in catalog, which could cause potential bugs after the table got renamed.

On the other hand, the table id is globally unique for each table, we should use that as far as possible.

Implementation challenges

There are too many places we are using table names to find them. Some places could be changed to table ids, some couldn't(and shouldn't). How to best recognize that and set a refactoring guideline is very challenging I think.

@MichaelScofield MichaelScofield added the C-enhancement Category Enhancements label May 5, 2023
@evenyag
Copy link
Contributor

evenyag commented May 5, 2023

The table engine also has similar issues. Actually, we should use the table id to access the table.

async fn alter_table(
&self,
ctx: &EngineContext,
request: AlterTableRequest,
) -> Result<TableRef>;
/// Returns the table by it's name.
fn get_table(
&self,
ctx: &EngineContext,
table_ref: &TableReference,
) -> Result<Option<TableRef>>;
/// Returns true when the given table is exists.
fn table_exists(&self, ctx: &EngineContext, table_ref: &TableReference) -> bool;
/// Drops the given table. Return true if the table is dropped, or false if the table doesn't exist.
async fn drop_table(&self, ctx: &EngineContext, request: DropTableRequest) -> Result<bool>;

@killme2008 killme2008 added this to the v0.3 milestone May 8, 2023
@killme2008 killme2008 self-assigned this May 8, 2023
@killme2008
Copy link
Contributor

The table name to table id transform must be done at catalog level, because the users only have the table name. After that, all the references to the tables should use table ids.

@DevilExileSu
Copy link
Contributor

pub struct MemoryCatalogManager {
/// Collection of catalogs containing schemas and ultimately Tables
pub catalogs: RwLock<HashMap<String, SchemaEntries>>,
pub table_id: AtomicU32,
}

async fn table(
&self,
catalog: &str,
schema: &str,
table_name: &str,
) -> Result<Option<TableRef>>;

Many places need to look up tables using catalog, schema, and table_name. Can we add an RwLock<HashMap<u32, TableRef>> to enable table retrieval based on table_id?

This way, the requests sent to the datanode can include only the table_id.

@MichaelScofield
Copy link
Collaborator Author

@DevilExileSu For MemoryCatalogManager, I think we can add the map to it. Also you need another map Map<TableName, TableId> to find table id from its name. Sometimes you just only have the table name at you hand at the first place, for example, parsing the SQL.

@evenyag
Copy link
Contributor

evenyag commented Jul 20, 2023

Also you need another map Map<TableName, TableId> to find table id from its name.

@MichaelScofield We could retrieve the table id from TableRef. If we find out the TableRef from the catalog via catalog.schema.table, then we have the table id. Maybe we don't need this map.

Once #1869 is done, I think we don't need to send catalog, schema, and table name to datanodes.

@DevilExileSu
Copy link
Contributor

Also you need another map Map<TableName, TableId> to find table id from its name. Sometimes you just only have the table name at you hand at the first place, for example, parsing the SQL.

Yes, I think that pub catalogs: RwLock<HashMap<String, SchemaEntries>> already includes the functionality of looking up TableRef and TableId by TableName, so we may not need Map<TableName, TableId>.

@DevilExileSu
Copy link
Contributor

@MichaelScofield @killme2008 If you don't mind, I would like to give it a try.

@MichaelScofield
Copy link
Collaborator Author

@killme2008 are you working on this?

@MichaelScofield
Copy link
Collaborator Author

Now we are fully turned to region server, no table name involved in read and write other than necessary table finding for user facing interface. I think this issue is no longer valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category Enhancements
Projects
None yet
Development

No branches or pull requests

4 participants