Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: performance improvement by lazy python #4613

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

benfdking
Copy link
Contributor

@benfdking benfdking commented Jan 15, 2025

  • make comments and texts getters lazy
  • adds optimizations to extract_string of when to compute things

@benfdking
Copy link
Contributor Author

While this makes the tokenizer much quicker, I think it would be good to benchmark slightly higher level to make sure we aren't seeing the slow down in other calls upstream.

@tobymao
Copy link
Owner

tobymao commented Jan 15, 2025

can you run the regular benchmarks?

@benfdking benfdking force-pushed the making_get_comments_lazy branch from 0825a93 to 1e8ece5 Compare January 15, 2025 01:22
@benfdking
Copy link
Contributor Author

Struggling when running python benchmarks/bench.py, getting

'Tokenizer' object has no attribute '_rs_dialect_settings'
'Tokenizer' object has no attribute '_rs_dialect_settings'
'Tokenizer' object has no attribute '_rs_dialect_settings'
'Tokenizer' object has no attribute '_rs_dialect_settings'
Traceback (most recent call last):
  File "/Users/benjaminking/Developer/sqlglot/benchmarks/bench.py", line 226, in <module>
    {k: v if v == "Query" else str(row[k])[0:7] + diff(row, k) for k, v in row.items()}
                                                  ^^^^^^^^^^^^
  File "/Users/benjaminking/Developer/sqlglot/benchmarks/bench.py", line 199, in diff
    return f" ({str(column / row['sqlglot'])[0:5]})"
                    ~~~~~~~^~~~~~~~~~~~~~~~
TypeError: ufunc 'divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Will have to be something for tomorrow.

@benfdking benfdking force-pushed the making_get_comments_lazy branch from 1e8ece5 to 4efe27f Compare January 17, 2025 13:01
- make comments and texts getters
@benfdking benfdking force-pushed the making_get_comments_lazy branch from 4efe27f to 1f9e2ec Compare January 17, 2025 13:04

#[getter(token_type)]
fn token_type(&self, py: Python) -> PyObject {
match &self.token_type {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this optional?

@benfdking benfdking force-pushed the making_get_comments_lazy branch 2 times, most recently from e7fd582 to b98ff12 Compare January 17, 2025 17:55
@benfdking benfdking force-pushed the making_get_comments_lazy branch from b98ff12 to d7b2c17 Compare January 17, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants