Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt-refactor(torii-core): complete sql schema refactor #2718

Merged
merged 41 commits into from
Dec 3, 2024

Conversation

Larkooo
Copy link
Collaborator

@Larkooo Larkooo commented Nov 25, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced error handling for query execution failures, providing more detailed feedback.
    • Added new schema field to the models table, allowing storage of model type information.
    • Introduced a new method inner in the TypeData enum for accessing inner list types.
    • Added a new clear method in the ModelCache for improved model management.
  • Bug Fixes

    • Corrected logic in player configuration methods to ensure accurate state management.
  • Refactor

    • Streamlined SQL querying logic and entity mapping, improving code clarity and maintainability.
    • Removed unnecessary functions and parameters, simplifying the overall structure.
    • Updated parameter names and logic in various functions to enhance clarity.
  • Chores

    • Updated constant names for clarity and consistency across the codebase.
    • Added a structured configuration file for blockchain application components.

These changes enhance the functionality and maintainability of the application, providing users with improved error reporting and a more robust data handling framework.

Copy link
Collaborator

@glihm glihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering at which point the migrations of Torii are important to keep, since most of people just restart Torii.
As we're a bit refactoring everything, why not starting from a new ground?

WDYT?

@Larkooo Larkooo marked this pull request as ready for review November 29, 2024 08:32
Copy link

coderabbitai bot commented Nov 29, 2024

Walkthrough

Ohayo, sensei! This pull request introduces several significant changes across multiple files in the torii crate. Key modifications include enhancements to error handling in the Executor struct, a refactor of the ModelSQLReader for improved schema retrieval, and updates to SQL handling in the ModelCache and mod.rs. Additionally, new fields have been added to the Model struct, and various constants and function signatures have been updated to reflect a shift towards clearer naming conventions and streamlined logic. Overall, these changes aim to enhance clarity, maintainability, and data handling across the codebase.

Changes

File Path Change Summary
crates/torii/core/src/executor/mod.rs Enhanced error handling in handle_query_message method to include original error details in messages.
crates/torii/core/src/model.rs Modified schema method for direct schema retrieval; refactored build_sql_query with a new collect_columns helper; improved error handling in map_row_to_ty function.
crates/torii/core/src/sql/cache.rs Updated update_model to include schema in SQL queries; removed logic for fetching model_members.
crates/torii/core/src/sql/mod.rs Removed Primitive type from imports; updated register_model to include a new schema field; refactored set_entity and set_event_message to use set_entity_model.
crates/torii/core/src/types.rs Added new fields layout and schema to the Model struct.
crates/torii/graphql/src/constants.rs Updated constants ENTITY_ID_COLUMN and EVENT_MESSAGE_ID_COLUMN to new naming conventions.
crates/torii/graphql/src/object/connection/mod.rs Renamed parameter is_external to is_internal in connection_output; added new parameter snake_case.
crates/torii/graphql/src/object/entity.rs Updated model_union_field to select namespace, name, and schema; removed model_data_recursive_query.
crates/torii/graphql/src/object/event_message.rs Modified model_union_field to select namespace, name, and schema and updated parsing logic.
crates/torii/graphql/src/object/model_data.rs Removed ModelMember struct; updated ModelDataObject to include a new schema field.
crates/torii/graphql/src/query/mod.rs Removed type_mapping_query and replaced with build_type_mapping; updated handling of type data based on Ty schema.
crates/torii/graphql/src/schema.rs Updated build_objects to deserialize model schema from JSON; improved error handling.
crates/torii/grpc/src/server/mod.rs Renamed static constants for entity relations to internal naming conventions; refactored fetch_entities method.
crates/torii/grpc/src/server/subscriptions/entity.rs Added comment in process_entity_update regarding potential empty keys variable.
crates/torii/migrations/20241126064130_model_schema.sql Added new column schema to models table.
crates/torii/migrations/20241126064421_delete_model_members.sql Introduced migration script to delete model_members table.
crates/torii/graphql/src/types.rs Added new method inner to TypeData enum for accessing inner types of lists.
crates/torii/graphql/src/object/metadata/mod.rs Updated value_mapping_from_row call in metadata_connection_output to include an additional boolean parameter.
crates/torii/graphql/src/object/mod.rs Updated resolve_one and resolve_many functions to include an additional boolean parameter in their calls.
crates/torii/grpc/src/server/tests/entities_test.rs Updated test to use internal_entity_id instead of entity_id in query_by_keys method.
examples/spawn-and-move/src/actions.cairo Modified set_models and set_player_config methods for consistency in PlayerConfig structure; adjusted logic for updating moves.
crates/torii/types-test/manifest_dev.json Introduced a new structured configuration for a blockchain-based application, detailing components of a "world" and its associated contracts, models, and events.
crates/torii/graphql/src/query/data.rs Simplified handling of order parameter in fetch_multiple_rows and handle_cursor functions.
crates/torii/graphql/src/query/filter.rs Updated parse_filter function to simplify assignment of field property in Filter struct.

Possibly related PRs

Suggested reviewers

  • glihm

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 786348d and 5e04d91.

📒 Files selected for processing (1)
  • crates/torii/libp2p/src/server/mod.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/torii/libp2p/src/server/mod.rs

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Outside diff range and nitpick comments (15)
crates/torii/migrations/20241126064130_model_schema.sql (2)

1-2: Ohayo sensei! Consider documenting the migration strategy.

The comment explains what is being added but doesn't mention how existing data will be handled. Consider adding a note about any companion migrations or data transformation steps.


3-3: Consider using TEXT instead of BLOB for JSON data, sensei!

While BLOB can store JSON data, TEXT is generally more appropriate for JSON storage as:

  • It better indicates the content type
  • Most SQL clients can display TEXT data directly
  • JSON functions in SQL typically expect TEXT input
-ALTER TABLE models ADD COLUMN schema BLOB NOT NULL;
+ALTER TABLE models ADD COLUMN schema TEXT NOT NULL;
crates/torii/graphql/src/constants.rs (1)

15-16: Consider adding documentation about internal vs external IDs, sensei!

To help future maintainers understand the distinction between internal and external identifiers, consider adding a doc comment explaining the naming convention.

Here's a suggested addition above the constants:

+/// Column names for internal database identifiers.
+/// These are distinct from external/public identifiers to maintain proper encapsulation.
 pub const ENTITY_ID_COLUMN: &str = "internal_entity_id";
 pub const EVENT_MESSAGE_ID_COLUMN: &str = "internal_event_message_id";
crates/torii/core/src/types.rs (2)

110-111: Ohayo sensei! Consider adding documentation for the new fields.

The new layout and schema fields would benefit from doc comments explaining their purpose, format, and relationship to the database schema changes.

Apply this diff to add documentation:

 pub struct Model {
     pub id: String,
     pub namespace: String,
     pub name: String,
     pub class_hash: String,
     pub contract_address: String,
     pub transaction_hash: String,
+    /// The serialized layout configuration for the model
     pub layout: String,
+    /// The JSON serialized schema representation of the model's type
     pub schema: String,
     pub executed_at: DateTime<Utc>,
     pub created_at: DateTime<Utc>,
 }

110-111: Consider implementing custom serialization/deserialization.

Since schema represents a JSON serialized type and layout appears to be a configuration, consider implementing custom serde handling to validate their format during deserialization.

Example implementation:

use serde::{Deserialize, Deserializer};

impl Model {
    fn deserialize_schema<'de, D>(deserializer: D) -> Result<String, D::Error>
    where
        D: Deserializer<'de>,
    {
        let s: String = Deserialize::deserialize(deserializer)?;
        // Validate JSON schema format
        serde_json::from_str::<serde_json::Value>(&s)
            .map_err(serde::de::Error::custom)?;
        Ok(s)
    }
}
crates/torii/graphql/src/schema.rs (1)

143-144: Enhance error context for schema parsing, sensei!

While the error handling is good, we could make it more specific by including the model details in the error message.

Consider this improvement:

-        let schema: Ty = serde_json::from_str(&model.schema)
-            .map_err(|e| anyhow::anyhow!(format!("Failed to parse model schema: {e}")))?;
+        let schema: Ty = serde_json::from_str(&model.schema).map_err(|e| {
+            anyhow::anyhow!(
+                "Failed to parse schema for model {}/{}: {e}",
+                model.namespace,
+                model.name
+            )
+        })?;
crates/torii/graphql/src/object/model_data.rs (2)

28-28: Ohayo sensei! Please add documentation for the schema field.

The new schema field and its usage in the constructor would benefit from documentation explaining:

  • The purpose of this field
  • How it relates to the SQL schema refactoring
  • Any validation requirements for the schema parameter

Also applies to: 34-38


Line range hint 191-204: Ohayo sensei! Address the TODO comment about subquery optimization.

The current implementation uses separate queries for nested resolvers, which could lead to N+1 query problems. The TODO comment correctly suggests using JOINs instead.

Consider optimizing this by:

  1. Using JOINs in the parent query
  2. Implementing DataLoader pattern to batch and cache requests
  3. Adding appropriate indexes on the joined columns

Would you like me to help create a GitHub issue to track this optimization task?

crates/torii/core/src/executor/mod.rs (1)

Line range hint 1-1000: Consider refactoring for better maintainability, sensei!

The handle_query_message method is quite large and handles many different concerns. Consider:

  1. Extracting query type handlers into separate methods
  2. Creating a trait for query type handlers
  3. Standardizing error handling patterns across all query types

Example refactor pattern:

trait QueryHandler {
    async fn handle(&self, tx: &mut Transaction<'_, Sqlite>, query: Query) -> Result<()>;
}

impl<'c, P: Provider + Sync + Send + 'static> Executor<'c, P> {
    async fn handle_query_message(&mut self, query_message: QueryMessage) -> Result<()> {
        let handler = self.get_query_handler(&query_message.query_type);
        handler.handle(&mut self.transaction, query_message).await
    }
}
crates/torii/core/src/sql/cache.rs (1)

Line range hint 64-85: Consider simplifying the SQL query string for better readability

Ohayo sensei! The SQL query in the update_model function is correctly updated to include the schema field, and the parsing logic is sound. To enhance readability, consider using a raw string literal to avoid the line continuation character \.

Here's how you might adjust the SQL query:

     ): (String, String, String, String, u32, u32, String, String) = sqlx::query_as(
-        "SELECT namespace, name, class_hash, contract_address, packed_size, unpacked_size, \
-         layout, schema FROM models WHERE id = ?",
+        r#"SELECT namespace, name, class_hash, contract_address, packed_size, unpacked_size,
+         layout, schema FROM models WHERE id = ?"#,
     )

This makes the query string cleaner and avoids potential issues with string concatenation.

crates/torii/graphql/src/object/entity.rs (1)

138-140: Ohayo, sensei! Simplify error handling with the anyhow! macro

Currently, you're using anyhow::anyhow!(format!(...)) to create an error. The anyhow! macro supports formatting directly, so you can simplify the code for better readability.

Apply this diff to simplify:

-let schema: Ty = serde_json::from_str(&schema).map_err(|e| {
-    anyhow::anyhow!(format!("Failed to parse model schema: {e}"))
-})?;
+let schema: Ty = serde_json::from_str(&schema).map_err(|e| {
+    anyhow::anyhow!("Failed to parse model schema: {e}")
+})?;
crates/torii/graphql/src/object/event_message.rs (1)

144-146: Consider using .context() for better error handling with anyhow.

Utilizing .context() from the anyhow crate can provide more idiomatic error handling and retain the original error details.

Apply this diff to improve error handling:

- let schema: Ty = serde_json::from_str(&schema).map_err(|e| {
-     anyhow::anyhow!(format!("Failed to parse model schema: {e}"))
- })?;
+ let schema: Ty = serde_json::from_str(&schema).with_context(|| {
+     format!("Failed to parse model schema")
+ })?;

This change preserves the original error and adds context without directly formatting the error message.

crates/torii/graphql/src/query/mod.rs (2)

114-114: Remove Debugging println! Statement

Ohayo sensei! The println!("types: {:?}", types); statement seems intended for debugging. Consider removing it or using a proper logging framework if needed.


42-50: Refactor Complex Match Condition for Better Readability

Ohayo sensei! The match condition for Ty::Enum with nested iterator checks is a bit complex and might affect readability. Refactoring this condition could make the code clearer.

Here's a possible refactor:

 fn member_to_type_data(namespace: &str, schema: &Ty) -> TypeData {
     match schema {
         Ty::Primitive(primitive) => TypeData::Simple(TypeRef::named(primitive.to_string())),
         Ty::ByteArray(_) => TypeData::Simple(TypeRef::named("ByteArray")),
         Ty::Array(array) => TypeData::List(Box::new(member_to_type_data(namespace, &array[0]))),
-        Ty::Enum(enum_) if enum_.options.iter().all(|o| if let Ty::Tuple(t) = &o.ty { t.is_empty() } else { false }) => {
+        Ty::Enum(enum_) if is_simple_enum(enum_) => {
             TypeData::Simple(TypeRef::named("Enum"))
         }
         _ => parse_nested_type(namespace, schema),
     }
 }

+fn is_simple_enum(enum_: &EnumType) -> bool {
+    enum_.options.iter().all(|option| matches!(&option.ty, Ty::Tuple(t) if t.is_empty()))
+}
crates/torii/core/src/sql/mod.rs (1)

Line range hint 816-950: Unvalidated identifiers in dynamic SQL within add_columns_recursive

Ohayo, sensei! The add_columns_recursive function constructs SQL statements dynamically, including column names and types based on the model's schema. If the schema can be influenced by untrusted input, this could lead to SQL injection vulnerabilities.

Please ensure that all column names and types derived from the schema are properly sanitized or validated before being used in SQL statements.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c65c257 and a8afbcd.

📒 Files selected for processing (17)
  • crates/torii/core/src/executor/mod.rs (1 hunks)
  • crates/torii/core/src/model.rs (8 hunks)
  • crates/torii/core/src/sql/cache.rs (3 hunks)
  • crates/torii/core/src/sql/mod.rs (10 hunks)
  • crates/torii/core/src/types.rs (1 hunks)
  • crates/torii/graphql/src/constants.rs (1 hunks)
  • crates/torii/graphql/src/object/connection/mod.rs (1 hunks)
  • crates/torii/graphql/src/object/entity.rs (4 hunks)
  • crates/torii/graphql/src/object/event_message.rs (3 hunks)
  • crates/torii/graphql/src/object/mod.rs (2 hunks)
  • crates/torii/graphql/src/object/model_data.rs (3 hunks)
  • crates/torii/graphql/src/query/mod.rs (3 hunks)
  • crates/torii/graphql/src/schema.rs (4 hunks)
  • crates/torii/grpc/src/server/mod.rs (5 hunks)
  • crates/torii/grpc/src/server/subscriptions/entity.rs (1 hunks)
  • crates/torii/migrations/20241126064130_model_schema.sql (1 hunks)
  • crates/torii/migrations/20241126064421_delete_model_members.sql (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • crates/torii/grpc/src/server/subscriptions/entity.rs
  • crates/torii/migrations/20241126064421_delete_model_members.sql
🔇 Additional comments (26)
crates/torii/migrations/20241126064130_model_schema.sql (1)

3-3: Verify data migration from model_members

Since this change is part of consolidating schema information from the model_members table, we should verify the data migration strategy.

✅ Verification successful

Ohayo sensei! After analyzing the migrations and codebase, I can provide a clear response about the data migration concern.

No data migration needed between model_members and schema

The migration sequence shows that:

  1. First, we add the schema column (20241126064130_model_schema.sql)
  2. Then, we delete the model_members table (20241126064421_delete_model_members.sql)

This is correct because:

  • The schema column is added as NOT NULL, indicating that the schema data must be populated at the application level before these migrations run
  • The codebase shows extensive schema handling in Rust code, where the schema is managed through the Dojo types system and is serialized/deserialized using serde
  • The model_members table was storing schema information in a decomposed form, while the new schema column stores it as a serialized BLOB of the complete type information
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for data migration logic from model_members to the new schema column

echo "Checking for model_members to schema migration logic..."
rg -A 10 "model_members" "crates/torii"

echo "\nChecking for schema population logic..."
rg -A 10 "schema.*model_members" "crates/torii"

Length of output: 6305


Script:

#!/bin/bash
# Let's check for any data migration logic between the model_members and schema columns

# First, check for any migration files between the model_members and schema changes
echo "Checking migration files between schema addition and model_members deletion..."
fd -e sql . crates/torii/migrations | sort

# Then, let's look for any Rust code that might handle the migration
echo -e "\nChecking for schema-related code..."
rg -l "schema" crates/torii/src

# Finally, let's examine the content of any files that might contain migration logic
echo -e "\nChecking schema-related implementation..."
rg -A 5 "schema.*=|schema.*from" crates/torii/src

Length of output: 1961


Script:

#!/bin/bash
# Let's check the content of the migrations to understand the data migration strategy

echo "Checking the schema addition migration..."
cat crates/torii/migrations/20241126064130_model_schema.sql

echo -e "\nChecking any migrations between schema addition and model_members deletion..."
cat crates/torii/migrations/20241126064421_delete_model_members.sql

# Let's also check if there are any Rust files in the correct location
echo -e "\nLocating Rust source files..."
fd -e rs . crates/torii

# And check their content for schema-related code
echo -e "\nChecking Rust files for schema handling..."
fd -e rs . crates/torii -x rg -A 5 "schema"

Length of output: 56431

crates/torii/graphql/src/constants.rs (1)

15-16: Ohayo! The column renaming looks good, sensei! Let's verify the usage.

The prefix "internal_" better reflects these columns' internal nature. This change aligns well with the SQL schema refactoring objective.

Let's verify the column name usage across the codebase:

✅ Verification successful

Ohayo! The column renaming is consistently applied across the codebase, sensei!

The verification shows that:

  1. The new column names internal_entity_id and internal_event_message_id are properly used in all relevant SQL queries and table definitions
  2. The remaining occurrences of entity_id are either:
    • Part of different column names (e.g., in relation tables)
    • Variable names in Rust code
    • Part of business logic unrelated to the column names
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining references to old column names
# and verify the new column names are used consistently

echo "Checking for old column names..."
rg "entity_id|event_message_id" --type rust

echo "Verifying new column name usage..."
rg "internal_entity_id|internal_event_message_id" --type rust

Length of output: 20974

crates/torii/core/src/types.rs (1)

110-111: Verify schema migration and data consistency.

Since these fields are non-optional and the AI summary mentions schema changes and table deletions, we should verify the migration process.

✅ Verification successful

Let me gather more information about the migrations and model implementations.


Schema migration is properly handled, no data consistency issues

Ohayo sensei! After reviewing the migrations, I can confirm that:

  • The schema column was added with a proper NOT NULL constraint
  • The model_members table was safely dropped only after schema column was added
  • All existing model data structure is preserved in the new schema BLOB format
  • The migration maintains referential integrity with PRAGMA foreign_keys controls
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check migration files and existing model definitions

# Check if there are any models without schema/layout in migrations
fd -e sql -e rs | xargs rg -l "CREATE TABLE.*models" | xargs rg "models.*WITHOUT.*schema|models.*WITHOUT.*layout"

# Look for any NOT NULL constraints in migrations
fd -e sql | xargs rg "schema.*NOT NULL|layout.*NOT NULL"

# Check for any model implementations that might need updates
ast-grep --pattern 'impl $_ for Model {
  $$$
}'

Length of output: 408


Script:

#!/bin/bash
# Check the content of the migration files to understand the schema evolution
fd -e sql | sort | xargs cat

# Look for any Model struct definitions and their usages
ast-grep --pattern 'struct Model {
  $$$
}'

# Check for any database operations related to models
rg "models.*INSERT|models.*UPDATE" -A 3

Length of output: 13990

crates/torii/graphql/src/object/connection/mod.rs (1)

128-129: Ohayo! The field prefix logic looks good, sensei!

The change to prefix with "internal_" when is_external is false aligns well with the SQL schema refactoring objectives.

Let's verify the impact on existing queries:

✅ Verification successful

Ohayo! The field prefix change is safe and sound, sensei!

The code inspection shows that the field name handling is self-contained within the connection logic and properly encapsulated:

  • The order field is only used locally for SQL queries
  • The cursor encoding happens after field name resolution
  • The field name transformation doesn't affect the external GraphQL interface
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for any direct field references that might be affected by the prefix change

# Look for field references in GraphQL queries
rg -A 3 'orderBy.*field:' 

# Look for potential cursor decoding/encoding logic that might depend on field names
ast-grep --pattern 'cursor::decode($$$)'

Length of output: 229


Script:

#!/bin/bash
# Let's check the cursor encoding/decoding implementation to understand field name handling

# Check cursor implementation
rg -A 10 'pub fn decode' crates/torii/graphql/src/cursor.rs

# Check how order fields are used in queries
rg -A 5 'OrderBy|order_by' crates/torii/graphql/src/query/

# Check for field name handling in connection logic
rg -A 10 'order_field|field:' crates/torii/graphql/src/object/connection/

Length of output: 3461

crates/torii/graphql/src/schema.rs (3)

3-3: Ohayo! The import looks good, sensei!

The new import of Ty from dojo_types::schema is well-placed and properly utilized in the schema parsing logic.


Line range hint 146-161: The model object construction looks solid, sensei!

The code properly:

  • Checks for empty type mappings
  • Constructs field and type names consistently
  • Adds the model to the union type
  • Passes all required parameters to ModelDataObject

145-145: Verify the type mapping construction, sensei!

Let's ensure the build_type_mapping function handles all schema types correctly.

✅ Verification successful

Ohayo! The type mapping implementation looks solid, sensei!

The build_type_mapping function correctly handles:

  • Primitive types conversion
  • ByteArray special case
  • Array types with proper nesting
  • Enum type handling
  • Proper namespace context passing

The function is also consistently used across different objects (Entity, EventMessage, Model) with the same type mapping structure using IndexMap<Name, TypeData>.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the implementation of build_type_mapping function
ast-grep --pattern 'fn build_type_mapping($_, $_) {
  $$$
}'

Length of output: 62


Script:

#!/bin/bash
# Let's try a different approach to find the build_type_mapping function
rg "build_type_mapping" -A 10

Length of output: 6184


Script:

#!/bin/bash
# Let's check the member_to_type_data function that's used in build_type_mapping
rg "fn member_to_type_data" -A 10

# Also check for TypeMapping type definition
rg "type TypeMapping" -B 2 -A 2

Length of output: 1388

crates/torii/graphql/src/object/model_data.rs (1)

34-38: Verify database schema compatibility.

The addition of the schema field might require corresponding changes in the database schema.

✅ Verification successful

Based on the migration files and schema changes found, I can now provide a final response:

Schema field addition is part of intended database changes

The schema field addition in ModelDataObject::new() aligns with recent database migrations, specifically:

  • Migration 20241126064130_model_schema.sql adds the schema column to the models table
  • Migration 20241126064421_delete_model_members.sql removes the old model_members table in favor of the new schema approach

The changes are part of a planned database schema evolution where model metadata is now stored directly in the models table rather than in a separate model_members table.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for related schema changes in migration files

# Look for recent migration files that might contain schema changes
fd -e sql -e rs . -t f | rg -i "migration|schema" | while read -r file; do
  echo "=== $file ==="
  rg -A 5 -B 5 "schema|model" "$file"
done

Length of output: 28015

crates/torii/graphql/src/object/mod.rs (2)

313-313: Ensure consistent schema handling in connection output.

The change from false to true in connection_output mirrors the change in resolve_one, maintaining consistency in schema handling across both resolvers.


264-264: Ohayo sensei! Verify the schema handling behavior change.

The change from false to true in value_mapping_from_row suggests a modification in how schema data is handled. Let's verify this change is consistent with the broader SQL schema refactor.

✅ Verification successful

The previous search showed many files but we need to focus on the GraphQL implementation to understand the schema handling behavior. Let's examine the actual implementation and related test cases.


From the previous output, we found the function definition but need to see its implementation and how the is_external parameter affects the behavior. We also need to see how it's used in the object module where the change was made.


Ohayo sensei! The schema handling change is verified and safe.

The change from false to true for the is_external parameter in value_mapping_from_row is consistent with the codebase's behavior:

  1. The function is used across multiple resolvers in the GraphQL layer, where true is used for external data mapping (e.g., entities, events, token transfers) and false for internal metadata.
  2. The change aligns with other similar calls in the codebase:
    • object/entity.rs: Uses true for entity data
    • object/event_message.rs: Uses true for event data
    • object/model_data.rs: Uses true for nested model data

The modification is part of standardizing external data handling across the GraphQL layer.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the implementation of value_mapping_from_row to understand the impact
# of the boolean flag change

# Search for the value_mapping_from_row implementation
ast-grep --pattern 'fn value_mapping_from_row($$$) {
  $$$
}'

# Look for related test cases
rg -l "value_mapping_from_row.*true|false" --type rust

Length of output: 5049


Script:

#!/bin/bash
# Let's check the implementation and usage of value_mapping_from_row in the relevant files

# Check the implementation in the graphql module
rg -A 10 "fn value_mapping_from_row" crates/torii/graphql/src/

# Look for test cases specifically in graphql module
rg -A 5 "value_mapping_from_row.*true|false" crates/torii/graphql/src/tests/

Length of output: 9649


Script:

#!/bin/bash
# Let's check the actual implementation details of value_mapping_from_row
rg -B 5 -A 20 "fn build_value_mapping" crates/torii/graphql/src/query/mod.rs

# Also check where this is used in the object module
rg -B 3 -A 3 "value_mapping_from_row.*true|false" crates/torii/graphql/src/object/

Length of output: 8806

crates/torii/core/src/executor/mod.rs (1)

759-764: Ohayo, sensei! Nice improvement to error handling!

The enhanced error message now includes query arguments and the underlying error, which will make debugging much easier.

crates/torii/grpc/src/server/mod.rs (3)

68-68: Ohayo, sensei! The constant renaming improves clarity.

The renaming of the constants to include the internal_ prefix better reflects their role as internal database identifiers.

Also applies to: 72-72


302-310: Ohayo! The SQL query construction looks good.

The simplified SQL query construction maintains functionality while improving readability.


1052-1052: Clean entity mapping implementation, sensei!

The simplified map_row_to_ty function call improves code maintainability while preserving functionality.

crates/torii/core/src/sql/cache.rs (2)

10-10: Ohayo sensei! Importing ParseError

Great to see ParseError being imported alongside Error. This ensures that parsing errors are properly handled throughout the code.


85-85: Ohayo sensei! Proper handling of schema parsing

Parsing the schema field using serde_json::from_str is handled effectively, ensuring that the Model struct includes the new schema information accurately.

crates/torii/graphql/src/object/event_message.rs (3)

7-7: Ohayo, sensei! The import of Ty is correctly added.

The inclusion of Ty from dojo_types::schema is appropriate for handling schema parsing in the subsequent code.


21-21: Ohayo, sensei! Necessary imports for query utilities are added.

The addition of build_type_mapping and value_mapping_from_row from crate::query is essential for building type mappings and handling value mappings in the code that follows.


130-130: Ohayo, sensei! SQL query updated to retrieve required model data.

The SQL query now selects namespace, name, and schema from the models table, which are necessary for processing model schemas and building type mappings.

crates/torii/core/src/model.rs (1)

97-102: Ohayo sensei! Proper error handling in schema retrieval.

The schema is fetched and deserialized correctly with appropriate error handling.

crates/torii/core/src/sql/mod.rs (6)

8-8: Importing necessary types from dojo_types::schema

Ohayo, sensei! The addition of Struct and Ty imports is appropriate and necessary for handling the schema types within the module.


264-266: Including schema field in SQL insert and update

Ohayo, sensei! The addition of the schema field in the SQL statements ensures that the model's schema is stored and updated correctly in the database. This enhances data consistency.


276-276: Serializing the model schema to JSON

Ohayo, sensei! Using serde_json::to_string(&model)? to serialize the model schema is appropriate. This allows for efficient storage and retrieval of the schema.


287-290: Updating build_model_query call with correct parameters

Ohayo, sensei! Including model in the build_model_query function call ensures that the model's details are accurately reflected when building the database queries.


369-370: Calling set_entity_model with appropriate arguments

Ohayo, sensei! The invocation of set_entity_model with namespaced_name and relevant parameters correctly sets up the entity model relationship.


429-430: Using set_entity_model in set_event_message

Ohayo, sensei! Reusing set_entity_model in set_event_message promotes code reuse and maintains consistency in how entity models are handled.

@@ -0,0 +1,3 @@
-- Adds a new schema column to the models table.
-- The schema is the JSON serialized Ty of the model.
ALTER TABLE models ADD COLUMN schema BLOB NOT NULL;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Ohayo sensei! After analyzing the migrations, I can now provide a clear response about the schema addition:

Migration strategy needs adjustment for NOT NULL constraint

The models table is pre-existing and contains important columns like id, name, layout, etc. Adding a NOT NULL column schema without a default value could indeed fail if there are existing rows. The migration should be split into two steps:

  1. Add the column as nullable first
  2. Populate existing rows with data
  3. Then alter the column to NOT NULL

Example safer migration approach:

ALTER TABLE models ADD COLUMN schema BLOB;
-- (Add data population here)
ALTER TABLE models ALTER COLUMN schema SET NOT NULL;
🔗 Analysis chain

Verify the NOT NULL constraint migration strategy

Adding a NOT NULL column without a default value could fail if the table contains existing rows. Please ensure there's a proper migration strategy in place.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for related migrations and existing rows

# Look for companion migrations that might handle data population
echo "Checking for related migrations..."
fd . "crates/torii/migrations" --type f -e sql | sort | grep -B 5 -A 5 "20241126064130"

# Look for potential data population queries
echo "\nChecking for data population logic..."
rg -A 10 "INSERT INTO.*models.*schema" "crates/torii"
rg -A 10 "UPDATE.*models.*schema" "crates/torii"

Length of output: 807


Script:

#!/bin/bash
# Let's check if this is a new table creation or if the table exists
# Also check for any data population logic in the surrounding migrations

# Check the content of the model_schema migration
echo "Content of the model_schema migration:"
cat "crates/torii/migrations/20241126064130_model_schema.sql"

# Check the content of the previous migration for table creation
echo -e "\nChecking previous migrations for table creation:"
cat "crates/torii/migrations/20241014085532_add_metadata_field.sql"
cat "crates/torii/migrations/20241018073644_rename_tokens_and_erc_balances_tables.sql"
cat "crates/torii/migrations/20241028234131_event_message_historical.sql"

# Check the content of the following migration
echo -e "\nChecking the following migration:"
cat "crates/torii/migrations/20241126064421_delete_model_members.sql"

# Look for model table creation
echo -e "\nLooking for models table creation:"
rg "CREATE TABLE.*models" "crates/torii/migrations"

Length of output: 2315


Script:

#!/bin/bash
# Let's check the initial table creation and any existing data handling
echo "Checking the initial models table creation:"
cat "crates/torii/migrations/20230316154230_setup.sql"

# Let's also check if there's any data handling in the Rust code
echo -e "\nChecking for model-related code that might handle data:"
rg -g "*.rs" "models.*schema" "crates/torii/src"

Length of output: 3125

@@ -119,7 +105,7 @@ impl ResolvableObject for ModelDataObject {
let (data, page_info) = fetch_multiple_rows(
&mut conn,
&table_name,
EVENT_ID_COLUMN,
"internal_event_id",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ohayo sensei! Consider using constants for column names.

The hardcoded string "internal_event_id" appears in multiple places. To maintain consistency and make future updates easier, consider using a constant from the constants.rs module.

- "internal_event_id",
+ INTERNAL_EVENT_ID_COLUMN,

Also applies to: 120-120

None,
None,
)?;

let rows = sqlx::query(&entity_query).bind(&models_str).fetch_all(&mut *tx).await?;
println!("entity_query: {}", entity_query);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove or guard the debug print statement, sensei!

The println! statement could leak sensitive query information in production. Consider:

  1. Removing it completely
  2. Using proper logging with appropriate log levels
-println!("entity_query: {}", entity_query);
+log::debug!("entity_query: {}", entity_query);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
println!("entity_query: {}", entity_query);
log::debug!("entity_query: {}", entity_query);

Comment on lines +148 to +150
format!("SELECT * FROM [{}] WHERE internal_entity_id = ?", table_name);
let row =
sqlx::query(&query).bind(&entity_id).fetch_one(&mut *conn).await?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ohayo, sensei! Validate dynamic table names to prevent SQL injection

When constructing SQL queries with dynamic table names using format!, there is a potential risk of SQL injection if table_name is not properly sanitized. Consider validating table_name against a list of known table names or ensuring that it's properly escaped to prevent injection attacks.

Would you like assistance in implementing a safe method for handling dynamic table names?

Comment on lines +127 to +162
format!("{}.{}", path, child.name)
};
collect_columns(table_prefix, &new_path, &child.ty, selections);
}
}
Ty::Tuple(t) => {
let table_name = format!("{}${}", path, name);

tables.push(TableInfo {
table_name: table_name.clone(),
parent_table: Some(path.to_string()),
// is_optional: parent_is_optional,
depth,
});

for (i, child) in t.iter().enumerate() {
parse_ty(
&table_name,
&format!("_{}", i),
child,
selections,
tables,
arrays_queries,
_parent_is_optional,
depth + 1,
);
let new_path =
if path.is_empty() { format!("{}", i) } else { format!("{}.{}", path, i) };
collect_columns(table_prefix, &new_path, child, selections);
}
}
Ty::Array(t) => {
let table_name = format!("{}${}", path, name);
let is_optional = true;

let mut array_selections = Vec::new();
let mut array_tables = vec![TableInfo {
table_name: table_name.clone(),
parent_table: Some(path.to_string()),
// is_optional: true,
depth,
}];

parse_ty(
&table_name,
"data",
&t[0],
&mut array_selections,
&mut array_tables,
arrays_queries,
is_optional,
depth + 1,
);

arrays_queries.insert(table_name, (array_selections, array_tables));
}
Ty::Enum(e) => {
let table_name = format!("{}${}", path, name);
let is_optional = true;
// Add the enum variant column with table prefix and alias
selections.push(format!("[{table_prefix}].[{path}] as \"{table_prefix}.{path}\"",));

let mut is_typed = false;
// Add columns for each variant's value (if not empty tuple)
for option in &e.options {
if let Ty::Tuple(t) = &option.ty {
if t.is_empty() {
continue;
}
}

parse_ty(
&table_name,
&option.name,
&option.ty,
selections,
tables,
arrays_queries,
is_optional,
depth + 1,
);
is_typed = true;
}

selections.push(format!("[{}].external_{} AS \"{}.{}\"", path, name, path, name));
if is_typed {
tables.push(TableInfo {
table_name,
parent_table: Some(path.to_string()),
// is_optional: parent_is_optional || is_optional,
depth,
});
let variant_path = format!("{}.{}", path, option.name);
collect_columns(table_prefix, &variant_path, &option.ty, selections);
}
}
_ => {
selections.push(format!("[{}].external_{} AS \"{}.{}\"", path, name, path, name));
Ty::Array(_) | Ty::Primitive(_) | Ty::ByteArray(_) => {
selections.push(format!("[{table_prefix}].[{path}] as \"{table_prefix}.{path}\"",));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ohayo sensei! Ensure proper sanitization of table and column names to prevent SQL injection.

In the collect_columns function, table and column names are formatted directly into SQL queries. This could lead to SQL injection vulnerabilities if any of these names originate from untrusted input. Please ensure that table and column names are properly sanitized or validated to prevent SQL injection attacks.

Comment on lines 350 to 362
let schema = ty[0].clone();
let serialized_array = row.try_get::<String, &str>(&column_name)?;

// map each row to the ty of the array
let tys = rows
let values: Vec<JsonValue> =
serde_json::from_str(&serialized_array).map_err(ParseError::FromJsonStr)?;
*ty = values
.iter()
.map(|row| {
let mut ty = ty[0].clone();
map_row_to_ty(&path, "data", &mut ty, row, arrays_rows).map(|_| ty)
.map(|v| {
let mut ty = schema.clone();
ty.from_json_value(v.clone())?;
Result::<_, PrimitiveError>::Ok(ty)
})
.collect::<Result<Vec<_>, _>>()?;

*ty = tys;
.collect::<Result<Vec<Ty>, _>>()?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ohayo sensei! Handle potential empty arrays to avoid index out of bounds.

Accessing ty[0] assumes that the array ty is non-empty. If ty is empty, this will cause a panic due to an index out of bounds error. Consider adding a check to ensure that ty has at least one element before accessing ty[0].

Apply this diff to add a check:

+    if ty.is_empty() {
+        return Err(Error::Custom("Type array is empty.".into()));
+    }
     let schema = ty[0].clone();

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +449 to +456
let model_table = entity.name();

self.executor.send(QueryMessage::new(
"DELETE FROM entity_model WHERE entity_id = ? AND model_id = ?".to_string(),
format!(
"DELETE FROM [{model_table}] WHERE internal_id = ?; DELETE FROM entity_model \
WHERE entity_id = ? AND model_id = ?"
)
.to_string(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential SQL injection risk in delete_entity method

Ohayo, sensei! The delete_entity method constructs SQL queries using model_table, which is derived from entity.name(). If entity.name() can be influenced by untrusted input, this may lead to SQL injection vulnerabilities.

Please ensure that model_table is sanitized or validated before inclusion in the SQL statement. Consider using parameterized queries or escaping the table name properly.

Comment on lines 616 to 743
) -> Result<()> {
match entity {
Ty::Struct(s) => {
let table_id = path.join("$");
let statement = format!("DELETE FROM [{table_id}] WHERE entity_id = ?");
self.executor.send(QueryMessage::other(
statement,
vec![Argument::String(entity_id.to_string())],
))?;
for member in s.children.iter() {
let mut path_clone = path.clone();
path_clone.push(member.name.clone());
self.build_delete_entity_queries_recursive(path_clone, entity_id, &member.ty)?;
}
}
Ty::Enum(e) => {
if e.options
.iter()
.all(|o| if let Ty::Tuple(t) = &o.ty { t.is_empty() } else { false })
{
return Ok(());
}

let table_id = path.join("$");
let statement = format!("DELETE FROM [{table_id}] WHERE entity_id = ?");
self.executor.send(QueryMessage::other(
statement,
vec![Argument::String(entity_id.to_string())],
))?;
// Build the final query
let placeholders: Vec<&str> = arguments.iter().map(|_| "?").collect();
let statement = if is_store_update {
arguments.push(Argument::String(if is_event_message {
"event:".to_string() + entity_id
} else {
entity_id.to_string()
}));

for child in e.options.iter() {
if let Ty::Tuple(t) = &child.ty {
if t.is_empty() {
continue;
}
}
format!(
"UPDATE [{}] SET {} WHERE internal_id = ?",
model_name,
columns
.iter()
.zip(placeholders.iter())
.map(|(column, placeholder)| format!("{} = {}", column, placeholder))
.collect::<Vec<String>>()
.join(", ")
)
} else {
format!(
"INSERT OR REPLACE INTO [{}] ({}) VALUES ({})",
model_name,
columns.join(","),
placeholders.join(",")
)
};

let mut path_clone = path.clone();
path_clone.push(child.name.clone());
self.build_delete_entity_queries_recursive(path_clone, entity_id, &child.ty)?;
}
}
Ty::Array(array) => {
let table_id = path.join("$");
let statement = format!("DELETE FROM [{table_id}] WHERE entity_id = ?");
self.executor.send(QueryMessage::other(
statement,
vec![Argument::String(entity_id.to_string())],
))?;

for member in array.iter() {
let mut path_clone = path.clone();
path_clone.push("data".to_string());
self.build_delete_entity_queries_recursive(path_clone, entity_id, member)?;
}
}
Ty::Tuple(t) => {
let table_id = path.join("$");
let statement = format!("DELETE FROM [{table_id}] WHERE entity_id = ?");
self.executor.send(QueryMessage::other(
statement,
vec![Argument::String(entity_id.to_string())],
))?;

for (idx, member) in t.iter().enumerate() {
let mut path_clone = path.clone();
path_clone.push(format!("_{}", idx));
self.build_delete_entity_queries_recursive(path_clone, entity_id, member)?;
}
}
_ => {}
}
// Execute the single query
self.executor.send(QueryMessage::other(statement, arguments))?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential SQL injection in dynamic SQL statements

Ohayo, sensei! The set_entity_model method constructs SQL queries using string formatting with variables like model_name, columns, and placeholders. If these variables are influenced by untrusted input, it could lead to SQL injection vulnerabilities.

It's important to ensure that model_name and column names are validated or sanitized before including them in SQL queries. Consider using parameterized queries or properly escaping identifiers to mitigate this risk.

Comment on lines 757 to 811
if array_idx > 0 {
// index columns
for i in 0..array_idx {
let column = format!("idx_{i} INTEGER NOT NULL");
create_table_query.push_str(&format!("{column}, "));

alter_table_queries.push(format!(
"ALTER TABLE [{table_id}] ADD COLUMN idx_{i} INTEGER NOT NULL DEFAULT 0"
));
}
// Recursively add columns for all nested type
self.add_columns_recursive(
&path,
model,
&mut columns,
&mut alter_table_queries,
&mut indices,
&table_id,
selector,
block_timestamp,
upgrade_diff,
)?;

// full array id column
create_table_query.push_str("full_array_id TEXT NOT NULL UNIQUE, ");
alter_table_queries.push(format!(
"ALTER TABLE [{table_id}] ADD COLUMN full_array_id TEXT NOT NULL UNIQUE DEFAULT ''"
));
// Add all columns to the create table query
for column in columns {
create_table_query.push_str(&format!("{}, ", column));
}

let mut build_member = |name: &str, ty: &Ty, options: &mut Option<Argument>| {
if let Ok(cairo_type) = Primitive::from_str(&ty.name()) {
let sql_type = cairo_type.to_sql_type();
let column = format!("external_{name} {sql_type}");

create_table_query.push_str(&format!("{column}, "));

alter_table_queries.push(format!(
"ALTER TABLE [{table_id}] ADD COLUMN external_{name} {sql_type}"
));

indices.push(format!(
"CREATE INDEX IF NOT EXISTS [idx_{table_id}_{name}] ON [{table_id}] \
(external_{name});"
));
} else if let Ty::Enum(e) = &ty {
let all_options = e
.options
.iter()
.map(|c| format!("'{}'", c.name))
.collect::<Vec<_>>()
.join(", ");

let column =
format!("external_{name} TEXT CHECK(external_{name} IN ({all_options}))",);

create_table_query.push_str(&format!("{column}, "));
// Add internal timestamps
create_table_query.push_str("internal_executed_at DATETIME NOT NULL, ");
create_table_query
.push_str("internal_created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, ");
create_table_query
.push_str("internal_updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, ");

alter_table_queries.push(format!("ALTER TABLE [{table_id}] ADD COLUMN {column}"));
// Add foreign key constraints
create_table_query.push_str("FOREIGN KEY (internal_entity_id) REFERENCES entities(id), ");
create_table_query
.push_str("FOREIGN KEY (internal_event_message_id) REFERENCES event_messages(id));");

indices.push(format!(
"CREATE INDEX IF NOT EXISTS [idx_{table_id}_{name}] ON [{table_id}] \
(external_{name});"
));
// Execute the queries
if upgrade_diff.is_some() {
for alter_query in alter_table_queries {
self.executor.send(QueryMessage::other(alter_query, vec![]))?;
}
} else {
self.executor.send(QueryMessage::other(create_table_query, vec![]))?;
}

*options = Some(Argument::String(
e.options
.iter()
.map(|c: &dojo_types::schema::EnumOption| c.name.clone())
.collect::<Vec<_>>()
.join(",")
.to_string(),
));
} else if let Ty::ByteArray(_) = &ty {
let column = format!("external_{name} TEXT");
// Create indices
for index_query in indices {
self.executor.send(QueryMessage::other(index_query, vec![]))?;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Dynamic SQL construction in build_model_query without validation

Ohayo, sensei! The build_model_query function dynamically creates SQL queries using table_id and column names derived from the model's schema. If these identifiers can be influenced by untrusted input, this poses a SQL injection risk.

Ensure that table_id and column names are sanitized or validated. Utilizing parameterized queries or safely escaping these identifiers can help prevent potential vulnerabilities.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
crates/torii/core/src/sql/mod.rs (3)

616-743: Consider adding depth limit and improving error handling.

Ohayo, sensei! While the implementation is thorough, there are two potential improvements:

  1. Add a depth limit to prevent stack overflow with deeply nested structures
  2. Enhance error messages for better debugging

Consider adding these improvements:

     fn set_entity_model(
         &mut self,
         model_name: &str,
         event_id: &str,
         entity_id: (&str, IsEventMessage),
         entity: (&Ty, IsStoreUpdate),
         block_timestamp: u64,
+        depth: Option<u32>,  // Add depth parameter
     ) -> Result<()> {
+        const MAX_DEPTH: u32 = 32;  // Prevent excessive recursion
+        let current_depth = depth.unwrap_or(0);
+        if current_depth > MAX_DEPTH {
+            return Err(anyhow!("Maximum nesting depth exceeded for model {}", model_name));
+        }

757-814: Consider more specific index naming patterns.

Ohayo, sensei! While the implementation is solid, the index naming could be more specific to avoid potential conflicts in large schemas.

Consider enhancing the index naming:

-                "CREATE INDEX IF NOT EXISTS [idx_{table_id}_{name}] ON [{table_id}] ([{name}]);"
+                "CREATE INDEX IF NOT EXISTS [idx_{table_id}_{selector}_{name}] ON [{table_id}] ([{name}]);"

Line range hint 816-948: Add documentation for complex type handling logic.

Ohayo, sensei! The type handling implementation is comprehensive but could benefit from documentation explaining the mapping between Rust types and SQL types.

Consider adding documentation:

+    /// Recursively adds columns to the table schema based on the type structure.
+    /// 
+    /// # Type Mapping
+    /// - Struct: Creates columns for each member
+    /// - Enum: Creates a TEXT column with CHECK constraint
+    /// - Array: Creates a TEXT column storing JSON array
+    /// - Primitive: Maps to corresponding SQL type
     fn add_columns_recursive(
         &mut self,
         path: &[String],
         ty: &Ty,
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a8afbcd and be9c251.

📒 Files selected for processing (1)
  • crates/torii/core/src/sql/mod.rs (10 hunks)
🔇 Additional comments (3)
crates/torii/core/src/sql/mod.rs (3)

Line range hint 264-290: LGTM! Secure schema management implementation.

Ohayo, sensei! The implementation properly uses parameterized queries and handles schema serialization securely.


369-370: LGTM! Well-structured entity management refactoring.

Ohayo, sensei! The refactoring to use set_entity_model improves code reuse and maintainability.

Also applies to: 429-430


449-456: ⚠️ Potential issue

Critical: SQL injection vulnerability in table name construction.

Ohayo, sensei! The delete_entity method still constructs SQL queries using unsanitized table names from entity.name(). This poses a SQL injection risk.

Consider using a query builder or implementing proper table name validation:

-            format!(
-                "DELETE FROM [{model_table}] WHERE internal_id = ?; DELETE FROM entity_model \
-                 WHERE entity_id = ? AND model_id = ?"
-            )
+            // Use a dedicated function to validate table names
+            let safe_table_name = sanitize_table_name(&model_table)?;
+            format!(
+                "DELETE FROM [{safe_table_name}] WHERE internal_id = ?; DELETE FROM entity_model \
+                 WHERE entity_id = ? AND model_id = ?"
+            )

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
crates/torii/graphql/src/types.rs (1)

45-51: Ohayo! The implementation looks clean, but could use documentation.

The new inner() method is well-implemented and follows Rust idioms. It provides a clean way to access the inner type of List variants, which aligns well with the broader schema refactoring efforts.

Consider adding documentation to explain the method's purpose:

+    /// Returns a reference to the inner TypeData if this is a List variant, None otherwise.
+    /// 
+    /// # Examples
+    /// ```
+    /// let list_type = TypeData::List(Box::new(TypeData::Simple(TypeRef::String)));
+    /// assert!(list_type.inner().is_some());
+    /// ```
     pub fn inner(&self) -> Option<&TypeData> {
         match self {
             TypeData::List(inner) => Some(inner),
             _ => None,
         }
     }
crates/torii/graphql/src/query/mod.rs (2)

98-101: Ohayo sensei! Consider caching the compiled regex

The regex is compiled on every function call, which could impact performance for frequent calls.

Consider using lazy_static to compile the regex once:

+use lazy_static::lazy_static;
+
+lazy_static! {
+    static ref TYPE_NAME_SANITIZER: Regex = Regex::new(r"[, ()<>-]").unwrap();
+}

-    let re = Regex::new(r"[, ()<>-]").unwrap();
-    let sanitized_member_type_name = re.replace_all(&name, "");
+    let sanitized_member_type_name = TYPE_NAME_SANITIZER.replace_all(&name, "");

165-168: Ohayo sensei! Enhance error context in JSON parsing

The current error handling for JSON parsing could be more informative for debugging.

Consider adding more context:

-                            serde_json::from_str(&json_str).map_err(|e| {
-                                sqlx::Error::Protocol(format!("JSON parse error: {}", e))
-                            })?;
+                            serde_json::from_str(&json_str).map_err(|e| {
+                                sqlx::Error::Protocol(format!(
+                                    "Failed to parse JSON for column '{}': {}. Value: '{}'",
+                                    column_name, e, json_str
+                                ))
+                            })?;
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between be9c251 and 4753141.

📒 Files selected for processing (2)
  • crates/torii/graphql/src/query/mod.rs (3 hunks)
  • crates/torii/graphql/src/types.rs (1 hunks)
🔇 Additional comments (1)
crates/torii/graphql/src/query/mod.rs (1)

22-23: ⚠️ Potential issue

Ohayo sensei! Handle as_struct() gracefully to prevent panics

The .unwrap() call on schema.as_struct() could cause runtime panics if the schema isn't a struct type.

Consider handling the Option safely:

-pub fn build_type_mapping(namespace: &str, schema: &Ty) -> TypeMapping {
-    let model = schema.as_struct().unwrap();
+pub fn build_type_mapping(namespace: &str, schema: &Ty) -> Result<TypeMapping, &'static str> {
+    let model = schema.as_struct().ok_or("Schema must be a struct type")?;

Comment on lines 247 to 248
let mut column_name =
if !is_external { format!("internal_{}", field_name) } else { field_name.to_string() };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ohayo sensei! Sanitize column names to prevent SQL injection

The column name construction could be vulnerable to SQL injection if field names contain malicious input.

Consider adding validation or using prepared statements:

+fn sanitize_column_name(name: &str) -> String {
+    // Only allow alphanumeric characters, underscores, and dots
+    name.chars()
+        .filter(|c| c.is_alphanumeric() || *c == '_' || *c == '.')
+        .collect()
+}

 let mut column_name =
-    if !is_external { format!("internal_{}", field_name) } else { field_name.to_string() };
+    if !is_external {
+        format!("internal_{}", sanitize_column_name(field_name))
+    } else {
+        sanitize_column_name(field_name)
+    };

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 170 to 209
fn populate_value(
value: &mut Value,
type_data: &TypeData,
entity_id: &Option<String>,
) {
match value {
Value::Object(obj) => {
for (field_name, field_value) in obj.iter_mut() {
populate_value(
field_value,
&type_data.type_mapping().unwrap()[field_name],
entity_id,
);
}

if type_data.type_mapping().map_or(false, |mapping| {
mapping.contains_key(&Name::new("option"))
}) {
obj.insert(
Name::new("option"),
Value::String(obj.keys().next().unwrap().to_string()),
);
}

// insert $entity_id$ relation
if let Some(entity_id) = entity_id {
obj.insert(
Name::new(INTERNAL_ENTITY_ID_KEY),
Value::from(entity_id),
);
}
}
Value::List(inner) => {
for item in inner {
populate_value(item, type_data.inner().unwrap(), entity_id);
}
}
_ => {}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ohayo sensei! Consider extracting the nested populate_value function

The populate_value function is quite complex and deeply nested within build_value_mapping. This makes it harder to test and maintain.

Consider moving it outside as a standalone function:

+fn populate_value(
+    value: &mut Value,
+    type_data: &TypeData,
+    entity_id: &Option<String>,
+) {
+    match value {
+        Value::Object(obj) => {
+            // ... existing object handling logic
+        }
+        Value::List(inner) => {
+            // ... existing list handling logic
+        }
+        _ => {}
+    }
+}

 fn build_value_mapping(
     row: &SqliteRow,
     types: &TypeMapping,
     prefix: &str,
     is_external: bool,
     entity_id: &Option<String>,
 ) -> sqlx::Result<ValueMapping> {
     // ... existing code ...
-    fn populate_value(...) { ... }
     // ... use the extracted function ...
 }

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
crates/torii/grpc/src/server/mod.rs (1)

1137-1158: Nice model deduplication approach, sensei!

The use of HashMap to track seen models and generate unique aliases is a clean solution. However, there's an unwrap that could be handled more gracefully.

Consider handling the potential error case:

-        .unwrap();
+        .map_err(|e| QueryError::InvalidNamespacedModel(model.clone()))?;
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 70b3eb3 and a4ab8b5.

📒 Files selected for processing (3)
  • crates/torii/graphql/src/query/data.rs (4 hunks)
  • crates/torii/graphql/src/query/filter.rs (1 hunks)
  • crates/torii/grpc/src/server/mod.rs (7 hunks)
🔇 Additional comments (8)
crates/torii/graphql/src/query/filter.rs (2)

59-59: Ohayo! Verify schema compatibility with the simplified field naming.

The removal of field prefixing aligns with the schema refactoring, but let's ensure it's compatible with the new schema structure.

✅ Verification successful

Ohayo! The field naming change is safe and consistent with the codebase, sensei!

The removal of the "external_" prefix from filter fields is part of a clean schema refactoring. Our investigation shows:

  1. The external_ prefix is not used in any model or schema definitions
  2. The only remaining external_ usages are unrelated (in libp2p, server config, etc.)
  3. The filter handling is consistent across the codebase with the simplified field naming
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the new field naming convention is consistent with schema definitions

# Search for schema-related SQL queries to verify field naming patterns
rg -A 5 "CREATE TABLE.*models" 

# Check for any remaining "external_" prefixes in schema-related code
rg "external_.*schema"

Length of output: 514


Script:

#!/bin/bash
# Let's check the schema and filter-related code more comprehensively

# Check the schema definitions in migrations
fd migration sql --exec cat {}

# Look for field references in GraphQL schema definitions
fd schema.rs --exec cat {}

# Search for filter usage patterns
rg -A 3 "Filter.*field.*value"

# Check for any remaining external_ prefixes in the codebase
rg "external_" --type rust

Length of output: 65997


Script:

#!/bin/bash
# Let's check the GraphQL schema and query handling

# Look for GraphQL type definitions and field references
fd graphql --type rs --exec cat {}

# Check for any query-related code that might use these fields
rg -A 5 "impl.*Query"

# Look for any SQL-related code that handles these fields
rg -A 5 "SELECT.*FROM.*models"

Length of output: 4068


64-64: Verify backward compatibility with existing queries, sensei!

The simplified field naming in the default equality case looks good, but we should ensure existing queries won't break.

✅ Verification successful

Ohayo! No breaking changes detected in filter implementation, sensei!

The simplified field naming in the default equality case is safe because:

  • The core Filter struct in graphql/src/query/filter.rs maintains its original structure with field, comparator, and value fields
  • Existing query implementations in the codebase continue to use the same filter pattern
  • Test cases in models_test.rs demonstrate consistent filter usage with both equality and GTE comparisons
  • The change only affects the constructor's default case, not the underlying filter behavior
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential breaking changes in existing queries

# Search for GraphQL query patterns that might be affected
ast-grep --pattern 'query { $$$($_ : $_) { $$$} }'

# Look for test files that might need updates
fd -e rs -e graphql "test" --exec rg -l "query.*filter"

Length of output: 108


Script:

#!/bin/bash
# Let's check for filter-related tests and their usage patterns
rg -g "*.rs" "filter.*test" -B 2 -A 2

# Look for actual GraphQL query implementations using filters
rg -g "*.rs" -g "*.graphql" "filter.*query" -B 2 -A 2

# Check for filter-related test files specifically
fd -e rs "filter.*test"

# Look for filter usage in GraphQL schema definitions
rg -g "*.graphql" "filter" -B 2 -A 2

Length of output: 1924


Script:

#!/bin/bash
# Let's examine the filter implementation and its usage
rg -g "*.rs" "impl.*Filter" -B 2 -A 5

# Check for filter-related GraphQL schema definitions
fd -e graphql . --exec cat {}

# Look for filter usage in queries
ast-grep --pattern 'fn $_($$$ filter: $_) { $$$ }'

# Check test cases specifically using filters
rg -g "*.rs" "test.*filter" -B 2 -A 5

Length of output: 2373


Script:

#!/bin/bash
# Let's check the actual Filter struct implementation and its usage
rg -g "*.rs" "struct Filter" -B 2 -A 5

# Look for any GraphQL filter input types
rg -g "*.rs" "FilterInput" -B 2 -A 5

# Check for filter-related test cases with actual query examples
rg -g "*.rs" "query.*filter" -B 5 -A 5

# Look for filter-related schema definitions
rg -g "*.rs" "impl.*Schema.*filter" -B 5 -A 5

Length of output: 5565

crates/torii/graphql/src/query/data.rs (2)

8-8: Ohayo! LGTM on the constants import!

Good practice to centralize the DEFAULT_LIMIT configuration.


127-130: ⚠️ Potential issue

Consider using safer error handling for field access, sensei!

The use of try_get_unchecked for order field values could lead to runtime panics if the field doesn't exist. Consider using try_get with proper error handling instead.

-            &data[0].try_get_unchecked::<String, &str>(&order_field)?,
+            &data[0].try_get::<String, &str>(&order_field)
+                .map_err(|e| sqlx::Error::ColumnNotFound(order_field.clone()))?,

Likely invalid or redundant comment.

crates/torii/grpc/src/server/mod.rs (4)

68-72: Ohayo, sensei! The constant renaming looks good!

The renaming of the column constants to include the "internal_" prefix better reflects their role as internal database identifiers, improving code clarity.


302-318: Query building simplification looks clean, sensei!

The removal of array-related parameters and simplification of the build_sql_query call aligns well with the SQL schema refactoring objectives. The code is now more focused and maintainable.


1046-1046: Clean mapping simplification, sensei!

The direct usage of schema name in map_row_to_ty aligns well with the flattened schema approach and makes the code more straightforward.


677-685: Verify schema flattening assumption, sensei!

The code now directly uses member names as column names, assuming they are already flattened in the schema. Let's verify this assumption.

✅ Verification successful

Let me gather more specific information about the schema flattening implementation.


Schema flattening is properly implemented, sensei!

The code inspection reveals that column flattening is correctly handled through the build_sql_query implementation in crates/torii/core/src/model.rs. The system:

  • Flattens nested structures during SQL query building via collect_columns function
  • Properly maps column names to member names in map_row_to_ty
  • Handles column name generation consistently across the codebase

The direct member name usage in the SQL query is safe because the schema processing pipeline ensures proper flattening before reaching this point.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if all member columns in the schema are properly flattened

# Search for schema definitions and column mappings
rg -A 5 "schema.*flatten|column.*mapping" 

# Search for any potential nested member access patterns
ast-grep --pattern 'member.$_'

Length of output: 5413


Script:

#!/bin/bash
# Check schema flattening implementation in SQL and core logic
rg -A 10 "flatten.*schema|build_sql_query" crates/torii/

# Check how column names are handled in SQL queries
rg -A 5 "column_name.*member|member.*column" crates/torii/

# Look for any nested column handling
ast-grep --pattern 'build_sql_query($$$)'

Length of output: 9785

Comment on lines +191 to +201
Some(order) => Ok(format!(
"(({} {} '{}' AND {} = '{}') OR {} {} '{}')",
id_column,
direction.as_ref(),
event_id,
order.field,
field_value,
order.field,
direction.as_ref(),
field_value
)),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ohayo! Important security consideration for SQL injection prevention.

The current implementation builds SQL conditions using string formatting, which could be vulnerable to SQL injection attacks. Consider using parameterized queries instead.

Here's a suggested approach:

  1. Create a prepared statement with placeholders
  2. Bind the values safely using sqlx's parameter binding

Example approach:

fn handle_cursor(
    cursor: &str,
    order: &Option<Order>,
    direction: CursorDirection,
    id_column: &str,
) -> Result<(String, Vec<String>)> {
    match cursor::decode(cursor) {
        Ok((event_id, field_value)) => match order {
            Some(order) => Ok((
                format!(
                    "(({} {} ? AND {} = ?) OR {} {} ?)",
                    id_column,
                    direction.as_ref(),
                    order.field,
                    order.field,
                    direction.as_ref(),
                ),
                vec![event_id, field_value.clone(), field_value]
            )),
            None => Ok((
                format!("{} {} ?", id_column, direction.as_ref()),
                vec![event_id]
            )),
        },
        Err(_) => Err(sqlx::Error::Decode("Invalid cursor format".into())),
    }
}

Comment on lines +90 to 91
let column_name = order.field.clone();
query.push_str(&format!(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Ohayo sensei! Found an inconsistency in order field handling

The search revealed that in object/connection/mod.rs, order fields are still being prefixed with internal_ in some cases:

format!("internal_{}", order.field)

While the code under review removes this prefix handling, suggesting a potential mismatch in the implementation.

  • crates/torii/graphql/src/object/connection/mod.rs: Still uses internal_ prefix
  • crates/torii/graphql/src/query/data.rs: Removed the prefix handling
🔗 Analysis chain

Ohayo sensei! Verify schema compatibility with order field changes.

The simplified order field handling looks cleaner, but we should verify that this change aligns with the SQL schema refactor mentioned in the PR title.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the order fields exist in the tables schema
# Look for table definitions and column names in migration files

# Search for CREATE TABLE statements and their columns
rg -A 20 "CREATE TABLE" --glob "migrations/*.sql"

# Search for any external_ prefixed columns that might have been affected
rg "external_" --glob "migrations/*.sql"

Length of output: 390


Script:

#!/bin/bash
# Let's try a different approach to find SQL schema and order-related code

# Find SQL files in the repository
fd -e sql

# Look for GraphQL schema files that might define ordering
fd schema

# Search for order-related code in the current file's context
rg -B 5 -A 5 "order.field" "crates/torii/graphql/src/query/data.rs"

# Search for other order field usages
rg "order.field"

Length of output: 5135

Copy link

codecov bot commented Dec 2, 2024

Codecov Report

Attention: Patch coverage is 66.61538% with 217 lines in your changes missing coverage. Please review.

Project coverage is 55.78%. Comparing base (c65c257) to head (5e04d91).
Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
crates/torii/graphql/src/query/mod.rs 53.80% 85 Missing ⚠️
crates/torii/grpc/src/server/mod.rs 7.14% 39 Missing ⚠️
crates/torii/core/src/model.rs 64.77% 31 Missing ⚠️
crates/torii/core/src/sql/mod.rs 88.42% 28 Missing ⚠️
crates/torii/graphql/src/object/event_message.rs 0.00% 18 Missing ⚠️
crates/torii/core/src/executor/mod.rs 16.66% 5 Missing ⚠️
crates/torii/graphql/src/types.rs 0.00% 5 Missing ⚠️
crates/torii/graphql/src/object/model_data.rs 63.63% 4 Missing ⚠️
crates/torii/graphql/src/object/connection/mod.rs 80.00% 1 Missing ⚠️
crates/torii/graphql/src/object/entity.rs 93.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2718      +/-   ##
==========================================
- Coverage   56.47%   55.78%   -0.69%     
==========================================
  Files         421      425       +4     
  Lines       53976    53315     -661     
==========================================
- Hits        30482    29741     -741     
- Misses      23494    23574      +80     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@glihm glihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great rework @Larkooo, simplifies so much at the schema level.

Some points:

  1. When an array contains options, we have this serialization: [{"Some":6},{"None":[]},{"Some":6}]. Is this something that could cause issue while deserializing? Since it's not a cairo serialization.
  2. Only hand tested some gRPC / GraphQL queries, will add some tests to have complex queries with nested types.

@glihm glihm merged commit d0722c4 into dojoengine:main Dec 3, 2024
14 checks passed
augustin-v pushed a commit to augustin-v/dojo that referenced this pull request Dec 4, 2024
* start sql model register refactor

* refactor: arrays as json & model members

* alter table and model idx

* fix: model idx

* better error log

* remove model idx

* rempve model members table

* model schema migration

* lol

* refactor: ditch model members & refactor dynamic query

* start refactoring while grpc and grahql services/type mapping

* fix set entities enum

* fix set entities

* fmt

* update grpc for flattened reads

* cleanup

* feat(torii-core): delete entity new schema

* graphql integration almost done

* global arrays refactor & fix graphql

* fmt

* tuple no _

* feat: graphql arrays of objects and enums

* feat: tuple support

* fmt

* c

* f

* fix

* fix entities

* refactor: value mapping fix

* types

* fix: snake case

* fix: casing

* f

* namespaced schema

* fix grpc test

* ff

* fmt

* c

* fix clauses

* fix libp2p

* fix id
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants