-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(torii): sql proxy endpoint for querying #2706
Conversation
WalkthroughOhayo, sensei! This pull request introduces significant changes to the database connection handling in the Changes
Possibly related PRs
Suggested reviewers
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
crates/torii/server/Cargo.toml (1)
32-32
: Ohayo sensei! Consider using workspace-level version management.The new dependency doesn't follow the workspace pattern used by other dependencies in this file. For consistency, consider moving the version specification to the workspace level.
-form_urlencoded = "1.2.1" +form_urlencoded.workspace = true
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
bin/torii/src/main.rs
(5 hunks)crates/torii/server/Cargo.toml
(1 hunks)crates/torii/server/src/proxy.rs
(8 hunks)
🔇 Additional comments (7)
crates/torii/server/Cargo.toml (1)
32-32
: 🛠️ Refactor suggestion
Consider using semantic versioning for security updates.
Using an exact version ("1.2.1") might prevent receiving important security updates. Consider using semantic versioning instead:
-form_urlencoded = "1.2.1"
+form_urlencoded = "1.2"
bin/torii/src/main.rs (6)
102-107
: Verify the read-only connection pool setup
Ohayo sensei! Please verify that the read-only connection pool is correctly configured. Ensure that creating readonly_options
with options.read_only(true)
does not mutate the original options
, and that both pools operate independently as intended. Also, consider if the max_connections(100)
setting for the read-only pool is appropriate for the expected database load.
130-130
: Confirm ModelCache
compatibility with read-only pool
Ohayo sensei! Please ensure that the ModelCache
works correctly with the read-only connection pool. If it needs to perform write operations, it should use the writable pool instead.
176-176
: Verify torii_grpc::server::new
uses read-only pool appropriately
Ohayo sensei! Please confirm that torii_grpc::server::new
functions correctly with a read-only database connection. If it requires write capabilities for any of its operations, consider using the writable pool.
191-196
: Ensure artifacts_server
operates with read-only pool
Ohayo sensei! Verify that the artifacts_server
can operate correctly using the read-only connection pool. If it needs to write to the database, please provide it with the writable pool.
217-217
: Confirm Proxy
functionality with read-only pool
Ohayo sensei! Please confirm that the Proxy
server does not perform any write operations to the database and that it can function properly with the read-only pool.
222-222
: Check GraphQL server's compatibility with read-only pool
Ohayo sensei! Ensure that the GraphQL server, instantiated in spawn_rebuilding_graphql_server
, can operate with a read-only database connection. If it requires write access, consider passing the writable pool instead.
crates/torii/server/src/proxy.rs
Outdated
if req.uri().path().starts_with("/sql") { | ||
let query = if req.method() == Method::GET { | ||
// Get the query from URL parameters | ||
let params = req.uri().query().unwrap_or_default(); | ||
form_urlencoded::parse(params.as_bytes()) | ||
.find(|(key, _)| key == "q") | ||
.map(|(_, value)| value.to_string()) | ||
.unwrap_or_default() | ||
} else if req.method() == Method::POST { | ||
// Get the query from request body | ||
let body_bytes = hyper::body::to_bytes(req.into_body()).await.unwrap_or_default(); | ||
String::from_utf8(body_bytes.to_vec()).unwrap_or_default() | ||
} else { | ||
return Ok(Response::builder() | ||
.status(StatusCode::METHOD_NOT_ALLOWED) | ||
.body(Body::from("Only GET and POST methods are allowed")) | ||
.unwrap()); | ||
}; | ||
|
||
// Execute the query in a read-only transaction | ||
return match sqlx::query(&query).fetch_all(&*pool).await { | ||
Ok(rows) => { | ||
let result: Vec<_> = rows | ||
.iter() | ||
.map(|row| { | ||
let mut obj = serde_json::Map::new(); | ||
for (i, column) in row.columns().iter().enumerate() { | ||
let value: serde_json::Value = match column.type_info().name() { | ||
"TEXT" => row | ||
.get::<Option<String>, _>(i) | ||
.map_or(serde_json::Value::Null, serde_json::Value::String), | ||
"INTEGER" => row | ||
.get::<Option<i64>, _>(i) | ||
.map_or(serde_json::Value::Null, |n| { | ||
serde_json::Value::Number(n.into()) | ||
}), | ||
"REAL" => row.get::<Option<f64>, _>(i).map_or( | ||
serde_json::Value::Null, | ||
|f| { | ||
serde_json::Number::from_f64(f).map_or( | ||
serde_json::Value::Null, | ||
serde_json::Value::Number, | ||
) | ||
}, | ||
), | ||
"BLOB" => row | ||
.get::<Option<Vec<u8>>, _>(i) | ||
.map_or(serde_json::Value::Null, |bytes| { | ||
serde_json::Value::String(STANDARD.encode(bytes)) | ||
}), | ||
_ => row | ||
.get::<Option<String>, _>(i) | ||
.map_or(serde_json::Value::Null, serde_json::Value::String), | ||
}; | ||
obj.insert(column.name().to_string(), value); | ||
} | ||
serde_json::Value::Object(obj) | ||
}) | ||
.collect(); | ||
|
||
let json = serde_json::to_string(&result).unwrap(); | ||
|
||
Ok(Response::builder() | ||
.status(StatusCode::OK) | ||
.header(CONTENT_TYPE, "application/json") | ||
.body(Body::from(json)) | ||
.unwrap()) | ||
} | ||
Err(e) => { | ||
error!("SQL query error: {:?}", e); | ||
Ok(Response::builder() | ||
.status(StatusCode::BAD_REQUEST) | ||
.body(Body::from("Query error")) | ||
.unwrap()) | ||
} | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical Security Risk: Executing Arbitrary SQL Queries
Ohayo, sensei! The /sql
endpoint currently allows users to execute arbitrary SQL queries directly against the database. This poses a significant security risk, including potential data leakage and SQL injection attacks. Even with a read-only SqlitePool
, exposing direct query execution can compromise the integrity and confidentiality of the data.
I recommend removing this endpoint or implementing a safe abstraction that only allows predefined queries or parameterized inputs. Would you like assistance in refactoring this to enhance security?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
crates/torii/server/src/proxy.rs (2)
266-292
: Consider enhancing type handling for broader compatibilityOhayo, sensei! The current SQL type to JSON mapping may not cover all possible data types, which could lead to unexpected behavior if new types are introduced. Consider using a more dynamic approach to handle various data types generically.
308-310
: Ohayo, sensei! Avoid exposing detailed error messages to clientsReturning full error messages in the response may leak sensitive information. It's better to log the error internally and return a generic message to the client.
Apply this diff to modify the error handling:
Err(e) => { + error!(target: LOG_TARGET, "Query execution error: {:?}", e); Ok(Response::builder() .status(StatusCode::BAD_REQUEST) - .body(Body::from(format!("Query error: {:?}", e))) + .body(Body::from("Failed to execute query")) .unwrap()) }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
crates/torii/server/src/proxy.rs
(11 hunks)
🔇 Additional comments (1)
crates/torii/server/src/proxy.rs (1)
239-312
: Ohayo, sensei! Duplicate of previous critical issue regarding /sql
endpoint
The critical security concerns regarding exposing the /sql
endpoint for arbitrary SQL queries, as mentioned in the previous review comment, still apply.
let body_bytes = hyper::body::to_bytes(req.into_body()).await.unwrap_or_default(); | ||
String::from_utf8(body_bytes.to_vec()).unwrap_or_default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Ohayo, sensei! Avoid silently ignoring errors when reading request body
Using unwrap_or_default()
when reading the request body may mask errors that should be handled appropriately. Consider properly handling errors when converting the request body to bytes and when converting bytes to a UTF-8 string.
Apply this diff to handle errors explicitly:
- let body_bytes = hyper::body::to_bytes(req.into_body()).await.unwrap_or_default();
- String::from_utf8(body_bytes.to_vec()).unwrap_or_default()
+ let body_bytes = match hyper::body::to_bytes(req.into_body()).await {
+ Ok(bytes) => bytes,
+ Err(e) => {
+ error!(target: LOG_TARGET, "Failed to read request body: {:?}", e);
+ return Ok(Response::builder()
+ .status(StatusCode::BAD_REQUEST)
+ .body(Body::from("Failed to read request body"))
+ .unwrap());
+ }
+ };
+ let query = match String::from_utf8(body_bytes.to_vec()) {
+ Ok(str) => str,
+ Err(e) => {
+ error!(target: LOG_TARGET, "Invalid UTF-8 sequence: {:?}", e);
+ return Ok(Response::builder()
+ .status(StatusCode::BAD_REQUEST)
+ .body(Body::from("Invalid UTF-8 sequence in request body"))
+ .unwrap());
+ }
+ };
Committable suggestion skipped: line range outside the PR's diff.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2706 +/- ##
=======================================
Coverage ? 56.17%
=======================================
Files ? 410
Lines ? 52929
Branches ? 0
=======================================
Hits ? 29733
Misses ? 23196
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (1)
crates/torii/server/src/proxy.rs (1)
266-295
: Simplify and enhance data type handling in query resultsThe current matching on
column.type_info().name()
may not cover all possible SQLite data types and can be simplified. Consider usingrow.try_get_raw()
andValueRef
for more robust handling of different data types.Here's how you could refactor the code:
- let value: serde_json::Value = match column.type_info().name() { - "TEXT" => row - .get::<Option<String>, _>(i) - .map_or(serde_json::Value::Null, serde_json::Value::String), - // for operators like count(*) the type info is NULL - // so we default to a number - "INTEGER" | "NULL" => row - .get::<Option<i64>, _>(i) - .map_or(serde_json::Value::Null, |n| { - serde_json::Value::Number(n.into()) - }), - "REAL" => row.get::<Option<f64>, _>(i).map_or( - serde_json::Value::Null, - |f| { - serde_json::Number::from_f64(f).map_or( - serde_json::Value::Null, - serde_json::Value::Number, - ) - }, - ), - "BLOB" => row - .get::<Option<Vec<u8>>, _>(i) - .map_or(serde_json::Value::Null, |bytes| { - serde_json::Value::String(STANDARD.encode(bytes)) - }), - _ => row - .get::<Option<String>, _>(i) - .map_or(serde_json::Value::Null, serde_json::Value::String), - }; + let value = match row.try_get_raw(i)?.type_info().name() { + "NULL" => serde_json::Value::Null, + "INTEGER" => { + let val: i64 = row.get(i); + serde_json::Value::Number(val.into()) + }, + "REAL" => { + let val: f64 = row.get(i); + serde_json::Value::Number(serde_json::Number::from_f64(val).unwrap_or(serde_json::Number::from(0))) + }, + "TEXT" => { + let val: String = row.get(i); + serde_json::Value::String(val) + }, + "BLOB" => { + let bytes: Vec<u8> = row.get(i); + serde_json::Value::String(STANDARD.encode(bytes)) + }, + _ => serde_json::Value::Null, + };
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
crates/torii/server/src/proxy.rs
(11 hunks)
🔇 Additional comments (1)
crates/torii/server/src/proxy.rs (1)
249-250
: 🛠️ Refactor suggestion
Handle potential errors when reading request body
Using unwrap_or_default()
may silently ignore errors when reading the request body. To ensure robustness, handle errors explicitly to provide appropriate feedback to the client.
Apply this diff to handle errors properly:
- let body_bytes = hyper::body::to_bytes(req.into_body()).await.unwrap_or_default();
- String::from_utf8(body_bytes.to_vec()).unwrap_or_default()
+ let body_bytes = match hyper::body::to_bytes(req.into_body()).await {
+ Ok(bytes) => bytes,
+ Err(e) => {
+ error!(target: LOG_TARGET, "Failed to read request body: {:?}", e);
+ return Ok(Response::builder()
+ .status(StatusCode::BAD_REQUEST)
+ .body(Body::from("Failed to read request body"))
+ .unwrap());
+ }
+ };
+ let query = match String::from_utf8(body_bytes.to_vec()) {
+ Ok(str) => str,
+ Err(e) => {
+ error!(target: LOG_TARGET, "Invalid UTF-8 sequence: {:?}", e);
+ return Ok(Response::builder()
+ .status(StatusCode::BAD_REQUEST)
+ .body(Body::from("Invalid UTF-8 sequence in request body"))
+ .unwrap());
+ }
+ };
Likely invalid or redundant comment.
return match sqlx::query(&query).fetch_all(&*pool).await { | ||
Ok(rows) => { | ||
let result: Vec<_> = rows | ||
.iter() | ||
.map(|row| { | ||
let mut obj = serde_json::Map::new(); | ||
for (i, column) in row.columns().iter().enumerate() { | ||
let value: serde_json::Value = match column.type_info().name() { | ||
"TEXT" => row | ||
.get::<Option<String>, _>(i) | ||
.map_or(serde_json::Value::Null, serde_json::Value::String), | ||
// for operators like count(*) the type info is NULL | ||
// so we default to a number | ||
"INTEGER" | "NULL" => row | ||
.get::<Option<i64>, _>(i) | ||
.map_or(serde_json::Value::Null, |n| { | ||
serde_json::Value::Number(n.into()) | ||
}), | ||
"REAL" => row.get::<Option<f64>, _>(i).map_or( | ||
serde_json::Value::Null, | ||
|f| { | ||
serde_json::Number::from_f64(f).map_or( | ||
serde_json::Value::Null, | ||
serde_json::Value::Number, | ||
) | ||
}, | ||
), | ||
"BLOB" => row | ||
.get::<Option<Vec<u8>>, _>(i) | ||
.map_or(serde_json::Value::Null, |bytes| { | ||
serde_json::Value::String(STANDARD.encode(bytes)) | ||
}), | ||
_ => row | ||
.get::<Option<String>, _>(i) | ||
.map_or(serde_json::Value::Null, serde_json::Value::String), | ||
}; | ||
obj.insert(column.name().to_string(), value); | ||
} | ||
serde_json::Value::Object(obj) | ||
}) | ||
.collect(); | ||
|
||
let json = serde_json::to_string(&result).unwrap(); | ||
|
||
Ok(Response::builder() | ||
.status(StatusCode::OK) | ||
.header(CONTENT_TYPE, "application/json") | ||
.body(Body::from(json)) | ||
.unwrap()) | ||
} | ||
Err(e) => Ok(Response::builder() | ||
.status(StatusCode::BAD_REQUEST) | ||
.body(Body::from(format!("Query error: {:?}", e))) | ||
.unwrap()), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Implement safeguards against resource exhaustion
Ohayo, sensei! Executing arbitrary SQL queries without limitations can lead to performance issues, such as high memory usage or slow response times if large result sets are returned. Consider implementing safeguards like limiting the number of rows returned or restricting the types of queries that can be executed.
For example, you could limit the number of rows by appending LIMIT
to the query if not already present. Alternatively, define an allowlist of permitted queries or require queries to match certain patterns.
if req.uri().path().starts_with("/sql") { | ||
let query = if req.method() == Method::GET { | ||
// Get the query from URL parameters | ||
let params = req.uri().query().unwrap_or_default(); | ||
form_urlencoded::parse(params.as_bytes()) | ||
.find(|(key, _)| key == "q") | ||
.map(|(_, value)| value.to_string()) | ||
.unwrap_or_default() | ||
} else if req.method() == Method::POST { | ||
// Get the query from request body | ||
let body_bytes = hyper::body::to_bytes(req.into_body()).await.unwrap_or_default(); | ||
String::from_utf8(body_bytes.to_vec()).unwrap_or_default() | ||
} else { | ||
return Ok(Response::builder() | ||
.status(StatusCode::METHOD_NOT_ALLOWED) | ||
.body(Body::from("Only GET and POST methods are allowed")) | ||
.unwrap()); | ||
}; | ||
|
||
// Execute the query | ||
return match sqlx::query(&query).fetch_all(&*pool).await { | ||
Ok(rows) => { | ||
let result: Vec<_> = rows | ||
.iter() | ||
.map(|row| { | ||
let mut obj = serde_json::Map::new(); | ||
for (i, column) in row.columns().iter().enumerate() { | ||
let value: serde_json::Value = match column.type_info().name() { | ||
"TEXT" => row | ||
.get::<Option<String>, _>(i) | ||
.map_or(serde_json::Value::Null, serde_json::Value::String), | ||
// for operators like count(*) the type info is NULL | ||
// so we default to a number | ||
"INTEGER" | "NULL" => row | ||
.get::<Option<i64>, _>(i) | ||
.map_or(serde_json::Value::Null, |n| { | ||
serde_json::Value::Number(n.into()) | ||
}), | ||
"REAL" => row.get::<Option<f64>, _>(i).map_or( | ||
serde_json::Value::Null, | ||
|f| { | ||
serde_json::Number::from_f64(f).map_or( | ||
serde_json::Value::Null, | ||
serde_json::Value::Number, | ||
) | ||
}, | ||
), | ||
"BLOB" => row | ||
.get::<Option<Vec<u8>>, _>(i) | ||
.map_or(serde_json::Value::Null, |bytes| { | ||
serde_json::Value::String(STANDARD.encode(bytes)) | ||
}), | ||
_ => row | ||
.get::<Option<String>, _>(i) | ||
.map_or(serde_json::Value::Null, serde_json::Value::String), | ||
}; | ||
obj.insert(column.name().to_string(), value); | ||
} | ||
serde_json::Value::Object(obj) | ||
}) | ||
.collect(); | ||
|
||
let json = serde_json::to_string(&result).unwrap(); | ||
|
||
Ok(Response::builder() | ||
.status(StatusCode::OK) | ||
.header(CONTENT_TYPE, "application/json") | ||
.body(Body::from(json)) | ||
.unwrap()) | ||
} | ||
Err(e) => Ok(Response::builder() | ||
.status(StatusCode::BAD_REQUEST) | ||
.body(Body::from(format!("Query error: {:?}", e))) | ||
.unwrap()), | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider restricting or securing the /sql
endpoint
Ohayo, sensei! Exposing an endpoint that allows execution of arbitrary SQL queries can be a significant security risk, even with a read-only database connection. It could lead to data leakage or other unintended consequences.
Consider the following options:
- Restrict the queries: Allow only specific queries or enforce a whitelist of allowed statements.
- Add authentication: Require authentication to access the endpoint to prevent unauthorized use.
- Use parameterized queries: Although not directly applicable here, ensure any dynamic inputs are safely handled.
Would you like assistance in refactoring this endpoint to enhance security?
.status(StatusCode::BAD_REQUEST) | ||
.body(Body::from(format!("Query error: {:?}", e))) | ||
.unwrap()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid exposing detailed error messages to clients
Including detailed error information in HTTP responses can reveal sensitive internal details and pose a security risk. It's better to log the error internally and return a generic message to the client.
Apply this diff to return a generic error message:
- Err(e) => Ok(Response::builder()
- .status(StatusCode::BAD_REQUEST)
- .body(Body::from(format!("Query error: {:?}", e)))
- .unwrap()),
+ Err(e) => {
+ error!(target: LOG_TARGET, "Query execution error: {:?}", e);
+ Ok(Response::builder()
+ .status(StatusCode::BAD_REQUEST)
+ .body(Body::from("Invalid query"))
+ .unwrap())
+ },
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great unlock to have this raw access! We should document the basic tables that can be queried and how torii actually formats the table for now.
The readonly is very great and simplifies a lot. For psql
for instance this option will not be available. We will have to sanitize or see what solution works the best. default_transaction_read_only
is a bit too harsh, granting roles may be annoying. Sanitization may be the easiest at first.
Also @Larkooo the |
Ah true, i think with the SQL refactor we will also address that. |
Summary by CodeRabbit
New Features
Bug Fixes
Chores