-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: root_parent_offset and X-AR-IO-Data-Item-Data-Offset header #228
Conversation
2fe1a26
to
11812ae
Compare
await compose.down(); | ||
}); | ||
|
||
it('Verifying header for L1 bundle', async function () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djwhitt do you think I should query DB to check if the calculation was done properly?
I don't think it make sense as this bundle will never change. One thing that I could do is to describe the reasoning like:
bdi has root_parent_offset
X and data_offset
Y so data item data offset should be Z.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlprieb Let's skip checking against the DB but add a comparison of the data retrieved by requesting Dc-q5iChuRWcsjVBFstEqmLTx4SWkGZxcVO9OTEGjkQ
by ID and using the size + root offset. That seems like a good sanity check since that's the functionality we're ultimately trying to enable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djwhitt here 507bc89
(#228)
📝 Walkthrough📝 WalkthroughWalkthroughThis pull request introduces structural changes to the database and application code by adding a new column, Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (6)
src/workers/ans104-unbundler.ts (2)
148-157
: Add documentation for the offset calculation logic.The offset calculation logic would benefit from documentation explaining:
- The purpose of rootParentOffset
- Why it's calculated as the sum of root_parent_offset and data_offset
- When and why it's only applied for non-root bundles
+ // For nested bundles (where rootTxId !== item.id), we need to calculate + // the absolute offset from the root bundle by adding the parent's offset + // to its data offset. This ensures correct positioning when unbundling + // deeply nested data. let rootParentOffset = 0; if (
Line range hint
167-171
: Improve error handling with specific types and context.The error handling could be enhanced by using a more specific error type and including additional context.
- } catch (error: any) { + } catch (error: unknown) { + const err = error instanceof Error ? error : new Error(String(error)); log.error('Unbundling failed:', { - message: error?.message, - stack: error?.stack, + message: err.message, + stack: err.stack, + itemId: item.id, + rootTxId, + parentIndex: item.index, });src/routes/ar-io.ts (1)
222-222
: LGTM! Consider adding documentation for the new field.The addition of
root_parent_offset
aligns with the PR objectives and follows the existing pattern of null-initialized fields that are backfilled later.Consider adding a code comment above the data item object to document the purpose of the
root_parent_offset
field and its relationship with theX-AR-IO-Data-Item-Data-Offset
header mentioned in the PR title.src/routes/data/handlers.ts (1)
127-136
: Add type safety for offset values.The current implementation assumes the offset values are numbers. Consider adding type assertions or validation to ensure type safety.
Consider adding type validation:
if ( dataAttributes?.rootParentOffset !== undefined && dataAttributes?.dataOffset !== undefined ) { + // Ensure offset values are numbers + const rootParentOffset = Number(dataAttributes.rootParentOffset); + const dataOffset = Number(dataAttributes.dataOffset); + if (isNaN(rootParentOffset) || isNaN(dataOffset)) { + log.warn('Invalid offset values', { + rootParentOffset: dataAttributes.rootParentOffset, + dataOffset: dataAttributes.dataOffset, + }); + return; + } res.header( headerNames.dataItemDataOffset, - (dataAttributes.rootParentOffset + dataAttributes.dataOffset).toString(), + (rootParentOffset + dataOffset).toString(), ); }test/end-to-end/data.test.ts (2)
338-340
: Document the test data constants with explanatory comments.Add comments explaining what these test data constants represent and how they relate to each other in the bundle hierarchy.
- const bundle = '-H3KW7RKTXMg5Miq2jHx36OHSVsXBSYuE2kxgsFj6OQ'; - const bdi = 'fLxHz2WbpNFL7x1HrOyUlsAVHYaKSyj6IqgCJlFuv9g'; - const di = 'Dc-q5iChuRWcsjVBFstEqmLTx4SWkGZxcVO9OTEGjkQ'; + // L1 bundle transaction ID + const bundle = '-H3KW7RKTXMg5Miq2jHx36OHSVsXBSYuE2kxgsFj6OQ'; + // Bundle data item within the L1 bundle + const bdi = 'fLxHz2WbpNFL7x1HrOyUlsAVHYaKSyj6IqgCJlFuv9g'; + // Data item within the bundle data item + const di = 'Dc-q5iChuRWcsjVBFstEqmLTx4SWkGZxcVO9OTEGjkQ';
389-389
: Document the magic numbers in assertions.Add comments explaining how these offset values (3072, 5783) were calculated or what they represent in the bundle structure.
- assert.equal(res.headers['x-ar-io-data-item-data-offset'], '3072'); + // 3072 represents the offset of this bundle data item within the L1 bundle + assert.equal(res.headers['x-ar-io-data-item-data-offset'], '3072');- assert.equal(res.headers['x-ar-io-data-item-data-offset'], '5783'); + // 5783 represents the offset of this data item within its parent bundle data item + assert.equal(res.headers['x-ar-io-data-item-data-offset'], '5783');Also applies to: 395-395
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (12)
migrations/2024.10.30T16.11.15.bundles.add-root-parent-offset.sql
(1 hunks)migrations/down/2024.10.30T16.11.15.bundles.add-root-parent-offset.sql
(1 hunks)src/constants.ts
(1 hunks)src/database/sql/bundles/import.sql
(1 hunks)src/database/sql/core/data-attributes.sql
(1 hunks)src/database/standalone-sqlite.ts
(2 hunks)src/lib/ans-104.ts
(7 hunks)src/routes/ar-io.ts
(1 hunks)src/routes/data/handlers.ts
(1 hunks)src/types.d.ts
(3 hunks)src/workers/ans104-unbundler.ts
(3 hunks)test/end-to-end/data.test.ts
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- migrations/2024.10.30T16.11.15.bundles.add-root-parent-offset.sql
- migrations/down/2024.10.30T16.11.15.bundles.add-root-parent-offset.sql
🔇 Additional comments (17)
src/database/sql/core/data-attributes.sql (2)
52-52
: LGTM! Proper query termination.
The semicolon is correctly added at the end of the query.
4-48
: LGTM! Clean and consistent implementation of new columns.
The query structure maintains consistency across all four UNION segments, with appropriate NULL handling for transactions where offsets don't apply.
Let's verify the column existence in the referenced tables:
✅ Verification successful
Let me generate one more verification script to check for the data_offset
column since we only found root_parent_offset
in the migrations.
✓ Column changes are properly aligned with schema
The verification confirms that both columns are properly defined in the database schema:
root_parent_offset
is added through migration2024.10.30T16.11.15.bundles.add-root-parent-offset.sql
data_offset
has been a part of the schema since the initial bundle schema (2023.05.15) and is present in both tables
The query's column structure correctly reflects the schema design, with appropriate NULL handling for transactions.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that the new columns exist in the referenced tables
# Expected: Find CREATE TABLE statements or migrations adding these columns
# Search for table definitions or migrations related to these columns
rg -A 5 "CREATE TABLE.*(?:new_data_items|stable_data_items)"
rg "ADD.*(?:root_parent_offset|data_offset)"
Length of output: 5523
Script:
#!/bin/bash
# Search for data_offset column in schema files and migrations
rg -B 2 -A 2 "data_offset" --type sql
Length of output: 10236
src/constants.ts (1)
28-28
: LGTM! Verify integration points.
The new header constant follows the established naming conventions and is properly positioned among related headers.
Let's verify the integration points mentioned in the summary:
✅ Verification successful
Let me gather more information about the integration points.
Let me check one more detail about the actual usage in the handlers.
Integration verified successfully
The new header constant is properly integrated:
- Used in
setDataHeaders
function to combinerootParentOffset
anddataOffset
- Has corresponding test coverage in
test/end-to-end/data.test.ts
- Implementation follows the established pattern of conditional header setting
- Header value is correctly computed as the sum of parent offset and data offset
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the usage of the new header constant in handlers and tests
# Test 1: Verify header usage in setDataHeaders function
echo "Checking setDataHeaders usage..."
rg -A 5 "setDataHeaders.*dataItemDataOffset" "src/routes/data/handlers.ts"
# Test 2: Verify test coverage
echo "Checking test coverage..."
rg "X-AR-IO-Data-Item-Data-Offset" "test/end-to-end/data.test.ts"
Length of output: 337
Script:
#!/bin/bash
# Let's check for the actual implementation and test details
# Check for setDataHeaders implementation
echo "Checking setDataHeaders implementation..."
ast-grep --pattern 'function setDataHeaders($_) { $$$ }'
# Check for detailed test implementation
echo "Checking test implementation..."
rg -B 2 -A 10 "X-AR-IO-Data-Item-Data-Offset header" "test/end-to-end/data.test.ts"
# Check for any other usage of this header
echo "Checking other usages..."
rg "dataItemDataOffset|X-AR-IO-Data-Item-Data-Offset" --type ts
Length of output: 1142
Script:
#!/bin/bash
# Check the context around the usage in handlers
echo "Checking handlers implementation..."
rg -B 10 -A 10 "headerNames\.dataItemDataOffset" "src/routes/data/handlers.ts"
Length of output: 816
src/database/sql/bundles/import.sql (2)
Line range hint 102-106
: Well-designed UPDATE clause.
The UPDATE clause's selective update strategy is a good practice. It only updates specific fields (height, root_transaction_id, parent_id, data_offset) while preserving other values, which helps maintain data integrity.
93-94
: LGTM! Verify migration for root_parent_offset.
The addition of root_parent_offset
to the upsertNewDataItem
statement is implemented correctly. The column is properly added to both the column list and VALUES clause, and its immutability is maintained by excluding it from the UPDATE clause.
Let's verify the migration exists:
Also applies to: 99-100
✅ Verification successful
Migration for root_parent_offset verified and properly implemented
The migration file correctly adds the root_parent_offset
column to both new_data_items
and stable_data_items
tables, aligning with the changes in the import SQL file. The implementation is complete and consistent.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the migration file for root_parent_offset exists
# Expected: Find migration file containing ALTER TABLE statements for root_parent_offset
rg -l "root_parent_offset" "migrations/"
Length of output: 175
Script:
#!/bin/bash
# Let's verify the content of the migration files to ensure they properly add the root_parent_offset column
cat migrations/2024.10.30T16.11.15.bundles.add-root-parent-offset.sql
Length of output: 206
src/workers/ans104-unbundler.ts (1)
156-156
: 🛠️ Refactor suggestion
Consider adding overflow protection for offset calculation.
The addition of offsets could potentially overflow for very large nested bundles.
- rootParentOffset = item.root_parent_offset + item.data_offset;
+ // Ensure offset calculation doesn't overflow Number.MAX_SAFE_INTEGER
+ const sum = BigInt(item.root_parent_offset) + BigInt(item.data_offset);
+ if (sum > BigInt(Number.MAX_SAFE_INTEGER)) {
+ throw new Error('Offset calculation would overflow safe integer bounds');
+ }
+ rootParentOffset = Number(sum);
src/lib/ans-104.ts (3)
70-70
: LGTM: Clean implementation of rootParentOffset.
The addition of rootParentOffset
is well-implemented with proper typing and consistent naming conventions.
Also applies to: 79-79, 126-126
283-283
: LGTM: Proper integration of rootParentOffset in parseBundle.
The rootParentOffset
parameter is correctly added to the method signature and properly forwarded to the worker thread.
Also applies to: 288-288, 349-349
421-422
: Consider adding validation for rootParentOffset.
While the implementation is functionally correct, consider adding validation for the rootParentOffset
parameter to ensure it's a valid number before processing. This would prevent potential runtime errors or data corruption.
Add validation after destructuring the message:
const { rootTxId, parentId, parentIndex, bundlePath, rootParentOffset } = message;
if (typeof rootParentOffset !== 'number') {
throw new Error('Invalid rootParentOffset: must be a number');
}
Also applies to: 470-470
src/types.d.ts (3)
276-276
: LGTM: root_parent_offset property correctly typed
The property is appropriately typed as a non-nullable number, which aligns with its purpose of storing offset values for bundle data items.
304-304
: LGTM: root_parent_offset correctly null for optimistic items
The property is appropriately typed as null, maintaining consistency with other fields in the optimistic data item interface where parent-related information is not available.
513-514
: LGTM: Optional offset properties align with header requirements
The optional rootParentOffset
and dataOffset
properties are correctly typed and follow naming conventions. Let's verify their usage in the codebase.
✅ Verification successful
Let me verify the implementation details in the handlers file to ensure proper usage of these properties.
Based on the comprehensive verification results, I can now provide the final response:
Verified: Offset properties are correctly implemented and tested
The rootParentOffset
and dataOffset
properties are properly used in the handlers to calculate and set the X-AR-IO-Data-Item-Data-Offset
header. The implementation:
- Checks both properties are defined before setting the header
- Correctly combines the offsets using addition
- Has comprehensive end-to-end tests verifying the behavior for different scenarios (L1 bundles, bundle data items, and nested data items)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the usage of new offset properties in relation to the X-AR-IO-Data-Item-Data-Offset header
# Test: Search for header usage
echo "Searching for X-AR-IO-Data-Item-Data-Offset header usage..."
rg -l "X-AR-IO-Data-Item-Data-Offset"
# Test: Search for property usage in route handlers
echo "Searching for offset properties usage in routes..."
rg "rootParentOffset|dataOffset" "src/routes/"
Length of output: 633
Script:
#!/bin/bash
# Get the context of how these properties are used in the handlers
rg -B 5 -A 5 "rootParentOffset|dataOffset" "src/routes/data/handlers.ts"
# Check the constants file for header definition
cat src/constants.ts
# Check test implementation
cat test/end-to-end/data.test.ts
Length of output: 27686
src/routes/data/handlers.ts (1)
127-136
: LGTM: Header integration with range requests.
The offset header is correctly included in both full and range responses through the setDataHeaders
function call in handleRangeRequest
.
test/end-to-end/data.test.ts (1)
336-397
: LGTM! Well-structured test suite.
The test suite is well-organized and follows the established patterns:
- Proper setup and teardown of the test environment
- Clear separation of test cases
- Comprehensive coverage of different scenarios
- Consistent assertion patterns
src/database/standalone-sqlite.ts (3)
343-343
: LGTM: root_parent_offset field addition
The addition of the root_parent_offset
field to the database row mapping is correct and follows the existing pattern.
998-999
: LGTM: Data offset fields addition
The addition of rootParentOffset
and dataOffset
fields to the returned attributes is correct and properly handles null cases using optional chaining.
998-999
: Verify data consistency for offset fields
Let's verify that the offset fields are being properly populated in the database and that existing data items have these fields where appropriate.
✅ Verification successful
Based on the search results, I can now provide a final response. The evidence shows that:
- The
root_parent_offset
anddata_offset
fields were recently added through migrations (2024.10.30T16.11.15.bundles.add-root-parent-offset.sql) - These fields are properly defined in database schemas and TypeScript interfaces
- The fields are used in data indexing and retrieval operations
- There are tests verifying the proper handling of these fields
- The implementation includes proper null handling for cases where offsets are not applicable
Offset fields are properly integrated and handled
The codebase shows complete integration of these offset fields across the database schema, TypeScript types, and application logic. The fields are:
- Properly added through migrations
- Included in both new and stable data item tables
- Handled in data indexing and retrieval flows
- Covered by test cases
- Properly nullable where appropriate
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for data items with offset fields
# Expected: Should show counts of items with and without offset fields to verify data consistency
echo "Checking data items with offset fields..."
rg -A 5 "root_parent_offset|data_offset" "src/database/sql/**/*.sql"
echo "Checking if these fields are included in relevant SQL queries..."
rg -A 5 "SELECT.*FROM.*data_items" "src/database/sql/**/*.sql"
Length of output: 499
Script:
#!/bin/bash
# Let's try a different approach to find SQL queries and offset fields
# First, let's find all SQL files in the repository
echo "Finding SQL files in the repository..."
fd -e sql
# Then search for offset fields in the entire codebase
echo -e "\nSearching for offset fields usage..."
rg -A 5 "root_parent_offset|data_offset"
# Look for SQL queries with data_items table
echo -e "\nSearching for data_items table queries..."
rg -A 5 "data_items"
# Check TypeScript interfaces/types related to these fields
echo -e "\nChecking TypeScript definitions..."
rg -A 5 "interface.*DataItem|type.*DataItem"
Length of output: 76078
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #228 +/- ##
===========================================
- Coverage 68.68% 68.52% -0.17%
===========================================
Files 32 32
Lines 7991 8022 +31
Branches 434 434
===========================================
+ Hits 5489 5497 +8
- Misses 2501 2524 +23
Partials 1 1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
test/bundles-schema.sql (2)
115-115
: Consider adding an index for root_parent_offset.If
root_parent_offset
will be frequently used in WHERE clauses or JOIN conditions, consider adding an index to improve query performance. This is particularly important if you plan to query parent-child relationships or perform hierarchical data traversal.Example index creation:
CREATE INDEX stable_data_items_root_parent_offset_idx ON stable_data_items (root_parent_offset);
115-115
: Add column documentation for root_parent_offset.Consider adding SQL comments to document:
- The purpose of
root_parent_offset
- Its relationship with other columns (if any)
- Expected values and constraints
Example documentation:
-- Add before the column definition: -- root_parent_offset: Stores the offset of the root parent data item -- References: Link to relevant documentation or specificationsAlso applies to: 150-150
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
test/bundles-schema.sql
(2 hunks)
🔇 Additional comments (1)
test/bundles-schema.sql (1)
150-150
: Verify data integrity requirements for root_parent_offset.
A few considerations:
- Should there be a foreign key constraint between
root_parent_offset
anddata_offset
to ensure referential integrity? - Consider adding an index if this column will be frequently queried.
- Consider explicitly declaring NULL/NOT NULL constraint based on business requirements.
Let's verify the usage pattern:
✅ Verification successful
No foreign key constraint needed for root_parent_offset
Based on the SQL files analysis:
root_parent_offset
is used as a nullable field in both tables, which is appropriate for its usage pattern- No foreign key constraint is needed as:
- It's used in data selection queries alongside
data_offset
- The field is populated during data item insertion with direct values
- The migration shows it's intentionally designed as a simple INTEGER column
- It's used in data selection queries alongside
- No index is required as there are no JOIN or WHERE clauses using this column - it's only used in SELECT projections
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for SQL queries using root_parent_offset to understand access patterns
rg -l "root_parent_offset" --type sql
# Search for JOIN conditions or WHERE clauses using this column
rg "root_parent_offset.*WHERE|JOIN.*root_parent_offset" --type sql
Length of output: 345
Script:
#!/bin/bash
# Let's check the actual SQL files to understand the usage patterns
echo "=== Checking data-attributes.sql ==="
cat "src/database/sql/core/data-attributes.sql"
echo -e "\n=== Checking import.sql ==="
cat "src/database/sql/bundles/import.sql"
echo -e "\n=== Checking migration file ==="
cat "migrations/2024.10.30T16.11.15.bundles.add-root-parent-offset.sql"
Length of output: 5045
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionality looks good. I added one suggestion about the test.
await compose.down(); | ||
}); | ||
|
||
it('Verifying header for L1 bundle', async function () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlprieb Let's skip checking against the DB but add a comparison of the data retrieved by requesting Dc-q5iChuRWcsjVBFstEqmLTx4SWkGZxcVO9OTEGjkQ
by ID and using the size + root offset. That seems like a good sanity check since that's the functionality we're ultimately trying to enable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
test/end-to-end/data.test.ts (1)
372-373
: Consider making the wait time configurable.The hardcoded 10-second wait might be insufficient on slower systems or excessive on faster ones.
- // hopefully enough time to unbundle it - await wait(10000); + const UNBUNDLE_WAIT_MS = process.env.UNBUNDLE_WAIT_MS ? parseInt(process.env.UNBUNDLE_WAIT_MS, 10) : 10000; + await wait(UNBUNDLE_WAIT_MS);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
test/end-to-end/data.test.ts
(1 hunks)
🔇 Additional comments (3)
test/end-to-end/data.test.ts (3)
336-374
: LGTM! Test suite setup follows established patterns.
The setup is well-structured and consistent with other test suites in the file. It properly handles:
- Database cleanup
- Environment configuration
- Bundle processing initialization
- Adequate wait time for unbundling
380-384
: LGTM! Header verification tests are comprehensive.
The tests effectively verify the X-AR-IO-Data-Item-Data-Offset
header behavior for:
- L1 bundle (undefined)
- Bundle data item (3072)
- Nested data item (5783)
Also applies to: 386-390, 392-396
398-416
: Excellent data consistency verification approach.
The test effectively validates the data integrity by:
- Fetching the complete data item
- Using the offset and size to fetch the same data through range request
- Comparing both responses
This provides strong verification that the offset calculations are correct and the data can be retrieved consistently through different methods.
No description provided.