-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(parquet-exporter): export blocks, txs and data items to parquet #213
Conversation
karlprieb
commented
Sep 26, 2024
- Add schema for blocks and transactions tables
- Update schema for tags table
- Refactor ParquetExporter to import blocks, transactions and tags from sqlite to duckdb
- Refactor ParquetExporter to export blocks, transactions and tags to parquet
* Add schema for blocks and transactions tables * Update schema for tags table * Refactor ParquetExporter to import blocks, transactions and tags from sqlite to duckdb * Refactor ParquetExporter to export blocks, transactions and tags to parquet
📝 Walkthrough📝 WalkthroughWalkthroughThe changes encompass modifications to the database schema and enhancements to the data export functionality. The Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #213 +/- ##
===========================================
- Coverage 70.19% 68.69% -1.51%
===========================================
Files 32 32
Lines 7905 7905
Branches 430 430
===========================================
- Hits 5549 5430 -119
- Misses 2356 2474 +118
- Partials 0 1 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (4)
src/database/duckdb/schema.sql (1)
11-36
: Approve newtransactions
table with a minor suggestionThe new
transactions
table structure looks comprehensive and well-designed, covering various aspects of a transaction with appropriate data types. This addition aligns well with the PR objectives.Consider renaming the
"offset"
column to avoid using a SQL keyword:- "offset" UBIGINT, + transaction_offset UBIGINT,This change would eliminate the need for quotes and make the schema more consistent with other column names.
src/routes/ar-io.ts (1)
232-242
: Improved input validation, consider enhancing error messages.The added input validation is a good improvement, ensuring that all required parameters are present and of the correct type. This helps prevent potential issues caused by invalid input.
Consider enhancing the error message to specify which parameter(s) failed validation. This would make it easier for API consumers to identify and correct issues. For example:
const invalidParams = []; if (typeof outputDir !== 'string' || outputDir.trim() === '') invalidParams.push('outputDir'); if (!Number.isInteger(startHeight) || startHeight < 0) invalidParams.push('startHeight'); if (!Number.isInteger(endHeight) || endHeight < 0) invalidParams.push('endHeight'); if (!Number.isInteger(maxFileRows) || maxFileRows < 0) invalidParams.push('maxFileRows'); if (invalidParams.length > 0) { res.status(400).send(`Invalid or missing parameters: ${invalidParams.join(', ')}`); return; }This approach provides more specific feedback about which parameters need attention.
src/workers/parquet-exporter.ts (2)
69-69
: Log error objects to include stack tracesWhen logging errors, ensure that you log the error objects instead of just the messages to include stack traces for better debugging.
Update your error logging statements as follows:
- logger.error('Error creating DuckDB database', error); + logger.error('Error creating DuckDB database', { error });This change ensures that the full error details are captured in the logs.
Also applies to: 84-84, 126-126, 181-181, 236-236, 273-273, 312-312, 366-366, 452-452
41-44
: Initialize 'isExporting' inside the constructor for consistencyInitializing class properties inside the constructor ensures consistency and clarity.
Apply this diff to move the initialization:
- private isExporting = false; constructor({ log, db, duckDbPath, }: { log: winston.Logger; db: Database; duckDbPath: string; }) { this.log = log; this.db = db; this.duckDbPath = duckDbPath; + this.isExporting = false; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
flake.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
- src/database/duckdb/schema.sql (1 hunks)
- src/routes/ar-io.ts (1 hunks)
- src/workers/parquet-exporter.ts (3 hunks)
🔇 Additional comments (5)
src/database/duckdb/schema.sql (1)
38-46
: Approve newblocks
table with a questionThe new
blocks
table structure looks good, covering essential information about blocks with appropriate data types. This addition aligns well with the PR objectives.Could you clarify why
indep_hash
is nullable? As it appears to be a unique identifier for blocks, it might be better to make it NOT NULL. If there's a specific reason for it to be nullable, please explain.- indep_hash BLOB, + indep_hash BLOB NOT NULL,If this change is appropriate, please update the schema accordingly. If there's a reason for it to be nullable, consider adding a comment in the schema to explain the rationale.
src/routes/ar-io.ts (1)
248-250
: Updated ParquetExporter configuration and usage.The changes to the ParquetExporter configuration and usage align with the expanded functionality described in the PR objectives. The exporter now seems capable of handling blocks and transactions in addition to tags.
To ensure the new functionality works as intended, please verify the following:
- The new
sqliteCoreDbPath
is correctly set up and accessible.- The
export
method inParquetExporter
correctly handles the expanded dataset (blocks, transactions, and tags).Run the following script to check the
ParquetExporter
implementation:This script will help confirm that the
ParquetExporter
class has been properly updated to handle the new export functionality and that it's using the newsqliteCoreDbPath
.Also applies to: 253-258
✅ Verification successful
src/workers/parquet-exporter.ts
Outdated
await this.db.exec(`INSERT INTO blocks ${query}`); | ||
log.info('Blocks inserted into DuckDB'); | ||
} catch (error) { | ||
throw `Error importing blocks: ${error}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw Error objects instead of strings for better error handling
In the catch
blocks, you're currently throwing strings. Throwing Error
objects preserves the stack trace and is a best practice for consistent error handling.
Apply this diff to fix the error throwing:
- throw `Error importing blocks: ${error}`;
+ throw new Error(`Error importing blocks: ${error}`);
Repeat this change in all similar instances throughout the methods.
Also applies to: 180-180, 235-235, 272-272, 311-311, 365-365, 474-474, 493-493
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlprieb, I think this is a good comment by @coderabbitai. I asked ChatGPT about the best way to catch and rethrow errors and this is what it suggested: https://chatgpt.com/share/e/66f6d1d2-8f7c-8004-9d74-c86d298a6225
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djwhitt, I'm glad you found the suggestion helpful!
(_/)
(^‿^)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. This looks good. Only thing I think need adjustment is how we're rethrowing errors.
src/workers/parquet-exporter.ts
Outdated
await this.db.exec(`INSERT INTO blocks ${query}`); | ||
log.info('Blocks inserted into DuckDB'); | ||
} catch (error) { | ||
throw `Error importing blocks: ${error}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlprieb, I think this is a good comment by @coderabbitai. I asked ChatGPT about the best way to catch and rethrow errors and this is what it suggested: https://chatgpt.com/share/e/66f6d1d2-8f7c-8004-9d74-c86d298a6225
src/workers/parquet-exporter.ts
Outdated
await this.db.exec(`INSERT INTO transactions ${query}`); | ||
log.info('Transactions inserted into DuckDB'); | ||
} catch (error) { | ||
throw `Error importing transactions: ${error}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here as above re error handling.
src/workers/parquet-exporter.ts
Outdated
await this.db.exec(`INSERT INTO transactions ${query}`); | ||
log.info('Data items inserted into DuckDB'); | ||
} catch (error) { | ||
throw `Error importing data items: ${error}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here as above re error handling.
src/workers/parquet-exporter.ts
Outdated
await this.db.exec(`INSERT INTO tags ${query}`); | ||
log.info('Transaction tags inserted into DuckDB'); | ||
} catch (error) { | ||
throw `Error importing transaction tags: ${error}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here as above re error handling.
src/workers/parquet-exporter.ts
Outdated
minHeight = height + 1n; | ||
rowCount = 0n; | ||
} catch (error) { | ||
throw `Error exporting Parquet file ${fileName}: ${error}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here as above re error handling.
src/workers/parquet-exporter.ts
Outdated
await this.db.exec(`CHECKPOINT ${tableName};`); | ||
return result[0].count; | ||
} catch (error) { | ||
throw `Error getting row count for height ${height} in ${tableName}: ${error}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here as above re error handling.
src/workers/parquet-exporter.ts
Outdated
maxHeight: result[0].max_height, | ||
}; | ||
} catch (error) { | ||
throw `Error getting height range for ${tableName}: ${error}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here as above re error handling.
adc5056
to
8b0fcdf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
src/workers/parquet-exporter.ts (1)
345-347
: Simplify height range calculations for clarityThe current calculations for
minHeight
andmaxHeight
can be simplified to enhance readability.Consider refactoring as follows:
- minHeight = - minHeight > BigInt(startHeight) ? minHeight : BigInt(startHeight); - maxHeight = maxHeight < BigInt(endHeight) ? maxHeight : BigInt(endHeight); + minHeight = BigInt(Math.max(Number(minHeight), startHeight)); + maxHeight = BigInt(Math.min(Number(maxHeight), endHeight));This refactoring uses
Math.max
andMath.min
to make the intention clearer.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
yarn.lock
is excluded by!**/yarn.lock
,!**/*.lock
📒 Files selected for processing (2)
- package.json (1 hunks)
- src/workers/parquet-exporter.ts (3 hunks)
🔇 Additional comments (2)
package.json (1)
24-24
: Approve the update of duckdb-async, but verify compatibility.The update of
duckdb-async
from^1.0.0
to^1.1.0
is a minor version bump, which typically indicates new features or non-breaking improvements. This update is likely beneficial for the project, potentially bringing performance improvements or bug fixes.However, it's important to ensure that this update doesn't introduce any breaking changes or compatibility issues with other dependencies in the project.
To verify the compatibility and impact of this update, please run the following script:
After running these checks, please review the results to ensure there are no unexpected issues or breaking changes introduced by this update.
src/workers/parquet-exporter.ts (1)
Line range hint
350-374
: Verify correctness ofminHeight
adjustment within loopUpdating
minHeight
inside the loop might lead to unexpected behavior in subsequent iterations.Please ensure that modifying
minHeight
within the loop does not cause any off-by-one errors or skip any heights. Verify that all intended heights are processed correctly.
const newError = new Error('Error importing blocks'); | ||
newError.stack = error.stack; | ||
throw newError; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Preserve original error messages when re-throwing
In the catch blocks, you're creating new Error
objects with a generic message and assigning the original error's stack to the new error. This approach loses the original error's message and any additional context, which can be valuable for debugging.
Consider using the cause
property of the Error
object to wrap the original error. This way, you preserve the original error's message and stack trace.
Here's how you can modify the catch blocks:
- const newError = new Error('Error importing blocks');
- newError.stack = error.stack;
- throw newError;
+ throw new Error('Error importing blocks', { cause: error });
Apply this change to all similar instances throughout the methods:
importBlocks
importTransactions
importDataItems
importTransactionTags
importDataItemTags
exportToParquet
getHeightRange
getRowCountForHeight
Also applies to: 182-185, 239-242, 278-281, 319-322, 376-379, 489-490, 508-509
const log = this.log.child({ method: 'export' }); | ||
|
||
if (this.isExporting) { | ||
log.error('An export is already in progress'); | ||
return; | ||
} | ||
this.isExporting = true; | ||
|
||
if (startHeight > endHeight) { | ||
log.error('startHeight must be less than or equal to endHeight'); | ||
return; | ||
} | ||
|
||
if (maxFileRows <= 0) { | ||
log.error('maxFileRows must be a positive number'); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Throw errors for invalid input parameters to alert calling code
In the export
method, when input validations fail, the function logs an error and returns. This might not sufficiently inform the calling code about the failure.
Consider throwing an error after logging to ensure the issue is propagated:
if (startHeight > endHeight) {
log.error('startHeight must be less than or equal to endHeight');
- return;
+ throw new Error('Invalid height range: startHeight must be less than or equal to endHeight');
}
if (maxFileRows <= 0) {
log.error('maxFileRows must be a positive number');
- return;
+ throw new Error('Invalid parameter: maxFileRows must be positive');
}
This change ensures that the calling code is aware of the invalid parameters and can handle the exceptions appropriately.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const log = this.log.child({ method: 'export' }); | |
if (this.isExporting) { | |
log.error('An export is already in progress'); | |
return; | |
} | |
this.isExporting = true; | |
if (startHeight > endHeight) { | |
log.error('startHeight must be less than or equal to endHeight'); | |
return; | |
} | |
if (maxFileRows <= 0) { | |
log.error('maxFileRows must be a positive number'); | |
return; | |
} | |
const log = this.log.child({ method: 'export' }); | |
if (this.isExporting) { | |
log.error('An export is already in progress'); | |
return; | |
} | |
this.isExporting = true; | |
if (startHeight > endHeight) { | |
log.error('startHeight must be less than or equal to endHeight'); | |
throw new Error('Invalid height range: startHeight must be less than or equal to endHeight'); | |
} | |
if (maxFileRows <= 0) { | |
log.error('maxFileRows must be a positive number'); | |
throw new Error('Invalid parameter: maxFileRows must be positive'); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Outside diff range and nitpick comments (2)
src/workers/parquet-exporter.ts (2)
Line range hint
338-385
: Improve security and error handling in exportToParquet methodThe
exportToParquet
method has good use of BigInt for handling large numbers, but requires improvements:
- SQL Injection vulnerability:
Use parameterized queries to prevent SQL injection. Replace the query execution with:
await this.db.exec(` COPY ( SELECT * FROM ${tableName} WHERE height >= ? AND height <= ? ) TO ? (FORMAT PARQUET, COMPRESSION 'zstd') `, [minHeight, height, filePath]);
- Error handling:
Preserve the original error message when re-throwing. Replace the catch block with:
} catch (error) { throw new Error(`Error exporting Parquet file ${fileName}`, { cause: error }); }The use of BigInt for height comparisons is a good practice for handling potentially large block heights.
Line range hint
1-515
: Overall assessment: Good functionality with areas for improvementThe ParquetExporter class provides comprehensive functionality for exporting data to Parquet format. The use of BigInt for height values is commendable, ensuring compatibility with large block heights. However, there are consistent areas for improvement across the methods:
SQL Injection prevention: Most methods need to implement parameterized queries or input validation to prevent SQL injection vulnerabilities.
Error handling: The error handling can be improved by preserving original error messages when re-throwing errors.
Input validation: Some methods, particularly the
export
method, could benefit from stricter input validation with error throwing instead of just logging.Addressing these issues will significantly enhance the security and robustness of the code. Consider applying the suggested changes consistently across all methods in the class.
As a general architectural suggestion, consider implementing a separate SQL query builder or data access layer. This could help centralize SQL injection prevention measures and make the code more maintainable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- src/workers/parquet-exporter.ts (3 hunks)
🔇 Additional comments (2)
src/workers/parquet-exporter.ts (2)
18-26
: LGTM: Import statements and class properties look goodThe new imports and class properties are appropriate for the added functionality. The
isExporting
flag is a good addition to prevent concurrent exports.
31-39
: LGTM: Constructor updates are consistentThe addition of the
duckDbPath
parameter and its initialization in the constructor are consistent with the new class property.
sqliteBundlesDbPath, | ||
sqliteCoreDbPath, | ||
}: { | ||
log: winston.Logger; | ||
duckDbPath: string; | ||
sqliteDbPath: string; | ||
sqliteBundlesDbPath: string; | ||
sqliteCoreDbPath: string; | ||
}) { | ||
const logger = log.child({ class: this.constructor.name }); | ||
const db = await Database.create(duckDbPath); | ||
const logger = log.child({ class: 'ParquetExporter' }); | ||
let db: Database; | ||
|
||
const duckDbSchema = readFileSync( | ||
'./src/database/duckdb/schema.sql', | ||
'utf8', | ||
); | ||
await db.exec(duckDbSchema); | ||
logger.debug('DuckDB schema created'); | ||
try { | ||
db = await Database.create(duckDbPath); | ||
|
||
const duckDbSchema = readFileSync( | ||
'./src/database/duckdb/schema.sql', | ||
'utf8', | ||
); | ||
await db.exec(duckDbSchema); | ||
|
||
logger.debug('DuckDB schema created'); | ||
} catch (error) { | ||
logger.error('Error creating DuckDB database', error); | ||
throw error; | ||
} | ||
|
||
try { | ||
await db.exec(`INSTALL sqlite; LOAD sqlite;`); | ||
await db.exec( | ||
`ATTACH '${sqliteBundlesDbPath}' AS sqlite_bundles_db (TYPE sqlite);`, | ||
); | ||
await db.exec( | ||
`ATTACH '${sqliteCoreDbPath}' AS sqlite_core_db (TYPE sqlite);`, | ||
); | ||
|
||
await db.exec(`INSTALL sqlite; LOAD sqlite;`); | ||
await db.exec(`ATTACH '${sqliteDbPath}' AS sqlite_db (TYPE sqlite);`); | ||
logger.debug('SQLite loaded'); | ||
return new ParquetExporter({ log: logger, db, duckDbPath, sqliteDbPath }); | ||
logger.debug('SQLite databases loaded'); | ||
} catch (error) { | ||
logger.error('Error loading SQLite databases', error); | ||
throw error; | ||
} | ||
|
||
logger.info('DuckDB database created!'); | ||
|
||
return new ParquetExporter({ | ||
log: logger, | ||
db, | ||
duckDbPath, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a security suggestion: Use parameterized queries for ATTACH statements
The improvements in error handling and logging are good. However, there's a potential SQL injection vulnerability in the ATTACH statements.
Consider using parameterized queries for the ATTACH statements to prevent SQL injection. For example:
await db.exec(`ATTACH ? AS sqlite_bundles_db (TYPE sqlite);`, [sqliteBundlesDbPath]);
await db.exec(`ATTACH ? AS sqlite_core_db (TYPE sqlite);`, [sqliteCoreDbPath]);
This approach treats the file paths as parameters rather than part of the SQL string, enhancing security.
private async importBlocks({ | ||
startHeight, | ||
endHeight, | ||
}: { | ||
startHeight: number; | ||
endHeight: number; | ||
}) { | ||
const log = this.log.child({ method: 'importBlocks' }); | ||
const query = ` | ||
SELECT | ||
indep_hash, | ||
height, | ||
previous_block, | ||
nonce, | ||
hash, | ||
block_timestamp, | ||
tx_count, | ||
block_size | ||
FROM | ||
sqlite_core_db.stable_blocks | ||
WHERE | ||
height BETWEEN ${startHeight} AND ${endHeight} | ||
ORDER BY | ||
height ASC; | ||
`; | ||
|
||
try { | ||
await this.db.exec(`INSERT INTO blocks ${query}`); | ||
log.info('Blocks inserted into DuckDB'); | ||
} catch (error: any) { | ||
const newError = new Error('Error importing blocks'); | ||
newError.stack = error.stack; | ||
throw newError; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve security and error handling in importBlocks method
The importBlocks
method has two areas for improvement:
- SQL Injection vulnerability:
Use parameterized queries to prevent SQL injection. Replace the query string with:
const query = `
SELECT
indep_hash,
height,
previous_block,
nonce,
hash,
block_timestamp,
tx_count,
block_size
FROM
sqlite_core_db.stable_blocks
WHERE
height BETWEEN ? AND ?
ORDER BY
height ASC;
`;
await this.db.exec(`INSERT INTO blocks ${query}`, [startHeight, endHeight]);
- Error handling:
Preserve the original error message when re-throwing. Replace the catch block with:
} catch (error) {
throw new Error('Error importing blocks', { cause: error });
}
This approach maintains the original error information while adding context.
private async importTransactions({ | ||
startHeight, | ||
endHeight, | ||
maxFileRows, | ||
outputDir, | ||
}: { | ||
startHeight: number; | ||
endHeight: number; | ||
maxFileRows: number; | ||
outputDir: string; | ||
}) { | ||
const sqliteQuery = ` | ||
SELECT | ||
sdit.height, | ||
sdit.data_item_id AS id, | ||
sdit.data_item_tag_index AS tag_index, | ||
sdi.indexed_at AS created_at, | ||
tn.name AS tag_name, | ||
tv.value AS tag_value, | ||
1 AS is_data_item | ||
FROM | ||
sqlite_db.stable_data_item_tags sdit | ||
JOIN | ||
sqlite_db.tag_names tn ON sdit.tag_name_hash = tn.hash | ||
JOIN | ||
sqlite_db.tag_values tv ON sdit.tag_value_hash = tv.hash | ||
JOIN | ||
sqlite_db.stable_data_items sdi ON sdit.data_item_id = sdi.id | ||
WHERE | ||
sdit.height BETWEEN ${startHeight} AND ${endHeight}; | ||
const log = this.log.child({ method: 'importTransactions' }); | ||
const query = ` | ||
SELECT | ||
st.id, | ||
NULL AS indexed_at, | ||
st.block_transaction_index, | ||
0 AS is_data_item, | ||
st.target, | ||
st.quantity, | ||
st.reward, | ||
st.last_tx as anchor, | ||
st.data_size, | ||
st.content_type, | ||
st.format, | ||
st.height, | ||
st.owner_address, | ||
st.data_root, | ||
NULL AS parent, | ||
st."offset", | ||
NULL AS size, | ||
NULL AS data_offset, | ||
NULL AS owner_offset, | ||
NULL AS owner_size, | ||
CASE | ||
WHEN octet_length(w.public_modulus) <= 64 THEN w.public_modulus | ||
ELSE NULL | ||
END AS owner, | ||
NULL AS signature_offset, | ||
NULL AS signature_size, | ||
NULL AS signature_type | ||
FROM | ||
sqlite_core_db.stable_transactions st | ||
LEFT JOIN | ||
sqlite_core_db.wallets w ON st.owner_address = w.address | ||
WHERE | ||
st.height BETWEEN ${startHeight} AND ${endHeight} | ||
ORDER BY | ||
st.height ASC; | ||
`; | ||
|
||
await this.db.exec(`INSERT INTO tags ${sqliteQuery}`); | ||
try { | ||
await this.db.exec(`INSERT INTO transactions ${query}`); | ||
log.info('Transactions inserted into DuckDB'); | ||
} catch (error: any) { | ||
const newError = new Error('Error importing transactions'); | ||
newError.stack = error.stack; | ||
throw newError; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply security and error handling improvements to importTransactions method
The importTransactions
method has the same issues as importBlocks
:
- SQL Injection vulnerability:
Use parameterized queries to prevent SQL injection. Replace the query execution with:
await this.db.exec(`INSERT INTO transactions ${query}`, [startHeight, endHeight]);
Ensure that ${startHeight}
and ${endHeight}
in the query string are replaced with ?
.
- Error handling:
Preserve the original error message when re-throwing. Replace the catch block with:
} catch (error) {
throw new Error('Error importing transactions', { cause: error });
}
These changes will improve security and maintain better error context.
private async importDataItems({ | ||
startHeight, | ||
endHeight, | ||
}: { | ||
startHeight: number; | ||
endHeight: number; | ||
}) { | ||
const log = this.log.child({ method: 'importDataItems' }); | ||
const query = ` | ||
SELECT | ||
sdi.id, | ||
sdi.indexed_at, | ||
NULL AS block_transaction_index, | ||
1 AS is_data_item, | ||
sdi.target, | ||
NULL AS quantity, | ||
NULL AS reward, | ||
sdi.anchor, | ||
sdi.data_size, | ||
sdi.content_type, | ||
NULL AS format, | ||
sdi.height, | ||
sdi.owner_address, | ||
NULL AS data_root, | ||
sdi.parent_id AS parent, | ||
sdi."offset", | ||
sdi.size, | ||
sdi.data_offset, | ||
sdi.owner_offset, | ||
sdi.owner_size, | ||
CASE | ||
WHEN octet_length(w.public_modulus) <= 64 THEN w.public_modulus | ||
ELSE NULL | ||
END AS owner, | ||
sdi.signature_offset, | ||
sdi.signature_size, | ||
sdi.signature_type | ||
FROM | ||
sqlite_bundles_db.stable_data_items sdi | ||
LEFT JOIN | ||
sqlite_bundles_db.wallets w ON sdi.owner_address = w.address | ||
WHERE | ||
sdi.height BETWEEN ${startHeight} AND ${endHeight} | ||
ORDER BY | ||
sdi.height ASC; | ||
`; | ||
|
||
this.log.debug('Data inserted into DuckDB'); | ||
try { | ||
await this.db.exec(`INSERT INTO transactions ${query}`); | ||
log.info('Data items inserted into DuckDB'); | ||
} catch (error: any) { | ||
const newError = new Error('Error importing data items'); | ||
newError.stack = error.stack; | ||
throw newError; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement security and error handling improvements in importDataItems method
The importDataItems
method requires the same improvements as the previous import methods:
- SQL Injection vulnerability:
Use parameterized queries to prevent SQL injection. Replace the query execution with:
await this.db.exec(`INSERT INTO transactions ${query}`, [startHeight, endHeight]);
Ensure that ${startHeight}
and ${endHeight}
in the query string are replaced with ?
.
- Error handling:
Preserve the original error message when re-throwing. Replace the catch block with:
} catch (error) {
throw new Error('Error importing data items', { cause: error });
}
These changes will enhance security and provide better error context.
private async importTransactionTags({ | ||
startHeight, | ||
endHeight, | ||
}: { | ||
startHeight: number; | ||
endHeight: number; | ||
}) { | ||
const log = this.log.child({ method: 'importTransactionTags' }); | ||
const query = ` | ||
SELECT | ||
stt.height, | ||
stt.transaction_id AS id, | ||
stt.transaction_tag_index AS tag_index, | ||
NULL AS indexed_at, | ||
tn.name AS tag_name, | ||
tv.value AS tag_value, | ||
0 AS is_data_item | ||
FROM | ||
sqlite_core_db.stable_transaction_tags stt | ||
JOIN | ||
sqlite_core_db.tag_names tn ON stt.tag_name_hash = tn.hash | ||
JOIN | ||
sqlite_core_db.tag_values tv ON stt.tag_value_hash = tv.hash | ||
WHERE | ||
stt.height BETWEEN ${startHeight} AND ${endHeight} | ||
ORDER BY | ||
stt.height ASC; | ||
`; | ||
|
||
try { | ||
await this.db.exec(`INSERT INTO tags ${query}`); | ||
log.info('Transaction tags inserted into DuckDB'); | ||
} catch (error: any) { | ||
const newError = new Error('Error importing transaction tags'); | ||
newError.stack = error.stack; | ||
throw newError; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply security and error handling improvements to importTransactionTags method
The importTransactionTags
method needs the same improvements as the previous import methods:
- SQL Injection vulnerability:
Use parameterized queries to prevent SQL injection. Replace the query execution with:
await this.db.exec(`INSERT INTO tags ${query}`, [startHeight, endHeight]);
Ensure that ${startHeight}
and ${endHeight}
in the query string are replaced with ?
.
- Error handling:
Preserve the original error message when re-throwing. Replace the catch block with:
} catch (error) {
throw new Error('Error importing transaction tags', { cause: error });
}
These changes will improve security and maintain better error context.
private async importDataItemTags({ | ||
startHeight, | ||
endHeight, | ||
}: { | ||
startHeight: number; | ||
endHeight: number; | ||
}) { | ||
const log = this.log.child({ method: 'importDataItemTags' }); | ||
const query = ` | ||
SELECT | ||
sdit.height, | ||
sdit.data_item_id AS id, | ||
sdit.data_item_tag_index AS tag_index, | ||
sdi.indexed_at, | ||
tn.name AS tag_name, | ||
tv.value AS tag_value, | ||
1 AS is_data_item | ||
FROM | ||
sqlite_bundles_db.stable_data_item_tags sdit | ||
JOIN | ||
sqlite_bundles_db.tag_names tn ON sdit.tag_name_hash = tn.hash | ||
JOIN | ||
sqlite_bundles_db.tag_values tv ON sdit.tag_value_hash = tv.hash | ||
JOIN | ||
sqlite_bundles_db.stable_data_items sdi ON sdit.data_item_id = sdi.id | ||
WHERE | ||
sdit.height BETWEEN ${startHeight} AND ${endHeight} | ||
ORDER BY | ||
sdit.height ASC; | ||
`; | ||
|
||
await this.db.close(); | ||
try { | ||
await this.db.exec(`INSERT INTO tags ${query}`); | ||
log.info('Data item tags inserted into DuckDB'); | ||
} catch (error: any) { | ||
const newError = new Error('Error importing data item tags'); | ||
newError.stack = error.stack; | ||
throw newError; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement security and error handling improvements in importDataItemTags method
The importDataItemTags
method requires the same improvements as the previous import methods:
- SQL Injection vulnerability:
Use parameterized queries to prevent SQL injection. Replace the query execution with:
await this.db.exec(`INSERT INTO tags ${query}`, [startHeight, endHeight]);
Ensure that ${startHeight}
and ${endHeight}
in the query string are replaced with ?
.
- Error handling:
Preserve the original error message when re-throwing. Replace the catch block with:
} catch (error) {
throw new Error('Error importing data item tags', { cause: error });
}
These changes will enhance security and provide better error context.
async export({ | ||
outputDir, | ||
startHeight, | ||
endHeight, | ||
maxFileRows, | ||
}: { | ||
outputDir: string; | ||
startHeight: number; | ||
endHeight: number; | ||
maxFileRows: number; | ||
}): Promise<void> { | ||
const log = this.log.child({ method: 'export' }); | ||
|
||
if (this.isExporting) { | ||
log.error('An export is already in progress'); | ||
return; | ||
} | ||
this.isExporting = true; | ||
|
||
if (startHeight > endHeight) { | ||
log.error('startHeight must be less than or equal to endHeight'); | ||
return; | ||
} | ||
|
||
if (maxFileRows <= 0) { | ||
log.error('maxFileRows must be a positive number'); | ||
return; | ||
} | ||
|
||
if (!existsSync(outputDir)) { | ||
mkdirSync(outputDir, { recursive: true }); | ||
} | ||
|
||
try { | ||
// Import data into DuckDB | ||
await this.importBlocks({ startHeight, endHeight }); | ||
await this.importTransactions({ startHeight, endHeight }); | ||
await this.importDataItems({ startHeight, endHeight }); | ||
await this.importTransactionTags({ startHeight, endHeight }); | ||
await this.importDataItemTags({ startHeight, endHeight }); | ||
|
||
// Export data to Parquet files | ||
await this.exportToParquet({ | ||
outputDir, | ||
tableName: 'blocks', | ||
startHeight, | ||
endHeight, | ||
maxFileRows, | ||
}); | ||
|
||
await this.exportToParquet({ | ||
outputDir, | ||
tableName: 'transactions', | ||
startHeight, | ||
endHeight, | ||
maxFileRows, | ||
}); | ||
|
||
await this.exportToParquet({ | ||
outputDir, | ||
tableName: 'tags', | ||
startHeight, | ||
endHeight, | ||
maxFileRows, | ||
}); | ||
|
||
log.info('Parquet export complete'); | ||
} catch (error) { | ||
log.error('Error exporting Parquet files', { | ||
error: error instanceof Error ? error.message : String(error), | ||
stack: error instanceof Error ? error.stack : undefined, | ||
}); | ||
} finally { | ||
await this.db.close(); | ||
|
||
// Delete the duckdb file | ||
try { | ||
rmSync(this.duckDbPath, { recursive: true, force: true }); | ||
rmSync(`${this.duckDbPath}.wal`, { force: true }); | ||
} catch (error) { | ||
log.error(`Error deleting duckdb file ${this.duckDbPath}:`, error); | ||
} | ||
this.isExporting = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good overall structure, with suggestions for input validation
The export
method has a good structure for orchestrating the export process. The use of a flag to prevent concurrent exports is a good practice. However, there's room for improvement in input validation:
Consider throwing errors for invalid input parameters instead of just logging. This will allow the calling code to handle these issues more effectively. For example:
if (startHeight > endHeight) {
throw new Error('startHeight must be less than or equal to endHeight');
}
if (maxFileRows <= 0) {
throw new Error('maxFileRows must be a positive number');
}
The error handling in the catch block and the cleanup process in the finally block are well implemented. The method effectively manages the entire export process, including proper resource cleanup.
): Promise<{ minHeight: bigint; maxHeight: bigint }> { | ||
const query = ` | ||
SELECT MIN(height) as min_height, MAX(height) as max_height | ||
FROM ${tableName} | ||
`; | ||
const result = await this.db.all(query); | ||
|
||
return { | ||
minHeight: result[0].min_height, | ||
maxHeight: result[0].max_height, | ||
}; | ||
try { | ||
const result = await this.db.all(query); | ||
|
||
return { | ||
minHeight: result[0].min_height, | ||
maxHeight: result[0].max_height, | ||
}; | ||
} catch (error: any) { | ||
const newError = new Error(`Error getting height range for ${tableName}`); | ||
newError.stack = error.stack; | ||
throw newError; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve security and error handling in getHeightRange method
The getHeightRange
method makes good use of BigInt for height values, but requires improvements:
- SQL Injection vulnerability:
Use a whitelist of allowed table names to prevent SQL injection. For example:
const allowedTables = ['blocks', 'transactions', 'tags'];
if (!allowedTables.includes(tableName)) {
throw new Error(`Invalid table name: ${tableName}`);
}
- Error handling:
Preserve the original error message when re-throwing. Replace the catch block with:
} catch (error) {
throw new Error(`Error getting height range for ${tableName}`, { cause: error });
}
The use of BigInt for height values is appropriate for handling potentially large block heights.
height: bigint, | ||
): Promise<bigint> { | ||
const query = ` | ||
SELECT COUNT(*) as count | ||
FROM ${tableName} | ||
WHERE height = ${height} | ||
`; | ||
const result = await this.db.all(query); | ||
|
||
return Number(result[0].count); | ||
} | ||
try { | ||
const result = await this.db.all(query); | ||
|
||
private async truncateTable(tableName: string): Promise<void> { | ||
await this.db.exec(`TRUNCATE TABLE ${tableName};`); | ||
await this.db.exec(`CHECKPOINT ${tableName};`); | ||
return result[0].count; | ||
} catch (error: any) { | ||
const newError = new Error( | ||
`Error getting row count for height ${height} in ${tableName}`, | ||
); | ||
newError.stack = error.stack; | ||
throw newError; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance security and error handling in getRowCountForHeight method
The getRowCountForHeight
method appropriately uses BigInt for the height value, but needs improvements:
- SQL Injection vulnerability:
Use a whitelist for table names and parameterized query for the height value:
const allowedTables = ['blocks', 'transactions', 'tags'];
if (!allowedTables.includes(tableName)) {
throw new Error(`Invalid table name: ${tableName}`);
}
const query = `
SELECT COUNT(*) as count
FROM ${tableName}
WHERE height = ?
`;
const result = await this.db.all(query, [height]);
- Error handling:
Preserve the original error message when re-throwing. Replace the catch block with:
} catch (error) {
throw new Error(`Error getting row count for height ${height} in ${tableName}`, { cause: error });
}
The use of BigInt for the height value is appropriate for handling potentially large block heights.