-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] LocalFileExtractor block #478
[FEATURE] LocalFileExtractor block #478
Conversation
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
For clarity, this is in Draft and you wanted to fix the high level things we spoke about first, right? Then I'll first remove myself as a reviewer and you please readd me when you think the PR is ready. @OmarFourati |
b792b9d
to
18cb347
Compare
@rhazn didn't manage to repair the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are really close, well done!
We still have to:
- Ensure that no path traversal up can happen using regex instead of
.startsWith
(and test for it) - Get the CI pipeline to work again
Regarding the CI pipeline I am honestly not sure what is happening there, the import looks identical to the HttpExtractor
to me. But it has to work. Does it work locally for you?
// | ||
// SPDX-License-Identifier: AGPL-3.0-only | ||
|
||
import assert = require('assert'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We import import { strict as assert } from 'assert';
, not the actual assert lib of node.
rawData.buffer as ArrayBuffer, | ||
); | ||
|
||
assert(file instanceof BinaryFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is a useful assert, it is created as binary file above so this should always be true?
); | ||
|
||
try { | ||
const rawData = await fs.readFile(filePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add a context log message here like Successfully extraced file {path}
, similar how it is done in the HttpExtractor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
block TestBlock oftype LocalFileExtractor { | ||
filePath: './test.csv'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better naming would be ./does-not-exist.csv
which makes it obvious what the point is.
validationContext: ValidationContext, | ||
) { | ||
if (propName === 'filePath') { | ||
if (typeof propValue === 'string' && propValue.startsWith('../')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a correct check, you need to ensure there are no ..
anywhere in the path. The following path is valid for you but actually traverses up: ./../../something.csv
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to use regex here (check for the pattern ..
to exist in the string I assume? Not sure, but I am sure that is something that is easy to google). This is security relevant so has to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are tests missing for invalid file path properties that should also include edge cases. I'd suggest ../path-traversal-up.csv
and ./../../path-traversal-up.csv
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sadly don't understand exactly what or how this should be implemented. With the regex implementation, I thought all the edge cases would through the error of the path traversal.
Need a hint there.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add tests that verify that you check the property correctly. This is done already in the ZipArchiveInterpreter
we spoke about before. The two paths I mentioned in my original comment would be good examples to test, because they are both invalid and should throw errors related to invalid paths (not missing file).
I do not know what you mean by "With the regex implementation, I thought all the edge cases would through the error of the path traversal.", the test cases are to verify that is true, independently of the implementation.
// SPDX-License-Identifier: AGPL-3.0-only | ||
|
||
/** | ||
* Extracts a `File` from the local storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing naming since localstorage
exists in browsers. I'd speak of file system.
* Extracts a `File` from the local storage. | |
* Extracts a `File` from the local file system. |
output default oftype File; | ||
|
||
/** | ||
* The PATH to the file in the local storage to extract. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why PATH
? There is a PATH
env variable on basicaly any operating system so this is needlessly confusing. Just go with path
. Same comment about localstorage
as for the block description as well.
* The PATH to the file in the local storage to extract. | |
* The path to the file in the local file system to extract. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are really close, well done!
We still have to:
- Ensure that no path traversal up can happen using regex instead of
.startsWith
(and test for it) - Get the CI pipeline to work again
Regarding the CI pipeline I am honestly not sure what is happening there, the import looks identical to the HttpExtractor
to me. But it has to work. Does it work locally for you?
Yes it does. Locally all the tests and everything runs okey. |
@@ -264,7 +264,9 @@ function checkLocalFileExtractorProperty( | |||
validationContext: ValidationContext, | |||
) { | |||
if (propName === 'filePath') { | |||
if (typeof propValue === 'string' && propValue.startsWith('../')) { | |||
const validPathRegex = /^(?!.*\.\.\/).*$/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be much easier to test for an invalid path (which is anything that does include two dots I assume but please google a bit for if that makes sense). So I'd invert this check and check for invalid paths and if that is true, throw the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add tests that verify that you check the property correctly. This is done already in the ZipArchiveInterpreter
we spoke about before. The two paths I mentioned in my original comment would be good examples to test, because they are both invalid and should throw errors related to invalid paths (not missing file).
I do not know what you mean by "With the regex implementation, I thought all the edge cases would through the error of the path traversal.", the test cases are to verify that is true, independently of the implementation.
6b16cb6
to
6904ec0
Compare
f4b57ef
to
9e5ee3a
Compare
Merge branch 'main' into feature/local-file-extractor
Moved changes from old PR that is now out of sync with main (#478) to a fresh PR, based on latest main commit. Co-authored-by: OmarFourati <[email protected]>
Closed in favor of #519 so we can work in a cleaner branch. |
Related to #393
Create logic and a new extractor block that would take care of local files
Closes #393