Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] LocalFileExtractor block #478

Closed

Conversation

OmarFourati
Copy link
Collaborator

@OmarFourati OmarFourati commented Dec 1, 2023

Related to #393

Create logic and a new extractor block that would take care of local files

Closes #393

Copy link

github-actions bot commented Dec 1, 2023

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@OmarFourati
Copy link
Collaborator Author

I have read the CLA Document and I hereby sign the CLA

@OmarFourati OmarFourati marked this pull request as draft December 1, 2023 10:03
@OmarFourati OmarFourati added the softwarecampus Issues related to the Softwarecampus grant label Dec 1, 2023
@OmarFourati OmarFourati self-assigned this Dec 1, 2023
@OmarFourati OmarFourati requested a review from rhazn December 1, 2023 13:32
@rhazn rhazn removed their request for review December 4, 2023 13:42
@rhazn
Copy link
Contributor

rhazn commented Dec 4, 2023

For clarity, this is in Draft and you wanted to fix the high level things we spoke about first, right? Then I'll first remove myself as a reviewer and you please readd me when you think the PR is ready. @OmarFourati

@OmarFourati OmarFourati marked this pull request as ready for review December 21, 2023 13:40
@OmarFourati OmarFourati requested a review from rhazn December 23, 2023 15:48
@OmarFourati OmarFourati force-pushed the feature/local-file-extractor branch from b792b9d to 18cb347 Compare December 26, 2023 10:08
@OmarFourati
Copy link
Collaborator Author

@rhazn didn't manage to repair the pipeline.
It fails because of file-util not being found but it follows the exaxt logic of other files and should definitely be found.
I need help there to be able to get review on this PR

Copy link
Contributor

@rhazn rhazn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are really close, well done!

We still have to:

  • Ensure that no path traversal up can happen using regex instead of .startsWith (and test for it)
  • Get the CI pipeline to work again

Regarding the CI pipeline I am honestly not sure what is happening there, the import looks identical to the HttpExtractor to me. But it has to work. Does it work locally for you?

//
// SPDX-License-Identifier: AGPL-3.0-only

import assert = require('assert');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We import import { strict as assert } from 'assert';, not the actual assert lib of node.

rawData.buffer as ArrayBuffer,
);

assert(file instanceof BinaryFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is a useful assert, it is created as binary file above so this should always be true?

);

try {
const rawData = await fs.readFile(filePath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add a context log message here like Successfully extraced file {path}, similar how it is done in the HttpExtractor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

block TestBlock oftype LocalFileExtractor {
filePath: './test.csv';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better naming would be ./does-not-exist.csv which makes it obvious what the point is.

validationContext: ValidationContext,
) {
if (propName === 'filePath') {
if (typeof propValue === 'string' && propValue.startsWith('../')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a correct check, you need to ensure there are no .. anywhere in the path. The following path is valid for you but actually traverses up: ./../../something.csv.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to use regex here (check for the pattern .. to exist in the string I assume? Not sure, but I am sure that is something that is easy to google). This is security relevant so has to be fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tests missing for invalid file path properties that should also include edge cases. I'd suggest ../path-traversal-up.csv and ./../../path-traversal-up.csv.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sadly don't understand exactly what or how this should be implemented. With the regex implementation, I thought all the edge cases would through the error of the path traversal.
Need a hint there.
Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add tests that verify that you check the property correctly. This is done already in the ZipArchiveInterpreter we spoke about before. The two paths I mentioned in my original comment would be good examples to test, because they are both invalid and should throw errors related to invalid paths (not missing file).

I do not know what you mean by "With the regex implementation, I thought all the edge cases would through the error of the path traversal.", the test cases are to verify that is true, independently of the implementation.

// SPDX-License-Identifier: AGPL-3.0-only

/**
* Extracts a `File` from the local storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusing naming since localstorage exists in browsers. I'd speak of file system.

Suggested change
* Extracts a `File` from the local storage.
* Extracts a `File` from the local file system.

output default oftype File;

/**
* The PATH to the file in the local storage to extract.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why PATH? There is a PATH env variable on basicaly any operating system so this is needlessly confusing. Just go with path. Same comment about localstorage as for the block description as well.

Suggested change
* The PATH to the file in the local storage to extract.
* The path to the file in the local file system to extract.

Copy link
Contributor

@rhazn rhazn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are really close, well done!

We still have to:

  • Ensure that no path traversal up can happen using regex instead of .startsWith (and test for it)
  • Get the CI pipeline to work again

Regarding the CI pipeline I am honestly not sure what is happening there, the import looks identical to the HttpExtractor to me. But it has to work. Does it work locally for you?

@OmarFourati
Copy link
Collaborator Author

I think we are really close, well done!

We still have to:

  • Ensure that no path traversal up can happen using regex instead of .startsWith (and test for it)
  • Get the CI pipeline to work again

Regarding the CI pipeline I am honestly not sure what is happening there, the import looks identical to the HttpExtractor to me. But it has to work. Does it work locally for you?

Yes it does. Locally all the tests and everything runs okey.
I don't understand neither. the util import doesn't show any error locally as well

@OmarFourati OmarFourati requested a review from rhazn December 27, 2023 18:02
@@ -264,7 +264,9 @@ function checkLocalFileExtractorProperty(
validationContext: ValidationContext,
) {
if (propName === 'filePath') {
if (typeof propValue === 'string' && propValue.startsWith('../')) {
const validPathRegex = /^(?!.*\.\.\/).*$/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be much easier to test for an invalid path (which is anything that does include two dots I assume but please google a bit for if that makes sense). So I'd invert this check and check for invalid paths and if that is true, throw the error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add tests that verify that you check the property correctly. This is done already in the ZipArchiveInterpreter we spoke about before. The two paths I mentioned in my original comment would be good examples to test, because they are both invalid and should throw errors related to invalid paths (not missing file).

I do not know what you mean by "With the regex implementation, I thought all the edge cases would through the error of the path traversal.", the test cases are to verify that is true, independently of the implementation.

@OmarFourati OmarFourati force-pushed the feature/local-file-extractor branch from 6b16cb6 to 6904ec0 Compare January 9, 2024 11:22
@OmarFourati OmarFourati force-pushed the feature/local-file-extractor branch from f4b57ef to 9e5ee3a Compare January 12, 2024 15:25
rhazn added a commit that referenced this pull request Feb 6, 2024
Moved changes from old PR that is now out of sync with main (#478) to a fresh PR, based on latest main commit.

Co-authored-by: OmarFourati <[email protected]>
@rhazn rhazn mentioned this pull request Feb 6, 2024
@rhazn
Copy link
Contributor

rhazn commented Feb 6, 2024

Closed in favor of #519 so we can work in a cleaner branch.

@rhazn rhazn closed this Feb 6, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Feb 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
softwarecampus Issues related to the Softwarecampus grant
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Extract files from local file system
2 participants