Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sftp): make sure to delete last file when watch and delete_on_finish are enabled #3037

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ooesili
Copy link
Contributor

@ooesili ooesili commented Nov 26, 2024

Fixes #2435

Questions

I believe I have fixed the underlying issue, but I am not sure how to write an integration test to verify the fix. I have created a new integration test function with a TODO comment on where I got stuck. The questions I have around this are:

  • My plan was to start a pipeline with watch and delete_on_finished enabled, the use an SFTP client directly to inspect which files exist on the server to make sure they are all deleted after the pipeline runs. However, I'm not sure how to actually run the pipeline. Is too specific of a test to run using integration.StreamTests(), and if not, could you point me in the right direction?
  • The other pattern I've seen would be to call newSFTPReaderFromParsed() directly from the tests then use Connect(), and ReadBatch() to interact with the plugin. However this plugin appears to be unusually structured in the way that it progresses through the input files. What it does is finds the first file in Connect() and sets up the scanner for the file. In ReadBatch(), when the file is exhausted, ReadBatch() returns service.ErrNotConnected which will cause the engine to re-run Connect() which advances to the next file. If the plugin only required Connect() to be called once, I would be happy to drive the plugin directly in the tests, but because of the reconnection logic required, I was hesitant to reimplement the reconnection loop in the tests. Is there a utility somewhere that I can use from a test that implements the reconnect logic?

@ooesili ooesili added bug inputs Any tasks or issues relating specifically to inputs labels Nov 26, 2024
@ooesili ooesili self-assigned this Nov 26, 2024
This commit reduces the scope of critical sections guarded by scannerMut
to remove a deadlock that causes the last file to not be deleted when
the SFTP input is used with watching enabled.
`(*watcherPathProvider).Next()` currently uses recursion to loop until a
path is found. This commit refactors that function to use a for loop
instead which is more straight forward to read.
@rockwotj
Copy link
Collaborator

I don't think there is a utility so either you need to do option 1 or implement the retry logic - which I don't think should be too bad?

Here's the code that drives this in benthos AFAIK: https://github.com/redpanda-data/benthos/blob/dad70374cd8fb323f0c7f47452498ea94c2ed7aa/internal/component/input/async_reader.go#L115

The pipeline option (number 1) might be the best route, but I'm not too familiar with that test helper myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug inputs Any tasks or issues relating specifically to inputs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SFTP input - last file not deleted
2 participants