Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File_info_poller send_stream() problem #656

Open
kurotych opened this issue Oct 31, 2023 · 5 comments
Open

File_info_poller send_stream() problem #656

kurotych opened this issue Oct 31, 2023 · 5 comments

Comments

@kurotych
Copy link
Member

Hello.
When I worked with the file_store library I faced a critical (as for me) issue in the file_info_poller module.
Currently, I don't use this part of the code but I think it is worth mentioning.

I had an error "Error streaming entry" due to reset connections by a peer (I kept it too long open) .

But I can't detect it because all I have "outside" is only the next() function.
So I can't distinguish if it is the end of the file or an error in a stream. In both cases, the loop finishes because next() returns None.

let mut rewards = file.into_stream(&mut tx).await?;
while let Some(reward) = rewards.next().await {
  ...
}

https://github.com/helium/oracles/blob/main/file_store/src/file_info_poller.rs#L175
image

@madninja
Copy link
Member

@kurotych do you actually have an instance where streaming from a file causes an error that ends up in a None? The primary case here would be if you're trying to decode the wrong kind of proto from the file I guess.. or the filesystem has corruption which would be very rare

@kurotych
Copy link
Member Author

kurotych commented Nov 2, 2023

Currently, I don't have any instances.

The error message was "Error streaming entry in the file of type..." so it is not a proto problem.
I believe the problem was caused by closing the connection from the AWS side (I kept it open long because of heavy computation).

I'm going to return to developing an appropriate module that can use this code next week and I'll try to prepare clear steps to reproduce.

@kurotych
Copy link
Member Author

Releated: #715

@bbalser
Copy link
Contributor

bbalser commented Feb 22, 2024

@kurotych I believe this issue is fixed with #715 , can you confirm?

@kurotych
Copy link
Member Author

kurotych commented Feb 23, 2024

Hi @bbalser. I'm not sure.

What I see is a good improvement in file_info_poller. Now you parse proto files inside file_info_poller this significantly reduces the open connection time (to AWS) so reduces the chance of occurring error described above.

But, I still see an old part of the code in main branch that caused an issue and "hides" the connection error under coupling ok() and filter_map().
That means that the chance to parse half of the file (for example) before the connection was closed (reset by aws or due to connection reason) exists.

I believe parse_file must return an error if the connection was closed/reset because it means that the file wasn't fully downloaded.
Currently, it looks like in the connection trouble case it just finishes parsing the file, prints errors to logs, and returns with Ok() losing another part of the file.

Maybe I'm wrong but I don't have the appropriate tool and time to test it.

That's why I still prefer (or have to) to use the cut version of this file, without this function.

https://github.com/helium/oracles/blob/main/file_store/src/file_info_poller.rs#L262
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants