You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running on g5.8xlarge instance (AMI- amazon/ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20250115) in private VPC that access S3 through gateway. Instance use long term credentials exposed via environment variables.
Mountpoint options
We are using mountpoint-s3-client directly
What happened?
While investigating an issue where users occasionally receive empty responses for GET requests (see awslabs/s3-connector-for-pytorch#293), we identified a potential race condition in the get_object logic. The issue may arise when the part_receiver is not yet ready at this point, but the subsequent request call here completes before data transfer begins. This leads to the stream being closed at this line, despite no data having been transferred to the client.
Relevant log output
I have modified source code of mountpoint-s3-client to get logs presented below. I'm going to add diff with changes after logs.2025-03-20T10:07:46.976405Z DEBUG _mountpoint_s3_client::mountpoint_s3_client_inner: Executing initial request key="ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG" params=GetObjectParams { range: None, if_match: None, checksum_mode: None }2025-03-20T10:07:46.976626Z DEBUG get_object{id=343 bucket="imagenet-ds" key="ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG" range=None if_match=None}: mountpoint_s3_client::s3_crt_client::get_object: new request2025-03-20T10:07:47.049884Z TRACE s3getobjectresponse{key=ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG}: mountpoint_s3_client::s3_crt_client::get_object: poll_next called, request terminated: false2025-03-20T10:07:47.050118Z TRACE get_object{id=343 bucket="imagenet-ds" key="ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG" range=None if_match=None}: mountpoint_s3_client::s3_crt_client: body part received start=0 length=1379192025-03-20T10:07:47.050171Z TRACE get_object{id=343 bucket="imagenet-ds" key="ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG" range=None if_match=None}: mountpoint_s3_client::s3_crt_client::get_object: passed bytes offset=0 lenght=1379192025-03-20T10:07:47.050191Z TRACE get_object{id=343 bucket="imagenet-ds" key="ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG" range=None if_match=None}: mountpoint_s3_client::s3_crt_client: body part passed to client start=0 length=1379192025-03-20T10:07:47.050239Z DEBUG get_object{id=343 bucket="imagenet-ds" key="ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG" range=None if_match=None}: mountpoint_s3_client::s3_crt_client: S3 request finished request_type=GetObject crt_error=None http_status=206 range=Some(0..137919) duration=72.837951ms ttfb=Some(68.210737ms) request_id=request_id2025-03-20T10:07:47.050287Z TRACE get_object{id=343 bucket="imagenet-ds" key="ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG" range=None if_match=None}: mountpoint_s3_client::s3_crt_client: S3 request completed detailed_metrics=RequestMetrics { request_id: Some("Request_Id"), start_timestamp_ns: 1275996844397509, end_timestamp_ns: 1275996917235460, send_start_timestamp_ns: Some(1275996844961420), send_end_timestamp_ns: Some(1275996845136930), receive_start_timestamp_ns: Some(1275996913347667), receive_end_timestamp_ns: Some(1275996915398108), response_status_code: Some(206), response_headers: Some(Headers { inner: 0x7779a40204d0 }), request_path_query: Some("/ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG"), host_address: Some("bucket-name.s3.us-east-1.amazonaws.com"), ip_address: Some("ip_address"), connection_id: Some(131364326309792), thread_id: Some(ThreadId(131367090259648)), request_stream_id: Some(87), request_type: GetObject, error_code: Error(0, "aws-c-common: AWS_ERROR_SUCCESS, Success.") }2025-03-20T10:07:47.050459Z DEBUG get_object{id=343 bucket="imagenet-ds" key="ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG" range=None if_match=None}: mountpoint_s3_client::s3_crt_client: meta request finished duration=73.354069ms2025-03-20T10:07:47.050543Z TRACE s3getobjectresponse{key=ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG}: mountpoint_s3_client::s3_crt_client::get_object: Request completed successfully2025-03-20T10:07:47.050567Z DEBUG _mountpoint_s3_client::mountpoint_s3_client_inner: Stream ended key="ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG" params=GetObjectParams { range: None, if_match: None, checksum_mode: None }2025-03-20T10:07:47.094659Z TRACE s3getobjectresponse{key=ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG}: mountpoint_s3_client::s3_crt_client::get_object: poll_next called, request terminated: true2025-03-20T10:07:47.094748Z DEBUG _mountpoint_s3_client::mountpoint_s3_client_inner: Stream ended key="ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG" params=GetObjectParams { range: None, if_match: None, checksum_mode: None }2025-03-20T10:07:47.094809Z TRACE s3getobjectresponse{key=ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG}: mountpoint_s3_client::s3_crt_client::get_object: poll_next called, request terminated: true2025-03-20T10:07:47.094834Z DEBUG _mountpoint_s3_client::mountpoint_s3_client_inner: Stream ended key="ILSVRC/Data/CLS-LOC/val/ILSVRC2012_val_00013144.JPEG" params=GetObjectParams { range: None, if_match: None, checksum_mode: None }Diff to get extra logs:Subject: [PATCH] Adding extra log to `get_object` pathSigned-off-by: Ilya Isaev <[email protected]>---Index: mountpoint-s3-client/src/s3_crt_client/get_object.rsIDEA additional info:Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP<+>UTF-8===================================================================diff --git a/mountpoint-s3-client/src/s3_crt_client/get_object.rs b/mountpoint-s3-client/src/s3_crt_client/get_object.rs--- a/mountpoint-s3-client/src/s3_crt_client/get_object.rs (revision 24e0743265531ea816f4d205526bc85ada78efd1)+++ b/mountpoint-s3-client/src/s3_crt_client/get_object.rs (date 1742557505469)@@ -14,6 +14,7 @@ use mountpoint_s3_crt::s3::client::{MetaRequest, MetaRequestResult}; use pin_project::pin_project; use thiserror::Error;+use tracing::{debug, error, trace, Span, debug_span}; use crate::object_client::{ Checksum, ClientBackpressureHandle, GetBodyPart, GetObjectError, GetObjectParams, ObjectClientError,@@ -105,7 +106,15 @@ } }, move |offset, data| {- let _ = part_sender.unbounded_send((offset, data.into()));+ let result = part_sender.unbounded_send((offset, data.into()));+ match result {+ Ok(_) => {+ trace!(offset=offset, lenght=data.len(), "passed bytes");+ }+ Err(err) => {+ error!(offset=offset, lenght=data.len(), error=?err, "failed to pass data");+ }+ } }, move |result| { if result.is_err() {@@ -139,6 +148,7 @@ } else { None };+ let span = debug_span!("s3getobjectresponse", key = %key); Ok(S3GetObjectResponse { request, part_receiver,@@ -146,6 +156,7 @@ backpressure_handle, headers, next_offset,+ span, }) } }@@ -198,6 +209,7 @@ headers: Headers, /// Next offset of the data to be polled from [poll_next] next_offset: u64,+ span: tracing::Span, } #[cfg_attr(not(docsrs), async_trait)]@@ -233,27 +245,39 @@ type Item = ObjectClientResult<GetBodyPart, GetObjectError, S3RequestError>; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Option<Self::Item>> {- if self.request.is_terminated() {+ let this = self.project();+ let span_clone = this.span.clone();+ let _enter = span_clone.enter(); // Enter the span for this polling operation+ trace!("poll_next called, request terminated: {}", this.request.is_terminated());+ if this.request.is_terminated() { return Poll::Ready(None); }- let this = self.project();- if let Poll::Ready(Some(item)) = this.part_receiver.poll_next(cx) {+ trace!("poll_ready: request is finished"); *this.next_offset = item.0 + item.1.len() as u64; return Poll::Ready(Some(Ok(item))); } match this.request.poll(cx) {- Poll::Ready(Ok(_)) => Poll::Ready(None),- Poll::Ready(Err(e)) => Poll::Ready(Some(Err(e))),+ Poll::Ready(Ok(_)) => {+ trace!("Request completed successfully");+ Poll::Ready(None)+ }+ Poll::Ready(Err(e)) => {+ error!("Request error: {:?}", e);+ Poll::Ready(Some(Err(e)))+ } Poll::Pending => { // If the request is still not finished but the read window is not enough to poll // the next chunk we want to return error instead of keeping the request blocked. // This prevents a risk of deadlock from using the [S3CrtClient], users must implement // their own logic to block the request if they really want to block a [GetObjectRequest].+ trace!("Request is still pending"); if let Some(handle) = &this.backpressure_handle { if *this.next_offset >= handle.read_window_end_offset() {+ error!("Backpressure limit reached: next_offset={}, window_end={}",+ *this.next_offset, handle.read_window_end_offset()); return Poll::Ready(Some(Err(ObjectClientError::ClientError( S3RequestError::EmptyReadWindow, ))));
The text was updated successfully, but these errors were encountered:
…1334)
Address an issue in the `Stream` implementation for
`S3GetObjectResponse` that could immediately return `None` (i.e.
terminate the stream) when detecting that the meta request had
completed, before returning previously received parts. Reported in
#1331.
The fix changes the mechanism used to extract the response body parts
and the request completion from the meta request callbacks. Instead of
multiple independent channels, it introduces a single channel that
supports multiple `S3GetObjectEvent`s. The events in the new channel
match the order in which the callbacks are invoked, which is guaranteed
by the CRT. The events channel also includes the `Headers` event,
avoiding the need of a separate channel to await for the headers to be
returned.
When using Mountpoint, an occurrence of this issue would result in a
read request failing with an `Input/output error`, with a warning entry
in the logs containing this message:
```
mountpoint_s3_fs::fuse: read failed with errno 5: get request failed: get request terminated unexpectedly
```
Note however that we were not able to trigger the issue in our tests.
### Does this change impact existing behavior?
No.
### Does this change need a changelog entry? Does it require a version
change?
Bug fix entry and increase patch version.
---
By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and I agree to the terms of
the [Developer Certificate of Origin
(DCO)](https://developercertificate.org/).
---------
Signed-off-by: Alessandro Passaro <[email protected]>
Mountpoint for Amazon S3 version
mountpoint-s3-client v0.13.0
AWS Region
us-east-1
Describe the running environment
Running on g5.8xlarge instance (AMI- amazon/ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20250115) in private VPC that access S3 through gateway. Instance use long term credentials exposed via environment variables.
Mountpoint options
What happened?
While investigating an issue where users occasionally receive empty responses for GET requests (see awslabs/s3-connector-for-pytorch#293), we identified a potential race condition in the
get_object
logic. The issue may arise when thepart_receiver
is not yet ready at this point, but the subsequentrequest
call here completes before data transfer begins. This leads to the stream being closed at this line, despite no data having been transferred to the client.Relevant log output
The text was updated successfully, but these errors were encountered: