You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While investigating failures of Netherite in customer code (see here) I noticed a stack trace where OOM exceptions were thrown from FASTER at a time when shutting down, which is surprising because at that point all outstanding memory operations were just being cancelled - so I was not expecting any OOMs to be thrown.
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.IO.BinaryReader.ReadBytes(Int32 count)
at DurableTask.Netherite.Faster.FasterKV.Value.Serializer.Deserialize(Value& obj) in //src/DurableTask.Netherite/StorageLayer/Faster/FasterKV.cs:line 1594
at FASTER.core.GenericAllocator2.Deserialize(Byte* raw, Int64 ptr, Int64 untilptr, Record2[] src, Stream stream)
at FASTER.core.GenericAllocator`2.AsyncReadPageWithObjectsCallback[TContext](UInt32 errorCode, UInt32 numBytes, Object context)
at DurableTask.Netherite.Faster.AzureStorageDevice.CancelAllRequests() in //src/DurableTask.Netherite/StorageLayer/Faster/AzureBlobs/AzureStorageDevice.cs:line 246
at System.Threading.CancellationToken.<>c.b__12_0(Object obj)
at System.Threading.CancellationTokenSource.Invoke(Delegate d, Object state, CancellationTokenSource source)
at System.Threading.CancellationTokenSource.CallbackNode.<>c.b__9_0(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.CancellationTokenSource.CallbackNode.ExecuteCallback()
at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException)
Taking a closer look at AsyncReadPageWithObjectsCallback, I can see that the errorCode is being basically ignored (other than for logging). I don't understand why it is o.k. for this code to read and deserialize the results even though this callback is a cancellation, i.e. the read was never completed?
private void AsyncReadPageWithObjectsCallback<TContext>(uint errorCode, uint numBytes, object context)
{
if (errorCode != 0)
{
logger?.LogError($"AsyncReadPageWithObjectsCallback error: {errorCode}");
}
PageAsyncReadResult<TContext> result = (PageAsyncReadResult<TContext>)context;
Record<Key, Value>[] src;
// We are reading into a frame
if (result.frame != null)
{
var frame = (GenericFrame<Key, Value>)result.frame;
src = frame.GetPage(result.page % frame.frameSize);
}
else
src = values[result.page % BufferSize];
// Deserialize all objects until untilptr
if (result.resumePtr < result.untilPtr)
{
MemoryStream ms = new(result.freeBuffer2.buffer);
ms.Seek(result.freeBuffer2.offset, SeekOrigin.Begin);
Deserialize(result.freeBuffer1.GetValidPointer(), result.resumePtr, result.untilPtr, src, ms);
ms.Dispose();
result.freeBuffer2.Return();
result.freeBuffer2 = null;
result.resumePtr = result.untilPtr;
}
// If we have processed entire page, return
if (result.untilPtr >= result.maxPtr)
{
result.Free();
// Call the "real" page read callback
result.callback(errorCode, numBytes, context);
return;
}
The text was updated successfully, but these errors were encountered:
While investigating failures of Netherite in customer code (see here) I noticed a stack trace where OOM exceptions were thrown from FASTER at a time when shutting down, which is surprising because at that point all outstanding memory operations were just being cancelled - so I was not expecting any OOMs to be thrown.
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.IO.BinaryReader.ReadBytes(Int32 count)
at DurableTask.Netherite.Faster.FasterKV.Value.Serializer.Deserialize(Value& obj) in //src/DurableTask.Netherite/StorageLayer/Faster/FasterKV.cs:line 1594
at FASTER.core.GenericAllocator
2.Deserialize(Byte* raw, Int64 ptr, Int64 untilptr, Record
2[] src, Stream stream)at FASTER.core.GenericAllocator`2.AsyncReadPageWithObjectsCallback[TContext](UInt32 errorCode, UInt32 numBytes, Object context)
at DurableTask.Netherite.Faster.AzureStorageDevice.CancelAllRequests() in //src/DurableTask.Netherite/StorageLayer/Faster/AzureBlobs/AzureStorageDevice.cs:line 246
at System.Threading.CancellationToken.<>c.b__12_0(Object obj)
at System.Threading.CancellationTokenSource.Invoke(Delegate d, Object state, CancellationTokenSource source)
at System.Threading.CancellationTokenSource.CallbackNode.<>c.b__9_0(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.CancellationTokenSource.CallbackNode.ExecuteCallback()
at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException)
Taking a closer look at
AsyncReadPageWithObjectsCallback
, I can see that the errorCode is being basically ignored (other than for logging). I don't understand why it is o.k. for this code to read and deserialize the results even though this callback is a cancellation, i.e. the read was never completed?The text was updated successfully, but these errors were encountered: