You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We may construct a situation where we incur some data loss in our data storage when operations in the IO queue fail.
Following is a construction of the abstracted case:
The main problem lies in the condition of the cache after an IO operation has finished erroneously. If the value is still present all is well and we can try to write the data again. Problematic is the eviction of data which can happen after an operation has failed. This can happen due to two factors:
The unpinning of data in the cache.
The preemptive movement of nodes from the InWriteback state to Unmodified.
We will explain and discuss both here shortly.
Unpinning of data
Entries of the cache are held in a special Pinned state if cache references exist to them. This is done transparently to the user whenever a value is retrieved from cache and is bound to the lifetime of the returned CacheValueRef. This way we can ensure that a reference is always valid and may never be evicted from cache when it is required at some point in the stack.
The IO write backs fetch the desired references from cache and apply compression and checksumming to the data. Afterwards, this cache reference is dropped and we hold a buffer representing our changed data. This buffer is then given to a closure in the IO queue in the following style:
fnbegin_write(&self,data:Buf,offset:DiskOffset) -> Result<(),VdevError>{let inner = self.inner.clone();let(enqueue_done, wait_for_enqueue) = futures::channel::oneshot::channel();let write = self.inner.pool.spawn_with_handle(asyncmove{
wait_for_enqueue.await.unwrap();let res = inner
.by_offset(offset).write(data, offset.block_offset()).await;
inner.write_back_queue.mark_completed(&offset).await;
res
})?;let ret = self.inner.write_back_queue.enqueue(offset,Box::pin(write));// Sending fails if receiver is dropped at this point,// which means the future
enqueue_done
.send(()).expect("Couldn't unlock enqueued write task");
ret
}
After the operation has completed, successfully or not, the data value runs out of scope and is dropped. The reader might recognize a problem with this, if no reference to the cache exists and the buffer is dropped we lose the state of whichever data was stored there. Briefly, we notice an error with the storage device but may never recover or reenqueue the write operation as the state of the data is now corrupted.
Preemptive node state movement
Another mechanic to prevent early eviction of non-persistent data is the ObjectKey variant Modifiied and InWriteback. The main problem is here that the movement between node states does not await the results of the IO queue but moves after the operation has been committed to the IO queue the nodes immediately, either to evict them directly from cache or to the third ObjectKey variant Unmodified which allows unrestricted evictions.
let was_present;{letmut cache = self.cache.write();// We can safely ignore pins.// If it's pinned, it must be a readonly request.
was_present = if evict {
cache.force_remove(&ObjectKey::InWriteback(mid), object_size)}else{
cache.force_change_key(&ObjectKey::InWriteback(mid),ObjectKey::Unmodified{offset: obj_ptr.offset,generation: obj_ptr.generation,},)};if was_present {debug!("Inserted {mid:?} into written_back");self.written_back.lock().insert(mid, obj_ptr.clone());}}
Available Avenues
There are some possibilities of improvement in terms of data safety revealed in this process. We might either keep data pinned until the IO operation has successfully finished or persist the compressed buffer to try writing anew at a later point. This carries some implications on the tree structure and interlinking between nodes to avoid loss of coherence between them.
The text was updated successfully, but these errors were encountered:
We may construct a situation where we incur some data loss in our data storage when operations in the IO queue fail.
Following is a construction of the abstracted case:
The main problem lies in the condition of the cache after an IO operation has finished erroneously. If the value is still present all is well and we can try to write the data again. Problematic is the eviction of data which can happen after an operation has failed. This can happen due to two factors:
InWriteback
state toUnmodified
.We will explain and discuss both here shortly.
Unpinning of data
Entries of the cache are held in a special
Pinned
state if cache references exist to them. This is done transparently to the user whenever a value is retrieved from cache and is bound to the lifetime of the returnedCacheValueRef
. This way we can ensure that a reference is always valid and may never be evicted from cache when it is required at some point in the stack.The IO write backs fetch the desired references from cache and apply compression and checksumming to the data. Afterwards, this cache reference is dropped and we hold a buffer representing our changed data. This buffer is then given to a closure in the IO queue in the following style:
After the operation has completed, successfully or not, the
data
value runs out of scope and is dropped. The reader might recognize a problem with this, if no reference to the cache exists and the buffer is dropped we lose the state of whichever data was stored there. Briefly, we notice an error with the storage device but may never recover or reenqueue the write operation as the state of the data is now corrupted.Preemptive node state movement
Another mechanic to prevent early eviction of non-persistent data is the
ObjectKey
variantModifiied
andInWriteback
. The main problem is here that the movement between node states does not await the results of the IO queue but moves after the operation has been committed to the IO queue the nodes immediately, either to evict them directly from cache or to the thirdObjectKey
variantUnmodified
which allows unrestricted evictions.Available Avenues
There are some possibilities of improvement in terms of data safety revealed in this process. We might either keep data pinned until the IO operation has successfully finished or persist the compressed buffer to try writing anew at a later point. This carries some implications on the tree structure and interlinking between nodes to avoid loss of coherence between them.
The text was updated successfully, but these errors were encountered: