Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHM: optimize metadata #1714

Merged
merged 25 commits into from
Jan 29, 2025

Conversation

yellowhatter
Copy link
Contributor

closes #1703

Combined SHM metadata in single storage that is now identified with single descriptor.

By-the-way fixed undiscovered edge case bug with buffer validity check in the receiver code.

Improves SHM performance by 35% (1.7M -> 2.3M) that moves SHM efficiency threshold to somewhat less than 512 bytes.

Breaks backward compatibility for SHM communication. Non-SHM is not affected.

@yellowhatter yellowhatter added enhancement Existing things could work better breaking-change Indicates that the issue implies a breaking change (be it at compile time or at runtime) labels Jan 14, 2025
@yellowhatter yellowhatter self-assigned this Jan 14, 2025
@yellowhatter
Copy link
Contributor Author

Put ProtocolID and ChunkDescriptor into SHM metadata: less wire overhead and less internal structure sizes for SHM

Gains 2% more performance for SHM, but this is not the only thing: it opens a way to implement publish-before-allocated-and-written concept, that should dramatically increase SHM performance and improve latency!

@@ -0,0 +1,58 @@
//
// Copyright (c) 2023 ZettaScale Technology
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure it's 2025 for newly-created files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#[stabby::stabby]
pub struct Metadata<const S: usize> {
headers: [ChunkHeaderType; S],
watchdogs: [AtomicU64; S], // todo: replace with (S + 63) / 64 when Rust supports it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious to know more about the todo comment here. What's the intended behaviour and current limitations?

Copy link
Contributor Author

@yellowhatter yellowhatter Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watchdogs are packed as bits of u64. This means that for S headers we need S/64-sized u64 array. However, stable Rust doesn't support const arithmetic for const generic parameter like this:
watchdogs: [AtomicU64; (S + 63) / 64]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation

)
}

pub fn count(&self) -> usize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If count is const, couldn't be an associated const instead of a function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in a bit different way: pub const fn
The reason is that associated constant cannot be called through object instance, only as Type::constant, so I need to carry full object type (and it is compilcated) everywhere
const fn is intended to work in the way we need

@Mallets
Copy link
Member

Mallets commented Jan 15, 2025

LGTM. I'll just wait to merge it because of the 1.1.2 release coming out tomorrow. I'd rather avoid introducing a breaking change in a patch release.

@yellowhatter
Copy link
Contributor Author

LGTM. I'll just wait to merge it because of the 1.1.2 release coming out tomorrow. I'd rather avoid introducing a breaking change in a patch release.

Sure, that's intended as I didn't put "release" label :)

@Mallets
Copy link
Member

Mallets commented Jan 21, 2025

@yellowhatter It's well understood that SHM is still marked as unstable, meaning that breaking changes may be introduced.
However, although I've already approved this PR, I was thinking wether we need to deal with this breaking change in a backward-compatible manner.
In Zenoh we have a Patch extension that identifies the actual protocol patch during the protocol handshake. Would that help, or be of any use in this case?

@yellowhatter
Copy link
Contributor Author

yellowhatter commented Jan 22, 2025

@yellowhatter It's well understood that SHM is still marked as unstable, meaning that breaking changes may be introduced. However, although I've already approved this PR, I was thinking wether we need to deal with this breaking change in a backward-compatible manner. In Zenoh we have a Patch extension that identifies the actual protocol patch during the protocol handshake. Would that help, or be of any use in this case?

@Mallets The idea is clear. I think we can add smth similar to Patch into existing SHM extension that is used to negotiate on SHM usage and supported data protocols at transport level. There will be no "backward compatibility", Zenoh will just fallback to non-SHM mode ;)

@Mallets
Copy link
Member

Mallets commented Jan 22, 2025

@Mallets The idea is clear. I think we can add smth similar to Patch into existing SHM extension that is used to negotiate on SHM usage and supported data protocols at transport level. There will be no "backward compatibility", Zenoh will just fallback to non-SHM mode ;)

I think it makes sense. This would allow us to better control the introduction of additional changes in the future.

@yellowhatter
Copy link
Contributor Author

yellowhatter commented Jan 27, 2025

@Mallets The idea is clear. I think we can add smth similar to Patch into existing SHM extension that is used to negotiate on SHM usage and supported data protocols at transport level. There will be no "backward compatibility", Zenoh will just fallback to non-SHM mode ;)

I think it makes sense. This would allow us to better control the introduction of additional changes in the future.

I have recently added versioning support. There is no additional OAM extension - I made version negotiation through SHM probing segment. That is more flexible and saves bytes on wire

self.header.header().generation.load(Ordering::SeqCst) == self.info.generation
let header = self.metadata.owned.header();

header.generation.load(Ordering::SeqCst) == self.info.generation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here two atomic operations are performed but not synchronised. What is the guarantee that watchdog_invalidated refers to the same header.generation?

It could happen that header.generation.load(Ordering::SeqCst) == self.info.generation but header.watchdog_invalidated.load(Ordering::SeqCst) (i.e. is invalidated). But between the first and the second atomic operation, the generation header could change and the watchdog_invalidated refers to a new state?

Is this a possible scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I will fix it, thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

watchdog_mask: u64,
}

unsafe impl Send for OwnedMetadataDescriptor {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the guarantees of these unsafe impls for Send and Sync traits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed these unsafe impls and made OwnedMetadataDescriptor to be Send and Sync fairly


#[inline(always)]
pub fn header(&self) -> &ChunkHeaderType {
unsafe { &(*self.header) }
Copy link
Member

@Mallets Mallets Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add # SAFETY comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore, removed this unsafe

}

pub(crate) fn validate(&self) -> u64 {
unsafe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add # SAFETY comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore, removed this unsafe

}

pub fn confirm(&self) {
unsafe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add # SAFETY comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore, removed this unsafe

core::cmp::Ordering::Equal => {}
ord => return ord,
}
self.watchdog_mask.cmp(&other.watchdog_mask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very stylistic comment but this check could be moved to the line 94 as part of the Equal branch body.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second point, what would be the case that the same watchdog_atomic pointer is associated to differentwatchdog_masks?

Copy link
Contributor Author

@yellowhatter yellowhatter Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style fixed

For the masks and watchdog_atomic: watchdogs are packed as bits of u64. Mask defines the exact bit within this u64. Descriptors with the same watchdog_atomic but different masks are different descriptors. I'm thinking of storing mask as u8 carrying the exact bit position, but it is a matter of further micro optimization and should be validated

.header()
.generation
.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
let mut guard = self.available.lock().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In zenoh-core there is the zlock! macro, which allows to quickly instrument any lock in the whole Zenoh codebase if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

pub fn validate_challenge(&self, expected_challenge: AuthChallenge, s: &str) -> bool {
let challnge_in_shm = self.challenge();
if challnge_in_shm != expected_challenge {
tracing::trace!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be promoted to debug! level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made debug


let version_in_shm = unsafe { *self.array.elem(VERSION_INDEX) };
if version_in_shm != SHM_VERSION {
tracing::trace!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be promoted to debug! level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made debug

@Mallets
Copy link
Member

Mallets commented Jan 29, 2025

@yellowhatter it seems the latest changes make some tests fail in the CI.

@yellowhatter
Copy link
Contributor Author

@yellowhatter it seems the latest changes make some tests fail in the CI.

yes, I'm working on it

@yellowhatter
Copy link
Contributor Author

[closes #1563]

@Mallets Mallets merged commit 9ba08eb into eclipse-zenoh:main Jan 29, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Indicates that the issue implies a breaking change (be it at compile time or at runtime) enhancement Existing things could work better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SHM: storages optimization
2 participants