Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

websocket: send/receive reducer & table ids instead of names #1883

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

Centril
Copy link
Contributor

@Centril Centril commented Oct 21, 2024

Description of Changes

Receive and send reducer ids and table ids as opposed to names in the SDK.

TODO:

Fixes https://github.com/clockworklabs/SpacetimeDBPrivate/issues/1091.

API and ABI breaking changes

Yes.

@Centril Centril force-pushed the centril/websocket-light branch 2 times, most recently from c659ca4 to 7e91c80 Compare October 22, 2024 20:17
@Centril Centril force-pushed the centril/websocket-use-ids branch 4 times, most recently from 13053e0 to 9ed5071 Compare October 23, 2024 02:02
@Centril Centril force-pushed the centril/websocket-light branch from 18f4e87 to b6689e9 Compare October 23, 2024 06:56
@Centril Centril force-pushed the centril/websocket-use-ids branch from a9e63d5 to fd40008 Compare October 23, 2024 07:02
@Centril Centril force-pushed the centril/websocket-light branch from b6689e9 to 33e815c Compare October 23, 2024 07:16
@Centril Centril force-pushed the centril/websocket-use-ids branch from fd40008 to 73488dc Compare October 23, 2024 07:57
@Centril Centril force-pushed the centril/websocket-use-ids branch from 3c60243 to 10c465f Compare October 23, 2024 14:50
@Centril Centril force-pushed the centril/websocket-light branch from eafc64b to 9c85842 Compare October 23, 2024 15:04
@Centril Centril force-pushed the centril/websocket-use-ids branch from 10c465f to e6c7062 Compare October 23, 2024 15:04
@Centril Centril force-pushed the centril/websocket-light branch from 9c85842 to 01d3cd9 Compare October 30, 2024 12:50
@Centril Centril force-pushed the centril/websocket-use-ids branch from b224fd6 to 6cc4313 Compare October 30, 2024 13:50
Copy link
Contributor

@gefjon gefjon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please update the PR description to match the template.
  • I would like to see a Rust SDK test added which verifies that it's possible to construct a connection and then immediately invoke a reducer, without first waiting for the on-connect callback. I would expect this to queue the reducer call until the handshake message is received. The code appears to do this correctly, but I want a test to make sure we don't regress it in the future.

Comment on lines -315 to +317
fn generate_table(&self, module: &ModuleDef, namespace: &str, tbl: &TableDef) -> String;
fn generate_table(&self, idx: u32, module: &ModuleDef, namespace: &str, tbl: &TableDef) -> String;
fn generate_type(&self, module: &ModuleDef, namespace: &str, typ: &TypeDef) -> String;
fn generate_reducer(&self, module: &ModuleDef, namespace: &str, reducer: &ReducerDef) -> String;
fn generate_reducer(&self, idx: u32, module: &ModuleDef, namespace: &str, reducer: &ReducerDef) -> String;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change to add indices to the client codegen? I believe, but would like it confirmed, that these are not table IDs which could be sent over a WebSocket; rather, they appear to be an unrelated optimization to the SDK internals.

If that is the case, I would strongly prefer this be broken out into a separate PR, and frankly I would see very little reason to approve such a PR without measurements showing that HashMap lookups of table and reducer name strings are a significant performance overhead in the SDKs. I would also question why we'+ prefer to introduce a new ID to this interface, rather than a change purely internal to the codegen, possibly involving either generating these indices within the codegen, or possibly involving a perfect hash function.

If that is not the case, and this change is necessary or useful towards the actual goal of this PR, namely transmitting integer IDs rather than name strings over the WebSocket API, then I would like to see documentation on this trait's methods describing what these indices mean, where they come from and how they're used. I would also like the PR description amended to describe this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change to add indices to the client codegen? I believe, but would like it confirmed, that these are not table IDs which could be sent over a WebSocket; rather, they appear to be an unrelated optimization to the SDK internals.

Indeed, these are indices into the 2 arrays, one for reducer names and one for reducer ids and the same for table names <-> table ids. This seemed like the most efficient way to represent things, avoiding hash maps in many cases in favor of just Vec<_>s.

If that is the case, I would strongly prefer this be broken out into a separate PR, [...]

Sure, I can do that, but then it will likely miss the train...

[...] and frankly I would see very little reason to approve such a PR without measurements showing that HashMap lookups of table and reducer name strings are a significant performance overhead in the SDKs. I would also question why we'+ prefer to introduce a new ID to this interface, rather than a change purely internal to the codegen, possibly involving either generating these indices within the codegen, or possibly involving a perfect hash function.

Why it would need to be a significant perf overhead. It seems clear to me that it is a perf win and more predictable to identity hash an u32 and using it as an index into a vector, rather than hashing strings. I don't understand what you mean regarding a change internal to the codegen. Whatever we do, we have to keep mappings K -> ID and sometimes ID -> K in the incoming-message-loop and DbContextImpl, as the values of ID are determined at handshake. K could then be &'static str or an index into the list of reducers/tables. This PR opted for the latter for efficiency reasons.

If that is not the case, and this change is necessary or useful towards the actual goal of this PR, namely transmitting integer IDs rather than name strings over the WebSocket API, then I would like to see documentation on this trait's methods describing what these indices mean, where they come from and how they're used. I would also like the PR description amended to describe this change.

Sounds good, I will definitely add that documentation/amend the PR description.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not dispute that this is obviously better for runtime performance, but there are non-technical costs to making changes like this. For example:

  • I have a stack of outstanding PRs, ending in an actual P1 ticket with actual performance and usability implications for the SDK, which will significantly conflict with this change.
  • Reviewing this PR was made artificially and unnecessarily more difficult because it included this change, and I had to sort through which parts of the diff related to the stated goal of the PR versus which parts related to this change.
  • Introducing a third identifier for database objects, in addition to the string names and the runtime IDs, adds significant potential for confusion. It's already not great that tables have both names and IDs, with different semantics, but at least the names are of an obviously distinct type from the IDs, so there's less risk of confusing the two. I believe there to be a very real cost in code complexity and maintenance to adding another integer identifier for database objects which is not interchangeable with IDs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the additional cost that making a larger change may introduce more bugs. As the person who has to deal with all the bugs and their downstream ramifications, I can tell you that the cost is very very high. We just dealt with it most recently with the change to u256 for Identity. The nature of these things is that it always feels like it will probably be fine, but it rarely is and then we pay the price downstream.

I agree with @gefjon, please separate the PR. I don't think this one can go in, but we'll get it in eventually. Either that or we can just have this be a different version of the API and maintain both and eventually deprecate the one Lightfox is using currently.

crates/cli/src/subcommands/generate/mod.rs Show resolved Hide resolved
Comment on lines -513 to +514
let table_name = &*table.get_schema().table_name;

if !deletes.is_empty() {
let table_name = &table.get_schema().table_name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to the PR, or just a drive-by change? Is it actually an optimization, i.e. is get_schema expensive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the caveat that I haven't spent time on this PR in a few days, this does look like a drive-by change, but also trivial code motion... This is a micro optimization, shaving off a hash map lookup, that LLVM is unlikely to elide.. Mostly though, I think I mostly just wanted to move it closer to the usage, for better readability.

let table_name = &*commit_table.get_schema().table_name;

if !inserts.is_empty() {
let table_name = &commit_table.get_schema().table_name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same questions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -899,7 +899,6 @@ impl<F: FnMut(u64)> spacetimedb_commitlog::payload::txdata::Visitor for ReplayVi
reader: &mut R,
) -> std::result::Result<Self::Row, Self::Error> {
let schema = self.committed_state.schema_for_table(table_id)?;
// TODO: avoid clone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this comment removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an Arc now, so cloning is cheap and so we don't need to care anymore.

@@ -82,7 +82,7 @@ Requested namespace: {namespace}",
output.into_inner()
}

fn generate_table(&self, module: &ModuleDef, namespace: &str, table: &TableDef) -> String {
fn generate_table(&self, _idx: u32, module: &ModuleDef, namespace: &str, table: &TableDef) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does TypeScript not need or use these indices?

Copy link
Contributor Author

@Centril Centril Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit of the PR was not fully done; TS should also be using the indices. I'll get that done today.

Comment on lines -117 to +121
UnknownField { field: String, tables: Vec<Box<str>> },
UnknownField { field: String, tables: Vec<Arc<str>> },
#[error("Unknown field name: `{field}` not found in the table(s): `{tables:?}`")]
UnknownFieldName { field: FieldName, tables: Vec<Box<str>> },
UnknownFieldName { field: FieldName, tables: Vec<Arc<str>> },
#[error("Field(s): `{fields:?}` not found in the table(s): `{tables:?}`")]
UnknownFields { fields: Vec<String>, tables: Vec<Box<str>> },
UnknownFields { fields: Vec<String>, tables: Vec<Arc<str>> },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these (and similar changes from Box to Arc in this PR) related to the goal of the PR? Could they be split into a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I decided to do this at the point I remembered that we wanted to keep the names in the JSON format but not in the BSATN format. This meant that we would have had to, in most cases, clone e.g., table_name: Box<str> only to throw them away at the end. That seemed wasteful, so I decided to make these into Arcs to avoid the actual unnecessary heap allocations. This could be split into a separate PR, but that would make it harder to merge this in time though.

Comment on lines 53 to +54
/// Maps table name to a set of callbacks.
table_callbacks: HashMap<&'static str, TableCallbacks<M>>,
table_callbacks: IntMap<u32, TableCallbacks<M>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is no longer correct. It's also not clear what the actual key is, now - is it the table ID received during handshake, or the table index computed at codegen time?

Base automatically changed from centril/websocket-light to master November 4, 2024 17:19
@bfops bfops force-pushed the centril/websocket-use-ids branch from 0ff7391 to 7915b10 Compare November 4, 2024 17:37
Copy link
Collaborator

@bfops bfops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the CLI changes that I'm a codeowner for - I did not review crates/cli/src/subcommands/generate/mod.rs because that diff is more about "code generation" than about "CLI".

@Centril
Copy link
Contributor Author

Centril commented Nov 4, 2024

Please update the PR description to match the template.

👍

  • I would like to see a Rust SDK test added which verifies that it's possible to construct a connection and then immediately invoke a reducer, without first waiting for the on-connect callback. I would expect this to queue the reducer call until the handshake message is received. The code appears to do this correctly, but I want a test to make sure we don't regress it in the future.

This is exercised by the test exec_caller_always_notified which failed before I adjusted the SDK to first always process the handshake.

@bfops
Copy link
Collaborator

bfops commented Dec 2, 2024

@Centril is this API breaking and ABI breaking? Can you add the appropriate labels?

@bfops bfops added the api-break A PR that makes an API breaking change label Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-break A PR that makes an API breaking change release-rc1-nice-to-have
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Websocket API: Establish reducer/table ids <-> names during handshake (IdentityConnected)
4 participants