Skip to content

feat(common, node): best effort to emit BatchedUpdateNotification so clients know the rowid that changed, not just which table #646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

douglascayers
Copy link

@douglascayers douglascayers commented Jun 28, 2025

Background

For my use case I need to know which rows have changed when the PowerSync client downloads data from the upstream postgres database.

The watchQuery* and onChange* client methods do not provide that granular of data. The watchQuery* methods simply rerun the query returning all records rather than the ones that changed (to be fixed in #614). The onChange* methods only indicate the table names that had changes.

The PowerSync sqlite extension's updateHook method emits a tuple of database name, table name, DML operation, and rowid when a change occurs on a table. This hook provides the data emitted by the onChange* method but the node SDK is only emitting the table names, unlike the OPSQLiteConnection which emits a full BatchedUpdateNotification object.

Problems

  • The node SDK is inconsistent with OPSQLiteConnection in emitting a full BatchedUpdateNotification so subscribers can know at least which rowids were affected. As discussed in this PR, primary key IDs is preferred but ROWIDs is what's available currently.
  • The use of ControlledExecutor may miss table updates if the updates arrive quicker than the executor can run them because the executor keeps at most one call-to-run queued.
  • The onChangeWithCallback has memory leak because does not remove the abort signal's event listener.

Changes

packages/common

  • Expand WatchOnChangeEvent interface to add update: BatchedUpdateNotification
  • Replace use of ControlledExecutor with a setInterval to flush updates at most every throttleMs
  • Introduce DisposeManager class to centralize logic around calling disposers in order and managing event listeners on abort signals

packages/node

  • Update BetterSQLite3DBAdapter.ts to emit a full BatchedUpdateNotification object rather than a partial that only indicated the table names
  • Update SqliteWorker.ts to emit a full UpdateNotification object based on data from the updateHook instead of only the table name

Copy link

changeset-bot bot commented Jun 28, 2025

🦋 Changeset detected

Latest commit: 29f6253

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 7 packages
Name Type
@powersync/common Minor
@powersync/node Minor
@powersync/web Minor
@powersync/op-sqlite Patch
@powersync/react-native Patch
@powersync/tanstack-react-query Patch
@powersync/diagnostics-app Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@simolus3
Copy link
Contributor

Without having looked at the implementation in detail yet, I think this is pretty neat. However, could you share some more details or a use-case that requires access to the rowid? I can't come up with one at the top of my head, is this primarily for more fine-grained update tracking in queries? (because we're also working on that with #614).

With PowerSync, isn't the generated id column more relevant than the rowid (I know we can't get that out of the update hook that easily)? I'm also not sure if we can guarantee that the rowid for a logical PowerSync row stays the same across sync iterations (in practice, it mostly will today - but that feels like an implementation detail to me).

@douglascayers douglascayers marked this pull request as draft June 29, 2025 02:56
@douglascayers
Copy link
Author

could you share some more details or a use-case that requires access to the rowid? I can't come up with one at the top of my head, is this primarily for more fine-grained update tracking in queries? (because we're also working on that with #614).

I'm wanting to know when sqlite rows are updated by PowerSync when it pulls data from the postgres buckets.

I started by looking at the watch* methods on AbstractPowerSyncDatabase but each time the table changed then I received all records that matched the query. A problem that #614 will fix.

So I next looked at the onChange* methods which would allow me to simply monitor one or more tables. However, it was only telling me that something happened to the watched tables but not which rows, and I need to know the rows.

I dug deeper and saw that the PowerSync sqlite extensions provide a updateHook which publishes the table name, the DML operation, and the rowid. As you noted, the actual ID would be preferred, but that hook only provides the rowid. Because of technical reasons for how sqlite publishes data on hooks only to the connection that subscribed to them, I need to go through the PowerSync client to get those updates. My own separate connection won't be notified. I then saw that the AbstractPowerSyncDatabase got its table updates by subscribing to a tablesUpdated event, which, for users using @powersync/better-sqlite3 driver, publishes data from collectCommittedUpdates which publishes data from that updateHook.

Seeing that the SDK already defines the UpdateNotification interface with those properties from the update hook event, and that the OPSQLiteConnection publishes data matching that interface, I thought that it may be an oversight that the BetterSQLite3DBAdapter only published the table names.

So the goal of this PR is to get the BetterSQLite3DBAdapter to publish the update hook data like the OPSLiteConnection does so that the PowerSyncDatabase onChange* methods can publish them, too.

With PowerSync, isn't the generated id column more relevant than the rowid (I know we can't get that out of the update hook that easily)? I'm also not sure if we can guarantee that the rowid for a logical PowerSync row stays the same across sync iterations (in practice, it mostly will today - but that feels like an implementation detail to me).

Correct. The ID column is preferred. The ROWID column does change as rows are deleted/inserted, so it is not stable. But it's better than nothing at the moment for my use case.

Because the PowerSync team's effort is focused on #614 and we've noted that long-term relying on ROWID is not ideal, I'll refactor this PR to make the least amount of changes I can to bubble up the update hook data to minimally conform to the existing UpdateNotification interface and OPSQLiteConnection behavior. If you all choose to accept the change, great, if not, I still need to use it in my project which I can do via npx patch-package.

Thanks!

@douglascayers douglascayers force-pushed the node-collect-committed-updates branch from 733d9b7 to ce67260 Compare June 29, 2025 17:17
@douglascayers
Copy link
Author

@simolus3 I've refactored the approach to mitigate memory leaks and race condition that could lead to missed updates, would appreciate your follow up review at your convenience

@douglascayers douglascayers marked this pull request as ready for review June 29, 2025 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants