raw tables support (WIP) (#198)

benitav · simolus3 · web-flow · commit 4d4a4406cebc · 2025-07-17T09:30:13.000+02:00
* init

* Code snippets

* Fix JS typo

* More typos

* Make it clearer that powersync_crud is virtual

* polish

* polish

* mention raw tables in client architecture

* more fitting icon

* Consolidate message

* additional polish

* Mention release versions for JS

* More notes on migrations

* Add minimum versions for Dart and Kotlin

* Title case

---------

Co-authored-by: Simon Binder &lt;simon@journeyapps.com&gt;
diff --git a/architecture/client-architecture.mdx b/architecture/client-architecture.mdx
@@ -64,4 +64,8 @@ The client SDK maintains the following tables:
 
 Most rows will be present in at least two tables — the `ps_data__<table>` table, and in `ps_oplog`. It may be present multiple times in `ps_oplog`, if it was synced via multiple buckets.
 
-The copy in `ps_oplog` may be newer than the one in `ps_data__<table>`. Only when a full checkpoint has been downloaded, will the data be copied over to the individual tables. If multiple rows with the same table and id has been synced, only one will be preserved (the one with the highest `op_id`).
+The copy in `ps_oplog` may be newer than the one in `ps_data__<table>`. Only when a full checkpoint has been downloaded, will the data be copied over to the individual tables. If multiple rows with the same table and id has been synced, only one will be preserved (the one with the highest `op_id`).
+
+<Note>
+  If you run into limitations with the above JSON-based SQLite view system, check out [this experimental feature](/usage/use-case-examples/raw-tables) which allows you to define and manage raw SQLite tables to work around some limitations. We are actively seeking feedback about this functionality.
+</Note>
diff --git a/docs.json b/docs.json
@@ -187,6 +187,7 @@
                   "usage/use-case-examples/offline-only-usage",
                   "usage/use-case-examples/postgis",
                   "usage/use-case-examples/prioritized-sync",
+                  "usage/use-case-examples/raw-tables",
                   "usage/use-case-examples/custom-write-checkpoints"
                 ]
               },
diff --git a/usage/use-case-examples.mdx b/usage/use-case-examples.mdx
@@ -1,8 +1,6 @@
 ---
 title: "Use Case Examples"
 description: "Learn how to use PowerSync in common use cases"
-mode: wide
-sidebarTitle: Overview
 ---
 
 The following examples are available to help you get started with specific use cases for PowerSync:
@@ -19,6 +17,7 @@ The following examples are available to help you get started with specific use c
   <Card title="Local-only Usage" icon="laptop" href="/usage/use-case-examples/offline-only-usage" horizontal/>
   <Card title="PostGIS" icon="map" href="/usage/use-case-examples/postgis" horizontal/>
   <Card title="Prioritized Sync" icon="star" href="/usage/use-case-examples/prioritized-sync" horizontal/>
+  <Card title="Raw SQLite Tables" icon="table" href="/usage/use-case-examples/raw-tables" horizontal/>
 </CardGroup>
 
 ## Additional Resources
diff --git a/usage/use-case-examples/raw-tables.mdx b/usage/use-case-examples/raw-tables.mdx
@@ -0,0 +1,290 @@
+---
+title: "Raw SQLite Tables to Bypass JSON View Limitations"
+description: "Use raw tables for native SQLite functionality and improved performance."
+sidebarTitle: "Raw SQLite Tables"
+---
+
+<Warning>
+Raw tables are an experimental feature. We're actively seeking feedback on:
+
+- API design and developer experience
+- Additional features or optimizations needed
+
+Join our [Discord community](https://discord.gg/powersync) to share your experience and get help.
+</Warning>
+
+By default, PowerSync uses a [JSON-based view system](/architecture/client-architecture#schema) where data is stored schemalessly in JSON format and then presented through SQLite views based on the client-side schema. Raw tables allow you to define native SQLite tables in the client-side schema, bypassing this.
+
+This eliminates overhead associated with extracting values from the JSON data and provides access to advanced SQLite features like foreign key constraints and custom indexes.
+
+<Note>
+**Availability**
+
+Raw tables were introduced in the following versions of our client SDKs:
+- __JavaScript__ (Node: `0.8.0`, React-Native: `1.23.0`, Web: `1.24.0`)
+- __Dart__: Version 1.15.0 of `package:powersync`.
+- __Kotlin__: Version 1.3.0
+
+Also note that raw tables are only supported by the new [Rust-based sync client](https://releases.powersync.com/announcements/improved-sync-performance-in-our-client-sdks), which is currently opt-in.
+</Note>
+
+## When to Use Raw Tables
+
+Consider raw tables when you need:
+
+- **Advanced SQLite features** like `FOREIGN KEY` and `ON DELETE CASCADE` constraints
+- **Indexes** - PowerSync's default schema has basic support for indexes on columns, while raw tables give you complete control to create indexes on expressions, use `GENERATED` columns, etc
+- **Improved performance** for complex queries (e.g., `SELECT SUM(value) FROM transactions`) - raw tables more efficiently get these values directly from the SQLite column, instead of extracting the value from the JSON object on every row
+- **Reduced storage overhead** - eliminate JSON object overhead for each row in `ps_data__<table>.data` column
+- **To manually create tables** - Sometimes you need full control over table creation, for example when implementing custom triggers
+  
+## How Raw Tables Work
+
+### Current JSON-Based System
+
+Currently the sync system involves two general steps:
+
+1. Download sync bucket operations from the PowerSync Service
+2. Once the client has a complete checkpoint and no pending local changes in the upload queue, sync the local database with the bucket operations
+
+The bucket operations use JSON to store the individual operation data. The local database uses tables with a simple schemaless `ps_data__<table_name>` structure containing only an `id` (TEXT) and `data` (JSON) column.
+
+PowerSync automatically creates views on that table that extract JSON fields to resemble standard tables reflecting your schema.
+
+### Raw Tables Approach
+
+When opting in to raw tables, you are responsible for creating the tables before using them - PowerSync will no longer create them automatically.
+
+Because PowerSync takes no control over raw tables, you need to manually:
+
+1. Tell PowerSync how to map the [schemaless protocol](/architecture/powersync-protocol#protocol) to your raw tables when syncing data.
+2. Configure custom triggers to forward local writes to PowerSync.
+
+For the purpose of this example, consider a simple table like this:
+
+```sql
+CREATE TABLE todo_lists (
+   id TEXT NOT NULL PRIMARY KEY,
+   created_by TEXT NOT NULL,
+   title TEXT NOT NULL,
+   content TEXT
+) STRICT;
+```
+
+#### Syncing into raw tables
+
+To sync into the raw `todo_lists` table instead of `ps_data__`, PowerSync needs the SQL statements extracting
+columns from the untyped JSON protocol used during syncing.
+This involves specifying two SQL statements:
+
+1. A `put` SQL statement for upserts, responsible for creating a `todo_list` row or updating it based on its `id` and data columns.
+2. A `delete` SQL statement responsible for deletions.
+
+The PowerSync client as part of our SDKs will automatically run these statements in response to sync lines being sent from the PowerSync Service.
+
+To reference the ID or extract values, prepared statements with parameters are used. `delete` statements can reference the id of the affected row, while `put` statements can also reference individual column values.
+Declaring these statements and parameters happens as part of the schema passed to PowerSync databases:
+
+<Tabs>
+
+<Tab title="JavaScript">
+Raw tables are not included in the regular `Schema()` object. Instead, add them afterwards using `withRawTables`.
+For each raw table, specify the `put` and `delete` statement. The values of parameters are described as a JSON
+array either containing:
+
+- the string `Id` to reference the id of the affected row.
+- the object `{ Column: name }` to reference the value of the column `name`.
+
+```JavaScript
+const mySchema = new Schema({
+   // Define your PowerSync-managed schema here
+   // ...
+});
+mySchema.withRawTables({
+  // The name here doesn't have to match the name of the table in SQL. Instead, it's used to match
+  // the table name from the backend database as sent by the PowerSync service.
+  todo_lists: {
+    put: {
+      sql: 'INSERT OR REPLACE INTO todo_lists (id, created_by, title, content) VALUES (?, ?, ?, ?)',
+      params: ['Id', { Column: 'created_by' }, { Column: 'title' }, { Column: 'content' }]
+    },
+    delete: {
+      sql: 'DELETE FROM lists WHERE id = ?',
+      params: ['Id']
+    }
+  }
+});
+```
+
+We will simplify this API after understanding the use-cases for raw tables better.
+</Tab>
+
+<Tab title="Dart">
+Raw tables are not part of the regular tables list and can be defined with the optional `rawTables` parameter.
+
+```dart
+final schema = Schema(const [], rawTables: const [
+  RawTable(
+    // The name here doesn't have to match the name of the table in SQL. Instead, it's used to match
+    // the table name from the backend database as sent by the PowerSync service.
+    name: 'todo_lists',
+    put: PendingStatement(
+      sql: 'INSERT OR REPLACE INTO todo_lists (id, created_by, title, content) VALUES (?, ?, ?, ?)',
+      params: [
+        PendingStatementValue.id(),
+        PendingStatementValue.column('created_by'),
+        PendingStatementValue.column('title'),
+        PendingStatementValue.column('content'),
+      ],
+    ),
+    delete: PendingStatement(
+      sql: 'DELETE FROM todo_lists WHERE id = ?',
+      params: [
+        PendingStatementValue.id(),
+      ],
+    ),
+  ),
+]);
+```
+</Tab>
+
+<Tab title="Kotlin">
+To define a raw table, include it in the list of tables passed to the `Schema`:
+
+```Kotlin
+val schema = Schema(listOf(
+   RawTable(
+      // The name here doesn't have to match the name of the table in SQL. Instead, it's used to match
+      // the table name from the backend database as sent by the PowerSync service.
+      name = "todo_lists",
+      put = PendingStatement(
+         "INSERT OR REPLACE INTO todo_lists (id, created_by, title, content) VALUES (?, ?, ?, ?)",
+         listOf(
+            PendingStatementParameter.Id,
+            PendingStatementParameter.Column("created_by"),
+            PendingStatementParameter.Column("title"),
+            PendingStatementParameter.Column("content")
+         )
+      ),
+      delete = PendingStatement(
+         "DELETE FROM todo_lists WHERE id = ?", listOf(PendingStatementParameter.Id)
+      )
+   )
+))
+```
+</Tab>
+
+<Tab title="Swift">
+Unfortunately, raw tables are not available in the Swift SDK yet.
+</Tab>
+
+<Tab title=".NET">
+Unfortunately, raw tables are not available in the .NET SDK yet.
+</Tab>
+
+</Tabs>
+
+After adding raw tables to the schema, you're also responsible for creating them by executing the
+corresponding `CREATE TABLE` statement before `connect()`-ing the database.
+
+#### Collecting local writes on raw tables
+
+PowerSync uses an internal SQLite table to collect local writes. For PowerSync-managed views, a trigger for
+insertions, updates and deletions automatically forwards local mutations into this table.
+When using raw tables, defining those triggers is your responsibility.
+
+The [PowerSync SQLite extension](https://github.com/powersync-ja/powersync-sqlite-core) creates an insert-only virtual table named `powersync_crud` with these columns:
+
+```SQL
+CREATE VIRTUAL TABLE powersync_crud(
+   -- The type of operation: 'PUT' or 'DELETE'
+   op TEXT,
+   -- The id of the affected row
+   id TEXT,
+   type TEXT,
+   -- optional (not set on deletes): The column values for the row
+   data TEXT,
+   -- optional: Previous column values to include in a CRUD entry
+   old_values TEXT,
+   -- optional: Metadata for the write to include in a CRUD entry
+   metadata TEXT,
+);
+```
+
+The virtual table associates local mutations with the current transaction and ensures writes made during the sync
+process (applying server-side changes) don't count as local writes.
+This means that triggers can be defined on raw tables like so:
+
+```SQL
+CREATE TRIGGER todo_lists_insert
+   AFTER INSERT ON todo_lists
+   FOR EACH ROW
+   BEGIN
+      INSERT INTO powersync_crud (op, id, type, data) VALUES ('PUT', NEW.id, 'todo_lists', json_object(
+         'created_by', NEW.created_by,
+         'title', NEW.title,
+         'content', NEW.content
+      ));
+   END;
+
+CREATE TRIGGER todo_lists_update
+   AFTER INSERT ON todo_lists
+   FOR EACH ROW
+   BEGIN
+      INSERT INTO powersync_crud (op, id, type, data) VALUES ('PUT', NEW.id, 'todo_lists', json_object(
+         'created_by', NEW.created_by,
+         'title', NEW.title,
+         'content', NEW.content
+      ));
+   END;
+
+CREATE TRIGGER todo_lists_delete
+   AFTER DELETE ON todo_lists
+   FOR EACH ROW
+   BEGIN
+      INSERT INTO powersync_crud (op, id, type) VALUES ('DELETE', OLD.id, 'todo_lists');
+   END;
+```
+## Migrations
+
+In PowerSync's [JSON-based view system](/architecture/client-architecture#schema) the client-side schema is applied to the schemaless data, meaning no migrations are required. Raw tables however are excluded from this, so it is the developers responsibility to manage migrations for these tables.
+
+### Adding raw tables as a new table
+
+When you're adding new tables to your sync rules, clients will start to sync data on those tables - even if the tables aren't mentioned
+in the client's schema yet.
+So at the time you're introducing a new raw table to your app, it's possible that PowerSync has already synced some data for that
+table, which would be stored in `ps_untyped`. When adding regular tables, PowerSync will automatically extract rows from `ps_untyped`.
+With raw tables, that step is your responsibility. To copy data, run these statements in a transaction after creating the table:
+
+```
+INSERT INTO my_table (id, my_column, ...)
+   SELECT id, data ->> 'my_column' FROM ps_untyped WHERE type = 'my_table';
+DELETE FROM ps_untyped WHERE type = 'my_table';
+```
+
+This does not apply if you've been using the raw table from the beginning (and never called `connect()` without them) - you only
+need this for raw tables you already had locally.
+
+Another workaround is to clear PowerSync data when changing raw tables and opt for a full resync.
+
+### Migrating to raw tables
+
+To migrate from PowerSync-managed tables to raw tables, first:
+
+1. Open the database with the new schema mentioning raw tables. PowerSync will copy data from tables previously managed by PowerSync into
+   `ps_untyped`.
+2. Create raw tables.
+3. Run the `INSERT FROM SELECT` statement to insert `ps_untyped` data into your raw tables.
+
+### Migrations on raw tables
+
+When adding new columns to raw tables, there currently isn't a way to re-sync that table to add those columns from the server - we are
+investigating possible workarounds and encourage users to try out if they need this.
+
+To ensure the column values are accurate, you'd have to delete all data after a migration and wait for the next complete sync.
+
+## Deleting data and raw tables
+
+APIs that clear an entire PowerSync database, like e.g. `disconnectAndClear()`, don't affect raw tables.
+This should be kept in mind when you're using those methods - data from raw tables needs to be deleted explicitly.

Original file line number	Diff line number	Diff line change
`@@ -187,6 +187,7 @@`
`187`	`187`	`"usage/use-case-examples/offline-only-usage",`
`188`	`188`	`"usage/use-case-examples/postgis",`
`189`	`189`	`"usage/use-case-examples/prioritized-sync",`
	`190`	`+ "usage/use-case-examples/raw-tables",`
`190`	`191`	`"usage/use-case-examples/custom-write-checkpoints"`
`191`	`192`	`]`
`192`	`193`	`},`