SQL logic error when schemas out of sync #261

psviderski · 2024-09-26T11:19:07Z

When DB schema is changed (e.g. a new column is added) on node A and not on node B, inserts/updates from node A using new schema fail to apply on node B:

2024-09-26T10:59:26.534155Z ERROR corro_agent::agent::handlers: could not process multiple changes: rusqlite: SQL logic error (actor_id: Some(ActorId(f1d3880f-c3b7-4cbb-b0df-6e72a4459517)), version: None)
2024-09-26T10:59:27.033472Z ERROR process_multiple_changes:process_single_version:process_complete_version{actor_id=ActorId(f1d3880f-c3b7-4cbb-b0df-6e72a4459517) versions=Version(4)..=Version(4)}: corro_agent::agent::util: error=SQL logic error
2024-09-26T10:59:27.033544Z ERROR process_multiple_changes:process_single_version: corro_agent::agent::util: error=SQL logic error
2024-09-26T10:59:27.033636Z ERROR process_multiple_changes: corro_agent::agent::util: error=rusqlite: SQL logic error (actor_id: Some(ActorId(f1d3880f-c3b7-4cbb-b0df-6e72a4459517)), version: None)

A similar error occurs when trying to process a change for a new table that doesn't exist on node B:

2024-09-26T11:07:32.499239Z ERROR corro_agent::agent::handlers: could not process multiple changes: rusqlite: crsql - could not find the schema information for table todos (actor_id: Some(ActorId(a424b2f7-a75c-49dd-9f8e-cf1d5c4f0fc5)), version: None)
2024-09-26T11:07:47.472566Z  INFO corro_agent::agent::handlers: synced 26 changes w/ f1d3880fc3b74cbbb0df6e72a4459517 in 0.007235708s @ 3593.290387063712 changes/s
2024-09-26T11:07:47.499287Z ERROR process_multiple_changes:process_single_version:process_complete_version{actor_id=ActorId(a424b2f7-a75c-49dd-9f8e-cf1d5c4f0fc5) versions=Version(1)..=Version(1)}: corro_agent::agent::util: error=crsql - could not find the schema information for table todos
2024-09-26T11:07:47.499360Z ERROR process_multiple_changes:process_single_version: corro_agent::agent::util: error=crsql - could not find the schema information for table todos
2024-09-26T11:07:47.499462Z ERROR process_multiple_changes: corro_agent::agent::util: error=rusqlite: crsql - could not find the schema information for table todos (actor_id: Some(ActorId(a424b2f7-a75c-49dd-9f8e-cf1d5c4f0fc5)), version: None)

This behavior seems reasonable to prevent data loss. However, it requires every node in the cluster to run the same schema version.
To comply with that requirement, what should be the process for deploying schema changes to the cluster? One approach could be having another distributed table that would store the latest schema along with a separate process that would watch the latest schema and synchronize it to db.schema_paths followed by executing corrosion reload.
Is there a better way to handle this? How do you manage schema changes at Fly?

The text was updated successfully, but these errors were encountered:

jeromegn · 2024-09-26T13:32:03Z

That's a great question. Schema synchronization has been largely left to users.

At Fly: We have a repository containing Corrosion-related configuration that we deploy independently.

Usually this is a multi-step process, similar to a migration for a normal database:

Make changes to the schema
Deploy to 1 node and make sure it applies properly
Since schema changes should not be destructive w/ Corrosion, it shouldn't cause any issues as long as nothing writes with this new schema definition
Deploy to all other nodes
Deploy the change that uses the new schema (whatever writes to it)

If we do this properly, every node has the same schema and there are no errors. If there are no errors, that's ok because nodes will keep trying to synchronize versions they haven't been able to apply.

It's not perfect and a bit tedious... if you want to distribute the schema automatically via Corrosion, that sounds doable in the way you've described it, but a little scary sometimes. Automatic schema propagation was something we decided not to do to keep more explicit control of what happens to the schema.

gedw99 · 2024-10-06T02:06:13Z

Very interesting subject area. Sometimes called Schema Evolution.

Reminds me of Protocol Buffers, and the pattern of never doing a delete or rename, but instead always creating a new field. Then you have a little transformation engine that watches the Schema, and maps things into the Fields correctly. This mapper has to be Runtime, not designed time based, so that the Transformation changes when the Schema changes.

I use Benthos for that.

CDC of Schema changes is useful, because it can at least tell you "something is going to break".

You can then compensate at Design time, and anything that is dependant on that Schema.

If the things that are dependent on the Schema are generated off the Schema, that a Compensation technique.

If not, you can at least design to know what is "bound" to your Schema, and tell them. Sort of like how Github Dependant helps you know that a package has changed that your are importing.

psviderski · 2024-10-07T00:13:47Z

@gedw99 Can you please elaborate or provide some examples on how you use Benthos for managing transformations?

gedw99 · 2024-10-07T11:39:29Z

@gedw99 Can you please elaborate or provide some examples on how you use Benthos for managing transformations?

im a bit time short, but benthos has great docs for this and its very easy.

you can make custom processors in Benthos to transform anything flowing through the systems as a "stream".

https://github.com/redpanda-data/benthos

https://github.com/akhenakh/geo-benthos is a example.

In the case of a DB, JSONSchema is nice and reflective off some JSON, which you , I think, have on you API coming out.

https://docs.redpanda.com/redpanda-connect/components/processors/json_schema/

gedw99 · 2025-01-26T01:04:11Z

Hey @psviderski

Did you come up with a strategy ?

You can use nats Jetstream as a way to manage the process of doing a schema upgrade maybe .

It’s kind of good as a control plane for these sorts of things . Like an old command and control bot net , but a corrosion net :)

psviderski · 2025-01-30T03:39:00Z

Thanks for the advice! No, I haven’t come up with a strategy yet. I will eventually need a more or less automated and controlled way to deploy schema changes to a cluster of machines in uncloud project that uses corrosion as a decentralised cluster state. But at this stage I make sure the latest schema is embedded into the daemon binary. When the daemon starts, it updates the schema file for corrosion. With this in place, I just need to update all the nodes to the latest daemon version in a relatively short period of time. I’m fine with this for now.

gedw99 · 2025-01-30T05:07:39Z

Thanks @psviderski for the explanation.

I will swing back to uncloud soon . In terraform and kamal fun and games currently:)

It’s funny because every layer we make we also have to upgrade that layer , like uncloud itself

gedw99 · 2025-02-07T03:55:14Z

I will do what @jeromegn suggests in #261 (comment)

Have done similar before .

I am guessing it will also have to account for unreachable nodes too a bit like:

Any node not reachable, when it’s comes back online , will ask for latest data up to the last schema change , then get the schema change , then keep catching up .

This will require a registry that’s global and all nodes check in to I suspect ..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL logic error when schemas out of sync #261

SQL logic error when schemas out of sync #261

psviderski commented Sep 26, 2024

jeromegn commented Sep 26, 2024

gedw99 commented Oct 6, 2024 •

edited

Loading

psviderski commented Oct 7, 2024

gedw99 commented Oct 7, 2024 •

edited

Loading

gedw99 commented Jan 26, 2025

psviderski commented Jan 30, 2025

gedw99 commented Jan 30, 2025 •

edited

Loading

gedw99 commented Feb 7, 2025

SQL logic error when schemas out of sync #261

SQL logic error when schemas out of sync #261

Comments

psviderski commented Sep 26, 2024

jeromegn commented Sep 26, 2024

gedw99 commented Oct 6, 2024 • edited Loading

psviderski commented Oct 7, 2024

gedw99 commented Oct 7, 2024 • edited Loading

gedw99 commented Jan 26, 2025

psviderski commented Jan 30, 2025

gedw99 commented Jan 30, 2025 • edited Loading

gedw99 commented Feb 7, 2025

gedw99 commented Oct 6, 2024 •

edited

Loading

gedw99 commented Oct 7, 2024 •

edited

Loading

gedw99 commented Jan 30, 2025 •

edited

Loading