Backfill teams - CE #4925

aerosol · 2024-12-18T09:47:50Z

No description provided.

zoldar · 2024-12-18T10:07:44Z

lib/plausible/data_migration/backfill_teams.ex

+          site: s,
+          inviter: inv
+        }
+        # preload: [inviter: inv, site: s]


aerosol · 2024-12-18T10:11:36Z

@ruslandoga the belief here is, running this (with dry_run?: false) should be the only step required for self-hosters to update to current master. Would you mind giving it a look? Thanks

ruslandoga · 2024-12-18T10:44:07Z

👋

Could you please summarise the changes / possible problems here? I don't know what to be looking out for :)

ruslandoga · 2024-12-18T11:38:23Z

lib/plausible/data_migration/backfill_teams.ex

+        Application.get_env(:plausible, Plausible.Repo)[:url]
+      )
+
+    @repo.start(db_url, pool_size: 2 * @max_concurrency)


Can this be lowered? Some free-tier / entry level hosted PostgreSQL instances limit the pool size to 25. So the "normal" repo and this data migration repo would compete for available connections and flood the logs with non-actionable errors.

Yes, this is adapted from what we've been running on prod.

ruslandoga · 2024-12-18T11:39:39Z

lib/plausible/data_migration/backfill_teams.ex

+
+    orphaned_teams =
+      from(
+        t in Plausible.Teams.Team,


Since CE can run this migration at any point in the future, would it be safer to hard code table names (here and in every other Ecto query) at the moment of the next CE release?

I think we have already had a similar problem with sites.

Because it's compiled within lib/, I think we'll get warnings sooner?

I don't think there would be any warnings.

I think we have already had a similar problem with sites.

Sorry, what problem exactly?

I cannot find it but basically there was a similar query that was using a schema and no select and it was failing in CE because some column in the Ecto schema didn't yet exist in the database because the migration for that column was being run afterwards. I think you were the one who fixed it by adding an explicit select: :)

The problem with these is that they only warn / raise at runtime.

Found it: #4459 (comment)

I wonder if Elixir 1.18 would surface this at compile time 🤔

ruslandoga · 2024-12-18T11:40:26Z

lib/plausible/data_migration/backfill_teams.ex

+    backfill(dry_run?)
+  end
+
+  defp backfill(dry_run?) do


Would it benefit from being a single transaction?

Otherwise it seems like if it fails half-way, it would always fail in the future due to insert conflicts.

The only reason for multiple transactions is to speed up execution time for many sites. Any specific insert conflicts you're predicting? We've been re-running it successfully. It will only catch up with records that require attention on subsequent executions. Setting up the pool size to one should make it possible to wrap in a single transaction, even with parallel Tasks spawned I think.

Any of the repo calls can raise (db pool timeout, no connection, etc.) and stop the execution thus leaving the DB in a potentially inconsistent state, where some records have already been inserted and inserting them again would cause a conflict. I don't know the specifics of this migration so I can't immediately provide examples where exactly that can happen. But simply wrapping it in a single transaction is easy and safe.

This backfill procedure was specifically designed in a away where it can be resumed at any point. Operation where consistency needs to be ensured across multiple db operations are wrapped up in a transaction. We can of course wrap it all up in a single transaction as well, if you prefer.

Yes, let's wrap it in a single transaction and remove async operations.

If it would be easier, I can make the changes I commented about and then re-request review from the core team to make sure the intended migration logic is preserved.

ruslandoga · 2024-12-18T11:42:53Z

lib/plausible/data_migration/backfill_teams.ex

+        :pass
+    end)
+    |> Enum.with_index()
+    |> Task.async_stream(


Does it have to be async?

Only if we want to utilize the pool, shouldn't matter for small CE instances. I think we can remove any parallel processing then.

I think we can remove any parallel processing then.

Let's do this, I think it would be easier to understand and debug if it goes wrong for one of the self-hosters.

ruslandoga · 2024-12-18T11:45:05Z

lib/plausible/data_migration/backfill_teams.ex

+      |> Ecto.Changeset.put_change(:updated_at, guest_invitation.updated_at)
+      |> @repo.update!()
+
+      if rem(idx, 1000) == 0 do


It can probably be lowered for CE. Or just print each site that's being updated right now.

Absolutely 👍

ruslandoga · 2024-12-18T11:49:05Z

Btw, does it mean the Teams is ready to be released and I don't need to cherry-pick v2.1.5? Or should we still wait? If we should wait, then I think it's better to merge this migration right before the CE release to avoid problems with the schemas.

Uku said it might still be too early to include Teams into CE, even after this PR. And we should instead wait until there are new features using the new Teams schemas. I'll keep on cherry-picking for the next release for now. I think this can stay open until the next full CE release.

aerosol · 2024-12-19T06:01:23Z

I think this can stay open until the next full CE release.

Great, thanks. Roughly, when will that be?

aerosol · 2024-12-19T06:08:17Z

Could you please summarise the changes / possible problems here? I don't know what to be looking out for :)

Gah I'm sorry for not providing much context.
We're working on introducing Teams to Plausible, that will, among other things, change how memberships and subscriptions are managed. That significantly affects the underlying data model. For users that are sole guests, or sole owners, not much should change even at the UI level, so we've been trying to remap data from the old model to the new one gradually, without disruptions, before actually implementing any of the new features. So we've switched all the reads first (to use backfilled/mirrored data) and then started writing to those new tables exclusively, silently deprecating the old schema. For CE it's hopefully much simpler: shutdown, run quick backfill, bring it back up with upgraded code.

aerosol added 2 commits December 18, 2024 10:50

OG script

b50bf62

Adapt to CE

13d751f

aerosol force-pushed the backfill-teams-ce branch from 5b7fd98 to 13d751f Compare December 18, 2024 09:55

aerosol assigned zoldar Dec 18, 2024

zoldar reviewed Dec 18, 2024

View reviewed changes

lib/plausible/data_migration/backfill_teams.ex Outdated

site: s,

inviter: inv

}

# preload: [inviter: inv, site: s]

Copy link

Contributor

zoldar Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Remove commented line

8a2d4e6

zoldar approved these changes Dec 18, 2024

View reviewed changes

aerosol requested a review from ruslandoga December 18, 2024 10:10

ruslandoga reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backfill teams - CE #4925

Backfill teams - CE #4925

aerosol commented Dec 18, 2024

zoldar Dec 18, 2024

aerosol commented Dec 18, 2024

ruslandoga commented Dec 18, 2024 •

edited

Loading

ruslandoga Dec 18, 2024 •

edited

Loading

aerosol Dec 19, 2024

ruslandoga Dec 18, 2024 •

edited

Loading

ruslandoga Dec 18, 2024

aerosol Dec 19, 2024

ruslandoga Dec 19, 2024

aerosol Dec 19, 2024

ruslandoga Dec 19, 2024

ruslandoga Dec 19, 2024

ruslandoga Dec 19, 2024

zoldar Dec 19, 2024

ruslandoga Dec 18, 2024

ruslandoga Dec 18, 2024

aerosol Dec 19, 2024

ruslandoga Dec 19, 2024 •

edited

Loading

zoldar Dec 19, 2024

ruslandoga Dec 19, 2024 •

edited

Loading

ruslandoga Dec 19, 2024 •

edited

Loading

ruslandoga Dec 18, 2024

aerosol Dec 19, 2024

ruslandoga Dec 19, 2024

ruslandoga Dec 18, 2024

aerosol Dec 19, 2024

ruslandoga commented Dec 18, 2024 •

edited

Loading

aerosol commented Dec 19, 2024

aerosol commented Dec 19, 2024

Backfill teams - CE #4925

Are you sure you want to change the base?

Backfill teams - CE #4925

Conversation

aerosol commented Dec 18, 2024

Choose a reason for hiding this comment

aerosol commented Dec 18, 2024

ruslandoga commented Dec 18, 2024 • edited Loading

ruslandoga Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruslandoga Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruslandoga Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruslandoga Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

ruslandoga Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruslandoga commented Dec 18, 2024 • edited Loading

aerosol commented Dec 19, 2024

aerosol commented Dec 19, 2024

ruslandoga commented Dec 18, 2024 •

edited

Loading

ruslandoga Dec 18, 2024 •

edited

Loading

ruslandoga Dec 18, 2024 •

edited

Loading

ruslandoga Dec 19, 2024 •

edited

Loading

ruslandoga Dec 19, 2024 •

edited

Loading

ruslandoga Dec 19, 2024 •

edited

Loading

ruslandoga commented Dec 18, 2024 •

edited

Loading