Invalidate compiled script cache when it is updated #2434

killme2008 · 2023-09-18T11:29:18Z

What type of enhancement is this?

User experience, API

What does the enhancement do?

We have to invalidate other frontend's compiled python script cache when the script is updated in cluster mode.

We need an event broadcast mechanism to do it.

Implementation challenges

No response

MichaelScofield · 2024-02-26T11:16:07Z

It can be done by calling Mailbox::broadcast to submit a "script invalidation request" to all the frontend nodes. There's an example to start with, see struct InvalidateCache.

xxxuuu · 2024-03-24T12:34:14Z

Hi, I would like to try this one. Could you assign it to me?

WenyXu · 2024-03-24T12:45:42Z

Hi, I would like to try this one. Could you assign it to me?

Have fun🚀

xxxuuu · 2024-03-27T16:00:53Z

Hi, I would like to try this one. Could you assign it to me?

Have fun🚀

Hi @WenyXu, I plan to communicate via Frontend--notify-->Metasrv--broadcast-->Frontend. When I implemented it, I found that mailbox::send in Frontend requires an InstructionReply, and it seems that Instruction is always sent by Metasrv as heartbeat response. If I make Frontend notify Metasrv, it seems to break this semantics and design. Should I do it this way? Cloud you give me any suggestions?

WenyXu · 2024-03-28T04:17:03Z

Hi, I would like to try this one. Could you assign it to me?

Have fun🚀

Hi @WenyXu, I plan to communicate via Frontend--notify-->Metasrv--broadcast-->Frontend. When I implemented it, I found that mailbox::send in Frontend requires an InstructionReply, and it seems that Instruction is always sent by Metasrv as heartbeat response. If I make Frontend notify Metasrv, it seems to break this semantics and design. Should I do it this way? Cloud you give me any suggestions?

PTAL @waynexia

WenyXu · 2024-03-29T07:36:14Z

Hi, I would like to try this one. Could you assign it to me?

Have fun🚀

Hi @WenyXu, I plan to communicate via Frontend--notify-->Metasrv--broadcast-->Frontend. When I implemented it, I found that mailbox::send in Frontend requires an InstructionReply, and it seems that Instruction is always sent by Metasrv as heartbeat response. If I make Frontend notify Metasrv, it seems to break this semantics and design. Should I do it this way? Cloud you give me any suggestions?

PTAL @waynexia

It seems we need to design an RPC service to accept cache invalidation requests from the Frontend. Would you like to implement this RPC service? We need another issue to track this feature.

waynexia · 2024-03-29T07:37:54Z

We still cannot ensure a correct result. The step broadcast -> other frontends is not synced. Let's implement this first to achieve a relax invalid process.

xxxuuu · 2024-03-29T07:48:27Z

We still cannot ensure a correct result. The step broadcast -> other frontends is not synced. Let's implement this first to achieve a relax invalid process.

In this context, what level of consistency are we expecting?

waynexia · 2024-03-29T07:57:45Z

Expect read-after-write like other insertion operations.

xxxuuu · 2024-03-29T09:05:10Z

Expect read-after-write like other insertion operations.

My plan is to add a version field to the scripts table (or use gmt_modified, but I'm not sure if there will be clock drift).

When Frontend try to execute a script, it needs to read the scripts table and determine if the local cache is outdated.

If the read and write operations follow a read-after-write consistency, it ensures the latest data.

The cost is that we have to read the scripts table, which adds an extra RTT. However, I believe this is acceptable because I think caching the script is not intended to save this particular RTT, but rather to reduce the CPU overhead during script compilation.

Moreover, we no longer need to implement broadcasting in the MetaSrv.

Do you think this plan is feasible?

waynexia · 2024-03-29T09:41:00Z

version should be enough. The map from script to "compiled binary" must (and already) exist. We only need to leverage some cache invalidation mechanism over version. Having an extra read to datanode also looks fine to me. Data feeds into scripts also come from datanode. And there are ways to reduce this overhead.

This does override the need for broadcasting if we choose to implement this "strict" mode now Can you give a more detailed per-step plan of how you break this task down and plan to implement it?

waynexia · 2024-03-29T09:47:23Z

cc @discord9

xxxuuu · 2024-04-01T02:24:26Z

version should be enough. The map from script to "compiled binary" must (and already) exist. We only need to leverage some cache invalidation mechanism over version. Having an extra read to datanode also looks fine to me. Data feeds into scripts also come from datanode. And there are ways to reduce this overhead.

This does override the need for broadcasting if we choose to implement this "strict" mode now Can you give a more detailed per-step plan of how you break this task down and plan to implement it?

Sure.

I have made some changes to my plan because I found a new issue. After inserting a script, it cannot be executed through SQL in other Frontends unless it is first executed through the HTTP API using /run-script. This is because scripts are only registered as UDFs in the ScriptManager::compile. I previously overlooked the possibility of executing it directly through UDF instead of going through the /run-script API.

Taking the UDF part into consideration, here is my plan:

First, add a providers field and a get_function_fallback function to FunctionRegistry. The get_function_fallback function will attempt to dynamically retrieve the Function through the providers.
Then, In the QueryEngineState::udf_function function, use FUNCTION_REGISTRY.get_function_fallback to retrieve the latest UDF.
The ScriptManager will implement this provider. I plan to modify the try_find_script_and_compile function to read from the scripts table each time and compare it with the locally cached version to determine whether to recompile and update the cache(ScriptManager::compiled).
Of course, there will be an additional field in the cache to indicate the version. But I'm not sure whether to add a new version field to scripts table or simply use gmt_modifie. I'm concerned that adding a new field might cause some complications during the upgrade.

Now, whether it is called through the HTTP API or SQL UDF, the latest script can be executed.

killme2008 · 2024-04-01T03:04:37Z

version should be enough. The map from script to "compiled binary" must (and already) exist. We only need to leverage some cache invalidation mechanism over version. Having an extra read to datanode also looks fine to me. Data feeds into scripts also come from datanode. And there are ways to reduce this overhead.
This does override the need for broadcasting if we choose to implement this "strict" mode now Can you give a more detailed per-step plan of how you break this task down and plan to implement it?

Sure.

I have made some changes to my plan because I found a new issue. After inserting a script, it cannot be executed through SQL in other Frontends unless it is first executed through the HTTP API using /run-script. This is because scripts are only registered as UDFs in the ScriptManager::compile. I previously overlooked the possibility of executing it directly through UDF instead of going through the /run-script API.

Taking the UDF part into consideration, here is my plan:

First, add a providers field and a get_function_fallback function to FunctionRegistry. The get_function_fallback function will attempt to dynamically retrieve the Function through the providers.

Then, In the QueryEngineState::udf_function function, use FUNCTION_REGISTRY.get_function_fallback to retrieve the latest UDF.

The ScriptManager will implement this provider. I plan to modify the try_find_script_and_compile function to read from the scripts table each time and compare it with the locally cached version to determine whether to recompile and update the cache(ScriptManager::compiled).

Of course, there will be an additional field in the cache to indicate the version. But I'm not sure whether to add a new version field to scripts table or simply use gmt_modifie. I'm concerned that adding a new field might cause some complications during the upgrade.

Now, whether it is called through the HTTP API or SQL UDF, the latest script can be executed.

Looks good to me! Let's take a step.

killme2008 · 2024-04-01T03:06:11Z

Expect read-after-write like other insertion operations.

My plan is to add a version field to the scripts table (or use gmt_modified, but I'm not sure if there will be clock drift).

When Frontend try to execute a script, it needs to read the scripts table and determine if the local cache is outdated.

If the read and write operations follow a read-after-write consistency, it ensures the latest data.

The cost is that we have to read the scripts table, which adds an extra RTT. However, I believe this is acceptable because I think caching the script is not intended to save this particular RTT, but rather to reduce the CPU overhead during script compilation.

Moreover, we no longer need to implement broadcasting in the MetaSrv.

Do you think this plan is feasible?

A version field is great, but I wonder how to process the atomic increment of this field, greptimedb doesn't support transactions and update currently.

xxxuuu · 2024-04-01T04:30:05Z

Expect read-after-write like other insertion operations.

My plan is to add a version field to the scripts table (or use gmt_modified, but I'm not sure if there will be clock drift).
When Frontend try to execute a script, it needs to read the scripts table and determine if the local cache is outdated.
If the read and write operations follow a read-after-write consistency, it ensures the latest data.
The cost is that we have to read the scripts table, which adds an extra RTT. However, I believe this is acceptable because I think caching the script is not intended to save this particular RTT, but rather to reduce the CPU overhead during script compilation.
Moreover, we no longer need to implement broadcasting in the MetaSrv.
Do you think this plan is feasible?

A version field is great, but I wonder how to process the atomic increment of this field, greptimedb doesn't support transactions and update currently.

This is indeed a problem, and I overlooked this point. Both of them have some issues

version may have multiple writes to the same version during concurrency, which and may also lead to version rollback.
Updating gmt_modified is not a 'read-modify-write' operation; it simply inserts the latest value, similar to 'last write wins.' However, it can still have similar issues during concurrency when there is clock drift between multiple frontends. If Node A has a faster clock than Node B, concurrent updates from A and B may occur, but B's write arrives later. The cache on Node A cannot be updated because its timestamp is greater than the data written by B.

I have browsed the documentation and found Metasrv Distributed Lock. Perhaps it can be used to ensure atomicity when updating. The likelihood of concurrent updates to the same script is low, so the lock contention is minimal. Users are aware of what they are doing, I believe it is acceptable.

killme2008 · 2024-04-01T05:14:24Z

I have browsed the documentation and found Metasrv Distributed Lock. Perhaps it can be used to ensure atomicity when updating. The likelihood of concurrent updates to the same script is low, so the lock contention is minimal. Users are aware of what they are doing, I believe it is acceptable.

@xxxuuu Yes, I believe it's ok.

xxxuuu · 2024-04-16T15:58:54Z

I don't have much time on weekdays due to work, but I'm still dedicated to this issue. In fact, I have already completed the logical implementation of the functionality, but testing has not been done yet. I will be able to finish it this week :）

killme2008 added C-bug Category Bugs C-enhancement Category Enhancements labels Sep 18, 2023

MichaelScofield added the good first issue Good for newcomers label Feb 26, 2024

killme2008 added the help wanted Extra attention is needed label Feb 26, 2024

WenyXu assigned xxxuuu Mar 24, 2024

zyy17 unassigned xxxuuu Apr 16, 2024

xxxuuu mentioned this issue Apr 22, 2024

fix: dynamically get python udf from scripts table #3774

Closed

3 tasks

tisonkun mentioned this issue May 20, 2024

Python UDF Support #3777

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalidate compiled script cache when it is updated #2434

Invalidate compiled script cache when it is updated #2434

killme2008 commented Sep 18, 2023 •

edited

Loading

MichaelScofield commented Feb 26, 2024

xxxuuu commented Mar 24, 2024

WenyXu commented Mar 24, 2024

xxxuuu commented Mar 27, 2024 •

edited

Loading

WenyXu commented Mar 28, 2024

WenyXu commented Mar 29, 2024

waynexia commented Mar 29, 2024

xxxuuu commented Mar 29, 2024

waynexia commented Mar 29, 2024

xxxuuu commented Mar 29, 2024

waynexia commented Mar 29, 2024

waynexia commented Mar 29, 2024

xxxuuu commented Apr 1, 2024

killme2008 commented Apr 1, 2024

killme2008 commented Apr 1, 2024

xxxuuu commented Apr 1, 2024

killme2008 commented Apr 1, 2024 •

edited by tisonkun

Loading

xxxuuu commented Apr 16, 2024

Invalidate compiled script cache when it is updated #2434

Invalidate compiled script cache when it is updated #2434

Comments

killme2008 commented Sep 18, 2023 • edited Loading

What type of enhancement is this?

What does the enhancement do?

Implementation challenges

MichaelScofield commented Feb 26, 2024

xxxuuu commented Mar 24, 2024

WenyXu commented Mar 24, 2024

xxxuuu commented Mar 27, 2024 • edited Loading

WenyXu commented Mar 28, 2024

WenyXu commented Mar 29, 2024

waynexia commented Mar 29, 2024

xxxuuu commented Mar 29, 2024

waynexia commented Mar 29, 2024

xxxuuu commented Mar 29, 2024

waynexia commented Mar 29, 2024

waynexia commented Mar 29, 2024

xxxuuu commented Apr 1, 2024

killme2008 commented Apr 1, 2024

killme2008 commented Apr 1, 2024

xxxuuu commented Apr 1, 2024

killme2008 commented Apr 1, 2024 • edited by tisonkun Loading

xxxuuu commented Apr 16, 2024

killme2008 commented Sep 18, 2023 •

edited

Loading

xxxuuu commented Mar 27, 2024 •

edited

Loading

killme2008 commented Apr 1, 2024 •

edited by tisonkun

Loading