Process to restore Clickhouse state from S3 raw events? #4763
-
Congrats team on the public launch today! I'm currently working on standing up a greenfield v3 installation of Langfuse but have a question about backup/recovery. In the docs at https://langfuse.com/self-hosting/infrastructure/clickhouse#backups there is mention of the possibility of Clickhouse state being restored from S3 data. Operationally how is this performed? Let's assume 3 months after running v3 my Clickhouse instance is 100% corrupted and unrecoverable, would I be able to stand up a new Clickhouse instance and Langfuse would automatically start recreating Clickhouse state in the background based on the raw events in S3? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hey there, @yuyuma! I'm here to help you with any questions or issues you have. Let's tackle this together! The process of restoring Clickhouse state from S3 raw events in Langfuse involves several steps, but specific operational details on how this is automatically initiated are not fully detailed in the available documentation or codebase. However, you can ensure that the necessary configurations are in place for interacting with S3 by setting the appropriate environment variables. These include variables for the S3 bucket name, region, access keys, and other related settings [1]. The restoration process is computationally intensive and may take a significant amount of time, as mentioned in the documentation. While the exact mechanism for automatically detecting a new Clickhouse instance and starting the restoration process is not explicitly documented, Langfuse does include error handling and logging mechanisms for processing events from S3 to Clickhouse, which suggests that there is some level of automation and monitoring involved [2][3]. For a more detailed understanding, you might need to explore the codebase further or refer to the ClickHouse backup guide for additional strategies. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Beta Was this translation helpful? Give feedback.
-
Hey @yuyuma , Overall, we strongly recommend to perform ClickHouse backups on a regular basis (https://clickhouse.com/docs/en/operations/backup for self-hosters) as that will be a significantly faster and cheaper operation to reproduce a previous, healthy state. Let me know if you have any further questions. |
Beta Was this translation helpful? Give feedback.
Hey @yuyuma ,
Right now, it's technically possible (and we've used it manually at times), but not automated anywhere within Langfuse. Our approach so far was to extract all events that happened within a given timeframe - via our application logs or the S3 access logs - and to emit a new message into the ingestion-queue on Redis. The worker will pick it up and perform a full merge of all updates that reside within S3.
One could also do it by recursively iterating through all directories within S3, but that would probably be an expensive operation given the S3 API pricing.
Overall, we strongly recommend to perform ClickHouse backups on a regular basis (https://clickhouse.com/docs/en/operati…