-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dataset]数据镜像制作步骤 #28
Comments
目前需要做数据科学与工程数据镜像
|
但我个人较倾向于使用第一种方式进行构建,原因是第一种方式下,最终形态可以做到数据与容器镜像分离,即 OpenDigger 的采样数据镜像可以使用同一个,在启动时将表结构文件和数据文件的 URL 在启动时传入即可,这样可以在启动时直接从 OSS 上拉取数据后导入完成初始化。这样可扩展性极好,采样数据侧仅需生成静态文件上传到 OSS 即可。而第二种方式,则每种采样数据都需要生成一个新的镜像,最终对镜像服务的存储压力也会较大。 |
#!/bin/bash
set -e
cd /docker-entrypoint-initdb.d
LOCKFILE=inited.lock
DB=github_log
if test -f "$LOCKFILE"; then
echo "$LOCKFILE exists."
else
echo "Start to init database."
echo "Start to extract data from tar file."
tar -xzf data.tar.gz #extract data files
echo "Extract data done."
clickhouse client -q "CREATE DATABASE $DB;" # create database
clickhouse client -m < table # create table
echo "Init database done."
clickhouse client -q "INSERT INTO $DB.events FORMAT Native" < data # insert data
echo "Insert data done."
touch $LOCKFILE # create lock file
fi 关于如何使用初始化脚本,参考官方镜像的文档说明。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
关于制作 ClickHouse 采样数据镜像的步骤:
SHOW CREATE TABLE github_log.events FORMAT TabSeparated OUTFILE
输出到某个文件SELECT * FROM github_log.events WHERE ... FORMAT Native/JSONCompact OUTFILE
,WHERE
子句为筛选条件,如时间区间、采样频率、仓库范围等。FORMAT
为输出格式,Native 为二进制格式,使用空间最小,但不可读,无法验证。JSONCompact 为压缩 JSON,仅有 value,可做验证。输出到某个文件即可。Originally posted by @frank-zsy in #27 (comment)
The text was updated successfully, but these errors were encountered: