-
Notifications
You must be signed in to change notification settings - Fork 6
ShannonBase Introduction
Shannonbase is the only fully managed database service that combines transactions, analytics, and machine learning services into one MySQL Database, without the complexity, latency, and cost of ETL duplication.Shannonbase will include Lakehouse, letting users query half a petabyte of data in object storage—in a variety of file formats, such as CSV, Parquet, and export files from other databases. Customers can leverage the benefits of HTAP even when their data is stored outside a MySQL database.
ShannonBase is an integrated HTAP database that adopts hybrid row-column store and in-memory computing. It is fully compatible with MySQL version 8.1.
-
Transaction workload process
It keeps all TP(Transaction workloads Process) features which MySQL has.
-
Analytical workloads process
In shannonbase, it employes a novel engnine, named IMCS(in-memory column store) to process all AP workloads.
-
rapid secondary engine.
An in-memory column store, named
rapid
secondary engine be incorporated. it's used to process AP workloads.
mysql> show engines;
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
| Engine | Support | Comment | Transactions | XA | Savepoints |
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
| ndbcluster | NO | Clustered, fault-tolerant tables | NULL | NULL | NULL |
| FEDERATED | NO | Federated MySQL storage engine | NULL | NULL | NULL |
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO |
| InnoDB | DEFAULT | Supports transactions, row-level locking, and foreign keys | YES | YES | YES |
| PERFORMANCE_SCHEMA | YES | Performance Schema | NO | NO | NO |
| Rapid | YES | Shannon Rapid storage engine | NO | NO | NO |
| MyISAM | YES | MyISAM storage engine | NO | NO | NO |
| ndbinfo | NO | MySQL Cluster system information storage engine | NULL | NULL | NULL |
| MRG_MYISAM | YES | Collection of identical MyISAM tables | NO | NO | NO |
| BLACKHOLE | YES | /dev/null storage engine (anything you write to it disappears) | NO | NO | NO |
| CSV | YES | CSV storage engine | NO | NO | NO |
| ARCHIVE | YES | Archive storage engine | NO | NO | NO |
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
- Added new catalog table
1: performance_schema.rpd_column_id
The rpd_column_id table provides information about columns of tables that are loaded in ShannBase.
The rpd_column_id table has these columns:
ID
A unique identifier for the column.
TABLE_ID
The ID of the table to which the column belongs.
COLUMN_NAME
The column name.
2: performance_schema.rpd_columns
The rpd_columns table provides column encoding information for columns of tables loaded in ShannonBase.
The rpd_columns table has these columns:
TABLE_ID
A unique identifier for the table.
COLUMN_ID
A unique identifier for the table column.
NDV
The number of distinct values in the column.
ENCODING
The type of encoding used.
DATA_PLACEMENT_INDEX
None
DICT_SIZE_BTYES
The dictionary size per column, in row count.
3:performance_schema.rpd_preload_stats
TABLE_SCHEMA
The schema name.
TABLE_NAME
The table name.
COLUMN_NAME
The column name.
AVG_BYTE_WIDTH_INC_NULL
The average byte width of the column. The average value includes NULL values.
4: performance_schema.rpd_table_id
ID
A unique identifier for the table.
NAME
The full table name including the schema.
SCHEMA_NAME
The schema name.
TABLE_NAME
The table name.
5: performance_schema.rpd_tables (Not avaiable now) ID
A unique identifier for the table.
SNAPSHOT_SCN
The system change number (SCN) of the table snapshot. The SCN is an internal number that represents a point in time according to the system logical clock that the table snapshot was transactionally consistent with the source table.
PERSISTED_SCN
The SCN up to which changes are persisted.
POOL_TYPE
The load pool type of the table. Possible values are RAPID_LOAD_POOL_SNAPSHOT and RAPID_LOAD_POOL_TRANSACTIONAL.
DATA_PLACEMENT_TYPE
The data placement type.
NROWS
The number of rows that are loaded for the table. The value is set initially when the table is loaded, and updated as changes are propagated.
LOAD_STATUS
The load status of the table. Statuses include:
NOLOAD_RPDGSTABSTATE
The table is not yet loaded.
LOADING_RPDGSTABSTATE
The table is being loaded.
AVAIL_RPDGSTABSTATE
The table is loaded and available for queries.
UNLOADING_RPDGSTABSTATE
The table is being unloaded.
INRECOVERY_RPDGSTABSTATE
The table is being recovered. After completion of the recovery operation, the table is placed back in the UNAVAIL_RPDGSTABSTATE state if there are pending recoveries.
STALE_RPDGSTABSTATE
A failure during change propagation, and the table has become stale.
UNAVAIL_RPDGSTABSTATE
The table is unavailable.
LOAD_PROGRESS
The load progress of the table expressed as a percentage value.
For Lakehouse, the following values are returned:
10%: the initialization phase is complete.
10-70%: the transformation to native HeatWave format is in progress.
70% - 80%: the transformation to native HeatWave format is complete and the aggregation phase is in progress.
80-99%: the recovery phase is in progress.
100%: data load is complete.
SIZE_BYTES
The amount of data loaded for the table, in bytes.
TRANSFORMATION_BYTES:
The total size of raw Lakehouse data transformed, in bytes.
NROWS:
The number of rows loaded to the external table.
QUERY_COUNT
The number of queries that referenced the table.
LAST_QUERIED
The timestamp of the last query that referenced the table.
LOAD_START_TIMESTAMP
The load start timestamp for the table.
LOAD_END_TIMESTAMP
The load completion timestamp for the table.
RECOVERY_SOURCE
Indicates the source of the last successful recovery for a table. The values are MySQL, that is InnoDB, or ObjectStorage.
RECOVERY_START_TIMESTAMP
The timestamp when the latest successful recovery started.
RECOVERY_END_TIMESTAMP
The timestamp when the latest successful recovery ended.
Database changed
mysql> alter table tt5 secondary_load;
Query OK, 3 rows affected (0.08 sec)
mysql> alter table tt51 secondary_load;
ERROR 1146 (42S02): Table 'test.tt51' doesn't exist
mysql> alter table tt1 secondary_load;
Query OK, 3 rows affected (0.07 sec)
mysql> select * from performance_schema.rpd_columns;
+----------+-----------+-----+----------+----------------------+-----------------+
| TABLE_ID | COLUMN_ID | NDV | ENCODING | DATA_PLACEMENT_INDEX | DICT_SIZE_BTYES |
+----------+-----------+-----+----------+----------------------+-----------------+
| 93 | 0 | 0 | N/A | 0 | 0 |
| 93 | 1 | 0 | SORTED | 0 | 25 |
| 93 | 2 | 0 | N/A | 0 | 0 |
| 93 | 3 | 0 | SORTED | 0 | 25 |
| 106 | 0 | 0 | N/A | 0 | 0 |
| 106 | 1 | 0 | N/A | 0 | 3 |
| 106 | 2 | 0 | N/A | 0 | 0 |
| 106 | 3 | 0 | N/A | 0 | 2 |
| 102 | 0 | 0 | N/A | 0 | 0 |
| 102 | 1 | 0 | N/A | 0 | 3 |
| 102 | 2 | 0 | N/A | 0 | 0 |
| 102 | 3 | 0 | N/A | 0 | 2 |
+----------+-----------+-----+----------+----------------------+-----------------+
12 rows in set (0.00 sec)
mysql> select * from performance_schema.rpd_column_id;
+----+----------+-------------+
| ID | TABLE_ID | COLUMN_NAME |
+----+----------+-------------+
| 0 | 93 | N_NATIONKEY |
| 1 | 93 | N_NAME |
| 2 | 93 | N_REGIONKEY |
| 3 | 93 | N_COMMENT |
| 0 | 106 | score |
| 1 | 106 | name |
| 2 | 106 | id |
| 3 | 106 | gender |
| 0 | 102 | score |
| 1 | 102 | name |
| 2 | 102 | id |
| 3 | 102 | gender |
+----+----------+-------------+
- How to build ShannonBase from source code step 0: Requirement Installation
1: compiling tools:
apt-get install -y g++
apt-get install -y libbison-dev
apt-get install -y flex
apt-get install -y clang-format
apt-get install -y lcov
apt-get install -y pkg-config
2: Cmake
apt-get install -y cmake
3: tools
apt-get install -y git
apt-get install -y wget
apt-get install -y tar
apt-get install -y bzip2
apt-get install -y unzip
4: openssl etc.
apt-get install -y libssl-dev
apt-get install -y libncurses-dev
apt-get install -y libudev-dev
apt-get install -y libgsasl-dev
apt-get install -y libldap-dev
5: Boost [from source code]
wget https://boostorg.jfrog.io/artifactory/main/release/1.77.0/source/boost_1_77_0.tar.bz2
tar -xvf boost_1_77_0.tar.bz2 && cd boost_1_77_0 && ./bootstrap.sh && ./b2 && ./b2 install
step 1: clone the source code from Shannonbase Repo
git clone [email protected]:Shannon-Data/ShannonBase.git
step 2: makes a directory where we build the source code.
cd ShannonBase && mkdir cmake_build && cd cmake_build
step 3: run cmake
cmake ../ \
-DWITH_BOOST=/path-to-boost-include-files/ \
-DCMAKE_BUILD_TYPE=[Release|Debug] \
-DCMAKE_INSTALL_PREFIX=/path-to-shannon-bin \
-DMYSQL_DATADIR=/home/path-to-shannon-bin/data \
-DSYSCONFDIR=. \
-DMYSQL_UNIX_ADDR=/home/path-to-shannon-bin/tmp/mysql.sock \
-DWITH_EMBEDDED_SERVER=OFF \
-DWITH_MYISAM_STORAGE_ENGINE=1 \
-DWITH_INNOBASE_STORAGE_ENGINE=1 \
-DWITH_PARTITION_STORAGE_ENGINE=1 \
-DMYSQL_TCP_PORT=3306 \
-DENABLED_LOCAL_INFILE=1 \
-DEXTRA_CHARSETS=all \
-DWITH_PROTOBUF=bundled \
-DWITH_SSL_PATH=/path-to-open-ssl/ \
-DDEFAULT_SET=community \
-DWITH_UNIT_TESTS=OFF \
[-DENABLE_GCOV=1 \ |
-DWITH_ASAN=1 \ |
]
-DCOMPILATION_COMMENT="MySQL Community Server, and Shannon Data AI Alpha V.- (GPL)"
step 4: compile and install
make -j5 && make install
step 5: initialize the database
/path-to-shannbase-bin/bin/mysqld --defaults-file=/path-to-shannonbase-bin/my.cnf --initialize --user=xxx
step 6: run shannonbase
/path-to-shannbase-bin/bin//mysqld --defaults-file=/path-to-shannonbase-bin/my.cnf --user=xxx &
- How to use Rapid engine step 1: create a table with secondary engine.
CREATE TABLE `NATION` (
`N_NATIONKEY` int NOT NULL,
`N_NAME` char(25) COLLATE ascii_bin NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED',
`N_REGIONKEY` int NOT NULL,
`N_COMMENT` varchar(152) COLLATE ascii_bin DEFAULT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED',
PRIMARY KEY (`N_NATIONKEY`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii COLLATE=ascii_bin SECONDARY_ENGINE=RAPID
step 2: Load data into rapid engine;
alter table NATION secondary_load;
step 3: Queries.
select * from NATION;
ver 0.1, 2024-01-29
1: Rapid Supported
2: Supported Type The following types has been supported in shannon rapid engine. (will be updated in future)
- Numeric
1: Integer
2: double
- String
1: char
2: varchar
3: text
*Datetime
1: date
2: time
3: datetime
In this part, we will list all the limitation in ShannonBase.
0: Supported the version is great than 8.0. MySQL 5.7 or less is not available.
1: Rapid Engine is mandatory installed. Rapid Engine is a memory column store, used to accelerate AP workloads.
2: Supported data type in Rapid Engine: (1) int, bigint, numeric, double, float; (2) Date, time; (3) char, varchar, text.
3: A table with rapid secondary engine MUST BE has at least ONE user defined primary key, and it's type is numeric supported now.