Skip to content

ShannonBase Introduction

Shannon Data AI Lab edited this page Feb 29, 2024 · 15 revisions

Welcome to the ShannonBase

Introduction

Shannonbase is the only fully managed database service that combines transactions, analytics, and machine learning services into one MySQL Database, without the complexity, latency, and cost of ETL duplication.Shannonbase will include Lakehouse, letting users query half a petabyte of data in object storage—in a variety of file formats, such as CSV, Parquet, and export files from other databases. Customers can leverage the benefits of HTAP even when their data is stored outside a MySQL database.

ShannonBase is an integrated HTAP database that adopts hybrid row-column store and in-memory computing. It is fully compatible with MySQL version 8.1.

image

Features

  • Transaction workload process

    It keeps all TP(Transaction workloads Process) features which MySQL has.

  • Analytical workloads process

    In shannonbase, it employes a novel engnine, named IMCS(in-memory column store) to process all AP workloads.

  • rapid secondary engine.

    An in-memory column store, named rapid secondary engine be incorporated. it's used to process AP workloads.

mysql> show engines; 
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                        | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
| ndbcluster         | NO      | Clustered, fault-tolerant tables                               | NULL         | NULL | NULL       |
| FEDERATED          | NO      | Federated MySQL storage engine                                 | NULL         | NULL | NULL       |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables      | NO           | NO   | NO         |
| InnoDB             | DEFAULT | Supports transactions, row-level locking, and foreign keys     | YES          | YES  | YES        |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                             | NO           | NO   | NO         |
| Rapid              | YES     | Shannon Rapid storage engine                                   | NO           | NO   | NO         |
| MyISAM             | YES     | MyISAM storage engine                                          | NO           | NO   | NO         |
| ndbinfo            | NO      | MySQL Cluster system information storage engine                | NULL         | NULL | NULL       |
| MRG_MYISAM         | YES     | Collection of identical MyISAM tables                          | NO           | NO   | NO         |
| BLACKHOLE          | YES     | /dev/null storage engine (anything you write to it disappears) | NO           | NO   | NO         |
| CSV                | YES     | CSV storage engine                                             | NO           | NO   | NO         |
| ARCHIVE            | YES     | Archive storage engine                                         | NO           | NO   | NO         |
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+

  • Added new catalog table

1: performance_schema.rpd_column_id

The rpd_column_id table provides information about columns of tables that are loaded in ShannBase.

The rpd_column_id table has these columns:

ID

A unique identifier for the column.

TABLE_ID

The ID of the table to which the column belongs.

COLUMN_NAME

The column name.

2: performance_schema.rpd_columns

The rpd_columns table provides column encoding information for columns of tables loaded in ShannonBase.

The rpd_columns table has these columns:

TABLE_ID

A unique identifier for the table.

COLUMN_ID

A unique identifier for the table column.

NDV

The number of distinct values in the column.

ENCODING

The type of encoding used.

DATA_PLACEMENT_INDEX

None

DICT_SIZE_BTYES

The dictionary size per column, in row count.

3:performance_schema.rpd_preload_stats

TABLE_SCHEMA

The schema name.

TABLE_NAME

The table name.

COLUMN_NAME

The column name.

AVG_BYTE_WIDTH_INC_NULL

The average byte width of the column. The average value includes NULL values.

4: performance_schema.rpd_table_id

ID

A unique identifier for the table.

NAME

The full table name including the schema.

SCHEMA_NAME

The schema name.

TABLE_NAME

The table name.

5: performance_schema.rpd_tables (Not avaiable now) ID

A unique identifier for the table.

SNAPSHOT_SCN

The system change number (SCN) of the table snapshot. The SCN is an internal number that represents a point in time according to the system logical clock that the table snapshot was transactionally consistent with the source table.

PERSISTED_SCN

The SCN up to which changes are persisted.

POOL_TYPE

The load pool type of the table. Possible values are RAPID_LOAD_POOL_SNAPSHOT and RAPID_LOAD_POOL_TRANSACTIONAL.

DATA_PLACEMENT_TYPE

The data placement type.

NROWS

The number of rows that are loaded for the table. The value is set initially when the table is loaded, and updated as changes are propagated.

LOAD_STATUS

The load status of the table. Statuses include:

NOLOAD_RPDGSTABSTATE

The table is not yet loaded.

LOADING_RPDGSTABSTATE

The table is being loaded.

AVAIL_RPDGSTABSTATE

The table is loaded and available for queries.

UNLOADING_RPDGSTABSTATE

The table is being unloaded.

INRECOVERY_RPDGSTABSTATE

The table is being recovered. After completion of the recovery operation, the table is placed back in the UNAVAIL_RPDGSTABSTATE state if there are pending recoveries.

STALE_RPDGSTABSTATE

A failure during change propagation, and the table has become stale.

UNAVAIL_RPDGSTABSTATE

The table is unavailable.

LOAD_PROGRESS

The load progress of the table expressed as a percentage value.

For Lakehouse, the following values are returned:

10%: the initialization phase is complete.

10-70%: the transformation to native HeatWave format is in progress.

70% - 80%: the transformation to native HeatWave format is complete and the aggregation phase is in progress.

80-99%: the recovery phase is in progress.

100%: data load is complete.

SIZE_BYTES

The amount of data loaded for the table, in bytes.

TRANSFORMATION_BYTES:

The total size of raw Lakehouse data transformed, in bytes.

NROWS:

The number of rows loaded to the external table.

QUERY_COUNT

The number of queries that referenced the table.

LAST_QUERIED

The timestamp of the last query that referenced the table.

LOAD_START_TIMESTAMP

The load start timestamp for the table.

LOAD_END_TIMESTAMP

The load completion timestamp for the table.

RECOVERY_SOURCE

Indicates the source of the last successful recovery for a table. The values are MySQL, that is InnoDB, or ObjectStorage.

RECOVERY_START_TIMESTAMP

The timestamp when the latest successful recovery started.

RECOVERY_END_TIMESTAMP

The timestamp when the latest successful recovery ended.

Database changed
mysql> alter table tt5 secondary_load;
Query OK, 3 rows affected (0.08 sec)

mysql> alter table tt51 secondary_load;
ERROR 1146 (42S02): Table 'test.tt51' doesn't exist
mysql> alter table tt1 secondary_load;
Query OK, 3 rows affected (0.07 sec)

mysql> select * from performance_schema.rpd_columns;
+----------+-----------+-----+----------+----------------------+-----------------+
| TABLE_ID | COLUMN_ID | NDV | ENCODING | DATA_PLACEMENT_INDEX | DICT_SIZE_BTYES |
+----------+-----------+-----+----------+----------------------+-----------------+
|       93 |         0 |   0 | N/A      |                    0 |               0 |
|       93 |         1 |   0 | SORTED   |                    0 |              25 |
|       93 |         2 |   0 | N/A      |                    0 |               0 |
|       93 |         3 |   0 | SORTED   |                    0 |              25 |
|      106 |         0 |   0 | N/A      |                    0 |               0 |
|      106 |         1 |   0 | N/A      |                    0 |               3 |
|      106 |         2 |   0 | N/A      |                    0 |               0 |
|      106 |         3 |   0 | N/A      |                    0 |               2 |
|      102 |         0 |   0 | N/A      |                    0 |               0 |
|      102 |         1 |   0 | N/A      |                    0 |               3 |
|      102 |         2 |   0 | N/A      |                    0 |               0 |
|      102 |         3 |   0 | N/A      |                    0 |               2 |
+----------+-----------+-----+----------+----------------------+-----------------+
12 rows in set (0.00 sec)

mysql> select * from performance_schema.rpd_column_id;
+----+----------+-------------+
| ID | TABLE_ID | COLUMN_NAME |
+----+----------+-------------+
|  0 |       93 | N_NATIONKEY |
|  1 |       93 | N_NAME      |
|  2 |       93 | N_REGIONKEY |
|  3 |       93 | N_COMMENT   |
|  0 |      106 | score       |
|  1 |      106 | name        |
|  2 |      106 | id          |
|  3 |      106 | gender      |
|  0 |      102 | score       |
|  1 |      102 | name        |
|  2 |      102 | id          |
|  3 |      102 | gender      |
+----+----------+-------------+

Getting Started

  • How to build ShannonBase from source code step 0: Requirement Installation

1: compiling tools:

apt-get install -y g++
apt-get install -y libbison-dev
apt-get install -y flex
apt-get install -y clang-format
apt-get install -y lcov
apt-get install -y pkg-config

2: Cmake

apt-get install -y cmake

3: tools

apt-get install -y git
apt-get install -y wget
apt-get install -y tar
apt-get install -y bzip2
apt-get install -y unzip

4: openssl etc.

apt-get install -y libssl-dev
apt-get install -y libncurses-dev
apt-get install -y  libudev-dev
apt-get install -y libgsasl-dev
apt-get install -y libldap-dev

5: Boost [from source code]

wget https://boostorg.jfrog.io/artifactory/main/release/1.77.0/source/boost_1_77_0.tar.bz2
tar -xvf boost_1_77_0.tar.bz2 && cd boost_1_77_0 && ./bootstrap.sh && ./b2  && ./b2 install

step 1: clone the source code from Shannonbase Repo

  git clone [email protected]:Shannon-Data/ShannonBase.git

step 2: makes a directory where we build the source code.

  cd ShannonBase && mkdir cmake_build && cd cmake_build

step 3: run cmake

 cmake ../ \
  -DWITH_BOOST=/path-to-boost-include-files/ \
  -DCMAKE_BUILD_TYPE=[Release|Debug]  \
  -DCMAKE_INSTALL_PREFIX=/path-to-shannon-bin \
  -DMYSQL_DATADIR=/home/path-to-shannon-bin/data \
  -DSYSCONFDIR=. \
  -DMYSQL_UNIX_ADDR=/home/path-to-shannon-bin/tmp/mysql.sock \
  -DWITH_EMBEDDED_SERVER=OFF \
  -DWITH_MYISAM_STORAGE_ENGINE=1 \
  -DWITH_INNOBASE_STORAGE_ENGINE=1 \
  -DWITH_PARTITION_STORAGE_ENGINE=1 \
  -DMYSQL_TCP_PORT=3306 \
  -DENABLED_LOCAL_INFILE=1 \
  -DEXTRA_CHARSETS=all \
  -DWITH_PROTOBUF=bundled \
  -DWITH_SSL_PATH=/path-to-open-ssl/ \
  -DDEFAULT_SET=community \
  -DWITH_UNIT_TESTS=OFF \
  [-DENABLE_GCOV=1 \ |
  -DWITH_ASAN=1 \    | 
  ]
  -DCOMPILATION_COMMENT="MySQL Community Server, and Shannon Data AI Alpha V.- (GPL)" 

step 4: compile and install

  make -j5 && make install

step 5: initialize the database

  /path-to-shannbase-bin/bin/mysqld --defaults-file=/path-to-shannonbase-bin/my.cnf --initialize  --user=xxx

step 6: run shannonbase

 /path-to-shannbase-bin/bin//mysqld --defaults-file=/path-to-shannonbase-bin/my.cnf   --user=xxx &  
  • How to use Rapid engine step 1: create a table with secondary engine.
CREATE TABLE `NATION` (
  `N_NATIONKEY` int NOT NULL,
  `N_NAME` char(25) COLLATE ascii_bin NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED',
  `N_REGIONKEY` int NOT NULL,
  `N_COMMENT` varchar(152) COLLATE ascii_bin DEFAULT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED',
  PRIMARY KEY (`N_NATIONKEY`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii COLLATE=ascii_bin SECONDARY_ENGINE=RAPID

step 2: Load data into rapid engine;

 alter table NATION secondary_load;

step 3: Queries.

  select * from NATION;

Releases

ver 0.1, 2024-01-29

1: Rapid Supported

2: Supported Type The following types has been supported in shannon rapid engine. (will be updated in future)

  • Numeric

1: Integer

2: double

  • String

1: char

2: varchar

3: text

*Datetime

1: date

2: time

3: datetime


Limitations

In this part, we will list all the limitation in ShannonBase.
0: Supported the version is great than 8.0. MySQL 5.7 or less is not available.
1: Rapid Engine is mandatory installed. Rapid Engine is a memory column store, used to accelerate AP workloads.
2: Supported data type in Rapid Engine: (1) int, bigint, numeric, double, float; (2) Date, time; (3) char, varchar, text.
3: A table with rapid secondary engine MUST BE has at least ONE user defined primary key, and it's type is numeric supported now.

Clone this wiki locally