diff --git a/docs/MatrixOne/Deploy/deploy-Matrixone-cluster/deploy-MatrixOne-cluster-with-k8.md b/docs/MatrixOne/Deploy/deploy-Matrixone-cluster/deploy-MatrixOne-cluster-with-k8.md new file mode 100644 index 000000000..ff923951c --- /dev/null +++ b/docs/MatrixOne/Deploy/deploy-Matrixone-cluster/deploy-MatrixOne-cluster-with-k8.md @@ -0,0 +1,543 @@ +# Cluster Deployment Guide + +This document will focus on how to deploy a **MatrixOne cluster** based on an already existing **Kubernetes** and **S3** environment. + +## Resource requirements + +### Experience environment + +The MatrixOne Cluster Experience Environment can be used for simple experience testing, learning or development, but is not suitable for business production. The following is a resource plan for the experience environment: + +Container resource: + +|Components |Roles |Service Replicas |Recommendations for Replica Distribution Policies |cpu(C) |Memory (G) |Storage Resource Types |Storage Volume Format |Storage Size (G) |Access Modes | +| :-------- | :------ | :------ | :---------------- | :---|:----- | :------ | :------- |:-------- |:------------ | +|logservice | prewrite logs WAL management | 3 | 3 nodes, 1 copy per node | 2 | 4 | PVC | file system | 100 | ReadWriteOnce| +|tn | Transaction Management | 1 | Single node, single copy | 4 | 8 | PVC | File Systems | 100 | ReadWriteOnce| +|cn | data computation | 1 | Single Node Single Replica | 2 | 4 | PVC | File System | 100 | ReadWriteOnce| + +Object Storage Resource (S3): + +| Components | Interface Protocols | Storage Size (G)| +| :----------------- | :------ | :------ | +| Business, monitoring, logging, and other data | s3v4 | >=50 | + +### Recommended environment + +The MatrixOne cluster recommendation environment offers high availability, reliability, and robust performance for real-world business production. The following is a resource plan for the recommended environment: + +Container resource: + +|Components |Roles |Service Replicas |Recommendations for Replica Distribution Policies |cpu(C) |Memory (G) |Storage Resource Types |Storage Volume Format |Storage Size (G) |Access Modes | +| :-------- | :------ | :------ | :---------------- | :---|:----- | :------ | :------- |:-------- |:------------ | +|logservice | prewrite logs WAL management | 3 | 3 nodes, 1 copy per node | 4 | 8 | PVC | file system | 100 | ReadWriteOnce| +|tn | Transaction Management | 1 | Single node, single copy | 16 | 64 | PVC | File System | 100 | ReadWriteOnce| +|cn | data computation | N | N Depending on business requirements, 2+ recommended for high availability. | 16 | 32 | PVC | File System | 100 | ReadWriteOnce| + +Object Storage Resource (S3): + +| Components | Interface Protocols | Storage Size (G) | IOPS | Bandwidth | +| :----------------- | :------ | :------ | :--------------------------------- | :------ | +| Business, monitoring, logs, and other data | s3v4 | Depends on business, recommended >=500 | Sequential read/write: >=2000, non-sequential read/write: >=10000 | >=10GB | + +For more information on resource requirements, refer to the [experience environment](../deployment-topology/experience-deployment-topology.md) and [recommended production environment](../deployment-topology/recommended-prd-deployment-topology.md) in the cluster topology planning chapter. + +## Preconditions + +Before you begin, make sure you have the following environments ready: + +- Kubernetes cluster environment and s3 environment that meet resource requirements + +- A client machine that can connect to a Kubernetes cluster. + +- The client machine needs to have the helm, kubectl client installed and configured to access the cluster's kubeconfig file with permissions to be able to deploy the helm chart package and install the CRD resource object. + +- Have extranet access, such as github.io, hub.docker.com, etc. If you can't access the extranet, you need to provide a private mirror repository for uploading relevant mirrors, and in the mo cluster yaml definition, modify the mirror repository address to the private repository address. + +- Cluster nodes have access to object stores, such as resolving domain names where objects are stored. + +__Note__: The following actions are performed on the client machine unless otherwise indicated. + +## Installing MatrixOne-Operator + +[MatrixOne Operator](https://github.com/matrixorigin/matrixone-operator) is a standalone software tool for deploying and managing MatrixOne clusters on Kubernetes. You can choose to deploy online or offline. + +- **Deploy online** + +Follow these steps to install MatrixOne Operator on master0. We will create a separate namespace `matrixone-operator` for Operator. + +1. Add the matrixone-operator address to the helm repository: + + ``` + helm repo add matrixone-operator https://matrixorigin.github.io/matrixone-operator + ``` + +2. Update the repository: + + ``` + helm repo update + ``` + +3. View the MatrixOne Operator version: + + ``` + helm search repo matrixone-operator/matrixone-operator --versions --devel + ``` + +4. Specify the release to install MatrixOne Operator: + + ``` + helm install matrixone-operator matrixone-operator/matrixone-operator --version --create-namespace --namespace matrixone-operator + ``` + + !!! note The parameter VERSION is the version number of the MatrixOne Operator to be deployed, such as 1.0.0-alpha.2. + +5. After a successful installation, confirm the installation status using the following command: + + ``` + kubectl get pod -n matrixone-operator + ``` + + Ensure that all Pod states in the above command output are Running. + + ``` + [root@master0 matrixone-operator]# kubectl get pod -n matrixone-operator NAME READY STATUS RESTARTS AGE matrixone-operator-f8496ff5c-fp6zm 1/1 Running 0 3m26s + ``` + +The corresponding Pod states are normal as shown in the above code line. + +- **Deploy offline** + +You can select the Operator Release version installation package you need from the project's [Release list](https://github.com/matrixorigin/matrixone-operator/releases) for offline deployment. + +1. Create a standalone namespace mo-op for Operator + + ``` + NS="mo-op" + kubectl create ns "${NS}" + kubectl get ns # return has mo-op + ``` + +2. Download and extract the matrixone-operator installation package + + ``` bash + wget https://github.com/matrixorigin/matrixone-operator/releases/download/chart-1.1.0-alpha2/matrixone-operator-1.1.0-alpha2.tgz + tar xvf matrixone-operator-1.1.0-alpha2.tgz + ``` + + If the original github address downloads too slowly, you can try downloading the mirror package from: + + ```bash + + wget https://githubfast.com/matrixorigin/matrixone-operator/releases/download/chart-1.1.0-alpha2/matrixone-operator-1.1.0-alpha2.tgz + ``` + + After extraction it will be in the current directory production folder `matrixone-operator`. + +3. Deploying matrixone-operator + + ``` + NS="mo-op" + cd matrixone-operator/ + helm install -n ${NS} mo-op ./charts/matrixone-operator --dependency-update # Success should return the status of deployed + ``` + + The above list of dependent docker mirrors is: + + - matrixone-operator + - kruise-manager + + If you cannot pull a mirror from dockerhub, you can pull it from Aliyun using the following command: + + ``` + helm -n ${NS} install mo-op ./charts/matrixone-operator --dependency-update -set image.repository="registry.cn-hangzhou.aliyuncs.com/moc-pub/matrixone-operator" --set kruise.manager.image.repository="registry.cn-hangzhou.aliyuncs.com/moc-pub/kruise-manager" + ``` + + See matrixone-operator/values.yaml for details. + +4. Check operator deployment status + + ``` + NS="mo-op" + helm list -n "${NS}" # returns a corresponding helm chart package with deployed + kubectl get pod -n "${NS}" -owide # returns a copy of a pod with Running + ``` + +To learn more about Matrixone Operator, check out [Operator Administration](../../Deploy/MatrixOne-Operator-mgmt.md). + +## Deploying MatrixOne + +This section describes two ways to deploy MatrixOne: YAML and Chart. + +### Prepare before you start + +1. Create namespace `mo` for MatrixOne: + + ``` + NS="mo" + kubectl create ns "${NS}" + kubectl get ns # return has mo + ``` + + !!! note It is recommended that this namespace be separate from the namespace of MatrixOne Operator and not use the same one. + +2. Create the Secret service for accessing s3 in namespace mo by executing the following command: + + S3 CA certificate not required if accessed via HTTP protocol + + ``` + NS="mo" + name="s3mo" + + kubectl -n "${NS}" create secret generic "${name}" --from-literal=AWS_ACCESS_KEY_ID=51e1bHqcbfKla0fuakAtoJ2LMEvKThg4NiMjxxxx --from-literal=AWS_SECRET_ACCESS_KEY=aDMWw1hO2rqxltyIcBN6sy8qE_leIgzo6Satxxxx #modify + kubectl get secret -n "${NS}" "${name}" -oyaml # normal output key information + ``` + + S3 access via the HTTPS protocol requires a CA certificate, and before you begin, perform the relevant actions and configure the relevant files as follows: + + - Based on the ca certificate file, create a secret object for Kubernetes: + + ``` + NS=mo + ca_file_path="/data/deploy/csp_cert/ca.crt" # File path defined by the certificate in the key + ca_file_name="csp.cert" # File name defined by the certificate in the key + ca_secret_name="csp.cert" # Name of the certificate key itself + # Creating a Key + kubectl -n ${ns} create secret generic ${ca_secret_name} --from-file=${ca_file_name}=${ca_file_path} + ``` + + - Configure the relevant settings in spec.logService.sharedStorage.s3.certificateRef in the yaml file of the MatrixOne cluster object (the mo.yaml and values.yaml files in the deployment steps below), as follows: + + ``` + sharedStorage: + s3: + endpoint: xx.yy.com + path: mypath + # secretRef is required when there is no environment based auth available. + secretRef: + # secretRef.name corresponds to ca_secret_name + name: csp + certificateRef: + # certificateRef.name corresponds to ${ca_file_name} + name: csp.cert + files: + # certificateRef.files The values below the array correspond to ${ca_file_path} + - csp.cert + ``` + +3. Label the machine + + The following tags need to be called to the node before deployment, otherwise scheduling fails. The following tags are recommended in principle to hit different nodes and multiple nodes depending on the copy, or at least 1 node if this is not possible. (7 different nodes are recommended) + + ``` + matrixone/cn: true + matrixone/tn: true + matrixone/lg: true + ``` + + The first group: find three different machines, each labeled cn. + + ``` + NODE_1="10.0.0.1" # Replace with actual IP + NODE_2="10.0.0.2" # Replace with actual IP + NODE_3="10.0.0.3" # Replace with actual IP + + kubectl label node ${NODE_1} matrixone/cn: true + kubectl label node ${NODE_2} matrixone/cn: true + kubectl label node ${NODE_3} matrixone/cn: true + ``` + + Group 2: Find the 4th different machine and label each with tn. + + ``` + NODE_4="10.0.0.4" # Replace with actual IP + + kubectl label node ${NODE_4} matrixone/tn: true + ``` + + Group 3: Find three different machines, each labeled log . + + ``` + NODE_5="10.0.0.5" # Replace with actual IP + NODE_6="10.0.0.6" # Replace with actual IP + NODE_7="10.0.0.7" # Replace with actual IP + + kubectl label node ${NODE_5} matrixone/lg: true + kubectl label node ${NODE_6} matrixone/lg: true + kubectl label node ${NODE_7} matrixone/lg: true + ``` + +### yaml-style deployment + +1. Customize the yaml file for the MatrixOne cluster by writing the following mo.yaml file (modify the resource request as appropriate): + + ``` + apiVersion: core.matrixorigin.io/v1alpha1 + kind: MatrixOneCluster + metadata: + name: mo + namespace: mo + + spec: + # 1. Configuring cn + cnGroups: + - cacheVolume: + size: 800Gi + config: |2 + [log] + level = "info" + name: cng1 + nodeSelector:# Add tags as appropriate + matrixone/cn: "true" + serviceType: NodePort + nodePort: 31429 + replicas: 3 + resources: + requests: + cpu: 16000m + memory: 64000Mi + limits: + cpu: 16000m + memory: 64000Mi + overlay: + env: + - name: GOMEMLIMIT + value: "57600MiB" + # 2. Configuring tn + tn: + cacheVolume: + size: 100Gi + config: |2 + + [log] + level = "info" + nodeSelector: + matrixone/tn: "true" + replicas: 1 + resources: + requests: + cpu: 16000m + memory: 64000Mi + limits: + cpu: 16000m + memory: 64000Mi + # 3. Configuring logservice + logService: + config: |2 + [log] + level = "info" + nodeSelector: + matrixone/lg: "true" + pvcRetentionPolicy: Retain + replicas: 3 + # Configuring s3 storage for logservice mapping + sharedStorage: + s3: + endpoint: s3-qos.iot.qiniuec-test.com + path: mo-test + s3RetentionPolicy: Retain + secretRef: #Configure the key for accessing s3, i.e., secret, with the name s3mo. + name: s3mo + volume: + size: 100Gi + resources: + requests: + cpu: 4000m + memory: 16000Mi + limits: + cpu: 4000m + memory: 16000Mi + topologySpread: + - kubernetes.io/hostname + imagePullPolicy: IfNotPresent + imageRepository: registry.cn-shanghai.aliyuncs.com/matrixorigin/matrixone + version: 1.1.1 #This is the version of the MO image + ``` + +2. Create a MatrixOne cluster by executing the following command + + ``` + kubectl apply -f ./mo.yaml + ``` + +### chart way to deploy + +1. Add a matrixone-operator repository to helm + + ``` + helm repo add matrixone-operator https://matrixorigin.github.io/matrixone-operator + ``` + +2. Update Repository + + ``` + helm repo update + ``` + +3. View MatrixOne Chart version + + ``` + helm search repo matrixone-operator/matrixone --devel + ``` + + Example returns + + ``` + > helm search repo matrixone-operator/matrixone --devel + NAME CHART VERSION APP VERSION DESCRIPTION + matrixone-operator/matrixone 0.1.0 1.16.0 A Helm chart to deploy MatrixOne on K8S + matrixone-operator/matrixone-operator 1.1.0-alpha2 0.1.0 Matrixone Kubernetes Operator + helm search repo matrixone-operator/matrixone --devel + ``` + +4. Deploying MatrixOne + + Modify the values.yaml file, empty the original file, and replace it with the following (modify the resource request as appropriate): + + ``` + # 1. Configuring cn + cnGroups: + - cacheVolume: + size: 800Gi + config: |2 + [log] + level = "info" + name: cng1 + nodeSelector: + matrixone/cn: "true" + serviceType: NodePort + nodePort: 31429 + replicas: 3 # Number of copies of cn + resources: + requests: + cpu: 16000m + memory: 64000Mi + limits: + cpu: 16000m + memory: 64000Mi + overlay: + env: + - name: GOMEMLIMIT + value: "57600MiB" + # 2. Configuring tn + tn: + cacheVolume: + size: 100Gi + config: |2 + + [log] + level = "info" + nodeSelector: + matrixone/tn: "true" + replicas: 1 # The number of copies of tn, which cannot be modified. The current version only supports a setting of 1. + resources: + requests: + cpu: 16000m + memory: 64000Mi + limits: + cpu: 16000m + memory: 64000Mi + # 3. Configuring logService + logService: + config: |2 + [log] + level = "info" + nodeSelector: + matrixone/lg: "true" + pvcRetentionPolicy: Retain + replicas: 3 # Number of copies of logService + #Configuring s3 storage for logService mapping + sharedStorage: + s3: + endpoint: s3-qos.iot.qiniuec-test.com + path: mo-test + s3RetentionPolicy: Retain + secretRef: #Configure the key for accessing s3, i.e., secret, with the name s3mo. + name: s3mo + volume: + size: 100Gi + resources: + requests: + cpu: 4000m + memory: 16000Mi + limits: + cpu: 4000m + memory: 16000Mi + topologySpread: + - kubernetes.io/hostname + imagePullPolicy: IfNotPresent + imageRepository: registry.cn-shanghai.aliyuncs.com/matrixorigin/matrixone + version: 1.1.1 #This is the version of the MO image + ``` + + Install MatrixOne Chart (this deploys a MatrixOneCluster object) + + ``` + NS="mo" + RELEASE_NAME="mo_chart" + VERSION=v1 + helm install -n ${NS} ${RELEASE_NAME} matrixone-operator/matrixone --version ${VERSION} -f values.yaml + ``` + +### Checking cluster status + +Observe cluster status until Ready + +``` +NS="mo" +kubectl get mo -n "${NS}" # Wait status is Ready +``` + +Observe pod status until all are Running + +``` +NS="mo" +kubectl get pod -n "${NS}" -owide # waiting state is Running +``` + +## Connecting a MatrixOne Cluster + +In order to connect to a MatrixOne cluster, you need to map the port of the corresponding service to the MatrixOne node. Here is a guide for connecting to a MatrixOne cluster using `kubectl port-forward`: + +- Allow local access only: + + ``` + nohup kubectl port-forward -nmo svc/svc_name 6001:6001 & + ``` + +- Specify that a machine or all machines access: + + ``` + nohup kubectl port-forward -nmo --address 0.0.0.0 svc/svc_name 6001:6001 & + ``` + +After specifying **allow local access** or **specifying a machine** or all, you can connect to MatrixOne using a MySQL client: + +``` +# Connect to MySQL service using 'mysql' command line tool +# Use 'kubectl get svc/svc_name -n mo -o jsonpath='{.spec.clusterIP}'' to get the cluster IP address of the service in the Kubernetes cluster +# The '-h' parameter specifies the hostname or IP address of the MySQL service +# The '-P' parameter specifies the port number of the MySQL service, in this case 6001 +# '-uroot' means login as root +# '-p111' indicates that the initial password is 111 +mysql -h $(kubectl get svc/svc_name -n mo -o jsonpath='{.spec.clusterIP}') -P 6001 -uroot -p111 +mysql: [Warning] Using a password on the command line interface can be insecure. +Welcome to the MySQL monitor. Commands end with ; or \g. +Your MySQL connection id is 163 +Server version: 8.0.30-MatrixOne-v1.1.1 MatrixOne + +Copyright (c) 2000, 2023, Oracle and/or its affiliates. + +Oracle is a registered trademark of Oracle Corporation and/or its +affiliates. Other names may be trademarks of their respective +owners. + +Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. + +mysql> +``` + +After explicitly `mysql>`, the distributed MatrixOne cluster setup connection completes. + +!!! note + The login account in the above code section is the initial account. Please change the initial password promptly after logging into MatrixOne, see [Password Management](../../Security/password-mgmt.md). diff --git a/docs/MatrixOne/Deploy/deploy-Matrixone-cluster/deploy-MatrixOne-cluster-without-k8.md b/docs/MatrixOne/Deploy/deploy-Matrixone-cluster/deploy-MatrixOne-cluster-without-k8.md new file mode 100644 index 000000000..1ad826821 --- /dev/null +++ b/docs/MatrixOne/Deploy/deploy-Matrixone-cluster/deploy-MatrixOne-cluster-without-k8.md @@ -0,0 +1,727 @@ +# Cluster Deployment Guide + +This document will focus on how to deploy a **privatized Kubernetes cluster**-based cloud protosurvival separated distributed database, MatrixOne, from 0. + +## Key steps + +1. Deploying Kubernetes Clusters +2. Deployment Object Store MinIO +3. Create and connect a MatrixOne cluster + +## Noun interpretation + +Since there are many Kubernetes related terms involved in this document, in order to allow you to understand the build process, here is a simple explanation of the important terms involved, if you need to know more about Kubernetes related content, you can refer directly to the [Kubernetes Chinese Community](http://docs.kubernetes.org.cn/) + +- Pod + + Pods are the smallest resource management component in Kubernetes. Pods are also resource objects that minimize running containerized apps. A Pod represents a process running in a cluster. In a nutshell, we can call a set of apps that provide specific functionality a pod that will contain one or more container objects that together serve the outside world. + +- Storage Class + + Storage Class, or **SC**, is used to label the characteristics and performance of a storage resource, which an administrator can define as a category, just as a storage device describes its own configuration (Profile). Depending on the description of the SC, the characteristics of the various storage resources can be visualized and the storage resources can be requested based on the application's demand for them. + +- CSI + + Kubernetes provides a **CSI** interface (Container Storage Interface), based on which custom CSI plug-ins can be developed to support specific storage for decoupling purposes. + +- PersistentVolume + + PersistentVolume, or **PV**, is a storage resource that includes settings for critical information such as storage capacity, access patterns, storage types, recycling policies, and backend storage types. + +- PersistentVolumeClaim + + PersistentVolumeClaim, or **PVC**, is used as a user request for storage resources. It mainly includes the setting of information such as storage space request, access mode, PV selection criteria and storage category. + +- Service + + Also called **SVC**, a mechanism for matching a set of pods to an external access service by way of label selection. Each svc can be understood as a microservice. + +- Operator + + Kubernetes Operator is a way to encapsulate, deploy, and manage Kubernetes applications. We deploy and manage Kubernetes apps on Kubernetes using the Kubernetes API (Application Programming Interface) and kubectl tools. + +## Deployment Architecture + +### Dependent Components + +The MatrixOne distributed system relies on the following components: + +- Kubernetes: As a resource management platform for the entire MatrixOne cluster, including components such as Logservice, CN, TN, etc., run in Pods managed by Kubernetes. If a failure occurs, Kubernetes is responsible for weeding out the failed Pod and starting a new one to replace it. + +- Minio: Provides object storage services for the entire MatrixOne cluster, with all MatrixOne data stored in object storage provided by Minio. + +In addition, for container management and orchestration on Kubernetes, we need the following plugins: + +- Helm:Helm is a package management tool for managing Kubernetes applications, similar to APT for Ubuntu and YUM for CentOS. It is used to manage preconfigured installation package resources called Charts. + +- local-path-provisioner: As a plug-in in Kubernetes that implements the CSI (Container Storage Interface) interface, local-path-provisioner is responsible for creating persistent volumes (PVs) for the Pods and Minio of each component of MatrixOne for persistent storage of data. + +### Overall architecture + +The overall deployment architecture is shown in the following figure: + +
+ +
+ +The overall architecture consists of the following components: + +- The bottom layer is three server nodes: the first as host1, where the Kubernetes springboard is installed, the second as the Kubernetes master, and the third as the Kubernetes worker node. + +- The upper layer is the installed Kubernetes and Docker environments that make up the cloud native platform layer. + +- A layer of Kubernetes plug-ins managed based on Helm, including local-path-storage plug-ins that implement CSI interfaces, Minio, and MatrixOne Operator. + +- The top layer is multiple Pods and Services generated by these component configurations. + +### MatrixOne's Pod and Storage Architecture + +MatrixOne creates a series of Kubernetes objects based on Operator's rules that are categorized by component and categorized into resource groups, CNSet, TNSet, and LogSet. + +- Service: Services in each resource group need to be externally available through the Service. Service hosts the ability to connect to the outside world, ensuring service is still available if the Pod crashes or is replaced. The external application connects through the public port of the Service, while the Service forwards the connection to the appropriate Pod through internal forwarding rules. + +- A containerized instance of the Pod:MatrixOne component that runs the core kernel code of MatrixOne. + +- PVC: Each Pod declares its required storage resources through a PVC (Persistent Volume Claim). In our architecture, CN and TN need to request a storage resource as a cache, while LogService needs the corresponding S3 resource. These requirements are stated via PVC. + +- PV:PV (Persistent Volume) is an abstract representation of a storage medium that can be viewed as a storage unit. After the PVC has been requested, the PV is created through software that implements the CSI interface and binds it to the PVC requesting the resource. + +
+ +
+ +## 1\. Deploy a Kubernetes cluster + +Because the distributed deployment of MatrixOne relies on Kubernetes clusters, we need a Kubernetes cluster. This article will guide you through setting up a Kubernetes cluster by using **Kuboard-Spray**. + +### Preparing the Cluster Environment + +For clustered environments, you need to prepare as follows: + +- 3 virtual machines +- The operating system uses CentOS 7.9 (needs to allow root remote login): two as machines to deploy Kubernetes and MatrixOne-related dependencies, and one as a springboard to build a Kubernetes cluster. +- extranet access conditions. All 3 servers require extranet mirror pull. + +The distribution of individual machine conditions is as follows: + +| **Host** | **Intranet IP** | **Extranet IP** | **mem** | **CPU** | **Disk** | **Role** | +| ------------ | ------------- | --------------- | ------- | ------- | -------- | ----------- | +| kuboardspray | 10.206.0.6 | 1.13.2.100 | 2G | 2C | 50G | hopper | +| master0 | 10.206.134.8 | 118.195.255.252 | 8G | 2C | 50G | master etcd | +| node0 | 10.206.134.14 | 1.13.13.199 | 8G | 2C | 50G | worker | + +#### Springboard Deployment Kuboard Spray + +Kuboard-Spray is a tool used to visually deploy Kubernetes clusters. It uses Docker to quickly pull up a web app that can visually deploy Kubernetes clusters. Once the Kubernetes cluster environment is deployed, you can stop the Docker app. + +##### Springboard environment preparation + +1. Install Docker: Since Docker is used, an environment with Docker is required. Install and start Docker on the springboard machine using the following command: + + ``` + curl -sSL https://get.docker.io/ | sh + #If in a restricted network environment in the country, you can change the following domestic mirror address + curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun + ``` + +2. Start Docker: + + ``` + [root@VM-0-6-centos ~]# systemctl start docker + [root@VM-0-6-centos ~]# systemctl status docker + ● docker.service - Docker Application Container Engine + Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled) + Active: active (running) since Sun 2023-05-07 11:48:06 CST; 15s ago + Docs: https://docs.docker.com + Main PID: 5845 (dockerd) + Tasks: 8 + Memory: 27.8M + CGroup: /system.slice/docker.service + └─5845 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock + + May 07 11:48:06 VM-0-6-centos systemd[1]: Starting Docker Application Container Engine... + May 07 11:48:06 VM-0-6-centos dockerd[5845]: time="2023-05-07T11:48:06.391166236+08:00" level=info msg="Starting up" + May 07 11:48:06 VM-0-6-centos dockerd[5845]: time="2023-05-07T11:48:06.421736631+08:00" level=info msg="Loading containers: start." + May 07 11:48:06 VM-0-6-centos dockerd[5845]: time="2023-05-07T11:48:06.531022702+08:00" level=info msg="Loading containers: done." + May 07 11:48:06 VM-0-6-centos dockerd[5845]: time="2023-05-07T11:48:06.544715135+08:00" level=info msg="Docker daemon" commit=94d3ad6 graphdriver=overlay2 version=23.0.5 + May 07 11:48:06 VM-0-6-centos dockerd[5845]: time="2023-05-07T11:48:06.544798391+08:00" level=info msg="Daemon has completed initialization" + May 07 11:48:06 VM-0-6-centos systemd[1]: Started Docker Application Container Engine. + May 07 11:48:06 VM-0-6-centos dockerd[5845]: time="2023-05-07T11:48:06.569274215+08:00" level=info msg="API listen on /run/docker.sock" + ``` + +Once the environment is ready, Kuboard-Spray can be deployed. + +#### Deploy Kuboard-Spray + +Install Kuboard-Spray by executing the following command: + +``` +docker run -d \ + --privileged \ + --restart=unless-stopped \ + --name=kuboard-spray \ + -p 80:80/tcp \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/kuboard-spray-data:/data \ + eipwork/kuboard-spray:latest-amd64 +``` + +If a mirror pull fails due to a network problem, you can use the following alternate address: + +``` +docker run -d \ + --privileged \ + --restart=unless-stopped \ + --name=kuboard-spray \ + -p 80:80/tcp \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/kuboard-spray-data:/data \ + swr.cn-east-2.myhuaweicloud.com/kuboard/kuboard-spray:latest-amd64 +``` + +Once this is done, you can enter `http://1.13.2.100` (Springboard IP address) in your browser to open the Kuboard-Spray web interface, enter the username `admin`, the default password `Kuboard123`, and log into the Kuboard-Spray interface as follows: + +![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-1.png) + +Once logged in, you can begin deploying Kubernetes clusters. + +### Visually deploy Kubernetes clusters + +Once logged into the Kuboard-Spray interface, you can begin visual deployment of your Kubernetes cluster. + +#### Import Kubernetes related resource packs + +The installation interface downloads the resource package corresponding to the Kubernetes cluster via an online download to enable offline installation of the Kubernetes cluster. + +1. Click on **Resource Pack Management** and select the appropriate version of Kubernetes Resource Pack to download: + + Download version `spray-v2.18.0b-2_k8s-v1.23.17_v1.24-amd64` + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-2.png) + +2. After clicking **Import**, select **Load Resource Package**, select the appropriate download source, and wait for the resource package download to complete. + + !!! note It is recommended that you choose Docker as the container engine for the K8s cluster. After selecting Docker as the container engine for K8s, Kuboard-Spray automatically uses Docker to run the various components of the K8s cluster, including containers on the Master node and the Worker node. + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-3.png) + +3. This `pulls` the relevant mirror dependencies: + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-4.png) + +4. After the mirrored resource pack is pulled successfully, return to Kuboard-Spray's web interface and see that the corresponding version of the resource pack has been imported. + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-5.png) + +#### Installing a Kubernetes Cluster + +This chapter will guide you through the installation of the Kubernetes cluster. + +1. Select **Cluster Management** and select **Add Cluster Installation Plan**: + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-6.png) + +2. In the pop-up dialog box, define the name of the cluster, select the version of the resource package you just imported, and click **OK**. As shown in the following figure: + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-7.png) + +##### Cluster planning + +Kubernetes clusters are deployed in a pattern of `1 master + 1 worker +1 etcd` according to pre-defined role classifications. + +After defining the completion cluster name in the previous step and selecting the completion resource pack version, click **OK** to proceed directly to the cluster planning phase. + +1. Select the role and name of the corresponding node: + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-8.png) + + - master node: Select the ETCD and control node and name it master0\. (You can also select the working node if you want the master node to work.) This approach improves resource utilization, but reduces the high availability of Kubernetes.) + - worker node: Select only the worker node and name it node0. + +2. After each node has filled in the role and node name, fill in the connection information for the corresponding node to the right, as shown in the following figure: + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-9.png) + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-9-1.png) + +3. Click **Save** when you have filled out all the roles. Next you are ready to install the Kubernetes cluster. + +#### Start installing Kubernetes cluster + +After completing all roles in the previous step and **saving** them, click **Execute** to begin the installation of the Kubernetes cluster. + +1. Click **OK** to begin installing the Kubernetes cluster as shown in the following figure: + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-10.png) + +2. When you install a Kubernetes cluster, the Kubernetes cluster is installed by executing an `ansible` script on the corresponding node. The overall event can take anywhere from 5 to 10 minutes depending on the machine configuration and network and the time to wait. + + __Note:__ If an error occurs, you can look at the contents of the log and confirm if it is a version mismatch for Kuboard-Spray. If it is, replace it with the appropriate version. + +3. After installation, go to the master node of the Kubernetes cluster and execute `kubectl get node`: + + ``` + [root@master0 ~]# kubectl get node + NAME STATUS ROLES AGE VERSION + master0 Ready control-plane,master 52m v1.23.17 + node0 Ready 52m v1.23.17 + ``` + +4. The command results are shown in the figure above, which indicates that the Kubernetes cluster installation is complete. + +5. Adjust the DNS routing table on each node of Kubernetes. Execute the following command on each machine to locate the nameserver that contains `169.254.25.10` and delete the record. (The record may affect the efficiency of communication between the individual pods and need not be changed if it does not exist) + + ``` + vim /etc/resolve.conf + ``` + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-10-1.png) + +## 2. Deployment helm + +Helm is a package management tool for managing Kubernetes applications. It simplifies the process of deploying and managing applications by using charts (preconfigured installation package resources). Similar to APT for Ubuntu and YUM for CentOS, Helm provides a convenient way to install, upgrade, and manage Kubernetes applications. + +Before installing Minio, we need to install Helm, as the Minio installation process depends on Helm. Here are the steps to install Helm: + +__Note:__ This chapter operates at the master0 node. + +1. Download the helm installation package: + + ``` + wget https://get.helm.sh/helm-v3.10.2-linux-amd64.tar.gz + #If in a restricted network environment in the country, you can change the following domestic mirror address + wget https://mirrors.huaweicloud.com/helm/v3.10.2/helm-v3.10.2-linux-amd64.tar.gz + ``` + +2. Unzip and install: + + ``` + tar -zxf helm-v3.10.2-linux-amd64.tar.gz + mv linux-amd64/helm /usr/local/bin/helm + ``` + +3. Verify the version to see if the installation is complete: + + ``` + [root@k8s01 home]# helm version + version.BuildInfo{Version:"v3.10.2", GitCommit:"50f003e5ee8704ec937a756c646870227d7c8b58", GitTreeState:"clean", GoVersion:"go1.18.8"} + ``` + + Installation is complete when the version information shown above appears. + +## 3. CSI Deployment + +CSI is a storage plug-in for Kubernetes and provides storage services for MinIO and MarixOne. This chapter will guide you through using the `local-path-provisioner` plugin. + +__Note:__ This chapter operates at the master0 node. + +1. Using the following command line, install CSI: + + ``` + wget https://github.com/rancher/local-path-provisioner/archive/refs/tags/v0.0.23.zip + unzip v0.0.23.zip + cd local-path-provisioner-0.0.23/deploy/chart/local-path-provisioner + helm install --set nodePathMap[0].paths[0]="/opt/local-path-provisioner",nodePathMap[0].node=DEFAULT_PATH_FOR_NON_LISTED_NODES --create-namespace --namespace local-path-storage local-path-storage ./ + ``` + + If the original github address downloads too slowly, you can try downloading the mirror package from: + + ``` + wget https://githubfast.com/rancher/local-path-provisioner/archive/refs/tags/v0.0.23.zip + ``` + +2. After a successful installation, the command line appears as follows: + + ``` + root@master0:~# kubectl get pod -n local-path-storage + NAME READY STATUS RESTARTS AGE + local-path-storage-local-path-provisioner-57bf67f7c-lcb88 1/1 Running 0 89s + ``` + + __Note:__ After installation, the storageClass provides storage services in the "/opt/local-path-provisioner" directory of the worker node. You can modify to another path. + +3. Set the default `storageClass`: + + ``` + kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' + ``` + +4. After setting the default successfully, the command line appears as follows: + + ``` + root@master0:~# kubectl get storageclass + NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE + local-path (default) cluster.local/local-path-storage-local-path-provisioner Delete WaitForFirstConsumer true 115s + ``` + +## 4. MinIO Deployment + + The role of MinIO is to provide object storage for MatrixOne. This chapter will guide you through deploying a single node MinIO. + +__Note:__ This chapter operates at the master0 node. + +### Installation Launch + +1. The command line to install and start MinIO is as follows: + + ``` + helm repo add minio https://charts.min.io/ mkdir minio_ins && cd minio_ins helm fetch minio/minio ls -lth tar -zxvf minio-5.0.9.tgz # This version is subject to change, based on the actual download cd ./minio/ + + kubectl create ns mostorage + + helm install minio \ + --namespace mostorage \ + --set resources.requests.memory=512Mi \ + --set replicas=1 \ + --set persistence.size=10G \ + --set mode=standalone \ + --set rootUser=rootuser,rootPassword=rootpass123 \ + --set consoleService.type=NodePort \ + --set image.repository=minio/minio \ + --set image.tag=latest \ + --set mcImage.repository=minio/mc \ + --set mcImage.tag=latest \ + -f values.yaml minio/minio + ``` + + !!! note + `--set resources.requests.memory=512Mi` sets MinIO's minimum memory consumption `--set persistence.size=1G` sets MinIO's storage size to 1G --`-set rootUser=rootuser,rootPassword=rootpass123` The parameters set here for rootUser and rootPassword are required for subsequent creation of the Kubernetes cluster's scrects file, so use a message that can be remembered. - If repeated multiple times for network or other reasons, you need to uninstall first: + + ``` + helm uninstall minio --namespace mostorage + ``` + +2. After installing and starting MinIO successfully, the command line appears as follows: + + ``` + NAME: minio + LAST DEPLOYED: Sun May 7 14:17:18 2023 + NAMESPACE: mostorage + STATUS: deployed + REVISION: 1 + TEST SUITE: None + NOTES: + MinIO can be accessed via port 9000 on the following DNS name from within your cluster: + minio.mostorage.svc.cluster.local + + To access MinIO from localhost, run the below commands: + + 1. export POD_NAME=$(kubectl get pods --namespace mostorage -l "release=minio" -o jsonpath="{.items[0].metadata.name}") + + 2. kubectl port-forward $POD_NAME 9000 --namespace mostorage + + Read more about port forwarding here: http://kubernetes.io/docs/user-guide/kubectl/kubectl_port-forward/ + + You can now access MinIO server on http://localhost:9000. Follow the below steps to connect to MinIO server with mc client: + + 1. Download the MinIO mc client - https://min.io/docs/minio/linux/reference/minio-mc.html#quickstart + + 2. export MC_HOST_minio-local=http://$(kubectl get secret --namespace mostorage minio -o jsonpath="{.data.rootUser}" | base64 --decode):$(kubectl get secret --namespace mostorage minio -o jsonpath="{.data.rootPassword}" | base64 --decode)@localhost:9000 + + 3. mc ls minio-local + ``` + + Minio has been successfully installed so far. During subsequent MatrixOne installations, MatrixOne will communicate with Minio directly through Kubernetes' Service (SVC) without additional configuration. + + However, if you want to connect to Minio from `localhost`, you can do the following command line to set the `POD_NAME` variable and connect `mostorage` to port 9000: + + ``` + export POD_NAME=$(kubectl get pods --namespace mostorage -l "release=minio" -o jsonpath="{.items[0].metadata.name}") + nohup kubectl port-forward --address 0.0.0.0 $POD_NAME -n mostorage 9000:9000 & + ``` + +3. Once launched, use to log into MinIO's page and create the information stored by the object. As shown in the following figure, the account password is the rootUser and rootPassword set by `--set rootUser=rootuser,rootPassword=rootpass123` in the above steps: + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-13.png) + +4. Once the login is complete, you need to create an object to store the relevant information: + + Click **Bucket > Create Bucket** and fill in Bucket's name **minio-mo** in **Bucket Name**. Once completed, click the button **Create Bucket** at the bottom right. + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/deploy/deploy-mo-cluster-14.png) + +## 5. MatrixOne Cluster Deployment + +This chapter will guide you through deploying a MatrixOne cluster. + +__Note:__ This chapter operates at the master0 node. + +#### Installing MatrixOne-Operator + +[MatrixOne Operator](https://github.com/matrixorigin/matrixone-operator) is a standalone software tool for deploying and managing MatrixOne clusters on Kubernetes. You can choose to deploy online or offline. + +- **Deploy online** + +Follow these steps to install MatrixOne Operator on master0. We will create a separate namespace `matrixone-operator` for Operator. + +1. Add the matrixone-operator address to the helm repository: + + ``` + helm repo add matrixone-operator https://matrixorigin.github.io/matrixone-operator + ``` + +2. Update the repository: + + ``` + helm repo update + ``` + +3. View the MatrixOne Operator version: + + ``` + helm search repo matrixone-operator/matrixone-operator --versions --devel + ``` + +4. Specify the release to install MatrixOne Operator: + + ``` + helm install matrixone-operator matrixone-operator/matrixone-operator --version --create-namespace --namespace matrixone-operator + ``` + + !!! note + The parameter VERSION is the version number of the MatrixOne Operator to be deployed, such as 1.0.0-alpha.2. + +5. After a successful installation, confirm the installation status using the following command: + + ``` + kubectl get pod -n matrixone-operator + ``` + + Ensure that all Pod states in the above command output are Running. + + ``` + [root@master0 matrixone-operator]# kubectl get pod -n matrixone-operator + NAME READY STATUS RESTARTS AGE + matrixone-operator-f8496ff5c-fp6zm 1/1 Running 0 3m26s + ``` + +The corresponding Pod states are normal as shown in the above code line. + +- **Deploy offline** + +You can select the Operator Release version installation package you need from the project's [Release list](https://github.com/matrixorigin/matrixone-operator/releases) for offline deployment. + +1. Create a standalone namespace mo-op for Operator + + ``` + NS="mo-op" + kubectl create ns "${NS}" + kubectl get ns # return has mo-op + ``` + +2. Download and extract the matrixone-operator installation package + + ``` + wget https://github.com/matrixorigin/matrixone-operator/releases/download/chart-1.1.0-alpha2/matrixone-operator-1.1.0-alpha2.tgz + tar xvf matrixone-operator-1.1.0-alpha2.tgz + ``` + + If the original github address downloads too slowly, you can try downloading the mirror package from: + + ``` + wget https://githubfast.com/matrixorigin/matrixone-operator/releases/download/chart-1.1.0-alpha2/matrixone-operator-1.1.0-alpha2.tgz + ``` + + After extraction it will be in the current directory production folder `matrixone-operator`. + +3. Deploying matrixone-operator + + ``` + NS="mo-op" + cd matrixone-operator/ + helm install -n ${NS} mo-op ./charts/matrixone-operator --dependency-update # Success should return the status of deployed + ``` + + The above list of dependent docker mirrors is: + + - matrixone-operator + - kruise-manager + + If you cannot pull a mirror from dockerhub, you can pull it from Aliyun using the following command: + + ``` + helm -n ${NS} install mo-op ./charts/matrixone-operator --dependency-update -set image.repository="registry.cn-hangzhou.aliyuncs.com/moc-pub/matrixone-operator" --set kruise.manager.image.repository="registry.cn-hangzhou.aliyuncs.com/moc-pub/kruise-manager" + ``` + + See matrixone-operator/values.yaml for details. + +4. Check operator deployment status + + ``` + NS="mo-op" + helm list -n "${NS}" # returns a corresponding helm chart package with deployed + kubectl get pod -n "${NS}" -owide # returns a copy of a pod with Running + ``` + +To learn more about Matrixone Operator, check out [Operator Administration](../../Deploy/MatrixOne-Operator-mgmt.md). + +### Creating a MatrixOne Cluster + +1. First create the namespace for MatrixOne: + + ``` + NS="mo-hn" + kubectl create ns ${NS} + ``` + +2. Customize the `yaml` file for the MatrixOne cluster by writing the following `mo.yaml` file: + + ``` + apiVersion: core.matrixorigin.io/v1alpha1 + kind: MatrixOneCluster + metadata: + name: mo + namespace: mo-hn + spec: + # 1. Configuring tn + tn: + cacheVolume: # Disk Cache for tn + size: 5Gi # Modified according to actual disk size and requirements + storageClassName: local-path # If you don't write it, the default storage class will be used. + resources: + requests: + cpu: 100m #1000m=1c + memory: 500Mi # 1024Mi + limits: # Note that the limits should not be lower than requests, nor exceed the capacity of a single node, and are generally allocated according to the actual situation, and it is sufficient to set the limits in line with requests. + cpu: 200m + memory: 1Gi + config: | # Configuration of tn + [dn.Txn.Storage] + backend = "TAE" + log-backend = "logservice" + [dn.Ckp] + flush-interval = "60s" + min-count = 100 + scan-interval = "5s" + incremental-interval = "60s" + global-interval = "100000s" + [log] + level = "error" + format = "json" + max-size = 512 + replicas: 1 # The number of copies of tn, which cannot be modified. The current version only supports a setting of 1. + # 2. Configuring logservice + logService: + replicas: 3 # Number of copies of logservice + resources: + requests: + cpu: 100m #1000m=1c + memory: 500Mi # 1024Mi + limits: # Note that the limits should not be lower than requests, nor exceed the capacity of a single node, and are generally allocated according to the actual situation, and it is sufficient to set the limits in line with requests. + cpu: 200m + memory: 1Gi + sharedStorage: # Configuring s3 storage for logservice mapping + s3: + type: minio # The s3 storage type to which it is docked is minio + path: minio-mo # The path to the minio bucket for mo, previously created via the console or the mc command. + endpoint: http://minio.mostorage:9000 # Here is the svc address and port for the minio service + secretRef: # Configure the key for accessing minio, i.e. secret, with the name minio + name: minio + pvcRetentionPolicy: Retain # Configure the cycle policy for pvc after cluster destruction, Retain for Retain and Delete for Delete. + volume: + size: 1Gi # Configure the size of the S3 object store, modify it according to the actual disk size and requirements + config: | # Configuration of the logservice + [log] + level = "error" + format = "json" + max-size = 512 + # 3. 配置 cn + tp: + cacheVolume: # Disk Cache for cn + size: 5Gi # Modified according to actual disk size and requirements + storageClassName: local-path # If you don't write it, the default storage class will be used. + resources: + requests: + cpu: 100m #1000m=1c + memory: 500Mi # 1024Mi + limits: # Note that the limits should not be lower than requests, nor exceed the capacity of a single node, and are generally allocated according to the actual situation, and it is sufficient to set the limits in line with requests. + cpu: 200m + memory: 2Gi + serviceType: NodePort # The cn needs to provide an external access portal, and its svc is set to NodePort. + nodePort: 31429 # nodePort Port Settings + config: | # Configuring cn + [cn.Engine] + type = "distributed-tae" + [log] + level = "debug" + format = "json" + max-size = 512 + replicas: 1 + version: nightly-54b5e8c # Here is the version of the MO image, you can check it through dockerhub, usually cn, tn, logservice are packaged in the same image, so you can use the same field to specify it, or you can specify it separately in the respective section, but without special circumstances, please use the unified image version. + # https://hub.docker.com/r/matrixorigin/matrixone/tags + imageRepository: matrixorigin/matrixone # Mirror repository address, if the tag has been modified since the local pull, then you can adjust this configuration item + imagePullPolicy: IfNotPresent # Mirror pulling policy, consistent with official k8s configurable values + ``` + +3. Create a Secret service in namespace `mo-hn` to access MinIO by executing the following command: + + ``` + kubectl -n mo-hn create secret generic minio --from-literal=AWS_ACCESS_KEY_ID=rootuser --from-literal=AWS_SECRET_ACCESS_KEY=rootpass123 + ``` + + where the username and password use the `rootUser` and `rootPassword` set when creating the MinIO cluster. + +4. Deploy the MatrixOne cluster by executing the following command: + + ``` + kubectl apply -f mo.yaml + ``` + +5. Please be patient for approximately 10 minutes, and continue if a Pod reboot occurs. until you see the following message indicating successful deployment: + + ``` + [root@master0 mo]# kubectl get pods -n mo-hn + NAME READY STATUS RESTARTS AGE + mo-tn-0 1/1 Running 0 74s + mo-log-0 1/1 Running 1 (25s ago) 2m2s + mo-log-1 1/1 Running 1 (24s ago) 2m2s + mo-log-2 1/1 Running 1 (22s ago) 2m2s + mo-tp-cn-0 1/1 Running 0 50s + ``` + +## Connecting a MatrixOne Cluster + +In order to connect to a MatrixOne cluster, you need to map the port of the corresponding service to the MatrixOne node. Here is a guide for connecting to a MatrixOne cluster using `kubectl port-forward`: + +- Allow local access only: + + ``` + nohup kubectl port-forward -nmo-hn svc/mo-tp-cn 6001:6001 & + ``` + +- Specify that a machine or all machines access: + + ``` + nohup kubectl port-forward -nmo-hn --address 0.0.0.0 svc/mo-tp-cn 6001:6001 & + ``` + +After specifying **allow local access** or **specifying a machine** or all, you can connect to MatrixOne using a MySQL client: + +``` +# Connect to MySQL service using 'mysql' command line tool +# Use 'kubectl get svc/mo-tp-cn -n mo-hn -o jsonpath='{.spec.clusterIP}' to get the cluster IP address of the service in the Kubernetes cluster +# The '-h' parameter specifies the hostname or IP address of the MySQL service +# The '-P' parameter specifies the port number of the MySQL service, in this case 6001 +# '-uroot' means login as root +# '-p111' indicates that the initial password is 111 +mysql -h $(kubectl get svc/mo-tp-cn -n mo-hn -o jsonpath='{.spec.clusterIP}') -P 6001 -uroot -p111 +mysql: [Warning] Using a password on the command line interface can be insecure. +Welcome to the MySQL monitor. Commands end with ; or \g. +Your MySQL connection id is 163 +Server version: 8.0.30-MatrixOne-v1.1.1 MatrixOne + +Copyright (c) 2000, 2023, Oracle and/or its affiliates. + +Oracle is a registered trademark of Oracle Corporation and/or its +affiliates. Other names may be trademarks of their respective +owners. + +Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. + +mysql> +``` + +After displaying `mysql>`, the distributed MatrixOne cluster setup connection completes. + +!!! note + The login account in the above code section is the initial account. Please change the initial password promptly after logging into MatrixOne, see [Password Management](../../Security/password-mgmt.md). diff --git a/docs/MatrixOne/Develop/Publish-Subscribe/pub-sub-overview.md b/docs/MatrixOne/Develop/Publish-Subscribe/pub-sub-overview.md index 467c936f4..a52d214b2 100644 --- a/docs/MatrixOne/Develop/Publish-Subscribe/pub-sub-overview.md +++ b/docs/MatrixOne/Develop/Publish-Subscribe/pub-sub-overview.md @@ -1,79 +1,104 @@ -# Publish-subscribe +# Publish Subscription -Publish-Subscribe (Pub/Sub for short) of a database is a messaging model in which **Publisher** sends messages to one or more **Subscribers**, and **Subscribers** The message is received and processed. In this mode, publishers and subscribers are loosely coupled, and no direct communication is required between them, thus improving the scalability and flexibility of the application. +A database's Publish-Subscribe (Pub/Sub) is a messaging model in which a **publisher** sends a message to one or more **subscribers**, who in turn receive and process the message. The **subscriber** sends the message to one or more subscribers. In this model, publishers and subscribers are loosely coupled and do not need to communicate directly with each other, thus increasing application scalability and flexibility. -In databases, the publish-subscribe function is usually used in scenarios such as real-time data updates, cache synchronization, and business event notification. For example, when the data of a particular table in the database changes, the subscribers can be notified in real-time through the publish and subscribe function, to realize real-time data synchronization and processing. In addition, the notification of business events can also be recognized through the publish and subscribe function, such as an order being canceled, a certain inventory quantity is insufficient, and so on. +In databases, publish subscriptions are often used in scenarios such as real-time data updates, cache synchronization, and business event notifications. For example, when data changes for a table in a database, subscribers can be notified in real time through the publish subscription feature, enabling real-time data synchronization and processing. In addition, notification of business events, such as an order being cancelled, an inventory being insufficient, etc., can be achieved by publishing a subscription function. -There can be a many-to-many relationship between publishers and subscribers; one publisher can publish messages to multiple subscribers, and one subscriber can also subscribe to various messages/data. Usually, the publish-subscribe function of the database consists of two parts: **Publisher** and **Subscriber**. **Publisher** is responsible for publishing messages, while **Subscriber** subscribes to corresponding messages to achieve data synchronization. +Typically, a database's publish subscription function consists of two parts: **the publisher** and the **subscriber**. The **publisher** is responsible for publishing the message, while the **subscriber** subscribes to the corresponding message for data synchronization purposes. There can be a many-to-many relationship between the publisher and the subscriber, i.e. one publisher can post messages to multiple subscribers and one subscriber can subscribe to multiple messages/data. ## Application scenarios -The publish-subscribe function has many typical application scenarios: +The publish subscription feature has several typical application scenarios: -- **Data Synchronization**: When a database needs to be kept in sync with another database, the publish-subscribe feature can send data changes to the subscriber database. For example, when a website needs to transfer data from one geographic location to another, publish-subscribe functionality can ensure data synchronization between the two databases. +- **Data synchronization**: When one database needs to be synchronized with another, the publish subscription feature can be used to send data changes to the subscriber database. For example, when a website needs to transfer data from one geographic location to another, the publish subscription feature can be used to ensure data synchronization between two databases. -- **Business data distribution**: The publish and subscribe function can distribute business data to different systems or processes. For example, when a bank needs to distribute customer account information to multiple business systems, the publish-subscribe function can distribute data to corresponding systems to ensure data consistency between various business processes. +- **Business data distribution**: The publish subscription feature can be used to distribute business data to different systems or business processes. For example, when a bank needs to distribute customer account information to multiple business systems, the publish subscription feature can be used to distribute data to the appropriate systems, ensuring data consistency across business processes. -- **Data backup**: The publish-subscribe function can back up data. For example, when one database needs to be backed up to another database, the publish-subscribe part can be used to back up the data to the subscriber database so that the data can be recovered in the event of failure of the primary database. +- **Data backup**: The publish subscription feature can be used to back up data. For example, when one database needs to be backed up to another, the publish subscription feature can be used to back up data to the subscriber database to restore data in the event of a primary database failure. -- **Real-time data processing**: The publish-subscribe function can be used to realize real-time data processing. For example, when a website needs to process data from different users, the publish-subscribe part can be used to transmit data to a processing program for processing, to realize real-time data analysis and decision-making. +- **Real-time data processing**: The publish subscription feature can be used to enable real-time data processing. For example, when a website needs to process data from different users, the publish subscription feature can be used to transfer the data to a handler for processing in order to enable real-time data analysis and decision-making. -## Concepts +## Noun interpretation -- **Publication**: In a database, a publication often refers to the process of setting a database object to be accessible by other accounts. It is a crucial step in data sharing and replication, where the published objects can be subscribed to by other accounts, and their data can be accessed. +- **Publishing**: In a database, publishing usually refers to setting a database object to a state accessible to other tenants. This is an important step in data sharing and replication, where published objects can be subscribed to and acquired by other tenants. -- **Subscription**: A subscription refers to a database choosing to receive and replicate the data of a published database object. +- **Subscription**: A subscription is when a database chooses to receive and copy data from published database objects. -- **Publisher (Pub)**: The Publisher is the database that performs the publishing operation. The Publisher is responsible for creating and managing the published objects, as well as managing the access permissions of databases subscribing to these published objects. +- **Publisher (Pub**): A publisher is a database that performs publishing operations. The publishing side is responsible for creating and managing published objects, as well as managing access to databases that subscribe to that published object. -- **Subscriber (Sub)**: The Subscriber is the account that subscribes to the published objects. +- **Subscriber (Sub)**: A subscriber is a tenant who subscribes to a publishing object. -- **Published Object**: A published object is a database created by the Publisher and made available for publication, namely a database. The data of these objects can be accessed and replicated by the Subscriber. +- **Publish object**: A publish object is a database object created on the publish side and set to publishable, i.e., a database. The data for these objects can be accessed and copied by the subscriber. -- **Subscribed Object**: A subscribed object is a published object replicated and stored on the Subscriber. The data of the subscribed object is updated according to the data on the Publisher. +- **Subscription objects**: Subscription objects are publishing objects that are copied and stored on the subscription side. The subscription object's data is updated based on the publisher's data. -## Publication/Subscription Scope Explanation +## Publish Subscription Scope Description -### Publish/Subscribe Application Scope +### Publish/Subscribe to Scope of Application -Both **Publisher** and **Subscriber** are accounts of MatrixOne. +- **Publisher (Pub)**and**Sub (Sub)** are both tenants of MatrixOne. -### Publishable/Subscribable Permissions +### Publishable/Subscribeable Permission Range -- Only ACCOUNTADMIN or MOADMIN role can create publications and subscriptions on the Publisher. -- Subscribers are controlled by ACCOUNTADMIN or MOADMIN roles to access subscription data permissions. +- **Publisher (Pub)**Only the ACCOUNTADMIN or MOADMIN role can create publications and subscriptions. +- **Subscriber (Sub)**Access to subscription data is manipulated by the ACCOUNTADMIN or MOADMIN role. -### Publication/Subscription Data Scope +### Publish/Subscribe Data Range -- A single **Publication** can only be associated with one database. -- Publications and subscriptions are only implemented at the database level, with no current support for direct publication and subscription at the table level. -- The **Subscriber** only has read access to the **Subscribed database**. -- If the **Publisher** adjusts the sharing scope of the publication, those accounts that are no longer within the new scope and have already created a subscribed database will find that their access to the **Subscribed database** is invalid. -- If the **Publisher** attempts to delete a database that has been published, the deletion will fail. -- If the **Publisher** deletes a **Publication**, but the corresponding object still exists in the subscribed database, an error will be triggered when the **Subscriber** attempts to access this object. The **Subscriber** will need to delete the corresponding **Subscription**. -- If the **Publisher** deletes a **Published object**, but the corresponding object still exists in the subscribed database, an error will be triggered when the **Subscriber** attempts to access this object. The **Subscriber** must delete the corresponding **Subscribed object**. +- A **publish** can only be associated with a single database. +- Publishing and subscribing is only implemented at the database level; direct table-level publishing and subscribing is not currently supported. +- The **Subscription side** only has read access to the **Subscription library**. +- If **Publisher** adjusts the sharing scope of a publish, those **Subscribers** that are not in the new scope will have invalid access to this **Subscription library** if they have already created a subscription library. +- If the **Publisher** modifies the posting, then the **Subscriber** will see the update without additional action. +- If **Pub** tries to delete a published database, the deletion will not succeed. +- If **Publisher** deletes **Publish** but the corresponding object in the subscription database still exists, accessing this object by **Subscriber (Sub)** will trigger an error and the corresponding **Subscription** will need to be deleted by **Subscriber (Sub)**. +- If the **Publishing side (Pub)** deletes the **Publishing object**, but the corresponding object in the subscription library still exists, then the **Subscribing side (Sub)** accessing this object triggers an error and requires the **Subscribing side (Sub)** to delete the corresponding **Subscription object**. -### Examples +### Publish Subscription Example -![](https://github.com/matrixorigin/artwork/blob/main/docs/develop/pub-sub/example-en.png?raw=true) +This chapter will give an example of how three tenants, sys, acc1, and acc2, currently exist in a MatrixOne cluster, operating on the three tenants in order of operation: -This chapter will give an example to introduce that there are currently three accounts in the MatrixOne cluster, sys, *acc1*, and *acc2*, and operate on the three accounts according to the order of operations: +![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/develop/pub-sub/data-share.png) -1. **Publisher**: sys account creates database *sub1* and table *t1*, and publishes *pub1*: +1. **Publisher**: sys tenant creates database sub1 with table t1 and publishes pub1: ```sql create database sub1; create table sub1.t1(a int,b int); - create publication pub1 database sub; + create publication pub1 database sub1; + mysql> show publications; + +-------------+----------+---------------------+-------------+-------------+----------+ + | publication | database | create_time | update_time | sub_account | comments | + +-------------+----------+---------------------+-------------+-------------+----------+ + | pub1 | sub1 | 2024-04-23 10:28:15 | NULL | * | | + +-------------+----------+---------------------+-------------+-------------+----------+ + 1 row in set (0.01 sec) ``` -2. **Subscriber**: both *acc1* and *acc2* create a subscription database *syssub1*, and thus get the shared table *t1*: +2. **Subscribers**: acc1 and acc2 both create the subscription library syssub1, resulting in the shared data table t1: ```sql - -- The SQL statements for acc1 and acc2 to create the subscription library are the same, so there will not repeat them + -- The all option allows you to see all the subscriptions that you have permission to subscribe to, and the unsubscribed sub_time and sub_name are null, so if you don't add all, you can only see the information that you have already subscribed to. + mysql> show subscriptions all; + +----------+-------------+--------------+---------------------+----------+----------+ + | pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | + +----------+-------------+--------------+---------------------+----------+----------+ + | pub1 | sys | sub1 | 2024-04-23 10:28:15 | NULL | NULL | + +----------+-------------+--------------+---------------------+----------+----------+ + 1 row in set (0.01 sec) + + -- The sql statements for creating a subscription library are the same for both acc1 and acc2, so I won't repeat them here. create database syssub1 from sys publication pub1; use syssub1; - show tables; + + mysql> show subscriptions; + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub1 | sys | sub1 | 2024-04-23 10:28:15 | syssub1 | 2024-04-23 10:35:13 | + +----------+-------------+--------------+---------------------+----------+---------------------+ + 1 row in set (0.00 sec) + mysql> show tables; +--------------------+ | Tables_in_syssub1 | @@ -83,41 +108,60 @@ This chapter will give an example to introduce that there are currently three ac 2 rows in set (0.02 sec) ``` -3. **Publisher**: *sys* account creates table *t2*: +3. **Publisher**: sys tenant create data table t2: ```sql create table sub1.t2(a text); ``` -4. **Subscribers**: *acc1* and *acc2* get shared tables *t1* and *t2*: +4. **Subscribers**: acc1 and acc2 get shared data tables t1 and t2: ```sql - show tables; - +--------------------+ - | Tables_in_syssub1 | - +--------------------+ - | t1 | - +--------------------+ - | t2 | - +--------------------+ - 2 rows in set (0.02 sec) + use syssub1; + mysql> show tables; + +-------------------+ + | Tables_in_syssub1 | + +-------------------+ + | t1 | + | t2 | + +-------------------+ + 2 rows in set (0.01 sec) ``` -5. **Publisher**: *sys* account creates database *sub2* and table *t2*, and publishes *pub2* to accounts *acc1* and *acc3*: +5. **Publisher**: sys tenant creates database sub2 with table t1 and publishes pub2 to tenant acc1 ```sql create database sub2; create table sub2.t1(a float); - create publication pub2 database sub2 account acc1,acc3; + create publication pub2 database sub2 account acc1; ``` -6. **Subscriber**: both *acc1* and *acc2* create the subscription database *syssub2*, and *acc1* gets the shared data table *t1*; *acc2* fails to create the subscription database *syssub2*: +6. **Subscribers**:acc1 and acc2 both create subscription library syssub2,acc1 gets shared data table t1;acc2 fails to create subscription library syssub2: - - *acc1* + - acc1 ```sql + mysql> show subscriptions all; + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub1 | sys | sub1 | 2024-04-23 10:28:15 | syssub1 | 2024-04-23 10:30:43 | + | pub2 | sys | sub2 | 2024-04-23 10:40:54 | NULL | NULL | + +----------+-------------+--------------+---------------------+----------+---------------------+ + 2 rows in set (0.01 sec) + create database syssub2 from sys publication pub2; use syssub2; + + mysql> show subscriptions all; + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub2 | sys | sub2 | 2024-04-23 10:40:54 | syssub2 | 2024-04-23 10:42:31 | + | pub1 | sys | sub1 | 2024-04-23 10:28:15 | syssub1 | 2024-04-23 10:30:43 | + +----------+-------------+--------------+---------------------+----------+---------------------+ + 2 rows in set (0.01 sec) + mysql> show tables; +--------------------+ | Tables_in_syssub2 | @@ -127,24 +171,61 @@ This chapter will give an example to introduce that there are currently three ac 2 rows in set (0.02 sec) ``` - - *acc2* + - acc2 ```sql - create database syssub2 from sys publication pub2; - > ERROR 20101 (HY000): internal error: the account acc3 is not allowed to subscribe the publication pub2 + -- acc2 看不到 pub2,因为没有订阅权限 + mysql> show subscriptions all; + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub1 | sys | sub1 | 2024-04-23 10:28:15 | syssub1 | 2024-04-23 10:35:13 | + +----------+-------------+--------------+---------------------+----------+---------------------+ + 1 row in set (0.01 sec) + + mysql> create database syssub2 from sys publication pub2; + ERROR 20101 (HY000): internal error: the account acc2 is not allowed to subscribe the publication pub2 ``` -7. **Publisher**: The *sys* account modifies and publishes *pub2* to all accounts: +7. **Publisher**: sys tenant modifies publishing pub2 to all tenants: ```sql alter publication pub2 account all; + mysql> show publications; + +-------------+----------+---------------------+---------------------+-------------+----------+ + | publication | database | create_time | update_time | sub_account | comments | + +-------------+----------+---------------------+---------------------+-------------+----------+ + | pub2 | sub2 | 2024-04-23 10:40:54 | 2024-04-23 10:47:53 | * | | + | pub1 | sub1 | 2024-04-23 10:28:15 | NULL | * | | + +-------------+----------+---------------------+---------------------+-------------+----------+ + 2 rows in set (0.00 sec) ``` -8. **Subscriber**: *acc2* successfully created the subscription database *syssub2*, and got the shared data table *t1*: +8. **Subscriber**:acc2 Created subscription library syssub2 successfully with shared data table t1: ```sql + -- acc2 can now see pub2. + mysql> show subscriptions all; + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub1 | sys | sub1 | 2024-04-23 10:28:15 | syssub1 | 2024-04-23 10:35:13 | + | pub2 | sys | sub2 | 2024-04-23 10:40:54 | NULL | NULL | + +----------+-------------+--------------+---------------------+----------+---------------------+ + 2 rows in set (0.00 sec) + create database syssub2 from sys publication pub2; use syssub2; + + mysql> show subscriptions all; + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub2 | sys | sub2 | 2024-04-23 10:40:54 | syssub2 | 2024-04-23 10:50:43 | + | pub1 | sys | sub1 | 2024-04-23 10:28:15 | syssub1 | 2024-04-23 10:35:13 | + +----------+-------------+--------------+---------------------+----------+---------------------+ + 2 rows in set (0.00 sec) + mysql> show tables; +--------------------+ | Tables_in_syssub2 | @@ -154,48 +235,150 @@ This chapter will give an example to introduce that there are currently three ac 2 rows in set (0.02 sec) ``` -9. **Publisher**: *sys* account deletes publication *pub1*: +9. **Publisher**:sys tenant delete publish pub1: ```sql drop publication pub1; + mysql> show publications; + +-------------+----------+---------------------+---------------------+-------------+----------+ + | publication | database | create_time | update_time | sub_account | comments | + +-------------+----------+---------------------+---------------------+-------------+----------+ + | pub2 | sub2 | 2024-04-23 10:40:54 | 2024-04-23 10:47:53 | * | | + +-------------+----------+---------------------+---------------------+-------------+----------+ + 1 row in set (0.00 sec) ``` -10. **Subscriber**: *acc1* failed to connect to *syspub1*: +10. **Subscribers**:acc1,acc2 Connection to syspub1 failed: - ```sql - use syssub1; - ERROR 20101 (HY000): internal error: there is no publication pub1 - ``` + ```sql + mysql> use syssub1; + ERROR 20101 (HY000): internal error: there is no publication pub1 + ``` -11. **Subscriber**: *acc2* delete *syspub1*: +11. **Publisher**: sys tenant creates new database sub1\_new and republishes it as pub1 - ```sql + ```sql + create database sub1_new; + use sub1_new; + create table t3(n1 int); + insert into t3 values (1); + create publication pub1 database sub1_new; + mysql> show publications; + +-------------+----------+---------------------+---------------------+-------------+----------+ + | publication | database | create_time | update_time | sub_account | comments | + +-------------+----------+---------------------+---------------------+-------------+----------+ + | pub2 | sub2 | 2024-04-23 10:40:54 | 2024-04-23 10:47:53 | * | | + | pub1 | sub1_new | 2024-04-23 10:59:11 | NULL | * | | + +-------------+----------+---------------------+---------------------+-------------+----------+ + 2 rows in set (0.00 sec) + ``` + +12. **Subscribers**: acc1, acc2 Connect to syspub1 and see what's new in pub1, meaning if the publisher changes what's published, the subscriber doesn't have to do anything to see the update. + + ```sql + use syssub1; + mysql> show subscriptions; + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub2 | sys | sub2 | 2024-04-23 10:40:54 | syssub2 | 2024-04-23 10:42:31 | + | pub1 | sys | sub1_new | 2024-04-23 10:59:11 | syssub1 | 2024-04-23 10:30:43 | + +----------+-------------+--------------+---------------------+----------+---------------------+ + 2 rows in set (0.01 sec) + + mysql> show tables; + +-------------------+ + | Tables_in_syssub1 | + +-------------------+ + | t3 | + +-------------------+ + 1 row in set (0.01 sec) + + mysql> select * from t3; + +------+ + | n1 | + +------+ + | 1 | + +------+ + 1 row in set (0.01 sec) + ``` + +13. **Subscriber**:acc1 Delete subscription: + + ```sql + -- Remove a subscription by drop database drop database syssub1; - ``` + mysql> show subscriptions; + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub2 | sys | sub2 | 2024-04-23 10:40:54 | syssub2 | 2024-04-23 10:42:31 | + +----------+-------------+--------------+---------------------+----------+---------------------+ + 1 row in set (0.00 sec) + ``` -12. **Publisher**: *sys* account recreates *pub1*: +14. **Publisher**: Before a sys tenant deletes a published database, delete its corresponding publication: - ```sql - create publication pub1 database sub; - ``` + ```sql + mysql> drop database sub1_new; + ERROR 20101 (HY000): internal error: can not drop database 'sub1_new' which is publishing + mysql> drop publication pub1; + Query OK, 0 rows affected (0.00 sec) -13. **Subscriber**: *acc1* connects to *syspub1* successfully: + mysql> drop database sub1_new; + Query OK, 1 row affected (0.03 sec) + ``` + +15. **Publisher**: sys tenant modifies publication: ```sql - create database syssub1 from sys publication pub1; - use syssub1; - mysql> show tables; - +--------------------+ - | Tables_in_syssub1 | - +--------------------+ - | t1 | - +--------------------+ - 2 rows in set (0.02 sec) + alter publication pub2 comment "this is pub2";--alter comments + mysql> show publications; + create database new_sub2; + create table new_sub2.new_t (xxx int); + insert into new_sub2.new_t values (123); + alter publication pub2 database new_sub2;--alter database + mysql> show publications; + +-------------+----------+---------------------+---------------------+-------------+--------------+ + | publication | database | create_time | update_time | sub_account | comments | + +-------------+----------+---------------------+---------------------+-------------+--------------+ + | pub2 | new_sub2 | 2024-04-23 10:40:54 | 2024-04-23 11:04:20 | * | this is pub2 | + +-------------+----------+---------------------+---------------------+-------------+--------------+ + 1 row in set (0.00 sec) ``` -## Reference +16. **Subscribers**: acc1, acc2 View the subscription to see the modified content of the publishing database: + + ```sql + mysql> show subscriptions; + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | + +----------+-------------+--------------+---------------------+----------+---------------------+ + | pub2 | sys | new_sub2 | 2024-04-23 10:40:54 | syssub2 | 2024-04-23 10:42:31 | + +----------+-------------+--------------+---------------------+----------+---------------------+ + 1 row in set (0.00 sec) + + use syssub2; + mysql> show tables; + +-------------------+ + | Tables_in_syssub2 | + +-------------------+ + | new_t | + +-------------------+ + 1 row in set (0.00 sec) + + mysql> select * from new_t; + +------+ + | xxx | + +------+ + | 123 | + +------+ + 1 row in set (0.00 sec) + ``` + +## Reference Documents -### Publisher Reference +### Publisher Reference Documentation - [CREATE PUBLICATION](../../Reference/SQL-Reference/Data-Definition-Language/create-publication.md) - [ALTER PUBLICATION](../../Reference/SQL-Reference/Data-Definition-Language/alter-publication.md) @@ -203,7 +386,7 @@ This chapter will give an example to introduce that there are currently three ac - [SHOW PUBLICATIONS](../../Reference/SQL-Reference/Other/SHOW-Statements/show-publications.md) - [SHOW CREATE PUBLICATION](../../Reference/SQL-Reference/Other/SHOW-Statements/show-create-publication.md) -### Subscriber Reference +### Subscriber Reference Documents - [CREATE...FROM...PUBLICATION...](../../Reference/SQL-Reference/Data-Definition-Language/create-subscription.md) - [SHOW SUBSCRIPTIONS](../../Reference/SQL-Reference/Other/SHOW-Statements/show-subscriptions.md) diff --git a/docs/MatrixOne/Develop/Vector/cluster_centers.md b/docs/MatrixOne/Develop/Vector/cluster_centers.md new file mode 100644 index 000000000..a3b75edb2 --- /dev/null +++ b/docs/MatrixOne/Develop/Vector/cluster_centers.md @@ -0,0 +1,169 @@ +# Cluster Center + +## What is a Cluster Center + +When using clustering algorithms, especially K-means, the number of clusters K represents the number of clusters into which you want to divide the data set. Each cluster is represented by its centroid, which is the central point or average position of all data points within the cluster. + +In the K-means algorithm, the choice of K has a great influence on the clustering results. Choosing the right K value can help you better understand the structure and pattern of your data. If the K value is not chosen properly, it can cause the following problems: + +- The K value is too small: it can cause different clusters to merge together, losing important patterns in the data. +- K-values are too large: may result in over-segmentation of data, with each cluster containing very few data points, which may mask general trends in the data. + +Matrixone provides a cluster center query to determine the K cluster centers of a vector column. + +## Application scenarios for clustering centers + +Clustering plays an important role in data analysis and machine learning. Here are some of the main application scenarios for clustering centers: + +- Market segmentation: In market analysis, clustering centers can help identify different customer group characteristics to customize marketing strategies for each group. + +- Image segmentation: In image processing, clustering centers are used to distinguish different regions or objects in an image, often used for image compression and segmentation. + +- Social network analysis: Cluster centers allow the identification of groups of users with similar behaviors or interests in social networks. + +- Anomaly detection: Cluster centers can help identify anomalies in the data, as anomalies are often far from all cluster centers. + +- Astronomical data analysis: In astronomy, clustering centers can be used to identify the characteristics of clusters of galaxies or star clusters. + +## involve algorithms + +Determining the clustering center of a vector dataset in Matrixone involves the following algorithms: + +- Random: In random initialization, the algorithm randomly selects n\_clusters of observations from the data set as the initial centroid. This method is simple and fast, but may result in the quality of the clustering results depending on the selection of the initial centroid, as random selection may not fall in a dense region of the data. + +- K-means++ (k-means++ initialization): k-means++ is a more advanced initialization method designed to improve the inadequacy of random initialization by selecting the initial centroid through a multi-step process to increase the probability that the selected centroid points represent the overall distribution of data. + +- Regular Kmeans: A widely used clustering method designed to divide data points into K clusters so that data points within clusters are as similar as possible and data points between clusters as different as possible. This method measures similarity between data points based on Euclidean distance, so it is better suited for processing data in plane space. + +- Spherical Kmeans: An algorithm for clustering data points. The process of calculating the center of a cluster by the Spherical K-means algorithm involves normalizing the data points. Especially suitable for high-dimensional and sparse high-dimensional and sparse, or data where the directionality of the data points is more important than the distance, such as text data, geographic location or user interest models. + +## Examples + +### Example 1 + +Suppose we have annual shopping data for a set of customers, including their annual income and total annual consumption. We want to use this data to understand our customers' consumption behavior and break it down into different consumer behavior groups. + +#### Steps + +1. Create customer table and insert data + + Prepare a table named `customer_table` that inserts 10 pieces of customer data. The two-dimensional vector represents the customer's annual revenue and total annual consumption. + + ```sql + CREATE TABLE customer_table(id int auto_increment PRIMARY KEY,in_ex vecf64(2)); + INSERT INTO customer_table(in_ex) VALUES("[120,50]"),("[80,25]"),("[200,100]"),("[100,40]"),("[300,120]"),("[150,75]"),("[90,30]"),("[250,90]"),("[75,20]"),("[150,60]"); + + mysql> select * from customer_table; + +------+------------+ + | id | in_ex | + +------+------------+ + | 1 | [120, 50] | + | 2 | [80, 25] | + | 3 | [200, 100] | + | 4 | [100, 40] | + | 5 | [300, 120] | + | 6 | [150, 75] | + | 7 | [90, 30] | + | 8 | [250, 90] | + | 9 | [75, 20] | + | 10 | [150, 60] | + +------+------------+ + 10 rows in set (0.01 sec) + ``` + +2. Determining the Cluster Center + + ```sql + mysql> SELECT cluster_centers(in_ex kmeans '2,vector_l2_ops,random,false') AS centers FROM customer_table; + +------------------------------------------------------------------------+ + | centers | + +------------------------------------------------------------------------+ + | [ [109.28571428571428, 42.857142857142854],[250, 103.33333333333333] ] | + +------------------------------------------------------------------------+ + 1 row in set (0.00 sec) + ``` + +3. Check Cluster Center + + A good cluster usually appears as a distinctly separated group in the visualization. As can be seen from the figure below, the cluster center selection is more appropriate. + +
+ +
+ +By identifying cluster centers, we can divide our customers into two groups: those with middle income and middle consumption levels (cluster center A) and those with higher income and higher consumption levels (cluster center B). Merchants can tailor their product positioning to each group's consumption characteristics, such as offering better value for money for Cluster Center A and high-end or luxury brands for Cluster Center B. + +### Example 2 + +A music streaming service wants to divide users into groups based on their preferences for different music types in order to offer personalized playlists. They collected user preference ratings (1 for disinterested and 5 for very fond) for five music types: rock, pop, jazz, classical, and hip-hop. + +#### Steps + +1. Build music type table and insert data + + Prepare a table named `music_table` that inserts 5 pieces of user data. The five-dimensional vectors correspond to user preference scores for five genres of music: rock, pop, jazz, classical and hip-hop. + + ```sql + CREATE TABLE music_table(id int,grade vecf64(5)); + INSERT INTO music_table VALUES(1,"[5,2,3,1,4]"),(2,"[3,5,2,1,4]"),(3,"[4,3,5,1,2]"),(4,"[2,5,4,3,1]"),(5,"[5,4,3,2,5]"); + + mysql> select * from music_table; + +------+-----------------+ + | id | grade | + +------+-----------------+ + | 1 | [5, 2, 3, 1, 4] | + | 2 | [3, 5, 2, 1, 4] | + | 3 | [4, 3, 5, 1, 2] | + | 4 | [2, 5, 4, 3, 1] | + | 5 | [5, 4, 3, 2, 5] | + +------+-----------------+ + 5 rows in set (0.01 sec) + ``` + +2. View vector normalization results + + ```sql + mysql> select normalize_l2(grade) from music_table; + +---------------------------------------------------------------------------------------------------------+ + | normalize_l2(grade) | + +---------------------------------------------------------------------------------------------------------+ + | [0.6741998624632421, 0.26967994498529685, 0.40451991747794525, 0.13483997249264842, 0.5393598899705937] | + | [0.40451991747794525, 0.6741998624632421, 0.26967994498529685, 0.13483997249264842, 0.5393598899705937] | + | [0.5393598899705937, 0.40451991747794525, 0.6741998624632421, 0.13483997249264842, 0.26967994498529685] | + | [0.26967994498529685, 0.6741998624632421, 0.5393598899705937, 0.40451991747794525, 0.13483997249264842] | + | [0.562543950463012, 0.4500351603704096, 0.3375263702778072, 0.2250175801852048, 0.562543950463012] | + +---------------------------------------------------------------------------------------------------------+ + 5 rows in set (0.01 sec) + ``` + +3. Determining the Cluster Center + + ```sql + mysql> SELECT cluster_centers(grade kmeans '2,vector_l2_ops,kmeansplusplus,true') AS centers FROM music_table; + +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | centers | + +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | [ [0.3370999312316211, 0.6741998624632421, 0.40451991747794525, 0.26967994498529685, 0.3370999312316211],[0.5920345676322826, 0.3747450076112172, 0.4720820500729982, 0.16489917505683388, 0.4571945951396342] ] | + +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + 1 row in set (0.00 sec) + ``` + +4. Check Cluster Center + + Use t-SNE to reduce high-dimensional data to 2D and visualize clustering results. As can be seen from the figure below, the data points are clearly separated by cluster centers in the space after dimension reduction, which increases confidence in the correctness of the cluster centers. + +
+ +
+ +By determining the cluster centers, we can divide users into two groups: Cluster 1 is primarily composed of users who prefer rock and hip hop music, which may represent a group of users seeking modern and rhythmic music. Cluster 2 is composed of users who prefer pop and jazz music, which may represent a group of users who prefer melodic and relaxed atmosphere music. Media companies can push out styles of music for users based on their preferences. + +## Reference Documents + +[Vector data type](../../Reference/Data-Types/vector-type.md) + +[CLUSTER_CENTERS()](../../Reference/Functions-and-Operators/Vector/cluster_centers.md) + +[L2_DISTANCE()](../../Reference/Functions-and-Operators/Vector/l2_distance.md) + +[NORMALIZE_L2()](MatrixOne/Reference/Functions-and-Operators/Vector/normalize_l2.md) diff --git a/docs/MatrixOne/Develop/Vector/vector_search.md b/docs/MatrixOne/Develop/Vector/vector_search.md new file mode 100644 index 000000000..7ad105b22 --- /dev/null +++ b/docs/MatrixOne/Develop/Vector/vector_search.md @@ -0,0 +1,79 @@ +# Vector retrieval + +## What is vector retrieval + +Vector retrieval is the retrieval of K vectors (K-Nearest Neighbor, KNN) that are close to the query vectors in a given vector dataset by some measure. This is a technique for finding vectors similar to a given query vector in large-scale high-dimensional vector data. Vector retrieval has a wide range of applications in many AI fields, such as image retrieval, text retrieval, speech recognition, recommendation systems, and more. Vector retrieval is very different from traditional database retrieval.Scalar search on traditional database mainly targets structured data for accurate data query,while vector search mainly targets vector data after vectorization of unstructured data for similar retrieval,which can only approximate the best match. + +
+ +
+ +Matrixone currently supports vector retrieval using the following distance measure functions: + +- Cosine similarity function [`cosine_similarity`](../../Reference/Functions-and-Operators/Vector/cosine_similarity.md) +- Cosine distance function [`cosine_distance`](../../Reference/Functions-and-Operators/Vector/cosine_distance.md) +- L2 distance function [`l2_distance`](../../Reference/Functions-and-Operators/Vector/l2_distance.md) + +!!! note Matrixone currently only supports fast KNN queries using vector indexes on the l2\_distance measure. + +## Application scenarios for vector retrieval + +Having vector capability in a database means that the database system has the ability to store, query, and analyze vector data. These vectors are often associated with complex data analysis, machine learning, and data mining tasks. Here are some application scenarios where the database has vector processing power: + +- **Generative AI applications**: These databases can serve as the backend for generative AI applications, enabling them to obtain nearest neighbor results based on user-supplied queries, improving output quality and relevance. +- **Advanced object recognition**: They are invaluable for developing advanced object recognition platforms that recognize similarities between different data sets. This has practical applications in areas such as plagiarism detection, facial recognition and DNA matching. +- **Personalized recommendation systems**: Vector databases can enhance recommendation systems by integrating user preferences and choices. This will result in more accurate and targeted recommendations that improve the user experience and engagement. +- **Anomaly detection**: A vector database can be used to store feature vectors representing normal behavior. The anomaly can then be detected by comparing the input vector with the storage vector. This is useful in cybersecurity and industrial quality control. +- **Marketing optimization**: Through the analysis and mining of user data, vector database can realize personalized recommendations, customer segmentation and market trend forecasting, and other functions to provide enterprises with accurate marketing strategies. +- **Natural language processing**: Vector database can process large-scale text data, realize semantic similarity search, text classification, document clustering and other natural language processing tasks, widely used in intelligent customer service, public opinion analysis and other fields. +- **Semantic Search and** Retrieval: In applications involving large language models, vector databases can store and retrieve massive amounts of text vectors, and intelligent text matching and semantic search can be achieved by calculating similarities between vectors. + +## Examples + +The Iris dataset is a well-known multi-class taxonomic dataset that can be searched and downloaded online by itself. This dataset contains 150 samples divided into 3 categories: Iris Setosa (mountain iris), Iris Versicolour (chromatic iris) and Iris Virginica (virginian iris). Each sample has 4 characteristics: sepal length, sepal width, petal length, and petal width. Below we perform a KNN query (based on l2\_distance) on the Iris dataset to determine the type of sample by identifying the K samples that most closely resemble a particular sample based on its characteristics. + +1. Create Iris tables and import data + + Prepare a table named `iris_table` and the corresponding Iris dataset data. The dataset has 150 rows of data, each row consisting of a four-dimensional eigenvector and species. + + ```sql + CREATE TABLE iris_table( + species varchar(100), --category + attributes vecf64(4) --feature + ); + LOAD DATA INFILE '/your_path/iris.csv' INTO TABLE iris_table; + ``` + +2. Use KNN to predict the category of this input feature + + ```sql + mysql> select * from iris_table order by l2_distance(attributes,"[4,3.3,3,0.9]") asc limit 1; + +------------------+--------------------+ + | species | attributes | + +------------------+--------------------+ + | Iris-versicolour | [4.9, 2.4, 3.3, 1] | + +------------------+--------------------+ + 1 row in set (0.00 sec) + + mysql> select * from iris_table order by l2_distance(attributes,"[4,3.3,3,0.9]") asc limit 5; + +------------------+----------------------+ + | species | attributes | + +------------------+----------------------+ + | Iris-versicolour | [4.9, 2.4, 3.3, 1] | + | Iris-versicolour | [5.1, 2.5, 3, 1.1] | + | Iris-versicolour | [5, 2.3, 3.3, 1] | + | Iris-setosa | [4.8, 3.4, 1.9, 0.2] | + | Iris-versicolour | [5.2, 2.7, 3.9, 1.4] | + +------------------+----------------------+ + 5 rows in set (0.00 sec) + ``` + +After searching, we can roughly determine that the sample type is discolored Iris. + +To understand the role of vector retrieval in building RAG applications, refer to the [RAG Application Foundation](../../Tutorial/rag-demo.md) example in the View Application Development Example. + +## Reference Documents + +[Vector data type](../../Reference/Data-Types/vector-type.md) + +[L2_DISTANCE()](../../Reference/Functions-and-Operators/Vector/l2_distance.md) diff --git a/docs/MatrixOne/Develop/Vector/vector_type.md b/docs/MatrixOne/Develop/Vector/vector_type.md new file mode 100644 index 000000000..04c00b538 --- /dev/null +++ b/docs/MatrixOne/Develop/Vector/vector_type.md @@ -0,0 +1,53 @@ +# Vector Type + +## What is a vector? + +In a database, vectors are usually a set of numbers that are arranged in a particular way to represent some data or feature. These vectors can be one-dimensional arrays, multi-dimensional arrays, or data structures with higher dimensions. In machine learning and data analysis, vectors are used to represent data points, features, or model parameters. They are typically used to process unstructured data, such as pictures, speech, text, etc., to transform the unstructured data into embedding vectors through machine learning models and subsequently process and analyze the data. + +
+ +
+ +## Matrixone support vector type + +Traditional vector databases are specially designed to process high-dimensional vector data,which are basically unstructured and have certain limitations. They may not provide as rich support for non-vector fields (e.g., metadata or text descriptions) as traditional relational databases, and lack the ability to handle complex data relationships and transactions, as well as insufficient functionality for data integrity constraints and metadata management. Therefore, vector databases may not be suitable for scenarios requiring complex queries, diverse data type support, or strong data consistency guarantees. + +MatrixOne, as a relational database with vector capabilities, provides powerful data management capabilities. MatrixOne combines the transactional consistency, data integrity, ease of integration, and rich tool ecosystem of traditional relational databases, while adding the ability to store high-dimensional vector data and efficiently search for similarities. This combination enables databases to uniformly manage and query structured and unstructured data, supporting complex AI and machine learning applications while maintaining data security and governance, reducing maintenance costs and system complexity, and providing flexible and comprehensive data solutions for modern applications. + +Matrixone currently supports vectors of type `float32` and `float64`, called `vecf32` and `vecf64` respectively, and does not support numbers of type string and integer. + +## Best Practices + +- **Vector type conversion**: When converting a vector from one type to another, it is recommended to specify both dimensions. For example: + + ```sql + SELECT b + CAST("[1,2,3]" AS vecf32(3)) FROM t1; + ``` + + This approach ensures accuracy and consistency in vector type conversion. + +- **Use binary format**: To improve overall insertion performance, consider using binary format instead of text format. Make sure the array is in small end-order format before converting to hexadecimal encoding. The following is sample Python code: + + ```python + import binascii + # 'value' is a NumPy object + def to\_binary(value): if value is None: return value + + # small endian floating point array + value = np.asarray(value, dtype=' SELECT + -> behavior, + -> occur_year, + -> SUM(BITMAP_COUNT(bitmap)) + -> FROM precompute + -> GROUP BY behavior,occur_year; ++----------+------------+---------------------------+ +| behavior | occur_year | sum(bitmap_count(bitmap)) | ++----------+------------+---------------------------+ +| browser | 2022 | 939995 | +| browser | 2023 | 1003173 | +| purchase | 2022 | 669474 | +| purchase | 2023 | 660605 | +| returns | 2023 | 4910 | +| returns | 2022 | 4350 | ++----------+------------+---------------------------+ +6 rows in set (0.01 sec) + +mysql> select behavior,occur_year,count(distinct user_id) from user_behavior_table group by behavior,occur_year; ++----------+------------+---------------------------+ +| behavior | occur_year | sum(bitmap_count(bitmap)) | ++----------+------------+---------------------------+ +| browser | 2022 | 939995 | +| browser | 2023 | 1003173 | +| purchase | 2022 | 669474 | +| purchase | 2023 | 660605 | +| returns | 2023 | 4910 | +| returns | 2022 | 4350 | ++----------+------------+---------------------------+ +6 rows in set (3.26 sec) +``` + +Calculate the number of users viewing, purchasing, and returning items from 2022-2023. + +```sql +mysql> SELECT behavior, SUM(cnt) FROM ( + -> SELECT + -> behavior, + -> BITMAP_COUNT(BITMAP_OR_AGG(bitmap)) cnt + -> FROM precompute + -> GROUP BY behavior,bucket + -> ) + -> GROUP BY behavior; ++----------+----------+ +| behavior | sum(cnt) | ++----------+----------+ +| browser | 1003459 | +| purchase | 780308 | +| returns | 9260 | ++----------+----------+ +3 rows in set (0.01 sec) + +mysql> select behavior,count(distinct user_id) from user_behavior_table group by behavior; ++----------+-------------------------+ +| behavior | count(distinct user_id) | ++----------+-------------------------+ +| browser | 1003459 | +| purchase | 780308 | +| returns | 9260 | ++----------+-------------------------+ +3 rows in set (1.44 sec) +``` + +It is obviously more efficient to use `BITMAP` when comparing the return times of the two queries. By using `BITMAP`, merchants can quickly filter out specific types of events to count the total number of users with a certain behavior. + +## Reference Documents + +- [BITMAP](../../Reference/Functions-and-Operators/Aggregate-Functions/bitmap.md) +- [COUNT](../../Reference/Functions-and-Operators/Aggregate-Functions/count.md) diff --git a/docs/MatrixOne/Develop/distinct-data/count-distinct.md b/docs/MatrixOne/Develop/distinct-data/count-distinct.md new file mode 100644 index 000000000..58dde17a2 --- /dev/null +++ b/docs/MatrixOne/Develop/distinct-data/count-distinct.md @@ -0,0 +1,59 @@ +# Deduplication of data using COUNT(DISTINCT) + +`COUNT (DISTINCT)` provides accurate deduplication results, but may be less efficient on large data sets. Use [BITMAP](bitmap.md) for large data sets. + +This article explains how to dedupe small amounts of data using `COUNT (DISTINCT)`. + +## Prepare before you start + +Completed [standalone deployment of](../../Get-Started/install-standalone-matrixone.md) MatrixOne. + +## Examples + +```sql +--Create an orders table with two fields, customer_id and product_id, which represent the unique identifiers of the customer and the product, respectively. +CREATE TABLE orders ( + order_id INT AUTO_INCREMENT PRIMARY KEY, + customer_id INT, + product_id INT, + order_date DATE, + quantity INT +); + +--Insert some sample data: +INSERT INTO orders (customer_id, product_id, order_date, quantity) +VALUES + (1, 101, '2023-04-01', 2), + (1, 102, '2023-04-02', 1), + (2, 101, '2023-04-03', 5), + (3, 103, '2023-04-04', 3), + (2, 104, '2023-04-05', 1), + (4, 101, '2023-04-06', 2), + (4, 102, '2023-04-07', 1), + (5, 105, '2023-04-08', 4), + (1, 101, '2023-04-09', 2); + +--Calculate the number of different customers: +mysql> SELECT COUNT(DISTINCT customer_id) AS unique_customer_count FROM orders; ++-----------------------+ +| unique_customer_count | ++-----------------------+ +| 5 | ++-----------------------+ +1 row in set (0.01 sec) + +--Calculate the quantities of different products: +mysql> SELECT COUNT(DISTINCT product_id) AS unique_product_count FROM orders; ++----------------------+ +| unique_product_count | ++----------------------+ +| 5 | ++----------------------+ +1 row in set (0.01 sec) +``` + +The two queries return the number of unique customers and the number of unique products in the orders table, respectively. This information is useful for analyzing customer diversity and product range. + +## Reference Documents + +- [COUNT](../../Reference/Functions-and-Operators/Aggregate-Functions/count.md) diff --git a/docs/MatrixOne/Develop/export-data/modump.md b/docs/MatrixOne/Develop/export-data/modump.md index 83563b4f9..68640ec59 100644 --- a/docs/MatrixOne/Develop/export-data/modump.md +++ b/docs/MatrixOne/Develop/export-data/modump.md @@ -1,93 +1,189 @@ -# Export data by MODUMP +# The mo-dump tool writes out -There are two methods to export data with MatrixOne: +MatrixOne supports two ways to export data: - `SELECT INTO...OUTFILE` - `mo-dump` -This document will introduce about how to export data with `mo-dump`. +This document focuses on how to export data using `mo-dump`. -## What is `mo-dump` +## What is a mo-dump -Like `mysqldump`, MatrixOne has a client utility tool called `mo-dump` that can perform backups of a MatrixOne database by exporting a ".sql" file type that contains SQL statements can be executed to recreate the original database. +`mo-dump` is a client-side utility for MatrixOne that, like `mysqldump`, can be used to back up a MatrixOne database by exporting a file of type `.sql` containing SQL statements executable to recreate the original database. -To use the `mo-dump` tool, you must have access to a server running an instance of MatrixOne. You must also have user credentials with the required privileges for the database you want to export. +With the `mo-dump` tool, you must have access to the server running the MatrixOne instance. You must also have user rights to the exported database. -### Syntax +## mo-dump syntax structure +```bash +./mo-dump -u ${user} -p ${password} \ + -h ${host} -P ${port} -db ${database}\ + [--local-infile=true] [-csv]\ + [-no-data] [-tbl ${table}...]\ + -net-buffer-length ${net-buffer-length} > {importStatement.sql} ``` -./mo-dump -u ${user} -p ${password} -h ${host} -P ${port} -db ${database} [--local-infile=true] [-csv] [-tbl ${table}...] -net-buffer-length ${net-buffer-length} > {dumpfilename.sql} -``` -The parameters are as following: +**Parameter interpretation** + +- **-u \[user]**: Username to connect to the MatrixOne server. Only users with database and table read access can use the `mo-dump` utility, which defaults to `dump`. + +- **-p \[password]**: Valid password for the MatrixOne user. Default value: `111`. + +- **-h \[host]**: Host IP address of the MatrixOne server. Default value: `127.0.0.1`. + +- **-P \[port]**: Port of the MatrixOne server. Default value: `6001`. + +- **-db \[databaseName]**: Required parameter. The name of the database to back up. Multiple databases can be specified, separated by `,` database names. -- **-u [user]**: It is a username to connect to the MatrixOne server. Only the users with database and table read privileges can use `mo-dump` utility. Default value: dump +- **-net-buffer-length \[packet size]**: Packet size, the total size of SQL statement characters. Packets are the basic unit of SQL exported data. If parameters are not set, the default is 1048576 Byte(1M) and the maximum is 16777216 Byte(16M). If the parameter here is set to 16777216 Byte(16M), then when data larger than 16M is to be exported, the data is split into multiple 16M packets, all but the last of which are 16M in size. -- **-p [password]**: The valid password of the MatrixOne user. Default value: 111 +- **-csv**: The default is false. When set to true means that the exported data is in csv format, the generated database and table structure and imported SQL statements are saved in the generated sql file, and the data is exported to the generated `${databaseName}_${tableName}.csv` file in the current directory. -- **-h [host]**: The host ip address of MatrixOne server. Default value: 127.0.0.1 +- **--local-infile**: The default is true and takes effect only when the parameter -csv is set to true. LOAD DATA LOCAL INFILE in the sql file script output by mo-dump when the parameter is true. LOAD DATA INFILE in the sql file script output by mo-dump when the argument is false. -- **-P [port]**: The port of MatrixOne server. Default value: 6001 +- **-tbl \[tableName]**: Optional argument. If the argument is empty, the entire database is exported. If you want to back up the specified table, you can add the parameters `-tbl` and `tableName` to the command. If multiple tables are specified, the table names are separated by `,` . -- **-db [database name]**: Required parameter. Name of the database that you want to take backup. +- **-no-data**: The default is false. When set to true means no data is exported, only the table structure. -- **-net-buffer-length [packet size]**: Packet size, the total size of the characters in the SQL statement. The data packet is the basic unit of SQL exported data. If no parameter is set, the default is 1048576 Byte (1M), and the maximum can be set to 16777216 Byte (16M). If the parameter here is set to 16777216 Byte (16M), then when the data larger than 16M is to be exported, the data will be split into multiple 16M data packets, except for the last data packet, the size of other data packets is 16M. +- **> {importStatement.sql}**: Stores the output SQL statement in the file *importStatement.sql*, otherwise outputs it on the screen. -- **-csv**: The default value is false. The exported data is in *CSV* format when set to true. +## Install the mo-dump tool -- **--local-infile**: The default value is true and only takes effect when the parameter **-csv** is set to true. Indicates support for native export of *CSV* files. +Download mode one and download mode two require the download tool wget or curl to be installed first. If you do not have it installed, install the download tool yourself first. -- **-tbl [table name]**: Optional parameter. If the parameter is empty, the whole database will be exported. If you want to take the backup specific tables, then you can specify multiple `-tbl` and table names in the command. +- Install under macOS -## Build the mo-dump binary +=== "**Download Method One: `The wget` tool downloads binary packages**" -To use `mo-dump` utility, we need to build the tool first. `mo-dump` is embedded in the MatrixOne source code. You can build the binary from the source code. + x86 Architecture System Installation Package: -__Tips:__ Same as MatrixOne `mo-dump` is written by Golang, building it will require a Golang installation and environment setting. + ``` + wget https://github.com/matrixorigin/mo_dump/releases/download/1.0.0/mo-dump-1.0.0-darwin-x86_64.zip + unzip mo-dump-1.0.0-darwin-x86_64.zip + ``` -1. Execute the following code to build the `mo-dump` binary from the MatrixOne source code: + ARM Architecture System Installation Package: + + ``` + wget https://github.com/matrixorigin/mo_dump/releases/download/1.0.0/mo-dump-1.0.0-darwin-arm64.zip + unzip mo-dump-1.0.0-darwin-arm64.zip + ``` + + If the original github address downloads too slowly, you can try downloading the mirror package from: ``` - git clone https://github.com/matrixorigin/matrixone.git - cd matrixone - make build modump + wget https://githubfast.com/matrixorigin/mo_dump/releases/download/1.0.0/mo-dump-1.0.0-darwin-xxx.zip ``` +=== "**Download mode two: `curl` tool downloads binary packages**" -2. Then you can find the `mo-dump` executable file in the MatrixOne folder. + x86 Architecture System Installation Package: -!!! note - This built `mo-dump` file can also work in a same hardware platform. But a binary built in a x86 platform will not work correctly in a darwin ARM platform. The best practice is to build and use the binary file within the same operating system and hardware platform. `mo-dump` only supports Linux and macOS for now. + ``` + curl -OL https://github.com/matrixorigin/mo_dump/releases/download/1.0.0/mo-dump-1.0.0-darwin-x86_64.zip + unzip mo-dump-1.0.0-darwin-x86_64.zip + ``` -## Steps to Export your MatrixOne Database using `mo-dump` + ARM Architecture System Installation Package: -`mo-dump` is easy to use with the command line. Here are the steps to take to export a complete database in the form of SQL commands: + ``` + curl -OL https://github.com/matrixorigin/mo_dump/releases/download/1.0.0/mo-dump-1.0.0-darwin-arm64.zip + unzip mo-dump-1.0.0-darwin-arm64.zip + ``` -Open up a command line or terminal window on your computer, then verify that from this terminal you can connect to your MatrixOne instance, enter this command to export the database: + If the original github address downloads too slowly, you can try downloading the mirror package from: -``` -./mo-dump -u username -p password -h host_ip_address -P port -db database > exporteddb.sql -``` + ``` + curl -OL https://githubfast.com/matrixorigin/mo_dump/releases/download/1.0.0/mo-dump-1.0.0-darwin-xxx.zip + ``` -For example, if you are launching the terminal in the same server as the MatrixOne instance, and you want to generate the backup of the single database, run the following command. The command will generate the backup of the "**t**" database with structure and data in the `t.sql` file. The `t.sql` file will be located in the same directory as your `mo-dump` executable. +- Install under Linux -``` -./mo-dump -u root -p 111 -h 127.0.0.1 -P 6001 -db t > t.sql -``` +=== "**Download Method One: `The wget` tool downloads binary packages**" + + x86 Architecture System Installation Package: + + ``` + wget https://github.com/matrixorigin/mo_dump/releases/download/1.0.0/mo-dump-1.0.0-linux-x86_64.zip + unzip mo-dump-1.0.0-linux-x86_64.zip + ``` + + ARM Architecture System Installation Package: + + ``` + wget https://github.com/matrixorigin/mo_dump/releases/download/1.0.0/mo-dump-1.0.0-linux-arm64.zip + unzip mo-dump-1.0.0-linux-arm64.zip + ``` + + If the original github address downloads too slowly, you can try downloading the mirror package from: + + ``` + wget https://githubfast.com/matrixorigin/mo_dump/releases/download/1.0.0/mo-dump-1.0.0-linux-xxx.zip + ``` +=== "**Download mode two: `curl` tool downloads binary packages**" + + x86 Architecture System Installation Package: + + ``` + curl -OL https://github.com/matrixorigin/mo_dump/releases/download/1.0.0/mo-dump-1.0.0-linux-x86_64.zip + unzip mo-dump-1.0.0-linux-x86_64.zip + ``` -If you want to export the tables in the database *t* to *CSV* format, refer to the following command: + ARM Architecture System Installation Package: + ``` + curl -OL https://github.com/matrixorigin/mo_dump/releases/download/1.0.0/mo-dump-1.0.0-linux-arm64.zip + unzip mo-dump-1.0.0-linux-arm64.zip + ``` + + If the original github address downloads too slowly, you can try downloading the mirror package from: + + ``` + curl -OL https://githubfast.com/matrixorigin/mo_dump/releases/download/1.0.0/mo-dump-1.0.0-linux-xxx.zip + ``` +!!! note Due to limitations of the linux kernel, mo-dump may not function properly on OS with lower kernels (less than 5.0), at which point you need to upgrade your kernel version. + +## How to export a MatrixOne database using `mo-dump` + +`mo-dump` is very easy to use from the command line. Open a terminal window on your local computer, go to the unzipped mo\_dump folder directory, locate the `mo-dump` executable: *mo-dump*, enter the following command, connect to MatrixOne, and export the database: + +``` bash +./mo-dump -u username -p password -h host_ip_address -P port -db database > importStatement.sql ``` -./mo-dump -u root -p 111 -db t -csv --local-infile=false > ttt.csv + +## Examples + +**Example 1** + +If you start the terminal in the same server as the MatrixOne instance and you want to generate a single or multiple database and a backup of all the tables in it, run the following command. This command will generate a backup **of the mydb1** and **mydb2** databases and the structure and data of the tables in the *importMydb.sql* file. The *importMydb.sql* file is saved in the current directory: + +```bash +./mo-dump -u root -p 111 -h 127.0.0.1 -P 6001 -db mydb1,mydb2 > importMydb.sql ``` -If you want to generate the backup of a single table in a database, run the following command. The command will generate the backup of the `t1` table of `t` database with structure and data in the `t.sql` file. +**Example 2** +If you want to export data from tables within database *mydb* to *CSV* format, the data from all tables in database *mydb* will be exported in the current directory in the format `${databaseName}_${tableName}.csv` and the generated database and table structure and imported SQL statements will be saved in the *mydb.sql* file: + +```bash +./mo-dump -u root -p 111 -h 127.0.0.1 -P 6001 -db mydb -csv > mydb.sql ``` -./mo-dump -u root -p 111 -db t -tbl t1 > t1.sql + +**Example 3** + +If you want to specify in the database to generate a backup of a table or tables, you can run the following command. This command will generate a structural and data backup of the *t1* and *t2* tables in database *db1*, saved in the *tab2.sql* file. + +```bash + ./mo-dump -u root -p 111 -db db1 -tbl t1,t2 > tab2.sql ``` -## Constraints +**Example 4** + +If you want a structural backup of a table or tables in the database, you can run the following command. This command will generate the structure of the *t1* and *t2* tables in database *db1*, saved in the *tab\_nodata.sql* file. + +``` bash +./mo-dump -u root -p 111 -db db1 -no-data -tbl t1,t2 > tab_nodata.sql +``` -* `mo-dump` only supports exporting the backup of a single database, if you have several databases to backup, you need to manually run `mo-dump` for several times. +## Limitations -* `mo-dump` doesn't support exporting only the structure or data of databases. If you want to generate the backup of the data without the database structure or vise versa, you need to manually split the `.sql` file. +* `mo-dump` does not yet support exporting data only. If you want to generate a backup of your data without a database and table structure, then you need to manually split the `.sql` file. diff --git a/docs/MatrixOne/Develop/read-data/window-function.md b/docs/MatrixOne/Develop/read-data/window-function.md deleted file mode 100644 index 033ae5c45..000000000 --- a/docs/MatrixOne/Develop/read-data/window-function.md +++ /dev/null @@ -1,77 +0,0 @@ -# Window Function - -Window Function (Window Function) is a unique function that can perform calculation operations on a specific window (Window) of the query result set. Window functions can be used to group, sort, and aggregate the result set and calculate the relevant value of each row of data within each window without changing the number of rows in the result set. That is, the result set can be flexibly analyzed and processed through the window function without introducing additional subqueries or join operations. - -SQL window functions have a wide range of applications in various business scenarios: - -1. **Intra-row comparison**: Compare a specific value of each row with other rows in the same group, such as calculating the difference between each employee's salary and the department's average salary. At this time, you can use window functions. - -2. **Data ranking**: The window function can quickly generate data ranking information. For example, you can use the `RANK()` or `ROW_NUMBER()` function to check the sales ranking. - -3. **Rolling Calculation**: Calculate the moving average. You can define the window range of the window function and then perform rolling calculations. - -## List of window functions - -- Most aggregate functions can also be used as window functions, for example, `SUM()`, `AVG()`, and `COUNT()`. These aggregate functions can be used with window functions to calculate the value of a column within a window Sum, average, or count. For aggregate functions and reference documents supported by MatrixOne that can be used as window functions, see: - - * [AVG](../../Reference/Functions-and-Operators/Aggregate-Functions/avg.md) - * [COUNT](../../Reference/Functions-and-Operators/Aggregate-Functions/count.md) - * [MAX](../../Reference/Functions-and-Operators/Aggregate-Functions/max.md) - * [SUM](../../Reference/Functions-and-Operators/Aggregate-Functions/sum.md) - * [MIN](../../Reference/Functions-and-Operators/Aggregate-Functions/min.md) - -- See the table below for other window functions: - -|Function name|description| -|---|---| -|[DENSE_RANK()](../../Reference/Functions-and-Operators/Window-Functions/dense_rank.md)| Used to assign ranks to rows in a dataset, always assigning consecutive ranks to the next value, even if previous values ​​have the same rank. | -|[RANK()](../../Reference/Functions-and-Operators/Window-Functions/rank.md)|Assigns a rank value to each row in the query result set, rows with the same value will have the same rank, and the next rank value will skip the same number of rows. | -|[ROW_NUMBER()](../../Reference/Functions-and-Operators/Window-Functions/row_number.md)|Assigns a unique integer value to each row in the query result set, ordered according to the specified collation. | - -## How to use window functions - -Using window functions usually requires the following steps: - -1. Define the window (Window): By using the OVER clause to define the scope of the window, you can specify the sorting rules, partition method and row range of the window, etc. - -2. Write the window function: In the `SELECT` statement, list the window function together with other columns, and specify the columns and operations that need to be calculated within the window. - -Here is an example of how to use window functions to calculate the total sales for each department and the sales rank for each employee within the department: - -```sql -CREATE TABLE SalesTable ( - Department VARCHAR(50), - Employee VARCHAR(50), - Sales INT -); - -INSERT INTO SalesTable (Department, Employee, Sales) VALUES -('Marketing', 'John', 1000), -('Marketing', 'Jane', 1200), -('Sales', 'Alex', 900), -('Sales', 'Bob', 1100), -('HR', 'Alice', 800), -('HR', 'Charlie', 850); - -SELECT - Department, - Employee, - Sales, - SUM(Sales) OVER(PARTITION BY Department) AS DepartmentSales, - RANK() OVER(PARTITION BY Department ORDER BY Sales DESC) AS SalesRank -FROM - SalesTable; -+------------+----------+-------+-----------------+-----------+ -| department | employee | sales | DepartmentSales | SalesRank | -+------------+----------+-------+-----------------+-----------+ -| HR | Charlie | 850 | 1650 | 1 | -| HR | Alice | 800 | 1650 | 2 | -| Marketing | Jane | 1200 | 2200 | 1 | -| Marketing | John | 1000 | 2200 | 2 | -| Sales | Bob | 1100 | 2000 | 1 | -| Sales | Alex | 900 | 2000 | 2 | -+------------+----------+-------+-----------------+-----------+ -6 rows in set (0.01 sec) -``` - -In the above example, the `PARTITION BY` clause is used to partition the result set by the department, and then the `SUM()` function calculates the total sales for each department. Also, the `ORDER BY` clause specifies sorting in descending order by sales, and the `RANK()` function assigns ranks to employees within each department based on sales. diff --git a/docs/MatrixOne/Develop/read-data/window-function/time-window.md b/docs/MatrixOne/Develop/read-data/window-function/time-window.md new file mode 100644 index 000000000..4aa1b35ba --- /dev/null +++ b/docs/MatrixOne/Develop/read-data/window-function/time-window.md @@ -0,0 +1,122 @@ +# Time window + +In a timing scenario, data is usually streamed, and streamed data is usually endless. We can't know when the data source will continue/stop sending data, so processing aggregate events (count, sum, etc.) on a stream will be handled differently than in a batch. Time windows (Windows) are generally used on time-series data streams to limit the scope of aggregations, such as Count of Website Hits in the Last 2 Minutes. The concept of a time window is equivalent to helping us collect a dynamic table of finite data based on acquisition time, and we can aggregate the data in the table. Over time, this window slides forward, continuously capturing new data for calculations. + +The time window is divided into Tumble Window and Sliding Window. Tumble Window has a fixed time window length with no overlap in each window time. Sliding Window's time window length is also fixed, but there is overlap between the windows to capture data changes more frequently. + +When users use the time window feature, they can do calculations within each time window, which slides forward as time passes. When defining a continuous query, you need to specify the size of the time window and the incremental time the next window goes forward. + +## Downsampling + +Downsampling refers to the process of extracting smaller, more manageable subsets of data from large amounts of data. This is especially important when dealing with large-scale time series data, reducing storage requirements, improving query efficiency, and providing clearer trend maps in data visualization. The time window function is the core capability of the database to implement the downsampling function. By defining the time window, we can aggregate the data within each window to achieve downsampling. The size of the time window and the sliding distance determine the granularity of the downsampling. + +## Timesheet and Time Window Syntax + +In MatrixOne, a time window needs to be used in conjunction with a timing table, which is a table that must have `ts` as the primary key when it is built and is of type `timestamp`. + +```sql +DDL Clause: + CREATE TABLE TS_TBL (ts timestamp(6) primary key, SIGNAL1 FLOAT, SIGNAL2 DOUBLE, ...); + +time_window_clause: + INTERVAL(timestamp_col, interval_val, time_unit) [SLIDING (sliding_val)] [fill_clause] + +time_unit: + SECOND | MINUTE | HOUR | DAY + +fill_clause: + FILL(NONE | PREV | NEXT | NULL | VALUE, val | LINEAR) +``` + +When creating a timing table, the `ts` column can specify the precision of `timestamp` up to `timestamp(6)` (microsecond level). + +Parameter meaning in INTERVAL statement: + +* timestamp_col: timestamp column. +* interval_val: length of the time window. +* time_unit: unit of time (seconds, minutes, hours, days). +* SLIDING (sliding_val): Optionally, specifies the time distance the window slides. +* FILL(fill_method): Optionally, specifies how to populate the data within the window. + +INTERVAL (timestamp_col, interval_val) acts on the data to produce the equivalent time period interval_val window, and SLIDING is used to specify the sliding_val time distance the window slides forward. + +- Tumble window when interval_val equals sliding_val. + +- Sliding window when interval_val is greater than sliding_val. + +Other instructions for use: + +- The INTERVAL and SLIDING clauses need to be used in conjunction with aggregate or select functions, which are currently supported in the time window: max, min, sum, avg, count aggregate functions. +- The window width of the aggregation time period is specified by the keyword INTERVAL with a minimum interval of 1 second. +- The time series increases strictly monotonously in the results returned by the time window. +- interval\_val must be a positive integer. +- When querying with INTERVAL, \_wstart(ts), \_wend(ts) are pseudo-columns generated from the window, the start and end times of the window, respectively. + +Example of use: + +This example demonstrates how to slide every 5 minutes over a 10-minute time window, giving a maximum and minimum temperature every 5 minutes. + +```sql +mysql> drop table if exists sensor_data; +CREATE TABLE sensor_data (ts timestamp(3) primary key, temperature FLOAT); +INSERT INTO sensor_data VALUES('2023-08-01 00:00:00', 25.0); +INSERT INTO sensor_data VALUES('2023-08-01 00:05:00', 26.0); +INSERT INTO sensor_data VALUES('2023-08-01 00:15:00', 28.0); +INSERT INTO sensor_data VALUES('2023-08-01 00:20:00', 30.0); +INSERT INTO sensor_data VALUES('2023-08-01 00:25:00', 27.0); +INSERT INTO sensor_data VALUES('2023-08-01 00:30:00', null); +INSERT INTO sensor_data VALUES('2023-08-01 00:35:00', null); +INSERT INTO sensor_data VALUES('2023-08-01 00:40:00', 28); +INSERT INTO sensor_data VALUES('2023-08-01 00:45:00', 38); +INSERT INTO sensor_data VALUES('2023-08-01 00:50:00', 31); +insert into sensor_data values('2023-07-31 23:55:00', 22); +mysql> select _wstart, _wend, max(temperature), min(temperature) from sensor_data where ts > "2023-08-01 00:00:00.000" and ts < "2023-08-01 00:50:00" interval(ts, 10, minute) sliding(5, minute); ++-------------------------+-------------------------+------------------+------------------+ +| _wstart | _wend | max(temperature) | min(temperature) | ++-------------------------+-------------------------+------------------+------------------+ +| 2023-08-01 00:00:00.000 | 2023-08-01 00:10:00.000 | 26 | 26 | +| 2023-08-01 00:05:00.000 | 2023-08-01 00:15:00.000 | 26 | 26 | +| 2023-08-01 00:10:00.000 | 2023-08-01 00:20:00.000 | 28 | 28 | +| 2023-08-01 00:15:00.000 | 2023-08-01 00:25:00.000 | 30 | 28 | +| 2023-08-01 00:20:00.000 | 2023-08-01 00:30:00.000 | 30 | 27 | +| 2023-08-01 00:25:00.000 | 2023-08-01 00:35:00.000 | 27 | 27 | +| 2023-08-01 00:30:00.000 | 2023-08-01 00:40:00.000 | NULL | NULL | +| 2023-08-01 00:35:00.000 | 2023-08-01 00:45:00.000 | 28 | 28 | +| 2023-08-01 00:40:00.000 | 2023-08-01 00:50:00.000 | 38 | 28 | +| 2023-08-01 00:45:00.000 | 2023-08-01 00:55:00.000 | 38 | 38 | ++-------------------------+-------------------------+------------------+------------------+ +10 rows in set (0.04 sec) + +``` + +## interpolation + +Missing values are often encountered when processing timing data. The interpolation (FILL) feature allows us to populate these missing values in a variety of ways, ensuring continuity and integrity of the data, which is critical to the data analysis and downsampling process. The `FIll` clause of the time window acts to populate the aggregate results. + +MatrixOne offers several interpolation methods to accommodate different data processing needs: + +- FILL(NONE): No padding, i.e. column unchanged +- FILL(VALUE, expr): Populate expr results +- FILL(PREV): Populate data with previous non-NULL value +- FILL(NEXT): Populate data with next non-NULL value +- FILL(LINEAR): Linear interpolation padding based on nearest non-NULL value before and after + +Example of use: + +This example adds interpolation logic to the previous table and populates it with NULL values. + +```sql +select _wstart(ts), _wend(ts), max(temperature), min(temperature) from sensor_data where ts > "2023-08-01 00:00:00.000" and ts < "2023-08-01 00:50:00.000" interval(ts, 10, minute) sliding(5, minute) fill(prev); + _wstart | _wend | max(temperature) | min(temperature) | +================================================================================================== + 2023-08-01 00:00:00.000 | 2023-08-01 00:10:00.000 | 26.0000000 | 26.0000000 | + 2023-08-01 00:05:00.000 | 2023-08-01 00:15:00.000 | 26.0000000 | 26.0000000 | + 2023-08-01 00:10:00.000 | 2023-08-01 00:20:00.000 | 28.0000000 | 28.0000000 | + 2023-08-01 00:15:00.000 | 2023-08-01 00:25:00.000 | 30.0000000 | 28.0000000 | + 2023-08-01 00:20:00.000 | 2023-08-01 00:30:00.000 | 30.0000000 | 27.0000000 | + 2023-08-01 00:25:00.000 | 2023-08-01 00:35:00.000 | 27.0000000 | 27.0000000 | + 2023-08-01 00:30:00.000 | 2023-08-01 00:40:00.000 | 27.0000000 | 27.0000000 | + 2023-08-01 00:35:00.000 | 2023-08-01 00:45:00.000 | 28.0000000 | 28.0000000 | + 2023-08-01 00:40:00.000 | 2023-08-01 00:50:00.000 | 38.0000000 | 28.0000000 | + 2023-08-01 00:45:00.000 | 2023-08-01 00:55:00.000 | 38.0000000 | 38.0000000 | +``` \ No newline at end of file diff --git a/docs/MatrixOne/Develop/read-data/window-function/window-function.md b/docs/MatrixOne/Develop/read-data/window-function/window-function.md new file mode 100644 index 000000000..d7abd03de --- /dev/null +++ b/docs/MatrixOne/Develop/read-data/window-function/window-function.md @@ -0,0 +1,77 @@ +# Window function + +A Window Function is a special function that can perform computation operations on a window (Window) in a query result set. Window functions can be used to group, sort, and aggregate result sets, while also being able to calculate correlation values for each row of data within each window without changing the number of rows in the result set. That is, through window functions, the result set can be analyzed and processed flexibly without introducing additional subqueries or join operations. + +SQL window functions have a wide range of applications in a variety of business scenarios: + +1. **Intra-row comparison**: Compares a value in each row to other rows in the same group, such as calculating the difference between each employee's salary and the average departmental salary. At this point, you can use the window function. + +2. **Data ranking**: Window functions can easily generate ranking information for data. For example, if you want to see the ranking of sales, you can use the `RANK()` or `ROW_NUMBER()` functions. + +3. **Rolling Calculation**: Calculates the moving average. You can define the window range of the window function and then perform a rolling calculation. + +## List of window functions + +- Most aggregate functions can also be used as window functions, for example, `SUM()`, `AVG()`, `COUNT()` These aggregate functions can be used with window functions to calculate the sum, average, or count of a column within a window. Aggregate functions and reference documentation for windowable functions supported by MatrixOne can be found at: + + * [AVG](../../../Reference/Functions-and-Operators/Aggregate-Functions/avg.md) + * [COUNT](../../../Reference/Functions-and-Operators/Aggregate-Functions/count.md) + * [MAX](../../../Reference/Functions-and-Operators/Aggregate-Functions/max.md) + * [SUM](../../../Reference/Functions-and-Operators/Aggregate-Functions/sum.md) + * [MIN](../../../Reference/Functions-and-Operators/Aggregate-Functions/min.md) + +- See the following table for other window functions: + +|Function name | Description| +|---------------|------------| +|[DENSE_RANK()](../../../Reference/Functions-and-Operators/Window-Functions/dense_rank.md)|Use to assign a rank to rows in the dataset, always assigning a consecutive rank to the next value, even if the previous value has the same rank.| +|[RANK()](../../../Reference/Functions-and-Operators/Window-Functions/rank.md)|Assign a rank value to each row in the query result set, rows of the same value will have the same rank, and the next rank value will skip the same number of rows.| +|[ROW_NUMBER()](../../../Reference/Functions-and-Operators/Window-Functions/row_number.md)|Assign a unique integer value to each row in the query result set, determining the order based on the specified sort rules.| + +## How to use window functions + +Using a window function usually requires the following steps: + +1. Define Window: By using the OVER clause to define the scope of a window, you can specify the window's sort rule, partitioning method, row range, etc. + +2. Write a window function: In a `SELECT` statement, list the window function with other columns, specifying the columns and actions that need to be computed within the window. + +Here is an example that demonstrates how to use the window function to calculate the total sales for each department and the sales ranking for each employee within the department: + +```sql +CREATE TABLE SalesTable ( + Department VARCHAR(50), + Employee VARCHAR(50), + Sales INT +); + +INSERT INTO SalesTable (Department, Employee, Sales) VALUES +('Marketing', 'John', 1000), +('Marketing', 'Jane', 1200), +('Sales', 'Alex', 900), +('Sales', 'Bob', 1100), +('HR', 'Alice', 800), +('HR', 'Charlie', 850); + +SELECT + Department, + Employee, + Sales, + SUM(Sales) OVER(PARTITION BY Department) AS DepartmentSales, + RANK() OVER(PARTITION BY Department ORDER BY Sales DESC) AS SalesRank +FROM + SalesTable; ++------------+----------+-------+-----------------+-----------+ +| department | employee | sales | DepartmentSales | SalesRank | ++------------+----------+-------+-----------------+-----------+ +| HR | Charlie | 850 | 1650 | 1 | +| HR | Alice | 800 | 1650 | 2 | +| Marketing | Jane | 1200 | 2200 | 1 | +| Marketing | John | 1000 | 2200 | 2 | +| Sales | Bob | 1100 | 2000 | 1 | +| Sales | Alex | 900 | 2000 | 2 | ++------------+----------+-------+-----------------+-----------+ +6 rows in set (0.01 sec) +``` + +In the above example, the `PARTITION BY` clause is used to partition the result set by department, and then the `SUM()` function calculates the total sales for each department. Meanwhile, the `ORDER BY` clause specifies a descending order of sales, and the `RANK()` function assigns a ranking to employees within each department based on sales. diff --git a/docs/MatrixOne/Develop/schema-design/1.1-overview.md b/docs/MatrixOne/Develop/schema-design/1.1-overview.md deleted file mode 100644 index 03f17aa2e..000000000 --- a/docs/MatrixOne/Develop/schema-design/1.1-overview.md +++ /dev/null @@ -1,54 +0,0 @@ -# Database Schema Design Overview - -This document provides the basics of MatrixOne database schema design. This document introduces terminology related to MatrixOne databases and subsequent data read and write examples. - -## Key concept in MatrixOne - -Database Schema: The database schema mentioned in this article is the same as the logical object database. It is the same as MySQL. - -## Database - -A database in MatrixOne is a collection of objects such as tables. - -To view the default database contained by MatrixOne, ues `SHOW DATABASES;` statment. - -To create a new database, ues `CREATE DATABASE database_name;` statement. - -## Table - -A table is a collection of related data in a database. - -Each table consists of rows and columns. Each value in a row belongs to a specific column. Each column allows only a single data type. To further qualify columns, you can add some constraints. - -## Index - -An index is a data structure used to find data in database tables quickly. It can be seen as a 'table of contents' that contains pointers to the data of each row in the table, making it possible for queries to locate data that meets specific conditions more quickly. - -The indexes commonly used in databases include primary key indexes, secondary indexes, etc. Among them, unique indexes are used to ensure the uniqueness of specific columns or combinations of columns, ordinary indexes are used to improve query performance, and full-text indexes are used for full-text search in text data. - -There are two common types of indexes, namely: - -- **Primary Key**: Primary key index, the index that identifies the primary key column. The primary key index uniquely identifies each row of data in the table. -- **Secondary index**: The secondary index is identified on the non-primary key. The secondary index, a non-clustered index, is used to improve query performance and speed up data retrieval. - -__Note:__ Currently, MatrixOne only supports **Primary Key**. - -## Vector - -MatrixOne now supports storing and querying vectors. Vector is a numerical array generally produced by AI models, including Large Language Models. These vectors can be seamlessly stored and queried, allowing for tasks like finding nearest neighbors, all while accommodating relational data. - -__Note:__ Currently, MatrixOne only supports inserting and querying vector data. - -For more information, see [vector](vector.md) - -## Other supported logical objects - -MatrixOne supports the following logical objects at the same level as the table: - -- View: a view acts as a virtual table whose schema is defined by the SELECT statement that creates the view. - -- Temporary table: a table whose data is not persistent. - -## Access Control - -MatrixOne supports both user-based and role-based access control. To allow users to view, modify, or delete data, for more information, see [Access control in MatrixOne](../../Security/role-priviledge-management/about-privilege-management.md). diff --git a/docs/MatrixOne/Develop/schema-design/create-table-as-select.md b/docs/MatrixOne/Develop/schema-design/create-table-as-select.md new file mode 100644 index 000000000..7a1ec2fd0 --- /dev/null +++ b/docs/MatrixOne/Develop/schema-design/create-table-as-select.md @@ -0,0 +1,89 @@ +# Using CTAS to replicate tables + +## What is CTAS + +CTAS ([Create Table As Select](../../Reference/SQL-Reference/Data-Definition-Language/create-table-as-select.md)), is a SQL statement used to quickly create a new table (replicate table) based on an existing table or query result. When the CTAS statement is executed, a new table is created directly from the data generated by the SELECT clause, and the column structure and data type of the new table are consistent with the result set in the SELECT clause. + +## Application scenarios + +CTAS has a wide range of application scenarios, including: + +- Data Migration: Using CTAS, you can quickly migrate data from one table to another, while changing the storage structure and distribution strategy of tables to accommodate different queries and storage needs. + +- Data backup: CTAS can be used to create backup copies of data, which is useful for data recovery and historical data analysis. + +- Table structure changes: When you need to modify the table structure (such as adding or removing columns, changing data types, etc.), CTAS can create a new table to reflect these changes without affecting the original table. + +- Data Science and Machine Learning: In data science projects, CTAS can be used to prepare data sets and create clean, formatted data tables suitable for training machine learning models. + +CTAS is an efficient SQL operation that dramatically improves the efficiency of data processing and analysis by simplifying data management processes and enhancing operational flexibility. However, when applying CTAS, the level of support for CTAS by the target database system and its potential impact on system performance need to be taken into account to ensure data synchronization and operational accuracy and effectiveness. + +## Prepare before you start + +Completed [standalone deployment of](../../Get-Started/install-standalone-matrixone.md) MatrixOne. + +## How to use CTAS + +### Grammar + +`CTAS` statements typically take the following form: + +```sql CREATE [TEMPORARY] TABLE table_name as select``` + +See chapter [Create Table As Select](../../Reference/SQL-Reference/Data-Definition-Language/create-table-as-select.md) for more syntax descriptions + +### Cases + +Suppose we have an e-commerce platform and we want to create a data table to analyze the details of each order including order number, customer ID, order date, product ID, product quantity and product price. + +```sql +CREATE TABLE orders( +order_id int auto_increment PRIMARY KEY, +customer_id int, +order_date date, +product_id int, +quantity int, +price float +); + +INSERT INTO orders(customer_id,order_date,product_id,quantity,price) values(30,"2023-04-01",5001,2,19.99); +INSERT INTO orders(customer_id,order_date,product_id,quantity,price) values(40,"2023-04-02",5002,1,29.99); +INSERT INTO orders(customer_id,order_date,product_id,quantity,price) values(30,"2023-04-03",5001,1,19.99); + +mysql> select * from orders; ++----------+-------------+------------+------------+----------+-------+ +| order_id | customer_id | order_date | product_id | quantity | price | ++----------+-------------+------------+------------+----------+-------+ +| 1 | 30 | 2023-04-01 | 5001 | 2 | 19.99 | +| 2 | 40 | 2023-04-02 | 5002 | 1 | 29.99 | +| 3 | 30 | 2023-04-03 | 5001 | 1 | 19.99 | ++----------+-------------+------------+------------+----------+-------+ +3 rows in set (0.00 sec) + +--For analysis purposes, we want to calculate the total price for each order and create a new table containing the order number, customer ID, order date, and total order price. +CREATE TABLE orders_analysis AS +SELECT + order_id, + customer_id, + order_date, + product_id, + quantity, + price, + CAST((quantity * price) AS float) AS total_price +FROM + orders; + +mysql> select * from orders_analysis; ++----------+-------------+------------+------------+----------+-------+-------------+ +| order_id | customer_id | order_date | product_id | quantity | price | total_price | ++----------+-------------+------------+------------+----------+-------+-------------+ +| 1 | 30 | 2023-04-01 | 5001 | 2 | 19.99 | 39.98 | +| 2 | 40 | 2023-04-02 | 5002 | 1 | 29.99 | 29.99 | +| 3 | 30 | 2023-04-03 | 5001 | 1 | 19.99 | 19.99 | ++----------+-------------+------------+------------+----------+-------+-------------+ +3 rows in set (0.00 sec) +``` + +In this example, the CTAS statement not only copies the columns in the original table, but also adds a new calculated column, total_price, which calculates the total price of the order line item by multiplying the number of products per order by the price. This gives us a new table suitable for sales analysis that can be used directly to generate reports or for further data analysis. + +This example demonstrates the power of CTAS in data conversion and preparation, which facilitates data analysis by allowing us to clean and convert data while creating new tables. \ No newline at end of file diff --git a/docs/MatrixOne/Develop/schema-design/data-integrity/foreign-key-constraints.md b/docs/MatrixOne/Develop/schema-design/data-integrity/foreign-key-constraints.md index 07f7fa8d7..cf48f04bd 100644 --- a/docs/MatrixOne/Develop/schema-design/data-integrity/foreign-key-constraints.md +++ b/docs/MatrixOne/Develop/schema-design/data-integrity/foreign-key-constraints.md @@ -22,6 +22,8 @@ When defining FOREIGN KEY, the following rules need to be followed: **Foreign Key Characteristics** +- Foreign key self-referencing: is when a column in a table references the primary key of the same table. This design is often used to represent hierarchical or parent-child relationships, such as organizational structures, classified directories, and so on. + - Multi-column foreign key: This type of foreign key is when two or more columns in a table jointly reference another table's primary key. In other words, these columns together define the reference to another table. They must exist in the form of a group and need to meet the foreign key constraint simultaneously. - Multi-level foreign key: This situation usually involves three or more tables, and they have a dependency relationship. A table's foreign key can be another table's primary key, and this table's foreign key can be the primary key of a third table, forming a multi-level foreign key situation. @@ -92,7 +94,41 @@ ERROR 20101 (HY000): internal error: Cannot add or update a child row: a foreign **Example Explanation**: In the above example, column c of t2 can only refer to the value or null value of column a in t1, so the operation of inserting row 1 and row 2 of t1 can be successfully inserted, but row 3 103 in the row is not a value in column a of t1, which violates the foreign key constraint, so the insert fails. -### Example 2 - Multi-column foreign key +### Example 2 - Foreign key self-reference + +```sql +-- Create a table named categories to store product categorization information. +CREATE TABLE categories ( + id INT AUTO_INCREMENT PRIMARY KEY, + name VARCHAR(255) NOT NULL, + parent_id INT, + FOREIGN KEY (parent_id) REFERENCES categories(id) +); + +mysql> INSERT INTO categories (name) VALUES ('Electronics'),('Books'); +Query OK, 2 rows affected (0.01 sec) + +mysql> INSERT INTO categories (name, parent_id) VALUES ('Laptops', 1),('Smartphones', 1),('Science Fiction', 2),('Mystery', 2); +Query OK, 4 rows affected (0.01 sec) + +mysql> select * from categories; ++------+-----------------+-----------+ +| id | name | parent_id | ++------+-----------------+-----------+ +| 1 | Electronics | NULL | +| 2 | Books | NULL | +| 3 | Laptops | 1 | +| 4 | Smartphones | 1 | +| 5 | Science Fiction | 2 | +| 6 | Mystery | 2 | ++------+-----------------+-----------+ +6 rows in set (0.01 sec) + +``` + +**Example Explanation**:In the above code, we have created a table named `categories` to store the category information of the products and first inserted two top level categories `Electronics` and `Books`. Then, we added subcategories to each of the top-level categories, for example, `Laptops` and `Smartphones` are subcategories of `Electronics`, and `Science Fiction` and `Mystery` are subcategories of `Books`. + +### Example 3 - Multi-column foreign key ```sql -- Creating a "Student" table to store student information @@ -121,7 +157,7 @@ CREATE TABLE StudentCourse ( **Example Explanation**: In the above example, there are three tables: the `Student` table, the `Course` table, and the `StudentCourse` table for recording which students have chosen which courses. In this case, the `Student ID` and `Course ID` in the course selection table can serve as foreign keys, jointly referencing the primary keys of the student table and the course table. -### Example 3 - Multi-level foreign key +### Example 4 - Multi-level foreign key ```sql -- Creating a "Country" table to store country information diff --git a/docs/MatrixOne/Develop/schema-design/overview.md b/docs/MatrixOne/Develop/schema-design/overview.md index 6f687c266..466134d36 100644 --- a/docs/MatrixOne/Develop/schema-design/overview.md +++ b/docs/MatrixOne/Develop/schema-design/overview.md @@ -39,6 +39,12 @@ MatrixOne supports the following logical objects at the same level as table: - Temporary table: a table whose data is not persistent. +## Vector + +MatrixOne now supports storing and querying vectors. Vectors are lists of numbers typically generated by AI models such as large language models. + +For more information, see [vector](... /Vector/vector_type.md) + ## Access Control MatrixOne supports both user-based and role-based access control. To allow users to view, modify, or delete data, for more information, see [Access control in MatrixOne](../../Security/role-priviledge-management/about-privilege-management.md). diff --git a/docs/MatrixOne/Develop/udf/udf-python-advanced.md b/docs/MatrixOne/Develop/udf/udf-python-advanced.md new file mode 100644 index 000000000..77acda356 --- /dev/null +++ b/docs/MatrixOne/Develop/udf/udf-python-advanced.md @@ -0,0 +1,378 @@ +# UDF-Python-Advanced + +This document will guide you on how to use the advanced features of UDF, including building UDF in phython files, whl packages. + +## Prepare before you start + +### Environment Configuration + +Before you begin, confirm that you have downloaded and installed the following software: + +- Verify that you have installed [Python 3.8 (or plus)](https://www.python.org/downloads/), check the Python version using the following code to confirm that the installation was successful: + + ```bash + #To check with Python installation and its version + python3 -V + ``` + + !!! note + If you have both Pyhon2 and Python3 in your operating system, you need to configure it globally for Python3 before using UDF, for example by renaming `/usr/bin/python` and then creating a python3 softlink there with the same name, an example of the relevant command: + + ```bash + mv /usr/bin/python /usr/bin/python.bak + ln -s /usr/local/python3/bin/python3 /usr/bin/python + ``` + +- Download and install the `protobuf` and `grpcio` tools and use the following code to download and install the `protobuf` and `grpcio` tools: + + ``` + pip3 install protobuf + pip3 install grpcio + ``` + +- Verify that you have completed installing the MySQL client. + +### Start MatrixOne + +1. Follow the steps in the [Quick Start](../../Get-Started/install-standalone-matrixone.md) chapter to complete the deployment of MatrixOne using mo\_ctl. When the deployment is complete, execute the following command to modify the configuration of mo\_ctl: + + ```bash + mo_ctl set_conf MO_CONF_FILE="\${MO_PATH}/matrixone/etc/launch-with-python-udf-server/launch.toml" + ``` + +2. After modifying the configuration, you need to start (or restart) the MatrixOne service for the configuration to take effect, such as starting the MatrixOne service with mo\_ctl: + + ```bash + mo_ctl start + ``` + +3. Wait for the MatrixOne service to start normally (if MatrixOne is starting for the first time, background initialization will take about ten seconds before connecting after initialization is complete). Execute the following command to access the MatrixOne service: + + ``` + mo_ctl connect + ``` + + You will be taken to the mysql client command line tool after a successful connection. + +## Import phython file to build UDF + +Embedded UDFs write function bodies directly in SQL, which can inflate SQL statements and be detrimental to code maintenance if the function logic is complex. To avoid this, we can write the UDF function body in an external, separate Python file, and then create the function in MatrixOne by importing the Python file. + +1. Prepare your python files + + You can write python code from the original SQL function body inside the `/opt/add_func.py` file: + + ``` + python def add(a, b): return a + b + ``` + +2. Creating a UDF Function + + Use the following command to create the function. We use the import keyword to import the add\_func.py file under the specified path. + + ``` + mysql create or replace function py_add_2(a int, b int) returns int language python import '/opt/add_func.py' -- absolute path to python file in OS handler 'add'; + ``` + +3. Call the UDF function + + Once the function has been created, the UDF function can be called with a **function name +** a parameter list of the matching type, for example: + + ```mysql + select py_add_2(12345,23456); + +-------------------------+ + | py_add(12345, 23456) | + +-------------------------+ + | 35801 | + +-------------------------+ + 1 row in set (0.02 sec) + ``` + +## Import whl package to build UDF + +WHL file is a standard built-in package format for python distribution that allows installation packages to run without building source distributions. WHL file is essentially a ZIP file. + +### Preparations + +1. Before building the whl package, we need to install the following tools: + + ``` + bash pip install setuptools wheel + # setuptools: for building and packaging Python libraries + # wheel: Used to generate .whl files + ``` + +### Build a whl package + +1. Create the file and its contents according to the following file structure + + We use a simple Python project directory, `func_add` (folder names can be arbitrarily named), with the following directory structure: + + ```bash + func_add/ + ├── add_udf.py + └── setup.py + ``` + + where `add_udf.py` is a normally executable Python code file that implements the function method body logic or can be treated as a module. The code in `add_udf.py` is for example: + + ```python + # function + + def add(a, b): + return a + b + ``` + + The `setup.py` file is used to define metadata, configuration information, etc. for the library, with code such as: + + ```python + # setup.py + + from setuptools import setup + + setup( name="udf", version="1.0.0", + # The python file name containing the function body is the module name after removing the extension name .py + py\_modules=\["add\_udf"] ) + ``` + +2. Build a whl package + + Once the project file is written in place, execute the following command inside the `func_add` directory to build the wheel package: + + ```bash + python setup.py bdist_wheel + ``` + + When the packaging is complete, the `udf-1.0.0-py3-none-any.whl` file is generated in the `func_add/dist` directory. + +### Create and call UDF functions + +1. Creating a UDF Function + + Copy the whl package to the planned function repository directory, such as the path: `/opt/udf/udf-1.0.0-py3-none-any.whl`, and use the whl package in the create statement to create the UDF function. An example of the create statement is as follows: + + ```sql + create or replace function py_add_3(a int, b int) + returns int language python + import '/opt/udf/udf-1.0.0-py3-none-any.whl' -- wheel The directory in which the package resides + handler 'add_udf.add'; -- Specifies the add function that calls the add_udf module in the whl package + ``` + +2. Call the UDF function + + Once the function has been created, the UDF function can be called with a **function name +** a parameter list of the matching type, for example: + + ```sql + select py_add_3(12345,23456); + +-------------------------+ + | py_add(12345, 23456) | + +-------------------------+ + | 35801 | + +-------------------------+ + 1 row in set (0.02 sec) + ``` + +## Function Vector + +In some scenarios, we would expect the python function to receive multiple tuples at once to improve its efficiency. As in model inference, we usually do this in a batch, where batch is the vector of the tuple, and MatrixOne provides the vector option of the function to handle this situation. We still use the py\_add function as an example to show the use of the vector option. + +1. Create a data table named grades under the udf\_test library: + + ```sql + create table grades(chinese int,math int); + ``` + +2. Insert several pieces of test data: + + ```sql + insert into grades values(97,100),(85,89),(79,99); + ``` + +3. View the data in the following table: + + ```mysql + select * from grades; + +---------+------+ + | chinese | math | + +---------+------+ + | 97 | 100 | + | 85 | 89 | + | 79 | 99 | + +---------+------+ + ``` + +4. Create a UDF function by executing the following command We use `add.vector = True` to mark the python function add to receive two int lists (vectors) instead of int values: + + ```sql + create or replace function py_add_4(a int, b int) returns int language python as $$ def add(a, b): \# a, b are list return \[a\[i] + b\[i] for i in range(len(a))] add.vector = True $$ handler 'add'; + ``` + +5. Call the UDF function + + The function is also called by its name and argument list, where the argument list we can use two integer field columns in the grades table, for example: + + ```sql + select py_add_4(chinese,math) as Total from grades; + +-------+ + | Total | + +-------+ + | 197 | + | 174 | + | 178 | + +-------+ + ``` + + With the vector option, we are free to choose the processing form of the function, such as a tuple at a time, or a tuple at a time. + +## Machine Learning Case: Credit Card Fraud Detection + +This section describes the use of python UDF in the machine learning inference pipeline, using Credit Card Fraud Detection as an example. (The code is detailed in [github-demo](https://github.com/matrixorigin/matrixone/tree/main/pkg/udf/pythonservice/demo) and includes the following files to be downloaded and written) + +### Environment Configuration + +In this section, we need to make sure that the local python environment has numpy and scikit-learn and joblib installed. + +```bash +pip install numpy pip install scikit-learn pip install joblib +``` + +### Background and data + +Credit card companies need to identify fraudulent transactions to prevent customers' credit cards from being used maliciously. (See [kaggle Credit Card Fraud Detection](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud) for more details) + +The data set contains transactions made by European cardholders using credit cards in September 2013. The data format is as follows: + +| Column Name | Type | Meaning | +| :--- | :--- | :--- | +|Time | int | Number of seconds elapsed between this transaction and the first transaction in the dataset | +| V1~V28 | double | Features extracted using PCA (to protect user identity and sensitive features) | +| Amount | double | Transaction Amount | +| Class | int | 1: fraudulent transactions, 0: Non-fraudulent transactions | + +Let's take the data according to 8: 1: The scale of 1 is divided into a training set, a validation set, and a test set. Since the training process is not the focus of this article, it is not covered too much here. + +We store the test set as new data emerging from the production process in the MO. [Click here to](https://github.com/matrixorigin/matrixone/blob/main/pkg/udf/pythonservice/demo/ddl.sql) get the `ddl.sql` file, import the data table with the following statement and some of the test data: + +```sql +source /your_download_path/ddl.sql +``` + +### Preparing the python-whl package + +1. Write `detection.py`: + + ```python + # coding = utf-8 + # -*\- coding:utf-8 -*- + import decimal import os from typing import List + + import joblib import numpy as np + + model\_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'model\_with\_scaler') + + + def detect(featuresList: List\[List\[int]], amountList: List\[decimal.Decimal]) -> List\[bool]: model\_with\_scaler = joblib.load(model\_path) + + columns_features = np.array(featuresList) + column_amount = np.array(amountList, dtype='float').reshape(-1, 1) + column_amount = model_with_scaler['amount_scaler'].transform(column_amount) + data = np.concatenate((columns_features, column_amount), axis=1) + predictions = model_with_scaler['model'].predict(data) + return [pred == 1 for pred in predictions.tolist()] + + + detect.vector = True + ``` + +2. Write `__init__.py`: + + ```python + # coding = utf-8 + # -*- coding:utf-8 -*- + from .detection import detect + ``` + +3. [Click to download](https://github.com/matrixorigin/matrixone/blob/main/pkg/udf/pythonservice/demo/credit/model_with_scaler) the trained model `model_with_scaler` + +4. Write `setup.py`: + + ```python + # coding = utf-8 + # -*- coding:utf-8 -*- + from setuptools import setup, find_packages + + setup( + name="detect", + version="1.0.0", + packages=find_packages(), + package_data={ + 'credit': ['model_with_scaler'] + }, + ) + ``` + +5. Organize the above files into the following structure: + + ```bash + |-- demo/ + |-- credit/ + |-- __init__.py + |-- detection.py # inference function + |-- model_with_scaler # model + |-- setup.py + ``` + +6. Go to the directory `demo` and build the wheel package detect-1.0.0-py3-none-any.whl with the following command: + + ```bash + python setup.py bdist_wheel + ``` + +### Fraud detection using udf + +1. To create a udf function: + + ```sql + create or replace function py_detect(features json, amount decimal) + returns bool + language python + import 'your_code_path/detect-1.0.0-py3-none-any.whl' -- replace with handler 'credit.detect'; + handler 'credit.detect';-- detect function under credit module + ``` + +2. Call the udf function for fraud detection: + + ```sql + select id, py_detect(features, amount) as is_fraud from credit_card_transaction limit 10; + ``` + + Output: + + ```sql + +---------+----------+ + | id | is_fraud | + +---------+----------+ + | 1 | false | + | 2 | false | + | 3 | true | + | 4 | false | + | 5 | false | + | 6 | false | + | 7 | false | + | 8 | true | + | 9 | false | + | 10 | false | + +---------+----------+ + ``` + +At this point, we have completed the reasoning for the credit card fraud detection task in MO. + +As the case shows, we can easily use python UDF for tasks that SQL cannot solve. Python UDF greatly improves development efficiency by both extending the semantics of SQL and eliminating the need for us to manually program data movement and transformation. + +## Reference Documents + +For base usage of UDF in MatrixOne, see [UDF Base Usage](udf-python.md). + +For specific parameters that MatrixOne creates for UDFs, see [Creating UDFs](../../Reference/SQL-Reference/Data-Definition-Language/create-function-python.md). + +For specific parameters for MatrixOne deletion of UDFs, see [Removing UDFs](../../Reference/SQL-Reference/Data-Definition-Language/drop-function.md). diff --git a/docs/MatrixOne/Develop/udf/udf-python.md b/docs/MatrixOne/Develop/udf/udf-python.md new file mode 100644 index 000000000..1c9978046 --- /dev/null +++ b/docs/MatrixOne/Develop/udf/udf-python.md @@ -0,0 +1,140 @@ +# UDF-Python + +You can write handlers for user-defined functions (UDFs) in Python. This document will guide you through how to create a simple Python UDF, including usage environment requirements, UDF creation, viewing, use, and deletion. + +## Prepare before you start + +### Environment Configuration + +Before you begin, confirm that you have downloaded and installed the following software: + +- Verify that you have installed [Python 3.8 (or plus)](https://www.python.org/downloads/), check the Python version using the following code to confirm that the installation was successful: + + ```bash + #To check with Python installation and its version + python3 -V + ``` + + !!! note + If you have both Pyhon2 and Python3 in your operating system, you need to configure it globally for Python3 before using UDF, for example by renaming `/usr/bin/python` and then creating a python3 softlink there with the same name, an example of the relevant command: + + ```bash + mv /usr/bin/python /usr/bin/python.bak + ln -s /usr/local/python3/bin/python3 /usr/bin/python + ``` + +- Download and install the `protobuf` and `grpcio` tools and use the following code to download and install the `protobuf` and `grpcio` tools: + + ``` + pip3 install protobuf pip3 install grpcio + ``` + +- Verify that you have completed installing the MySQL client. + +### Start MatrixOne + +1. Follow the steps in the [Quick Start](../../Get-Started/install-standalone-matrixone.md) chapter to complete the deployment of MatrixOne using mo\_ctl. When the deployment is complete, execute the following command to modify the configuration of mo\_ctl: + + ```bash + mo_ctl set_conf MO_CONF_FILE="\${MO_PATH}/matrixone/etc/launch-with-python-udf-server/launch.toml" + ``` + +2. After modifying the configuration, you need to start (or restart) the MatrixOne service for the configuration to take effect, such as starting the MatrixOne service with mo\_ctl: + + ```bash + mo_ctl start + ``` + +3. Wait for the MatrixOne service to start normally (if MatrixOne is starting for the first time, background initialization will take about ten seconds before connecting after initialization is complete). Execute the following command to access the MatrixOne service: + + ``` + mo_ctl connect + ``` + + You will be taken to the mysql client command line tool after a successful connection. + +## Embedded Build UDF + +MatrixOne supports creating UDFs in SQL by writing function bodies directly in Python code using the AS keyword. UDFs created in this form are called **embedded UDFs**. + +1. Create a test library + + Before you can create a UDF function, you need to create a test library: + + ```sql + mysql> create database udf_test; Query 0k, 1 row affected (0.02 sec) + mysql> use udf_test; + Database changed + ``` + +2. Creating a UDF Function + + Within the target library, the CREATE command can be executed in conjunction with Python statements to create UDF functions. For example, use the following SQL to define a function named **py\_add** that defines a list of arguments to receive two arguments of type int . The function function returns the sum of the two arguments, with the specific function logic in the python code after as . Then, use the handler keyword to specify the name of the python function called: + + ```sql + create or replace function py_add(a int, b int) returns int language python as + $$ + def add(a, b): + return a + b + $$ + handler 'add'; + ``` + + !!! note + The current version of matrixone does not check the python syntax when creating a UDF. Users need to guarantee the correctness of the python syntax themselves, or they will get an error when executing subsequent execution functions. + +3. Call the UDF function + + Once the function has been created, the UDF function can be called with a **function name +** a parameter list of the matching type, for example: + + ```sql + select py_add(12345,23456); + +-------------------------+ + | py_add(12345, 23456) | + +-------------------------+ + | 35801 | + +-------------------------+ + 1 row in set (0.02 sec) + ``` + +## Delete UDF + +Created UDF functions can be removed by `drop function` command. MatrixOne fully identifies a UDF function by its **`function name (`**parameter list), so deleting a UDF function requires explicitly specifying the **function name** and **parameter list**, for example: + +``` +sql drop function py_add(int, int); +``` + +## View UDF + +Information about created UDF functions is saved in the metadata of MatrixOne. You can obtain the UDF details already in MatrixOne by querying the system table `mo_catalog.mo_user_defined_function`, for example: + +```mysql +mysql> select * from mo_catalog.mo_user_defined_function\G +*************************** 1. row *************************** + function_id: 9000016 + name: py_add + owner: 0 + args: [{"name": "a", "type": "int"}, {"name": "b", "type": "int"}] + rettype: int + body: {"handler":"add","import":false,"body":"\ndef add(a, b):\n return a + b\n"} + language: python + db: udf_test + definer: root + modified_time: 2023-12-26 13:59:39 + created_time: 2023-12-26 13:59:39 + type: FUNCTION + security_type: DEFINER + comment: +character_set_client: utf8mb4 +collation_connection: utf8mb4_0900_ai_ci + database_collation: utf8mb4_0900_ai_ci +``` + +## Reference Documents + +For advanced usage of UDF in MatrixOne, see [UDF Advanced Usage](udf-python-advanced.md). + +For specific parameters that MatrixOne creates for UDFs, see [Creating UDFs](../../Reference/SQL-Reference/Data-Definition-Language/create-function-python.md). + +For specific parameters for MatrixOne deletion of UDFs, see [Removing UDFs](../../Reference/SQL-Reference/Data-Definition-Language/drop-function.md). diff --git a/docs/MatrixOne/FAQs/deployment-faqs.md b/docs/MatrixOne/FAQs/deployment-faqs.md index 156ce620a..02dc96d4a 100644 --- a/docs/MatrixOne/FAQs/deployment-faqs.md +++ b/docs/MatrixOne/FAQs/deployment-faqs.md @@ -1,129 +1,203 @@ -# Deployment FAQs +# Deployment Frequently Asked Questions -## Operating system requirements +## Environmentally relevant -### **What are the required operating system versions for deploying MatrixOne?** +**What OS version is required to deploy MatrixOne?** -MatrixOne supports the following operating system: +- MatrixOne currently supports the operating systems in the table. -| Linux OS | Version | +| Linux OS | Version | | :----------------------- | :------------------------ | -| Debian | 11.0 or later | +| Debian | 11.0 or later | | Ubuntu LTS | 20.04 or later | | Red Hat Enterprise Linux | 9.0 or later releases | | Oracle Enterprise Linux | 9.0 or later releases | +| CentOS | 7.0 or later releases | -MatrixOne also supports macOS operating system, but it's only recommended to run as a test and development environment. +For Linux systems with lower kernels, it is recommended to deploy with a binary package built on musl libc if you are using a binary package installation deployment, as detailed in the **recommended installation environment** chapter in [the Standalone Deployment Matrixone overview](../Get-Started/install-standalone-matrixone.md). -| macOS | Version | +- MatrixOne also supports the macOS operating system and is currently only recommended to run in test and development environments. + +| macOS | 版本 | | :---- | :--------------------- | | macOS | Monterey 12.3 or later | -## Hardware requirements +- As a domestic database, MatrixOne is currently compatible with and supported by the following domestic operating systems: + +| OS | OS Version | CPU | Memory | +| :------ |:------ | :------ | :----- | +|OpenCloudOS| v8.0 / v9.0 | x86 CPU;4 Core | 16 GB | +|openEuler | 20.03 | x86 / ARM CPU;4 Core | 16 GB | +|TencentOS Server | v2.4 / v3.1 | x86 CPU;4 Core | 16 GB | +|UOS | V20 | ARM CPU;4 Core | 16 GB | +|KylinOS | V10 | ARM CPU;4 Core | 16 GB | +|KylinSEC | v3.4 | x86 / ARM CPU;4 Core | 16 GB | + +**Can I use MatrixOne properly under Red Hat systems like CentOS 7?** + +MatrixOne does not have strict operating system requirements and supports use under CentOS 7, but CentOS 7 was discontinued at the end of June 24 and a newer version of the operating system is recommended. + +**Does MatrixOne support deployment in homegrown environments?** + +For the domestic operating system and chip, the chip we have adapted to Pang Peng and Haiguang, the operating system has adapted to Galactic Kirin, Euler, Kirin Shinan. + +**Where can I deploy MatrixOne?** + +MatrixOne can be deployed on premise, in a public cloud, in a private cloud, or on kubernetes. + +**Does MatrixOne support distributed deployment on Aliyun ecs servers?** + +Currently K8S based on ECS or Aliyun ACK is required for distributed deployment. + +Do **cluster deployments only support K8s? Can you physically distribute local deployment?** + +If there is no k8s and minio environment ahead of time. Our installation tools come with k8s and minio and can also be deployed on a physical machine with one click. + +* **Does the current non-k8s version support master-slave configuration?** + +MatrixOne does not currently support a non-k8s version of the master-slave configuration and will do so later. -### **What are the required hardware for deploying MatrixOne?** +**Can a production environment only be deployed in k8s mode?** -For standalone installation, MatrixOne can be running on the 64-bit generic hardware server platform in the Intel x86-64 and ARM architecture. The requirements and recommendations about server hardware configuration for development, testing and production environments are as follows: +Yes, for distributed stability and scalability, we recommend that production systems be deployed with k8s. If you don't have k8s out of the box, you can deploy with managed k8s to reduce complexity. -* Development and testing environments +## Hardware related -| CPU | Memory | Local Storage | -| :------ | :----- | :-------------- | +**What are MatrixOne's configuration requirements for deploying hardware?** + +In standalone installations, MatrixOne can currently run on 64-bit universal hardware server platforms with Intel x86-64 and ARM architectures. + +Server hardware configuration requirements and recommendations for development, test, and production environments are as follows: + +- Development and Test Environment Requirements + +| CPU | Memory | Local Storage | +| :------ | :--------- | :------------- | | 4 core+ | 16 GB+ | SSD/HDD 200 GB+ | -The Macbook M1/M2 with ARM architecture is also a good fit for a development environment. +Macbook M1/M2 with ARM architecture is also suitable for development environments. -* Production environment +- Production environment requirements -| CPU | Memory | Local Storage | -| :------- | :----- | :-------------- | +| CPU | Memory | Local Storage | +| :-------- | :----- | :------------- | | 16 core+ | 64 GB+ | SSD/HDD 500 GB+ | -For comprehensive details on deploying MatrixOne in a distributed setting, see [Cluster Topology Planning Overview](../Deploy/deployment-topology/topology-overview.md). This guide includes specific server hardware configuration requirements and recommendations tailored for development, testing, and production environments. +In the case of distributed installations, MatrixOne can refer to the [Cluster Topology Planning Overview](../Deploy/deployment-topology/topology-overview.md) for server hardware configuration requirements and recommendations for development, test, and production environments. + +## Configuration related + +**Do you need to change any settings during installation?** + +Normally, you don't need to change any settings when you install. The `launch.toml` default setting makes it entirely possible to run MatrixOne directly. But if you need to customize the listening port, IP address, path to store data files, you can modify the corresponding [`cn.toml`](https://github.com/matrixorigin/matrixone/blob/main/etc/launch-with-proxy/cn.toml), [`tn.toml`](https://github.com/matrixorigin/matrixone/blob/main/etc/launch-with-proxy/tn.toml), or [`log.toml`](https://github.com/matrixorigin/matrixone/blob/main/etc/launch-with-proxy/log.toml). Details of parameter configuration within these files can be found in the [General Parameters Configuration](../Reference/System-Parameters/system-parameter.md) + +**What should I do when I want to save MatrixOne's data directory to a file directory that I specify?** + +When you start MatrixOne with Docker, you can mount the data directory you specified to the Docker container, see [Mounting directories to](../Maintain/mount-data-by-docker.md) the Docker container. + +When you compile and launch MatrixOne using source or binary packages, you can do this by modifying the default data directory path in the configuration file: open the MatrixOne configuration file directory `matrixone/etc/launch-with-proxy`, modify the configuration parameter `data-dir = "`./mo-data" in the three files `cn.toml`, `tn.toml` and `log.toml` to `data-dir = "your_local_path"`, save it and restart MatrixOne. + +## Tools related + +**Can binary package installations be managed via mo\_ctl?** + +By setting the path to the MO\_PATH configuration binary package, you can use mo\_ctl to manage it. -## Installation and deployment +**Does the mo\_ctl tool support source deployment upgrades** -### **What settings do I need to change for installation?** +The upgrade command allows you to specify the corresponding version or commitid for fine-grained upgrades, setting the current version MO\_PATH with care, and the compilation environment. -Normally you don't need to change anything for installation. A default setting of `launch.toml` is enough to run MatrixOne directly. But if you want to customize your listening port, ip address, stored data files path, you may modify the corresponding records of [`cn.toml`](https://github.com/matrixorigin/matrixone/blob/main/etc/launch-with-proxy/cn.toml), [`tn.toml`](https://github.com/matrixorigin/matrixone/blob/main/etc/launch-with-proxy/tn.toml) or [`log.toml`](https://github.com/matrixorigin/matrixone/blob/main/etc/launch-with-proxy/log.toml), for more details about parameter configuration in these files, see [Boot Parameters for standalone installation](../Reference/System-Parameters/system-parameter.md). +**Does the mo\_ctl tool support deployment of matrixOne clusters** -### **After the MySQL client is installed, I open the terminal and run `mysql`, I got an error of `command not found: mysql`.** +Not currently supported, later consider joining cluster deployment and management. -To solve the error, you need to set the environment variable. Open a new terminal window, run the following command: +**helm install operator, how about seeing if the installation worked?** + +Use helm list -A to see. + +**How do operators deployed in helm mode uninstall?** + +Uninstall by specifying a name and namespace through the helm uninstall command. + +**Is the version of operator required for deployment?** + +operator is used to manage the matrixOne cluster, so the operator version is as consistent as possible with the version of the cluster. For example, if we have a cluster with version 1.0.0-rc2 installed, the operator version corresponding to the earlier installation should also be 1.0.0-rc2. If a consistent version of operator is not found, a similar version of operator is recommended. + +## Error reporting related + +**When I finish installing the MySQL client, opening a terminal and running `mysql` generates an error `command not found: mysql`, how do i fix this?** + +This error report is the reason the environment variable is not set. Open a new terminal and execute the following command: === "**Linux Environment**" - ```bash - echo 'export PATH="/path/to/mysql/bin:$PATH"' >> ~/.bash_profile - source ~/.bash_profile - ``` + ```bash + echo 'export PATH="/path/to/mysql/bin:$PATH"' >> ~/.bash_profile + source ~/.bash_profile + ``` - Replace `/path/to/mysql/bin` in the above code with the MySQL installation path in your system. Usually, it is `/usr/local/mysql/bin`; if you are not sure about the installation path of MySQL, you can use the following command to find it: + Replace `/path/to/mysql/bin' in the above code with the installation path for MySQL on your system. Typically it is '/usr/local/mysql/bin', if you are unsure of the MySQL installation path you can find it using the following command: - ```bash - whereis mysql - ``` + ```bash + whereis mysql + ``` === "**MacOS Environment**" - After macOS 10, zsh is used as the default shell. Here, zsh is used as an example. If you use other shells, you can convert it yourself. + After macOS 10, use 'zsh' as the default 'shell'. Use zsh here as an example, and if you use another 'shell' you can convert it yourself. - ```zsh - echo export PATH=/path/to/mysql/bin:$PATH >> ~/.zshrc - source ~/.zshrc - ``` + ```zsh + echo export PATH=/path/to/mysql/bin:$PATH >> ~/.zshrc + source ~/.zshrc + ``` - Replace `/path/to/mysql/bin` in the above code with the MySQL installation path in your system. Usually, it is `/usr/local/mysql/bin`; if you are not sure about the installation path of MySQL, you can use the following command to find it: + Replace `/path/to/mysql/bin' in the above code with the installation path for MySQL on your system. Typically it is '/usr/local/mysql/bin', if you are unsure of the MySQL installation path you can find it using the following command: - ```bash - whereis mysql - ``` + ```bash + whereis mysql + ``` -### **When I install MatrixOne by building from source, I got an error of the following and the build failed, how can I proceed?** +**When I install and choose to install build MatrixOne from source, I get the following error or build failure indication. How do I continue?** -Error: `Get "https://proxy.golang.org/........": dial tcp 142.251.43.17:443: i/o timeout` +Error: `Get "https://proxy.golang.org/...": dial tcp 142.251.43.17:443: i/o timeout` -As MatrixOne needs many go libraries as dependency, it downloads them at the time of building it. This is an error of downloading timeout, it's mostly a networking issue. If you are using a Chinese mainland network, you need to set your go environment to a Chinese image site to accelerate the go library downloading. If you check your go environment by `go env`, you may see `GOPROXY="https://proxy.golang.org,direct"`, you need to set it by +Because MatrixOne requires many GO libraries as dependencies, it downloads the GO libraries at the same time as it builds. The error reported above is a download timeout error, mainly due to network issues. -``` -go env -w GOPROXY=https://goproxy.cn,direct -``` +- If you're using a mainland Chinese network, you need to set up your GO environment to a Chinese mirror site to speed up the download of the GO library. -Then the `make build` should be fast enough to finish. +- If you check your GO environment via `go env`, you may see `GOPROXY="https://proxy.golang.org,direct"`, then you need to set the following: -### **When I want to save the MatrixOne data directory to my specified file directory, how should I do it?** - -- When you use Docker to start MatrixOne, you can mount the data directory you specify to the Docker container, see [Mount directory to Docker container](../Maintain/mount-data-by-docker.md). +``` +go env -w GOPROXY=https://goproxy.cn,direct +``` -- When you use the source code or binary package to compile and start MatrixOne, you can modify the default data directory path in the configuration file: open the MatrixOne source file directory `matrixone/etc /launch-with-proxy`, modify the configuration parameter `data-dir = "./mo-data"` in the three files of `cn.toml`, `tn.toml` and `log.toml` is `data-dir = "your_local_path"`, save and restart MatrixOne It will take effect. +Once setup is complete, `make build` should be completed soon. -### **When I was testing MatrixOne with MO-tester, I got an error of `too many open files`?** +**When I test MatrixOne via MO-Tester, how do I resolve the `too many open files` error I generate?** -MO-tester will open and close many SQL files in a high speed to test MatrixOne, this kind of usage will easily reach the maximum open file limit of Linux and macOS, which is the reason for the `too many open files` error. +To test MatrixOne, MO-Tester quickly opens and closes many SQL files, and soon reaches the maximum open file limit for Linux and MacOS systems, which is what causes the `too many open files` error. -* For MacOS, you can just set the open file limit by a simple command: +* For MacOS systems, you can set limits on opening files with a simple command: ``` -ulimit -n 65536 +ulimit -n 65536 ``` -* For Linux, please refer to this detailed [guide](https://www.linuxtechi.com/set-ulimit-file-descriptors-limit-linux-servers/) to set the ulimit to 100000. +* For Linux systems, refer to the detailed [guide](https://www.linuxtechi.com/set-ulimit-file-descriptors-limit-linux-servers/) to set the *ulimit* to 100,000. -After setting this value, there will be no more `too many open files` error. +Once setup is complete, there will be no `too many open files` error. -### **Ssb-dbgen cannot be compiled on a PC with M1 chip** +**My PC is an M1 chip and when I run an SSB test I find that I can't compile successfully ssb-dbgen** -To complete the following configuration, then compiling 'SSB-DBgen' for a PC with M1 chip: +PCs with hardware configured as M1 chips also need to be configured as follows before compiling `ssb-dbgen`: 1. Download and install [GCC11](https://gcc.gnu.org/install/). -2. To ensure the gcc-11 is successful installed, run the following command: +2. Enter the command to confirm that gcc-11 succeeded: - ``` - gcc-11 -v - ``` + ``` gcc-11 -v ``` - The successful result is as below: + The following results indicate success: ``` Using built-in specs. @@ -136,63 +210,61 @@ To complete the following configuration, then compiling 'SSB-DBgen' for a PC wit gcc version 11.3.0 (Homebrew GCC 11.3.0) ``` -3. Modify the *bm_utils.c* file in the *ssb-dbgen* directory: +3. Manually modify the *bm\_utils.c* configuration file in the *ssb-dbgen* directory: - - Change the `#include ` in line 41 to `#include ` + - Modify `#include ` on line 41 to `#include ` - - Change the `open(fullpath, ((*mode == 'r')?O_RDONLY:O_WRONLY)|O_CREAT|O_LARGEFILE,0644);` in line 398 to ``open(fullpath, ((*mode == 'r')?O_RDONLY:O_WRONLY)|O_CREAT,0644);`` + - Change `open(fullpath, ((*mode == 'r')?O_RDONLY:O_WRONLY)|O_CREAT|O_LARGEFILE,0644);` in line 398 to `open(fullpath, ((*mode == 'r')?O_RDONLY:O_WRONLY)|O_CREAT,0644);` to `open(fullpath, ((*mode == 'r')? WRONLY)|O_CREAT,0644);` -4. Modify the *varsub.c* file in the *ssb-dbgen* directory: +4. Manually modify the *varsub.c* configuration file in the *ssb-dbgen* directory: - - Change the `#include ` in line 5 to `#include ` + - Modify `#include ` on line 5 to `#include ` -5. Modify the *makefile* file in the *ssb-dbgen* directory: +5. Manually modify the *makefile* configuration file in the *ssb-dbgen* directory: - - Change the `CC = gcc` in line 5 to `CC = gcc-11` + - Modify `CC = gcc` on line 5 to `CC = gcc-11` -6. Enter into *ssb-dbgen* directory again and compile: +6. Go to the *ssb-dbgen* directory again and compile: ``` - cd ssb-dbgen - make + cd ssb-dbgen make ``` -7. Check the *ssb-dbgen* directory, when the the *dbgen* file is generated, indicating that the compilation is successful. +7. View the *ssb-dbgen* directory and generate *the dbgen* executable, indicating that the compilation was successful. -### **I built MatrixOne in the main branch initially but encountered panic when switching to other versions for building** +**I built MatrixOne in the main branch first, now switch to another version before building panic** -The storage formats between MatrixOne version 0.8.0 and its earlier versions are not compatible with each other. This means that when executing `make build`, the system will automatically generate a data directory file named *mo-data* to store data. +This problem can occur when MatrixOne version switching involves versions prior to 0.8.0 and the `make build` command is used. This is an incompatibility issue caused by the MatrixOne datastore format upgrade, but will continue to be compatible since version 0.8.0. -In the future, if you need to switch to another branch and re-execute `make build` to build MatrixOne, it may cause a panic situation to occur. In this case, you need first to clean the *mo-data* data directory (that is, execute the `rm -rf mo-data` command), and then rebuild MatrixOne. +!!! note + In this case, we strongly recommend that you reinstall the [latest stable version](../../MatrixOne/Get-Started/install-standalone-matrixone.md) of MatrixOne for subsequent data compatibility, and recommend using the mo\_ctl tool for quick build and startup. -Reference code example: +Specifically, prior to MatrixOne version 0.8.0, after performing `make build`, a data directory file named *mo-data* was automatically generated to hold the data. If you switch to another branch and do `make build` again, the *mo-data* data directory is not automatically deleted, which may cause a panic situation due to incompatible data formats. -``` -[root ~]# cd matrixone // Go to the matrixone directory -[root ~]# git branch // Check the current branch -* 0.8.0 -[root ~]# make build // Build matrixone -... // The build process code is omitted here. If you want to switch to another version, such as version 0.7.0, -[root ~]# git checkout 0.7.0 // Switch to version 0.7.0 -[root ~]# rm -rf mo-data // Clean up the data directory -[root ~]# make build // Build matrixone again -... // The build process code is omitted here -[root ~]# ./mo-service --daemon --launch ./etc/launch/launch.toml &> test.log & // Start MatrixOne service in the terminal backend -``` +To fix this, you need to clean up the *mo-data* data directory (that is, execute the `rm -rf mo-data` command) before rebuilding MatrixOne. -!!! note - The MatrixOne version 0.8.0 is compatible with the storage format of older versions. If you use version 0.8.0 or a higher version, there is no need to clean the data file directory when switching to other branches and buildings. +The following reference code example uses an earlier build process: + +``` bash +[root ~]# cd matrixone // Go to matrixone file directory +[root ~]# git branch // View current branch * 0.8.0 +[root ~]# make build // Build matrixone +... // Omit build process code here. If you want to switch to a different version at this point, for example version 0.7.0 +[root ~]# git checkout 0.7.0 // switch to version 0.7.0 +[root ~]# rm -rf mo-data // clean up the data directory +[root ~]# make build // build matrixone ... // omit the build process code here +``` -### **Password authentication error when connecting to the MatrixOne cluster via proxy with CN label** +**I connect proxy with CN tag and login to MatrixOne cluster with password validation error** -- **Issue Reason**: Incorrect connection string formatting. Support for extending the username field is available when connecting to the MatrixOne cluster through the MySQL client. You can add a `?` after the username and follow it with CN group labels. CN group labels consist of key-value pairs separated by `=` and multiple key-value pairs are separated by commas `,`. +- **Cause of problem**: The connection string was incorrectly written. Connecting to a MatrixOne cluster via a MySQL client supports extending the user name field by adding `?`,`?` after the user name You can then follow the CN group label with `=` between the key and value of the CN group label and comma `,` between multiple key-values. -- **Solution**: Please refer to the following example. +- **Workaround**: Refer to the following example. -Assuming the configuration for the CN group in your MatrixOne's `mo.yaml` configuration file is as shown below: +Suppose in your MatrixOne `mo.yaml` configuration file, the configuration of the CN group looks like this: ```yaml -## Displaying partial code only +## Only part of the code is shown ... - cacheVolume: size: 100Gi @@ -203,6 +275,10 @@ Assuming the configuration for the CN group in your MatrixOne's `mo.yaml` config ... ``` -When connecting to the MatrixOne cluster through the MySQL client, you can use the following command example: `mysql -u root?workload=bk -p111 -h 10.206.16.10 -P 31429`. In this command, `workload=bk` is the CN label, connected using `=`. +Connecting to the MatrixOne cluster via a MySQL client, you can use the following command example: `mysql -u root?workload=bk -p111 -h 10.206.16.10 -P 31429`. where `workload=bk` is the CN tag, connected using `=`. + +**There is a pod called job-bucket that keeps getting up when installing the latest operator. How should I troubleshoot it?** + +Can see if there is no secret. It may be that the minio connection information is not configured to connect to the minio. -Similarly, when using the `mo-dump` tool to export data, refer to the following command example: `mo-dump -u "dump?workload=bk" -h 10.206.16.10 -P 31429 -db tpch_10g > /tmp/mo/tpch_10g.sql`. +Similarly, the command to export data using the `mo-dump` tool, you can refer to the example using the following command: `mo-dump -u "dump?workload=bk" -h 10.206.16.10 -P 31429 -db tpch_10g > /tmp/mo/tpch_10g.sql`. diff --git a/docs/MatrixOne/FAQs/product-faqs.md b/docs/MatrixOne/FAQs/product-faqs.md index 852c9e6ff..53ec4ce7e 100644 --- a/docs/MatrixOne/FAQs/product-faqs.md +++ b/docs/MatrixOne/FAQs/product-faqs.md @@ -1,39 +1,96 @@ -# **Product FAQs** +# Product Frequently Asked Questions -* **What is MatrixOne?** +## Product related - MatrixOne is a future-oriented hyperconverged cloud & edge native DBMS that supports transactional, analytical, and streaming workload with a simplified and distributed database engine, across multiple data centers, clouds, edges, and other heterogenous infrastructures. The all-in-one architecture of MatrixOne will significantly simplify database management and maintenance, creating a single database that can serve multiple data applications. - For information about MatrixOne, you can see [MatrixOne Introduction](../Overview/matrixone-introduction.md). +**What is MatrixOne?** -* **Where can I apply MatrxOne?** +MatrixOne is a future-proof, hyperconverged, heterogeneous cloud-native database that supports hybrid workloads such as transaction/analysis/streaming with a hyperconverged data engine and cross-room/multi-site/cloud-side collaboration with a heterogeneous cloud-native architecture. MatrixOne wanted to simplify the costs of data system development and operations, reduce data fragmentation between complex systems, and break down the boundaries of data convergence. +To learn more about MatrixOne, you can browse the [Introduction to MatrixOne](../Overview/matrixone-introduction.md). - MatrixOne provides users with HTAP services to support hybrid workloads. It can be used to build data warehouse or data platforms. +**Is MatrixOne developed based on MySQL or another database?** -* **Is MatrixOne based on MySQL or some other database?** + MatrixOne is a new database built from scratch. MatrixOne is compatible with some of MySQL's syntax and semantics, and will produce more semantics in the future than MySQL, so we can make it a more powerful hyper-converged database. For compatibility with MySQL, see [MySQL compatibility](../Overview/feature/mysql-compatibility.md). - MatrixOne is a completely redesigned database. It's compatible with part of MySQL syntax and semantics. We are currently working to support more database semantics such as PostgreSQL, Hive, Clickhouse, since we intend to develop MatrixOne as a hyperconverged database. - To learn more about the compatibility with MySQL, you can visit [MySQL-Compatibility](../Overview/feature/mysql-compatibility.md). +**What programming language was MatrixOne developed in?** -* **Which programming language is MatrixOne developed with?** +MatrixOne currently primarily uses **Golang** as the primary programming language. - Currently, the primary programming language used for our programming in **Golang**. +**What programming languages are currently supported for connecting to MatrixOne?** -* **What operating system does MatrixOne support?** +MatrixOne supports Java, Python, Golang languages, and ORM connections. Other languages can also connect MO as MySQL. - MatrixOne supports Linux and macOS. Please refer to [deployment FAQ](deployment-faqs.md) for more details. +**What compression algorithm does MatrixOne column storage use?** -* **Which MatrixOne data types are supported?** +MatrixOne column storage currently uses the LZ4 compression algorithm and does not support modification via configuration. - You can see [data types in MatrixOne](../Reference/Data-Types/data-types.md) to learn more about the data types we support. +**Can I upgrade the lower version to the latest version?** -* **Where can I deploy MatrixOne?** +Both MatrixOne 0.8.0 and above can be upgraded directly from a lower version to the latest version using `mo_ctl upgrade latest`. You can refer to the [mo\_ctl tool](../Maintain/mo_ctl.md). Versions prior to 0.8.0 are recommended to back up data, reload and import if required for upgrades. - MatrixOne can be deployed on-premise, public cloud, private cloud, or Kubernetes. +Is MatrixOne stable **now? Which version is recommended?** -* **Can I contribute to MatrixOne?** +MatrixOne is now available in version 1.2.2\. We've done a lot of optimization work on stability, and it's ready to be used in the production business. - Yes, MatrixOne is an open-source project developed on GitHub. Contribution instructions are published in [Contribution Guide](../Contribution-Guide/make-your-first-contribution.md). We welcome developers to contribute to the MatrixOne community. +**Is there a cloud version of MatrixOne? Want a quick test to see**. -* **In addition to the MatrixOne documentation, are there any other ways to acquire MatrixOne knowledge?** +There is. Now mo cloud has started public testing. Details View [MatrixOne Cloud Documentation](https://docs.matrixorigin.cn/zh/matrixonecloud/MatrixOne-Cloud/Get-Started/quickstart/) - Currently, [MatrixOne documentation](https://docs.matrixorigin.cn/en/) is the most important and timely way to get MatrixOne related knowledge. In addition, we also have some technical communication groups in Slack and WeChat. If you have any needs, contact [opensource@matrixorigin.io](mailto:opensource@matrixorigin.io). +## Architecture related + +Are **MatrixOne permissions also designed based on the RBAC model? Is it possible to grant permissions directly to the user?** + +MatrixOne's rights management is designed and implemented using a combination of role-based access control (RBAC) and autonomous access control (DAC) security models. MatrixOne does not support granting rights directly to users and requires authorization through roles. + +**How does a highly available architecture work?** + +There is currently no highly available architecture for the standalone version of MatrixOne, and the highly available architecture for the master-slave version is still being designed. Distributed versions are inherently highly available, and both k8s and s3 are inherently highly available architectures. MatrixOne's nodes cn and tn are stateless, hang can be pulled up at any time, the log service is stateful, its 3 nodes are a distributed architecture that provides a raft group, hang 1 is okay, keep running, hang 2 systems will be unavailable. + +**Does the current tn node of the k8s cluster node support expansion?** + +Scaling is not yet supported by the current tn node of MatrixOne. + +What are the **various components used for? Minimize how many instances do you need to deploy? Can you support non-stop non-sensual expansion later?** + +The MatrixOne core has 4 components, proxy,cn,tn,log service. cn is the stateless compute node, tn is the transaction node, and log service is the log of the transaction, equivalent to WAL. proxy is used for load balancing and resource group management. It can be done in 3 physical/virtual machines if they are all deployed in a hybrid. Senseless expansion is possible, mo is memory separated, and the expansion of storage is the expansion of s3. The calculated expansion is cn, itself based on k8s, cn stateless, and a container that can be expanded quickly. + + **How is resource isolation achieved between multiple tenants?** + +The resource isolation core of MatrixOne is that the ACCOUNT can correspond to the resource group of the CN Set, or the tenant isolation can be considered the container isolation of the CN. In addition to different resource groups that can be assigned by multi-tenants, CN resource groups can be further assigned within a single tenant based on business type for more granular control. For a complete description of resource isolation, see [Load and Tenant](../Deploy/mgmt-cn-group-using-proxy.md) Isolation + +Can the table engine in **MySQL be migrated directly? Is it compatible with engines like InnoDB?** + +MatrixOne does not support MySQL's InnoDB, MyISAM and other engines, but can use MySQL's statements directly. MatrixOne ignores these engines, and there is only one storage engine in MatrixOne, TAE, which has been developed completely independently and can be applied to all kinds of scenarios in a friendly manner without using ENGINE=XXX to replace the engine. + +## Functionally related + +**What applications does MatrxOne support?** + + MatrixOne provides users with the ultimate HTAP service. MatrixOne can be used in enterprise data centers, big data analytics and other scenarios. + +**Which database is MatrixOne compatible with?** + +MatrixOne remains highly compatible with MySQL 8.0 in usage, including SQL syntax, transfer protocols, operators and functions, and more. A list of differences with MySQL 8.0 compatibility can be found in the [MySQL compatibility list](../Overview/feature/mysql-compatibility.md). + +**How about MySQL compatibility, is it used directly as MySQL in BI?** + +MatrixOne is highly compatible with MySQL 8.0 and is generally consistent with MySQL in terms of communication protocols, SQL syntax, connectivity tools, and development mode. Many administrative and ecological tools can also reuse MySQL's tools. BI can be used directly as MySQL, please refer to [FineBI for visual reports](../Develop/Ecological-Tools/BI-Connection/FineBI-connection.md), [Yonghong BI for](../Develop/Ecological-Tools/BI-Connection/yonghong-connection.md) visual reports and [Superset for visual monitoring](../Develop/Ecological-Tools/BI-Connection/Superset-connection.md). + +## Database Comparison Related + +**How does MatrixOne stand-alone and MySQL performance compare?** + +The standalone version of MatrixOne slightly outperforms MySQL in TP performance, but far outperforms MySQL in Load, Stream Write, Analyze queries. + +**What is the difference with HTAP database TiDB?** + +MatrixOne is not the same architecture as TiDB. MatrixOne is memory-separated, a cloud-based shared storage architecture with data in one place, one copy, and HTAP implemented with one engine. And TiDB is Share nothing architecture, data to be fragmented, TiKV to do TP, TiFlash to do AP, using two engines with an ETL to do the HTAP, data to be stored in two copies. + +## Other + +* **Can I participate in contributing to the MatrixOne project?** + +MatrixOne is an open source project entirely on Github and all developers are welcome to contribute. See our [Contribution Guide](../Contribution-Guide/make-your-first-contribution.md) for more information. + +* **Is there an alternative to official documentation for MatrixOne knowledge acquisition?** + +Currently, [MatrixOne documentation](https://docs.matrixorigin.cn) is the most important and timely way to gain knowledge about MatrixOne. In addition, we have a number of technology communities at Slack and WeChat. For any requests, please contact [opensource@matrixorigin.io](mailto:opensource@matrixorigin.io). diff --git a/docs/MatrixOne/FAQs/sql-faqs.md b/docs/MatrixOne/FAQs/sql-faqs.md index b5cfa351d..0ff26706d 100644 --- a/docs/MatrixOne/FAQs/sql-faqs.md +++ b/docs/MatrixOne/FAQs/sql-faqs.md @@ -1,34 +1,135 @@ -# **SQL FAQs** +# SQL Frequently Asked Questions -* **Are functions and other keywords case sensitive?** +## Basic function related - No, they are not case sensitive. Only in one case case is sensitive in MatrixOne, if user creates table and attributes with \`\`, the name in\`\` is case sensitive. To find this table name or attribute name in your query, it needs to be in \`\` as well. +**Is MatrixOne case sensitive to identifiers?** -* **How do I export data from MatrixOne to a file?** +MatrixOne is case-insensitive by default for identifiers and supports case-sensitive support with the lower_case_table_names parameter, which is described in more detail in [Case Sensitive Support](../Reference/Variable/system-variables/lower_case_tables_name.md) - You can use `SELECT INTO OUTFILE` command to export data from MatrixOne to a **csv** file (only to the server host, not to the remote client). - For this command, you can see [SELECT Reference](../Reference/SQL-Reference/Data-Query-Language/select.md). + **What SQL statements does MatrixOne support?** -* **What is the size limit of a transaction in MatrixOne?** +SQL statements currently supported by MatrixOne can refer [to SQL statement](../Reference/SQL-Reference/SQL-Type.md) classification. - The transaction size is limited to the memory size you have for hardware environment. +**What data types does MatrixOne support?** -* **What kind of character set does MatrixOne support?** +MatrixOne currently supports common integer, floating point, string, time date, boolean, enumeration, binary, JSON types, see [Data Type Overview](../Reference/Data-Types/data-types.md). - MatrixOne supports the UTF-8 character set by default and currently only supports UTF-8. +**What types of character sets does MatrixOne support?** -* **What is the `sql_mode` in MatrixOne?** +MatrixOne supports the UTF-8 character set by default and currently supports only UTF-8. - MatrixOne doesn't support modifying the `sql_mode` for now, the default `sql_mode` is as the `only_full_group_by` in MySQL. +**What constraints and indexes does MatrixOne support?** -* **How do I bulk load data into MatrixOne?** +MatrixOne currently supports Primary Key, Unique Key, Not Null, Foreign Key, Auto Increment constraint, and Secondary Index. Secondary indexes currently only implement syntax support, not acceleration. In addition, MatrixOne provides a sort key (Cluster by) for orphaned key tables, which helps us sort the columns we need to query ahead of time and accelerate the query. - MatrixOne provides two methods of bulk load data: 1. Using `source filename` command from shell, user can load the SQL file with all DDL and insert data statements. 2. Using `load data infile...into table...` command from shell, user can load an existing .csv file to MatrixOne. + **What query types does MatrixOne support?** -* **How do I know how my query is executed?** +MatrixOne supports most common SQL queries: - To see how MatrixOne executes for a given query, you can use the [`EXPLAIN`](../Reference/SQL-Reference/Other/Explain/explain.md) statement, which will print out the query plan. +Basic Query: Supports common grouping, deduplication, filtering, sorting, qualifying, regular expression and other basic query capabilities. + +Advanced Query: Supports advanced query capabilities such as views, subqueries, joins, combinations, common table expressions (CTEs), window functions, and Prepare preprocessing. + +Aggregate Functions: Supports common AVG, COUNT, MIN, MAX, SUM and other aggregate functions. + +System Functions and Operators: Supports common strings, date and time, mathematical functions and common operators. + +**What are the reserved keywords for MatrixOne?** + +A list of reserved keywords for MatrixOne can be found in [Keywords](../Reference/Language-Structure/keywords.md). + +When using the reserved keyword as an identifier, it must be wrapped in back quotation marks, otherwise an error will occur. When using non-reserved keywords as identifiers, you can use them directly without wrapping them in back quotes. + +**Materialized views are not supported by the MatrixOne branch?** + +MatrixOne does not currently support materialized views, and with current AP performance support, direct analysis can also result in a higher analysis experience. The materialized view feature is also already in MatrixOne's Roadmap. If you have a high rigidity need for a materialized view, feel free to mention Issue to describe your scenario: + + **The MatrixOne branch does not support Geometry?** + +Not yet supported, will be supported later. + +**Are functions and keywords in MatrixOne case sensitive?** + +Case-insensitive. In MatrixOne, there is only one case where you need to be case sensitive: if you create tables and properties with names in \`\`,\`\` you need to be case sensitive. query this table name or attribute name, then the table name and attribute name also need to be included in . + +Does **MatrixOne support transactions? What is the supported transaction isolation level?** + +MatrixOne supports transaction capabilities of ACID (atomicity, consistency, isolation, persistence), supports pessimistic and optimistic transactions, and uses pessimistic transactions by default. Read Committed isolation levels are used when pessimistic transactions are used, and Snapshot Isolation isolation levels are used when switching to optimistic transactions. + +**Are functions and keywords in MatrixOne case sensitive?** + +Case-insensitive. In MatrixOne, there is only one case where you need to be case sensitive: if you create tables and properties with names in \`\`,\`\` you need to be case sensitive. query this table name or attribute name, then the table name and attribute name also need to be included in . + +## Data Import/Export Related + +**How do I import data into MatrixOne?** + +MatrixOne supports the same [`INSERT`](../Develop/import-data/insert-data.md) data insertion statements as MySQL, real-time data writing via `INSERT`, and offline bulk import statements for [`LOAD DATA`](../Develop/import-data/bulk-load/bulk-load-overview.md). + +**How do I export data from MatrixOne to a file?** + +In MatrixOne, you can use the binary tool [`mo-dump`](../Develop/export-data/modump.md) to export data to SQL or csv files, or [`SELECT INTO to`](../Develop/export-data/select-into-outfile.md) export `csv` files. + + **How to export only table structure via mo-dump tool?** + +You can specify not to export data by adding the -no-data parameter to the export command. + +**Some fields are missing from a json object imported using load data, does the import report an error?** + +An error is reported, and in import json, there are more fields than in the table, which can be imported normally, but the extra fields are ignored, and if there are fewer, they cannot be imported. + +**Is it possible to write relative paths to import files while performing source import?** + +Yes, but to prevent errors relative to your current path using the mysql client, or to recommend writing the full path, also note the file permissions issue. + +**Is it possible to optimize when importing a large file using the load data command?** + +You can turn on parallel imports by specifying PARALLEL to true when importing. For example, for a large file of 2 Gs, two threads are used to load it, and the second thread is split-positioned to the 1G position and then read back and loaded. This allows two threads to read large files simultaneously, each reading 1G of data. You can also slice the data file yourself. + + **Is there a transaction for load data import?** + +All load statements are transactional. + +**source Do triggers and stored procedures involved when importing sql take effect?** + +Currently if there are incompatible data types, triggers, functions or stored procedures in sql, you still need to modify them manually, otherwise execution will report an error. + +**Does mo-dump support batch export of multiple databases?** + +Exporting a backup of a single database is currently only supported. If you have multiple databases to back up, you need to manually run mo-dump multiple times. + +**Does MatrixOne support importing data from Minio?** + +Yes, the load data command supports importing data from local files, S3 object storage services, and S3 compatible object storage services into matrixone, while Minio is also based on the S3 protocol, so it is also supported, see [Local Object Storage](https://docs.matrixorigin.cn/1.2.2/MatrixOne/Deploy/import-data-from-minio-to-mo/) for details + +**When MatrixOne imports and exports data, if there is an encoding problem that causes the data to be scrambled, how do we generally solve it** + +Since matrixone only supports UTF8 as an encoding by default and cannot be changed, if there is a garbled code when importing data, we can't solve it by modifying the character set of the database and tables. We can try converting the data encoding to UTF8. Common conversion tools are iconv and recode, such as: Convert GBK encoded data to UTF-8 encoded: iconv -f GBK -t UTF8 t1.sql > t1_utf8.sql. + +**What permissions are required when importing and exporting MatrixOne?** + +If you are a tenant administrator, you can import and export directly through the default role. For normal users, you need 'insert' permission to import tables when importing; 'select' permission to export tables when exporting by select...into outfile; and *'select'* permission to all tables (table\*.) and 'show tables' permission to all libraries (database.\*) when exporting by mo-dump. + +## Permission related + +**Can regular users grant MOADMIN roles?** + +No, MOADMIN is the highest cluster administrator privilege and only the root user has it. + +## Other + +**What is `sql_mode` in MatrixOne?** + + MatrixOne's default `sql_mode` is `only_full_group_by` in MySQL. So all `select` fields in the default query syntax, except those in the aggregate function, must appear in `group by` . But MatrixOne also supports modifying `sql_mode` to be compatible with the incomplete specification of `group by` syntax. + +**show tables in MatrixOne cannot view temporary tables, how can I see if it was created successfully?** + +Currently it can be viewed through the "show create table temporary table name" . Since temporary tables are only visible in the current session after they are created, at the end of the current session the database automatically deletes the temporary table and frees up all space, which is usually human-aware during its lifetime. + +**How do I view my Query execution plan?** + + To see how MatrixOne executes on a given query, you can use the [`EXPLAIN`](../Reference/SQL-Reference/Other/Explain/explain.md) statement, which prints out the query plan. ``` - EXPLAIN SELECT col1 FROM tbl1; + EXPLAIN SELECT col1 FROM tbl1; ``` diff --git a/docs/MatrixOne/Get-Started/install-standalone-matrixone.md b/docs/MatrixOne/Get-Started/install-standalone-matrixone.md index a25e5ba53..0b5c5ea3e 100644 --- a/docs/MatrixOne/Get-Started/install-standalone-matrixone.md +++ b/docs/MatrixOne/Get-Started/install-standalone-matrixone.md @@ -10,10 +10,29 @@ MatrixOne supports **Linux** and **MacOS**. For quick start, we recommend the fo | :------ | :----- | :-------------- | :------| |Debian| 11 and later | x86 / ARM CPU; 4 Cores | 16 GB | |Ubuntu| 20.04 and later | x86 / ARM CPU; 4 Cores | 16 GB | +|CentOS| 7 and later | x86 / ARM CPU;4 核 | 16 GB | |macOS| Monterey 12.3 and later | x86 / ARM CPU; 4 Cores | 16 GB | +!!! note + If you are currently using a Linux kernel version lower than 5.0, due to the limitation of the linux kernel, the deployment of Matrixone using binary packages built based on glibc may report errors related to glibc, in this case, you can choose to use the **binary packages built based on musl libc in the [Binary Package Deployment](./install-on-linux/install-on-linux-method2.md) for deployment. musl libc is a lightweight C standard library for Linux, and using musl libc to package your application allows you to generate static binaries that do not depend on the system C library. musl libc is a lightweight C standard library designed for Linux systems. In addition, since CentOS 8 is no longer officially supported and CentOS 7 will end its maintenance cycle on June 30, 2024, users currently using these versions may be at some risk. Therefore, we recommend that users use other operating system versions. + For more information on the required operating system versions for deploying MatrixOne, see [Hardware and Operating system requirements](../FAQs/deployment-faqs.md). +### **Support for domestic systems** + +As a domestic database, MatrixOne is currently compatible with and supports the following domestic operating systems: + +|Operating System |Operating System Version | CPU |Memory| +| :------ |:------ | :------ | :----- | +|OpenCloudOS| v8.0 / v9.0 | x86 CPU;4 Cores | 16 GB | +|openEuler | 20.03 | x86 / ARM CPU;4 Cores | 16 GB | +|TencentOS Server | v2.4 / v3.1 | x86 CPU;4 Cores | 16 GB | +|UOS | V20 | ARM CPU;4 Cores | 16 GB | +|KylinOS | V10 | ARM CPU;4 Cores | 16 GB | +|KylinSEC | v3.0 | x86 / ARM CPU;4 Cores | 16 GB | + +__NOTE__: Supported domestic CPUs include TengCloud S2500, FT2000+/64, Kunpeng 916, Kunpeng 920 and Haikuang H620-G30. + ## **Deploy on macOS** You can install and connect to MatrixOne on macOS in one of three ways that work best for you: diff --git a/docs/MatrixOne/Maintain/backup-restore/backup-restore-overview.md b/docs/MatrixOne/Maintain/backup-restore/backup-restore-overview.md index aa46853b9..a04943f99 100644 --- a/docs/MatrixOne/Maintain/backup-restore/backup-restore-overview.md +++ b/docs/MatrixOne/Maintain/backup-restore/backup-restore-overview.md @@ -1,78 +1,97 @@ # MatrixOne Backup and Recovery Overview -Database backup and recovery are core operations for any database management system and are crucial for ensuring data security and availability. MatrixOne provides flexible and powerful database backup and recovery capabilities to ensure the integrity and continuity of user data. This article provides an overview of the importance of MatrixOne database backup and recovery, backup strategies, and backup methods. +Database backup and recovery is one of the core operations of any database management system and an important guarantee of data security and availability. MatrixOne also provides flexible and robust database backup and recovery capabilities to ensure the integrity and continuity of user data. This article describes the importance of MatrixOne database backup and recovery, backup strategies, and backup methods. -## Backup and Recovery Strategies +## Backup and recovery strategies -Database backup and recovery can restore operational status in various disaster recovery scenarios. +Under different disaster recovery scenarios, a database backup can be used to restore health. -Here are backup and recovery methods for different situations: +Here are backup recovery methods for different scenarios: -1. **Operating System Crash**: +1. **Operating system crash**: - - With Physical Backup: Restore the entire database state using physical backup after a crash. Restore the backup to normal hardware conditions and apply redo logs to ensure data consistency. - - With Logical Backup: Rebuild the database architecture and data on a new server using logical backup. First, install the database software, execute SQL statements from the logical backup to create tables, indexes, and more, and then import the data. + - If there is a physical backup: After a crash, restore the entire database state through a physical backup, restore the backup to a normal hardware environment, and apply redo logs to ensure data consistency. + - If there is a logical backup: Rebuild the database schema and data on the new server through a logical backup. Install the database software and execute the SQL statements in the logical backup to create tables, indexes, etc. before importing the data. -2. **Power (Detection) Failure**: +2. **Power supply (detection) failed**: - - With Physical Backup: Recover the database using physical backup after a failure. Restore the backup to normal hardware conditions and apply redo logs for data consistency. - - With Logical Backup: Similarly, rebuild the database on a new server using logical backup. + - If there is a physical backup: After failure, the database can be restored through a physical backup, restoring the backup to a normal hardware environment, and applying redo logs to ensure data consistency. + - If there is a logical backup: Again, rebuild the database on the new server through a logical backup. -3. **File System Crash**: +3. **File system crash**: - - With Physical Backup: Use a physical backup to recover the database, restore the backup to normal hardware conditions, and apply redo logs for data consistency. - - With Logical Backup: After a crash, rebuild the database architecture and data on a new server. + - If there is a physical backup: Restore the database using a physical backup, restore the backup to a normal hardware environment, and apply redo logs to ensure data consistency. + - If there is a logical backup: After a crash, rebuild the database schema and data on the new server. -4. **Hardware Issues (e.g., Hard Drive, Motherboard)**: +4. **Hardware issues (hard drives, motherboards, etc.):** - - With Physical Backup: Recover the database using physical backup, restoring the backup to new hardware conditions and applying redo logs for data consistency. - - With Logical Backup: Rebuild the database on new hardware using logical backup. + - If there is a physical backup: restore the database through a physical backup, restore the backup to a new hardware environment, and apply redo logs to ensure data consistency. + - If there is a logical backup: Use the logical backup to rebuild the database in the new hardware environment. -For backup and recovery, consider the following strategies: +The following strategies can be followed for backup recovery: -1. **Backup Frequency**: Determine the frequency of backups, typically divided into full and incremental backups. Full backups consume more storage space and time but offer faster recovery, while incremental backups are more economical. +1. **Frequency of backups**: Determine the frequency of backups, usually divided into full and incremental backups. Full backups take up more storage and time, but restore faster, while incremental backups are more economical. -2. **Backup Storage**: Choose a secure backup storage location to ensure backup data is not easily compromised or lost. Typically, use offline storage media or cloud storage for backups. +2. **Backup storage**: Choose a secure backup storage location to ensure backup data is not vulnerable to corruption or loss. Offline storage media or cloud storage is often used to house backups. -3. **Backup Retention Period**: Determine the retention period for backup data to facilitate historical data retrieval and recovery when needed. Establish data retention policies based on regulations and business requirements. +3. **Backup Retention Period**: Determine the retention period for backup data for retrieval and recovery of historical data when needed. Develop appropriate data retention policies based on regulations and business needs. -Regardless of the recovery scenario, follow these principles: +Whatever the recovery scenario, the following principles should be followed: -1. Consider stopping database operations to prevent data changes. -2. Choose an appropriate backup for recovery based on backup type. -3. Restore backup files. -4. Consider applying corresponding redo logs to ensure data consistency. -5. Start the database and perform the necessary testing. +1. Consider stopping database operation to prevent data changes. +2. Select the appropriate backup for recovery based on the backup type. +3. Restore the backup file. +4. Consider applying the appropriate redo logs to ensure data consistency. +5. Start the database and perform the necessary tests. -## Database Backup Methods +## Database backup method -MatrixOne provides multiple backup methods, considering database requirements, performance, resources, and recovery time. +MatrixOne offers a variety of backup methods that take into account factors such as database requirements, performance, resources, and recovery time. -MatrixOne databases offer various backup tools to meet different scenarios and needs: +The MatrixOne database provides multiple backup tools to meet different scenarios and needs: -1. **modump**: Used for exporting data and schemas from the database. It generates recoverable SQL scripts for logical backups. +1. **mo-dump**: Used to export data and patterns from a database. It generates recoverable SQL scripts for logical backups. -2. **mo-backup**: Used for physical backup and recovery. `mo-backup` is a tool for physical backup and recovery of MatrixOne enterprise services, helping protect your MatrixOne database and perform reliable recovery operations when needed. +2. **mo-backup**: for physical backup and recovery. `mo-backup` is a physical backup and recovery tool for MatrixOne enterprise-class services that helps you protect your MatrixOne databases and perform reliable recovery operations when needed. - !!! note - **mo-backup** is a physical backup and recovery tool for enterprise-level services. Contact your MatrixOne account manager for the tool download path and user guide. + !!! note + **mo-backup** Physical backup and recovery tool for enterprise level services, you need to contact your MatrixOne account manager for the tool download path. ### Logical Backup and Recovery -#### Using `SELECT INTO OUTFILE` for Backup +#### Backup with `SELECT INTO OUTFILE` -Use the `SELECT ... INTO OUTFILE` command to export retrieved data in a specific format to a file. The exported file is created by the MatrixOne service and is only available on the MatrixOne server host. Exporting to the client file system is not supported. Ensure that the export directory does not have files with the same name to avoid overwriting new files. +Use the `SELECT ... INTO OUTFILE` command to export the retrieved data to a file in format, created by the MatrixOne service and only on the MatrixOne server host. Export to client file system is not supported, export directory do not rename files to avoid overwriting new files. -For more information on operational steps and examples, see [Export data by SELECT INTO](../../Develop/export-data/select-into-outfile.md). +For operational steps and examples, see [`SELECT INTO...OUTFILE`](../../Develop/export-data/select-into-outfile.md) -#### Using `modump` for Backup +#### Backup with `mo-dump` -MatrixOne supports logical backup using the `modump` tool, which generates SQL statements that can be used to recreate database objects and data. +MatrixOne supports logical backups using the `mo-dump` tool to generate SQL statements that can be used to recreate database objects and data. -For more information on operational steps and examples, see [Export data by MODUMP](../../Develop/export-data/modump.md). +For operational steps and examples, see the [`mo-dump tool Write`](../../Develop/export-data/modump.md) Out -#### Using Command-Line Batch Import for Recovery +#### Bulk Import Recovery Using the Command Line -MatrixOne supports inserting many rows into database tables using the `LOAD DATA` command. It also supports importing table structures and data into the entire database using the' SOURCE' command. +MatrixOne supports inserting large numbers of rows into database tables using the `LOAD DATA` command. It also supports importing table structures and data into the entire database using the `SOURCE` command. -For more information, see [Bulk Load Overview](../../Develop/import-data/bulk-load/bulk-load-overview.md). +For more information, see [Batch Import](../../Develop/import-data/bulk-load/bulk-load-overview.md) + +### Physical Backup and Recovery + +#### Backup and restore with `mo_br` + +MatrixOne supports regular physical and snapshot backups using the `mo_br` tool. + +See the [`mo-br User Guide`](../backup-restore/mobr-backup-restore/mobr.md) for steps and examples + +#### Using SQL Backup and Recovery + +MatrixOne supports snapshot backup and recovery using SQL. + +Refer to the documentation for methods of snapshot backup and recovery using SQL: + +- [CREATE SNAPSHOT](../../Reference/SQL-Reference/Data-Definition-Language/create-snapshot.md) +- [DROP SNAPSHOT](../../Reference/SQL-Reference/Data-Definition-Language/drop-snapshot.md) +- [SHOW SNAPSHOTS](../../Reference/SQL-Reference/Data-Definition-Language/create-snapshot.md) +- [RESTORE ACCOUNT](../../Reference/SQL-Reference/Data-Definition-Language/restore-account.md) diff --git a/docs/MatrixOne/Maintain/backup-restore/key-concepts.md b/docs/MatrixOne/Maintain/backup-restore/key-concepts.md index 91366f0e3..ce06c6fe6 100644 --- a/docs/MatrixOne/Maintain/backup-restore/key-concepts.md +++ b/docs/MatrixOne/Maintain/backup-restore/key-concepts.md @@ -1,48 +1,52 @@ -# Backup and Recovery Concepts +# Backup and Recovery Related Concepts -## Physical Backup vs. Logical Backup +## Physical, snapshot and logical backups -### Physical Backup +### Physical backup -Physical backup directly copies database files to a backup medium, such as tape or disk. This method involves copying all physical data blocks of the database, including data files, control files, and redo log files. The backed-up data is the binary data stored on disk, making recovery operations typically faster. +Physical backup is the process of copying database files directly to backup media such as tape, hard drives, etc. This method copies all physical blocks of the database to backup media, including data files, control files, and redo log files. Backed-up data is binary data that is actually stored on disk, and recovery operations are usually quick. -### Logical Backup +### snapshot backup -Logical backup involves backing up logical objects (e.g., tables, indexes, stored procedures) in the database using SQL statements. This backup method exports the definitions and data of logical objects to a backup file but does not include the binary data of database files. Although recovery may be slower, backup data is usually more readable and modifiable. +Database snapshot backup is a form of physical backup, but unlike traditional physical backup, it creates an instant copy of data by capturing a read-only static view of the database at a specific point in time. This backup method utilizes an incremental storage mechanism that records only blocks of data that have changed since the last snapshot, making efficient use of storage space. Snapshot backups support fast recovery because they provide a complete, consistent view of the database for data protection, report generation, analysis, and other scenarios that require data consistency. In addition, they typically rely on the snapshot capabilities of the underlying storage system to provide a secure copy of data access to the database without affecting its proper operation. -### Differences Between Physical and Logical Backup +### Logical backup -The primary difference between physical and logical backup lies in the form of the backed-up data. Physical backup copies the binary data on disk, while logical backup backs up the definitions and data of analytical objects. The two methods have differences in backup speed, data size, and backup flexibility. +A logical backup is a backup of logical objects (such as tables, indexes, stored procedures, etc.) in a database through SQL statements. This backup method exports definitions and data for logical objects to a backup file, but does not involve binary data for database files. While recovery is slower, backup data is often easier to read and modify. -## Full Backup, Incremental Backup, and Differential Backup +### The difference -### Full Backup +Data physical backup, logical backup, and snapshot backup are three different data protection strategies: physical backup creates a full copy of the database by directly replicating the database's storage files for rapid recovery and large-scale data migration; logical backup exports the logical structure of the database, such as SQL statements, to store data and structure in text to facilitate data migration across platforms and versions; and snapshot backup is a read-only view of the database at a given point in time, using incremental storage technology to record changes for rapid recovery to a specific point in time state, often dependent on storage system support. -Full backup involves backing up the entire dataset to storage devices, including all files and folders. While it is typically time-consuming and requires substantial storage space, it can restore data without relying on other backup files. +## Full, incremental, and differential backups -### Incremental Backup +### Full backup -Incremental backup only backs up recently changed or added files or data blocks. It backs up changes made since the last backup, typically consuming less backup time and storage space. Incremental backups are often performed between regular full backups to ensure the backup data stays up-to-date. You need all incremental backups and the latest full backup to restore data. +Full backup backs up the entire data set to the storage device, including all files and folders. Although often time consuming and requiring large storage space, data can be fully restored without additional backup files. -### Differential Backup +### incremental backup -Differential backup backs up data that has changed since the last full backup, resulting in more extensive backup data and longer backup times than incremental backups. When restoring data, you only need to restore the latest full and differential backups. As it does not depend on previous backup data, the recovery process is relatively simpler, and backup and restoration times are shorter. +Incremental backups back up only recently changed or added files or blocks. Back up only changes since the last backup, usually with less backup time and storage. Usually performed between regular full backups to ensure backup data is kept up to date. All incremental and up-to-date full backups are used to restore data. -### Differences Between Full, Incremental, and Differential Backup +### Differential backup -- Full backup provides a complete data backup but requires more time and storage space. -- Incremental backup is suitable for environments with minimal data changes, saving backup storage space and time but requiring reliance on previous backup data. -- Differential backup suits environments with significant data changes, resulting in more extensive backup data, shorter recovery times, and a relatively straightforward backup and recovery process. +Differential backups are backups of data that has changed since the last full backup, so backup data is larger and takes longer than incremental backups. When restoring data, simply restore the most recent full backup and then the most recent differential backup. The backup and recovery process is relatively simple due to shorter recovery times without relying on previously backed up data. + +### Differences between full, incremental, and differential backups + +- Full backup provides full data backup, but requires more time and storage. +- Incremental backups are suitable for environments with less data change, saving backup storage space and time, but with long recovery times and reliance on front backup data. +- Differential backup is suitable for environments with more data changes, larger backup data, shorter recovery time than incremental backup, and a relatively simple backup recovery process. ## Recovery ### Physical Recovery -Physical recovery involves database recovery using physical backups. It is typically used in severe failures such as disk failures, operating system failures, or file system failures. In physical recovery, specialized recovery tools or software read backup files or storage media's actual data blocks and attempt to repair damaged blocks. +Physical recovery is database recovery using physical backups. Typically used for critical failures such as hard disk failures, operating system failures, or file system failures. In physical recovery, use professional recovery tools or software to read the actual data blocks in a backup file or storage medium and try to repair the damaged blocks. -Physical recovery can quickly restore the database without executing SQL statements or other high-level operations. Additionally, physical recovery can restore all database objects, including tables, indexes, stored procedures, and more. +Physical recovery has the advantage of quickly restoring the database by processing blocks directly without SQL statements or other high-level operations. In addition, physical recovery restores all database objects, including tables, indexes, stored procedures, etc. -### Complete Recovery vs. Incomplete Recovery +### Full versus incomplete recovery -- Complete Recovery: Applies all redo logs from the backup set, restoring data to a point where all logs are committed. -- Incomplete Recovery: Applies some redo logs or incremental backups, restoring data to a specific time defined by the backup redo logs. +- Full Recovery: Apply all redo logs from the backup set and restore the data to a committed state for all logs in the backup. +- Incomplete Recovery: Apply a backup set of partial redo logs or add-ons to restore the database to a point in time that the backup redo logs contain. diff --git a/docs/MatrixOne/Maintain/backup-restore/mobr-backup-restore/mobr-physical-backup-restore.md b/docs/MatrixOne/Maintain/backup-restore/mobr-backup-restore/mobr-physical-backup-restore.md new file mode 100644 index 000000000..551a57e46 --- /dev/null +++ b/docs/MatrixOne/Maintain/backup-restore/mobr-backup-restore/mobr-physical-backup-restore.md @@ -0,0 +1,254 @@ +## Overview of principles + +A regular physical backup of a database is a direct copy of the database's physical storage files, including data files, log files, and control files, to create a separate copy of the database. This process is usually performed at the file system level and can be achieved by operating system commands. The resulting backup is a full backup of the database, containing all the data and objects. Backup files can be stored on multiple media and can be compressed and encrypted to save space and improve security. On recovery, these files can be copied directly to the desired location to quickly restore the entire database. In addition, physical backups support cross-platform migration for disaster recovery and database migration scenarios, but may require more storage space and time. + +A full backup is a backup process that backs up all data in a database. It creates a full copy of the database, which usually requires more storage space and longer to complete. Because it contains all the data, full backups are simpler to restore and can be restored directly to the state they were in. + +Incremental backups are backups of data that has changed since the last backup. It replicates only blocks or data files that have been modified between backups, so backup sets are typically smaller and faster. Incremental backups can save storage space and backup time, but can be more complex when data is restored because a series of incremental backups need to be applied sequentially to restore to the latest state. + +MatrixOne supports incremental and full physical backup restores using `mo_br`: + +!!! note + mo_br Backup and recovery tool for enterprise services, you need to contact your MatrixOne account manager for the tool download path. + +## Examples + +### Example 1 Full Backup Recovery + +- Connect mo to create databases db1, db2. + +```sql +create database db1; +create database db2; + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| db1 | +| db2 | +| information_schema | +| mo_catalog | +| mo_debug | +| mo_task | +| mysql | +| system | +| system_metrics | ++--------------------+ +9 rows in set (0.00 sec) +``` + +- Create a full backup + +``` +./mo_br backup --host "127.0.0.1" --port 6001 --user "dump" --password "111" --backup_dir "filesystem" --path "/Users/admin/soft/backuppath/syncback1" + +Backup ID + 25536ff0-126f-11ef-9902-26dd28356ef3 + +./mo_br list ++--------------------------------------+-------+----------------------------------------+---------------------------+--------------+---------------------------+-----------------------+------------+ +| ID | SIZE | PATH | AT TIME | DURATION | COMPLETE TIME | BACKUPTS | BACKUPTYPE | ++--------------------------------------+-------+----------------------------------------+---------------------------+--------------+---------------------------+-----------------------+------------+ +| 25536ff0-126f-11ef-9902-26dd28356ef3 | 65 MB | BackupDir: filesystem Path: | 2024-05-15 11:56:44 +0800 | 8.648091083s | 2024-05-15 11:56:53 +0800 | 1715745404915410000-1 | full | +| | | /Users/admin/soft/backuppath/syncback1 | | | | | | ++--------------------------------------+-------+----------------------------------------+---------------------------+--------------+---------------------------+-----------------------+------------+ +``` + +- Connect mo Delete database db1 and build database db3. + +```sql +drop database db1; +create database db3; + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| db2 | +| db3 | +| information_schema | +| mo_catalog | +| mo_debug | +| mo_task | +| mysql | +| system | +| system_metrics | ++--------------------+ +9 rows in set (0.00 sec) +``` + +- Stop mo service, delete mo-data, restore backup + +``` +mo_ctl stop +rm -rf /Users/admin/soft/matrixone/mo-data + +./mo_br restore 25536ff0-126f-11ef-9902-26dd28356ef3 --restore_dir filesystem --restore_path "/Users/admin/soft/matrixone" +From: + BackupDir: filesystem + Path: /Users/admin/soft/backuppath/syncback1 + +To + BackupDir: filesystem + Path: /Users/admin/soft/matrixone + +TaePath + ./mo-data/shared +restore tae file path ./mo-data/shared, parallelism 1, parallel count num: 1 +restore file num: 1, total file num: 733, cost : 549µs +Copy tae file 1 + 018f7a41-1881-7999-bbd6-858c3d4acc18_00000 => mo-data/shared/018f7a41-1881-7999-bbd6-858c3d4acc18_00000 + ... +``` + +- Start mo, check recovery + +``` +mo_ctl start +``` + +```sql +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| db1 | +| db2 | +| information_schema | +| mo_catalog | +| mo_debug | +| mo_task | +| mysql | +| system | +| system_metrics | ++--------------------+ +9 rows in set (0.00 sec) +``` + +As you can see, the recovery was successful. + +### Example 2 Incremental Backup Recovery + +- Connect mo Create databases db1, db2 + +```sql +create database db1; +create database db2; + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| db1 | +| db2 | +| information_schema | +| mo_catalog | +| mo_debug | +| mo_task | +| mysql | +| system | +| system_metrics | ++--------------------+ +9 rows in set (0.00 sec) +``` + +- Creating Full Backups + +``` +./mo_br backup --host "127.0.0.1" --port 6001 --user "dump" --password "111" --backup_dir "filesystem" --path "/Users/admin/soft/backuppath/syncback2" + +Backup ID + 2289638c-1284-11ef-85e4-26dd28356ef3 +``` + +- Create incremental backups based on the above full backup + +``` +./mo_br backup --host "127.0.0.1" --port 6001 --user "dump" --password "111" --backup_dir "filesystem" --path "/Users/admin/soft/backuppath/syncback2" --backup_type "incremental" --base_id "2289638c-1284-11ef-85e4-26dd28356ef3" + +Backup ID + 81531c5a-1284-11ef-9ba3-26dd28356ef3 + +./mo_br list ++--------------------------------------+-------+----------------------------------------+---------------------------+--------------+---------------------------+-----------------------+-------------+ +| ID | SIZE | PATH | AT TIME | DURATION | COMPLETE TIME | BACKUPTS | BACKUPTYPE | ++--------------------------------------+-------+----------------------------------------+---------------------------+--------------+---------------------------+-----------------------+-------------+ +| 2289638c-1284-11ef-85e4-26dd28356ef3 | 70 MB | BackupDir: filesystem Path: | 2024-05-15 14:26:59 +0800 | 9.927034917s | 2024-05-15 14:27:09 +0800 | 1715754419668571000-1 | full | +| | | /Users/admin/soft/backuppath/syncback2 | | | | | | +| 81531c5a-1284-11ef-9ba3-26dd28356ef3 | 72 MB | BackupDir: filesystem Path: | 2024-05-15 14:29:38 +0800 | 2.536263666s | 2024-05-15 14:29:41 +0800 | 1715754578690660000-1 | incremental | +| | | /Users/admin/soft/backuppath/syncback2 | | | | | | ++--------------------------------------+-------+----------------------------------------+---------------------------+--------------+---------------------------+-----------------------+-------------+ +``` + +Comparing the time consumption of incremental and full backups, you can see that incremental backups take less time. + +- Connect mo Delete database db1 and build database db3. + +```sql +drop database db1; +create database db3; + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| db2 | +| db3 | +| information_schema | +| mo_catalog | +| mo_debug | +| mo_task | +| mysql | +| system | +| system_metrics | ++--------------------+ +9 rows in set (0.00 sec) +``` + +- Stop mo service, delete mo-data, restore backup + +``` +mo_ctl stop +rm -rf /Users/admin/soft/matrixone/mo-data + +./mo_br restore 81531c5a-1284-11ef-9ba3-26dd28356ef3 --restore_dir filesystem --restore_path "/Users/admin/soft/matrixone" +2024/05/15 14:35:27.910925 +0800 INFO malloc/malloc.go:43 malloc {"max buffer size": 2147483648, "num shards": 8, "classes": 23, "min class size": 128, "max class size": 1048576, "buffer objects per class": 22} +From: + BackupDir: filesystem + Path: /Users/admin/soft/backuppath/syncback2 + +To + BackupDir: filesystem + Path: /Users/admin/soft/matrixone + +TaePath + ./mo-data/shared +... +``` + +- Start mo, check recovery + +``` +mo_ctl start +``` + +```sql +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| db1 | +| db2 | +| information_schema | +| mo_catalog | +| mo_debug | +| mo_task | +| mysql | +| system | +| system_metrics | ++--------------------+ +9 rows in set (0.00 sec) +``` + +As you can see, the recovery was successful. diff --git a/docs/MatrixOne/Maintain/backup-restore/mobr-backup-restore/mobr-snapshot-backup-restore.md b/docs/MatrixOne/Maintain/backup-restore/mobr-backup-restore/mobr-snapshot-backup-restore.md new file mode 100644 index 000000000..26098c654 --- /dev/null +++ b/docs/MatrixOne/Maintain/backup-restore/mobr-backup-restore/mobr-snapshot-backup-restore.md @@ -0,0 +1,288 @@ +# mo_br tool for snapshot backup recovery + +## Snapshot Backup Recovery Implementation Principles + +Database snapshot backup restores by creating a read-only static view of the database at a specific point in time, known as a snapshot. Snapshots utilize the storage system's copy-on-write (COW) technology to copy and store the original data page only before it is modified, creating a stateful copy of the database at the time the snapshot was created. When you need to recover data, you can pick the data from the snapshot and copy or restore it to a new or existing database. Snapshot files are initially small and grow as the source database changes, so their size needs to be monitored and managed as necessary. Snapshots must be on the same server instance as the source database, and since they are read-only, writes cannot be made directly on them. Note that snapshot recovery operations overwrite the current data, so caution is required. + +## Application scenarios + +Database snapshots are a powerful tool to improve database availability and performance in multiple scenarios. Here are some application scenarios for snapshots: + +- **Data Backup and Recovery**: Snapshots can be used as a way to backup a database, allowing a read-only copy of the database to be created for data backup and recovery without stopping the database service. + +- **Reports and Data Analysis**: Snapshots can be used to avoid impacting online transactions when databases are required to remain static for report generation or data analysis. + +- **Development and testing**: Before developing a new feature or testing system, a copy of the database can be created by snapshot so that testing can be done without affecting the production environment. + +- **Data migration**: During data migration, snapshots can be used to ensure data consistency and avoid data changes during migration. + +- **High-risk operational protection**: Before performing operations that may have an impact on database stability, such as database upgrades, structural changes, etc., snapshots can be created so that they can be quickly restored if the operation fails. + +## MatrixOne support for snapshots + +MatrixOne supports two ways to perform tenant-level snapshot backup restores: + +- sql statement +- mo_br tool + +This document focuses on using `mo_br` for tenant-level snapshot backup restores. + +!!! note + mo_br Backup and recovery tool for enterprise services, you need to contact your MatrixOne account manager for the tool download path. + +## Prepare before you start + +- Completed [standalone deployment of](../../../Get-Started/install-standalone-matrixone.md) MatrixOne + +- Completed mo_br tool deployment + +## Examples + +## Example 1 Table Level Recovery + +- Connecting Matrixone System Tenants to Execute Table-Building Statements + +```sql +create db if not exists snapshot_read; +use snapshot_read; +create table test_snapshot_read (a int); +INSERT INTO test_snapshot_read (a) VALUES(1), (2), (3), (4), (5),(6), (7), (8), (9), (10), (11), (12),(13), (14), (15), (16), (17), (18), (19), (20),(21), (22), (23), (24), (25), (26), (27), (28), (29), (30),(31), (32), (33), (34), (35), (36), (37), (38), (39), (40),(41), (42), (43), (44), (45), (46), (47), (48), (49), (50),(51), (52), (53), (54), (55), (56), (57), (58), (59), (60),(61), (62), (63), (64), (65), (66), (67), (68), (69), (70),(71), (72), (73), (74), (75), (76), (77), (78), (79), (80), (81), (82), (83), (84), (85), (86), (87), (88), (89), (90),(91), (92), (93), (94), (95), (96), (97), (98), (99), (100); + +mysql> select count(*) from snapshot_read.test_snapshot_read; ++----------+ +| count(*) | ++----------+ +| 100 | ++----------+ +``` + +- Create a snapshot + +``` +./mo_br snapshot create --host "127.0.0.1" --port 6001 --user "dump" --password "111" --level "account" --sname "sp_01" --account "sys" + +./mo_br snapshot show --host "127.0.0.1" --port 6001 --user "dump" --password "111" --account "sys" +SNAPSHOT NAME TIMESTAMP SNAPSHOT LEVEL ACCOUNT NAME DATABASE NAME TABLE NAME +sp_01 2024-05-10 02:06:08.01635 account sys +``` + +- Connect the Matrixone system tenant by deleting some of the data in the table. + +```sql +delete from snapshot_read.test_snapshot_read where a <= 50; + +mysql> select count(*) from snapshot_read.test_snapshot_read; ++----------+ +| count(*) | ++----------+ +| 50 | ++----------+ +``` + +- Table level restored to this tenant + +``` +./mo_br snapshot restore --host "127.0.0.1" --port 6001 --user "dump" --password "111" --account "sys" --db "snapshot_read" --table "test_snapshot_read" --sname "sp_01" +``` + +- Connect Matrixone System Tenant Query Recovery + +```sql +mysql> select count(*) from snapshot_read.test_snapshot_read; ++----------+ +| count(*) | ++----------+ +| 100 | ++----------+ +``` + +## Example 2 Database Level Recovery + +- Connect Matrixone system tenant to execute sql statement + +```sql +create db if not exists snapshot_read; +use snapshot_read; +create table test_snapshot_read (a int); +INSERT INTO test_snapshot_read (a) VALUES(1), (2), (3), (4), (5),(6), (7), (8), (9), (10), (11), (12),(13), (14), (15), (16), (17), (18), (19), (20),(21), (22), (23), (24), (25), (26), (27), (28), (29), (30),(31), (32), (33), (34), (35), (36), (37), (38), (39), (40),(41), (42), (43), (44), (45), (46), (47), (48), (49), (50),(51), (52), (53), (54), (55), (56), (57), (58), (59), (60),(61), (62), (63), (64), (65), (66), (67), (68), (69), (70),(71), (72), (73), (74), (75), (76), (77), (78), (79), (80), (81), (82), (83), (84), (85), (86), (87), (88), (89), (90),(91), (92), (93), (94), (95), (96), (97), (98), (99), (100); +create table test_snapshot_read_1(a int); +INSERT INTO test_snapshot_read_1 (a) VALUES(1), (2), (3), (4), (5),(6), (7), (8), (9), (10), (11), (12),(13), (14), (15), (16), (17), (18), (19), (20),(21), (22), (23), (24), (25), (26), (27), (28), (29), (30),(31), (32), (33), (34), (35), (36), (37), (38), (39), (40),(41), (42), (43), (44), (45), (46), (47), (48), (49), (50),(51), (52), (53), (54), (55), (56), (57), (58), (59), (60),(61), (62), (63), (64), (65), (66), (67), (68), (69), (70),(71), (72), (73), (74), (75), (76), (77), (78), (79), (80), (81), (82), (83), (84), (85), (86), (87), (88), (89), (90),(91), (92), (93), (94), (95), (96), (97), (98), (99), (100); + +mysql> select count(*) from snapshot_read.test_snapshot_read; ++----------+ +| count(*) | ++----------+ +| 200 | ++----------+ +1 row in set (0.00 sec) + +mysql> select count(*) from snapshot_read.test_snapshot_read_1; ++----------+ +| count(*) | ++----------+ +| 100 | ++----------+ +1 row in set (0.01 sec) +``` + +- Create a snapshot + +``` +./mo_br snapshot create --host "127.0.0.1" --port 6001 --user "dump" --password "111" --level "account" --sname "sp_02" --account "sys" + +./mo_br snapshot show --host "127.0.0.1" --port 6001 --user "dump" --password "111" --account "sys" +SNAPSHOT NAME TIMESTAMP SNAPSHOT LEVEL ACCOUNT NAME DATABASE NAME TABLE NAME +sp_02 2024-05-10 02:47:15.638519 account sys +``` + +- Connection Matrixone system tenant deletes some data + +```sql +delete from snapshot_read.test_snapshot_read where a <= 50; +delete from snapshot_read.test_snapshot_read_1 where a >= 50; + +mysql> select count(*) from snapshot_read.test_snapshot_read; ++----------+ +| count(*) | ++----------+ +| 100 | ++----------+ +1 row in set (0.00 sec) + +mysql> select count(*) from snapshot_read.test_snapshot_read_1; ++----------+ +| count(*) | ++----------+ +| 49 | ++----------+ +1 row in set (0.01 sec) +``` + +- Database level restore to this tenant + +``` +./mo_br snapshot restore --host "127.0.0.1" --port 6001 --user "dump" --password "111" --account "sys" --db "snapshot_read" --sname "sp_02" +``` + +- Connect Matrixone System Tenant Query Recovery + +```sql +mysql> select count(*) from snapshot_read.test_snapshot_read; ++----------+ +| count(*) | ++----------+ +| 200 | ++----------+ +1 row in set (0.00 sec) + +mysql> select count(*) from snapshot_read.test_snapshot_read_1; ++----------+ +| count(*) | ++----------+ +| 100 | ++----------+ +1 row in set (0.00 sec) +``` + +## Example 3 Tenant Level Recovery + +Tenant Level Recovery + +- Connect Matrixone system tenant to execute sql statement + +```sql +create database if not exists snapshot_read; + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| information_schema | +| mo_catalog | +| mo_debug | +| mo_task | +| mysql | +| snapshot_read | +| system | +| system_metrics | ++--------------------+ +8 rows in set (0.00 sec) +``` + +- Create a snapshot + +``` +./mo_br snapshot create --host "127.0.0.1" --port 6001 --user "dump" --password "111" --level "account" --sname "sp_03" --account "sys" + +./mo_br snapshot show --host "127.0.0.1" --port 6001 --user "dump" --password "111" +SNAPSHOT NAME TIMESTAMP SNAPSHOT LEVEL ACCOUNT NAME DATABASE NAME TABLE NAME +sp_03 2024-05-11 03:20:16.065685 account sys +``` + +- Connecting Matrixone System Tenant Delete Database + +```sql +drop database snapshot_read; + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| information_schema | +| mo_catalog | +| mo_debug | +| mo_task | +| mysql | +| system | +| system_metrics | ++--------------------+ +7 rows in set (0.01 sec) +``` + +- Tenant level restored to this tenant + +``` +./mo_br snapshot restore --host "127.0.0.1" --port 6001 --user "dump" --password "111" --account "sys" --sname "sp_03" +``` + +- Tenant level restored to new tenant + +``` +./mo_br snapshot restore --host "127.0.0.1" --port 6001 --user "dump" --password "111" --account "sys" --sname "sp_03" --new_account "acc2" --new_admin_name "admin" --new_admin_password "111"; +``` + +- Connect Matrixone System Tenant Query Recovery + +```sql +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| information_schema | +| mo_catalog | +| mo_debug | +| mo_task | +| mysql | +| snapshot_read | +| system | +| system_metrics | ++--------------------+ +8 rows in set (0.00 sec) +``` + +- Connect New Tenant acc2 Query Recovery + +```sql +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| information_schema | +| mo_catalog | +| mysql | +| snapshot_read | +| system | +| system_metrics | ++--------------------+ +6 rows in set (0.00 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Maintain/backup-restore/mobr-backup-restore/mobr.md b/docs/MatrixOne/Maintain/backup-restore/mobr-backup-restore/mobr.md new file mode 100644 index 000000000..8fc14bfc5 --- /dev/null +++ b/docs/MatrixOne/Maintain/backup-restore/mobr-backup-restore/mobr.md @@ -0,0 +1,673 @@ +# mo_br Backup and Recovery + +Database physical backup and snapshot backup are two important data protection strategies that play an important role in many scenarios. Physical backups enable fast and complete database recovery by replicating the physical files of the database, such as data files and log files, and are especially suitable for overall database migration or disaster recovery situations. Snapshot backup, on the other hand, provides a fast and storage-efficient way of backing up data by recording its status at a specific point in time, for scenarios that require point-in-time recovery or read-only query operations, such as report generation or data analysis. The combination of physical backup recovery, which can take longer, and snapshot backup, which provides fast data access, provides comprehensive protection of the database, ensuring data security and business continuity. + +MatrixOne supports regular physical and snapshot backups via the `mo_br` utility. This section describes how `mo_br` is used. + +!!! note + mo_br Physical backup and recovery tool for enterprise level services, you need to contact your MatrixOne account manager for the tool download path. + +## Reference Command Guide + +help - Print Reference Guide + +``` +./mo_br help +the backup and restore tool for the matrixone + +Usage: + mo_br [flags] + mo_br [command] + +Available Commands: + backup backup the matrixone data + check check the backup + completion Generate the autocompletion script for the specified shell + delete delete the backup + help Help about any command + list search the backup + restore restore the matrixone data + snapshot Manage snapshots + +Flags: + --config string config file (default "./mo_br.toml") + -h, --help help for mo_br + --log_file string log file (default "console") + --log_level string log level (default "error") + +Use "mo_br [command] --help" for more information about a command. +``` + +## Physical backup + +### Create a backup + +#### Syntax structure + +``` +mo_br backup + --host + --port + --user + --password + --backup_dir s3|filesystem + //s3 oss minio + --endpoint + --access_key_id + --secret_access_key + --bucket + --filepath + --region + --compression + --role_arn + --is_minio + --parallelism + //filesystem + --path + --parallelism + --meta_path + //incremental backup required + --backup_type + --base_id +``` + +**Parameter description** + +| parameters | clarification | +| ---- | ---- | +|host | Target MatrixOne's IP| +|port|port number| +|user| user| +|password | User's password| +|backup_dir | Destination path type for backups. s3 or filesystem| +|endpoint| URL to connect to the service that backs up to s3| +|access_key_id| Access key ID for backup to s3| +|secret_access_key| Secret access key for backup to s3| +|bucket| Backup to the bucket s3 needs access to| +|filepath| Relative file paths for backups to s3 +|region| Backup to s3's object storage service area| +|compression| The compressed format of the files backed up to s3.| +|role_arn| The resource name of the role backed up to s3.| +|is_minio| Specify whether the backup to s3 is a minio| +|path| Local file system backup path| +|parallelism|parallelism| +|meta_path|Specifies the location of the meta file. It can only be a path in the file system. If not specified, the default is the mo_br.meta file in the same directory.| +|backup_type|Specifies that the backup type is incremental, incremental.| +|base_id|The ID of the last backup, mainly used to determine the timestamp of the last backup.| + +#### Examples + +- Full backup to local file system + +``` +./mo_br backup --host "127.0.0.1" --port 6001 --user "dump" --password "111" --backup_dir "filesystem" --path "yourpath" +``` + +- Full backup to minio + +``` +./mo_br backup --host "127.0.0.1" --port 6001 --user "dump" --password "111" --backup_dir "s3" --endpoint "http://127.0.0.1:9000" --access_key_id "S0kwLuB4JofVEIAxxxx" --secret_access_key "X24O7t3hccmqUZqvqvmLN8464E2Nbr0DWOu9xxxx" --bucket "bucket1" --filepath "/backup1" --is_minio +``` + +- Incremental backup to local file system + +``` +./mo_br backup --host "127.0.0.1" --port 6001 --user "dump" --password "111" --backup_dir "filesystem" --path "yourpath" --backup_type "incremental" --base_id "xxx" +``` + +### View backups + +#### Syntax structure + +``` +mo_br list + -- ID + // To query the backup data. If the backup is on s3(oss minio), you need to specify the + --access_key_id + --secret_access_key + --not_check_data + --meta_path +``` + +**Parameter description** + +| parameters | clarification | +| ---- | ---- | +| ID | ID of the backup| +|access_key_id| Access key ID of the backup to s3| +|secret_access_key| Secret access key for the backup to s3| +|not_check_data | View only the information in the meta. Does not check the backup data. The default without this parameter is that it will check the backed up files. Currently, it will only check if the backup file exists.| +|meta_path | Specifies the location of the meta file. If not specified, the default is the mo_br.meta file in the same directory.| + +#### Examples + +- View a list of all backups + +``` +./mo_br list ++--------------------------------------+--------+--------------------------------+---------------------------+--------------+---------------------------+ +| ID | SIZE | PATH | AT TIME | DURATION | COMPLETE TIME | ++--------------------------------------+--------+--------------------------------+---------------------------+--------------+---------------------------+ +| 4d21b228-10dd-11ef-9497-26dd28356ef2 | 586 kB | BackupDir: filesystem Path: | 2024-05-13 12:00:12 +0800 | 1.700945333s | 2024-05-13 12:00:13 +0800 | +| | | /Users/admin/soft/backup | | | | +| 01108122-10f9-11ef-9359-26dd28356ef2 | 8.3 MB | BackupDir: filesystem Path: | 2024-05-13 15:18:28 +0800 | 3.394437375s | 2024-05-13 15:18:32 +0800 | +| | | /Users/admin/soft/backup | | | | ++--------------------------------------+--------+--------------------------------+---------------------------+--------------+---------------------------+ +``` + +- View the list of backups with the specified ID, the ID determined by list detects the backed up files. + +``` +./mo_br list 4d21b228-10dd-11ef-9497-26dd28356ef2 ++--------------------------------------+--------+--------------------------------+---------------------------+--------------+---------------------------+ +| ID | SIZE | PATH | AT TIME | DURATION | COMPLETE TIME | ++--------------------------------------+--------+--------------------------------+---------------------------+--------------+---------------------------+ +| 4d21b228-10dd-11ef-9497-26dd28356ef2 | 586 kB | BackupDir: filesystem Path: | 2024-05-13 12:00:12 +0800 | 1.700945333s | 2024-05-13 12:00:13 +0800 | +| | | /Users/admin/soft/backup | | | | ++--------------------------------------+--------+--------------------------------+---------------------------+--------------+---------------------------+ + +Checking the backup data(currently,no checksum)... + +check: /backup_meta +check: /mo_meta +check: hakeeper/hk_data +check: tae/tae_list +check: tae/tae_sum +check: config/log.toml_018f70d1-3100-7762-b28b-8f85ac4ed3cd +check: config/tn.toml_018f70d1-310e-78fc-ac96-aa5e06981bd7 +... +``` + +### Delete Backup + +#### Syntax structure + +``` +mo_br delete ID + //To delete backup data. If the backup is on s3 (oss minio), you need to specify the + --access_key_id + --secret_access_key + --not_delete_data + --meta_path +``` + +**Parameter description** + +| parameters | clarification | +| ---- | ---- | +| ID | ID of the backup to be deleted| +|access_key_id| Access key ID of the backup to s3| +|secret_access_key| Secret access key for the backup to s3| +|not_delete_data|Only the information in the meta is deleted. Backup data is not deleted.| +|meta_path|Specifies the location of the meta file. If not specified, the default is the mo_br.meta file in the same directory.| + +#### Examples + +- Delete Local File System Backup + +``` +./mo_br delete e4cade26-3139-11ee-8631-acde48001122 +``` + +- Delete a backup on minio. + +``` +./mo_br delete e4cade26-3139-11ee-8631-acde48001122 --access_key_id "S0kwLuB4JofVEIAxWTqf" --secret_access_key "X24O7t3hccmqUZqvqvmLN8464E2Nbr0DWOu9Qs5A" +``` + +### Restore Backup + +#### Syntax structure + +- Restore a backup with the specified ID + +``` +mo_br restore ID + //Reads the backup data with the specified ID. If the backup is on s3(oss minio), you need to specify the + --backup_access_key_id + --backup_secret_access_key + + //Destination path to restore restore_directory + --restore_dir s3|filesystem + //s3 + --restore_endpoint + --restore_access_key_id + --restore_secret_access_key + --restore_bucket + --restore_filepath + --restore_region + --restore_compression + --restore_role_arn + --restore_is_minio + //filesystem + --restore_path + --dn_config_path + --meta_path + --checksum + --parallelism +``` + +**Parameter description** + +| parameters | clarification | +| ---- | ---- | +|ID | ID to be recovered| +|backup_access_key_id|Access key ID of the backup in s3| +|backup_secret_access_key |Secret access key backed up in s3| +|restore_dir | Destination path type for recovery. Use when specifying the destination path for recovery. s3|filesystem| +|restore_endpoint| Connect to the URL to restore to the S3 service| +|restore_access_key_id| Access key ID restored to s3| +|restore_secret_access_key| Recover the secret access key to s3| +|restore_bucket| Recover to the bucket that s3 needs to access| +|restore_filepath|Relative file path to restore to s3| +|restore_region| Restore the object storage service area to s3| +|restore_compression|The compressed format of the S3 file restored to s3.| +|restore_role_arn| Resource name of the role restored to s3.| +|restore_is_minio|Specifies whether the recovered s3 is a minio| +|restore_path|Restore to local path| +|dn_config_path| dn Configuration path| +|meta_path|Specifies the location of the meta file. It can only be a path in the file system. If not specified, the default is the mo_br.meta file in the same directory.| +|checksum |Parallelism of tae file replication during recovery, default is 1| +|parallelism|parallelism| + +- Do not specify a restore backup ID + +``` +//Recovery +mo_br restore + --backup_dir s3|filesystem Destination path type for backups. Used when specifying the destination path for backups. + //s3 + --backup_endpoint + --backup_access_key_id + --backup_secret_access_key + --backup_bucket + --backup_filepath + --backup_region + --backup_compression + --backup_role_arn + --backup_is_minio + //filesystem + --backup_path + //Destination path to restore restore_directory + --restore_dir s3|filesystem //Destination path type for recovery. Used when specifying the destination path for recovery. + //s3 + --restore_endpoint + --restore_access_key_id + --restore_secret_access_key + --restore_bucket + --restore_filepath + --restore_region + --restore_compression + --restore_role_arn + --restore_is_minio + //filesystem + --restore_path + --dn_config_path + --meta_path + --checksum + --parallelism +``` + +**Parameter description** + +| parameters | clarification | +| ---- | ---- | +|backup_dir | Destination path type for recovery. Use when specifying the destination path for recovery. s3|filesystem| +|backup_endpoint| Connect to the URL of the backup on s3 | +|backup_access_key_id| Access key ID of the backup in s3| +|backup_secret_access_key| 备份在 s3 的 Secret access key| +|backup_bucket| Backup bucket in s3| +|backup_filepath| Relative file paths for backups in s3| +|backup_region| Backup service area in s3| +|backup_compression| The compressed format of the files backed up in s3.| +|backup_role_arn| The resource name of the role backed up in s3.| +|backup_is_minio| Specifies whether the backup s3 is minio| +|backup_path| Path to local backup| +|restore_dir | The type of destination path to recover. To specify the destination path for recovery, use. s3 or filesystem| +|restore_endpoint| Connect to the URL to restore to the S3 service| +|restore_access_key_id| Access key ID restored to s3| +|restore_secret_access_key| Recover the secret access key to s3| +|restore_bucket| Recover to the bucket that s3 needs to access| +|restore_filepath|Relative file path to restore to s3| +|restore_region| Restore the object storage service area to s3| +|restore_compression|The compressed format of the S3 file restored to s3.| +|restore_role_arn| Resource name of the role restored to s3.| +|restore_is_minio|Specifies whether the recovered s3 is a minio| +|restore_path|Restore the path to the local matrixone| +|dn_config_path| dn Configuration path| +|meta_path|Specifies the location of the meta file. It can only be a path in the file system. If not specified, the default is the mo_br.meta file in the same directory.| +|checksum |Parallelism of tae file replication during recovery, default is 1| +|parallelism|parallelism| + +#### Examples + +Restore from File System to File System + +**Step one:** Stop mo, delete mo-data + +**Step Two:** Execute the following recovery command + +``` +./mo_br restore fb26fd88-41bc-11ee-93f8-acde48001122 --restore_dir filesystem --restore_path "your_mopath" +``` + +After recovery a new mo-data file is generated at matrixone + +**Step three:** Start mo + +### Verify the check code for the backup + +Read each file in the backup folder and its sha256 file. Calculates the sha256 value of the file and compares it to the sha256 file value. The sha256 file is created when the file is created or updated. + +#### Syntax structure + +- Verify a backup of an ID + +``` +mo_br check ID + //Checks the backup data for the specified ID. If the backup is on s3 (oss minio), you need to specify + --backup_access_key_id string + --backup_secret_access_key string + --meta_path string //Specifies the meta file location. If not specified, the default is the mo_br.meta file in the same directory. +``` + +**Parameter description** + +| parameters | clarification | +| ---- | ---- | +|backup_access_key_id| Access key ID of the backup in s3| +|backup_secret_access_key| Secret access key backed up in s3| +|meta_path|Specifies the location of the meta file. It can only be a path in the file system. If not specified, the default is the mo_br.meta file in the same directory.| + +- Verify the backup, specifying the path to the backup + +``` +mo_br check + --backup_dir s3|filesystem + //s3 + --backup_endpoint + --backup_access_key_id + --backup_secret_access_key + --backup_bucket + --backup_filepath + --backup_region + --backup_compression + --backup_role_arn + --backup_is_minio + //filesystem + --backup_path + --meta_path +``` + +**Parameter description** + +| parameters | clarification | +| ---- | ---- | +|backup_dir | The type of path where the backup is located, which must be specified if no ID is specified. s3 or filesystem| +|backup_endpoint| Connect to the URL of the backup on s3| +|backup_access_key_id| Access key ID of the backup in s3| +|backup_secret_access_key| Secret access key backed up in s3| +|backup_bucket| Backup bucket in s3| +|backup_filepath| Relative file paths for backups in s3| +|backup_region| Backup service area in s3| +|backup_compression| The compressed format of the files backed up in s3.| +|backup_role_arn| The resource name of the role backed up in s3.| +|backup_is_minio| Specifies whether the backup s3 is minio| +|backup_path| Path to local backup| +|meta_path|Specifies the location of the meta file. It can only be a path in the file system. If not specified, the default is the mo_br.meta file in the same directory.| + +#### Examples + +- Verify a backup of an ID + +``` +./mo_br check 1614f462-126c-11ef-9af3-26dd28356ef3 ++--------------------------------------+--------+-----------------------------------+---------------------------+---------------+---------------------------+ +| ID | SIZE | PATH | AT TIME | DURATION | COMPLETE TIME | ++--------------------------------------+--------+-----------------------------------+---------------------------+---------------+---------------------------+ +| 1614f462-126c-11ef-9af3-26dd28356ef3 | 126 MB | BackupDir: filesystem Path: | 2024-05-15 11:34:28 +0800 | 22.455633916s | 2024-05-15 11:34:50 +0800 | +| | | /Users/admin/soft/incbackup/back2 | | | | ++--------------------------------------+--------+-----------------------------------+---------------------------+---------------+---------------------------+ + +Checking the backup data... + +check: /backup_meta +check: /mo_meta +check: hakeeper/hk_data +check: tae/tae_list +check: tae/tae_sum +check: config/launch.toml_018f7a50-d300-7017-8580-150edf08733e +... +``` + +- Verify backups in a backup directory + +``` +(base) admin@admindeMacBook-Pro mo-backup % ./mo_br check --backup_dir filesystem --backup_path /Users/admin/soft/incbackup/back2 +2024/05/15 11:40:30.011160 +0800 INFO malloc/malloc.go:42 malloc {"max buffer size": 1073741824, "num shards": 16, "classes": 23, "min class size": 128, "max class size": 1048576, "buffer objects per class": 23} +check: /backup_meta +check: /mo_meta +check: hakeeper/hk_data +check: tae/tae_list +check: tae/tae_sum +check: config/launch.toml_018f7a50-d300-7017-8580-150edf08733e +check: config/log.toml_018f7a50-d30c-7ed0-85bc-191e9f1eb753 +... +``` + +## snapshot backup + +### Create a snapshot + +#### Syntax structure + +``` +mo_br snapshot create + --host + --port + --user + --password + --level + --account + --sname +``` + +**Parameter description** + +| Parameters | Description | +| ---- | ---- | +|host | Target MatrixOne's IP| +|port|port number| +|user | user| +|password | User's password| +|level | Scope of snapshot backup, only account is supported for the time being| +|account| Tenant object name for snapshot backups| +|sname | Snapshot name| + +#### Examples + +- To create a snapshot for system tenant sys: + +``` +./mo_br snapshot create --host "127.0.0.1" --port 6001 --user "dump" --password "111" --level "account" --sname "snapshot_01" --account "sys" +``` + +- System tenant creates snapshot for normal tenant acc1: + +``` + ./mo_br snapshot create --host "127.0.0.1" --port 6001 --user "dump" --password "111" --level "account" --sname "snapshot_02" --account "acc1" +``` + +- Normal tenant creation snapshot: + + - Create Normal Tenant acc1 + + ```sql + create account acc1 admin_name admin IDENTIFIED BY '111'; + ``` + + - acc1 Create snapshot + + ``` + ./mo_br snapshot create --host "127.0.0.1" --port 6001 --user "acc1#admin" --password "111" --level "account" --account "acc1" --sname "snapshot_03" + ``` + +### View snapshots + +#### Syntax structure + +``` +mo_br snapshot show + --host + --port + --user + --password + --account + --db + --table + --sname + --beginTs + --endTs +``` + +**Parameter description** + +| Parameters | Description | +| ---- | ---- | +|host | Target MatrixOne's IP| +|port|port number| +|user | subscribers| +|password | User's password| +|account| Tenant name to filter, for sys administrators only| +|db | The name of the database to be filtered| +|table | Table name to filter| +|sname | Name of the snapshot to filter| +|beginTs |Start time of the snapshot timestamp to be filtered| +|endTs | The end time of the snapshot timestamp to be filtered| + +#### Examples + +- To view snapshots created under System Tenants: + +``` +./mo_br snapshot show --host "127.0.0.1" --port 6001 --user "dump" --password "111" +SNAPSHOT NAME TIMESTAMP SNAPSHOT LEVEL ACCOUNT NAME DATABASE NAME TABLE NAME +snapshot_02 2024-05-11 02:29:23.07401 account acc1 +snapshot_01 2024-05-11 02:26:03.462462 account sys +``` + +- View the snapshot created under acc1: + +``` +./mo_br snapshot show --host "127.0.0.1" --port 6001 --user "acc1#admin" --password "111" +SNAPSHOT NAME TIMESTAMP SNAPSHOT LEVEL ACCOUNT NAME DATABASE NAME TABLE NAME +snapshot_03 2024-05-11 02:29:31.572512 account acc1 +``` + +- View the snapshot created for tenant acc1 under System Tenant and filter the start time: + +``` +./mo_br snapshot show --host "127.0.0.1" --port 6001 --user "dump" --password "111" --account "acc1" --beginTs "2024-05-11 00:00:00" +SNAPSHOT NAME TIMESTAMP SNAPSHOT LEVEL ACCOUNT NAME DATABASE NAME TABLE NAME +snapshot_02 2024-05-11 02:29:23.07401 account acc1 +``` + +### Delete Snapshot + +#### Syntax structure + +``` +mo_br snapshot drop + --host + --port + --user + --password + --sname +``` + +**Parameter description** + +| Parameters | Description | +| ---- | ---- | +|host | Target MatrixOne's IP| +|port|port number| +|user | subscribers| +|password | User's password| +|sname | Name of the snapshot to filter| + +#### Examples + +- To delete a snapshot created by a system tenant: + +``` +./mo_br snapshot drop --host "127.0.0.1" --port 6001 --user "dump" --password "111" --sname "snapshot_01" +``` + +- To delete a snapshot created by a normal tenant: + +``` +./mo_br snapshot drop --host "127.0.0.1" --port 6001 --user "acc1#admin" --password "111" --sname "snapshot_03" +``` + +### Restoring a snapshot + +#### Syntax structure + +``` +mo_br snapshot restore + --host + --port + --user + --password + --sname + --account + --db + --table + --newaccount + --newaccountadmin + --newaccountpwd +``` + +**Parameter description** + +| Parameters | Description | +| ---- | ---- | +|host | Target MatrixOne's IP| +|port|port number| +|user | subscribers| +|password | User's password| +|sname | Name of the snapshot to be restored| +|account| The name of the tenant to restore, for sys administrators only| +|db | Name of the database to be recovered| +|table | The name of the table to be recovered| +|newaccount | Newly created tenant name| +|newaccountadmin | tenant manager| +|newaccountpwd | Tenant Administrator Password| + +__NOTE__: Only system tenants can perform restore data to a new tenant, and only tenant-level restores are allowed. + +#### Examples + +- Table level restored to this tenant + +``` +./mo_br snapshot restore --host "127.0.0.1" --port 6001 --user "dump" --password "111" --account "sys" --db "snapshot_read" --table "test_snapshot_read" --sname "sp_01" +``` + +- Database level restore to this tenant + +``` +./mo_br snapshot restore --host "127.0.0.1" --port 6001 --user "dump" --password "111" --account "sys" --db "snapshot_read" --sname "sp_02" +``` + +- Tenant level restored to this tenant + +``` +./mo_br snapshot restore --host "127.0.0.1" --port 6001 --user "dump" --password "111" --account "sys" --sname "sp_03" +``` + +- Tenant level restored to new tenant + +``` +./mo_br snapshot restore --host "127.0.0.1" --port 6001 --user "dump" --password "111" --account "sys" --sname "sp_03" --new_account "acc2" --new_admin_name "admin" --new_admin_password "111"; +``` \ No newline at end of file diff --git a/docs/MatrixOne/Maintain/backup-restore/modump-backup-restore.md b/docs/MatrixOne/Maintain/backup-restore/modump-backup-restore.md index be2daa407..24278e742 100644 --- a/docs/MatrixOne/Maintain/backup-restore/modump-backup-restore.md +++ b/docs/MatrixOne/Maintain/backup-restore/modump-backup-restore.md @@ -1,22 +1,22 @@ -# Backup and Restore by using mo-dump +# mo-dump Backup and Recovery -It is essential to back up your databases to recover your data and be up and running again in case problems occur, such as system crashes, hardware failures, or users deleting data by mistake. Backups are also essential as a safeguard before upgrading a MatrixOne installation, and they can be used to transfer a MatrixOne building to another system. +For businesses that produce a lot of data every day, it's important to back up the database. In case of system crash or hardware failure, or user misoperation, you can recover data and restart the system without data loss. -MatrixOne currently only supports logical backup through the `modump` utility. `modump` is a command-line utility used to generate the logical backup of the MatrixOne database. It produces SQL Statements that can be used to recreate the database objects and data. You can look up the syntax and usage guide in the [modump](../../Develop/export-data/modump.md) chapter. +In addition, data backups serve as safeguards before upgrading a MatrixOne installation, while data backups can also be used to transfer a MatrixOne installation to another system. -We will take a simple example to walk you through the backup and restore process with the `modump` utility. +MatrixOne supports logical backups via the `mo-dump` utility. `modump` is a command-line utility that generates logical backups of MatrixOne databases. It generates SQL statements that can be used to recreate database objects and data. You can find its syntax description and usage guide in the [mo-dump](../../Develop/export-data/modump.md) chapter. -## Steps +We'll walk through a simple example of how to use the `mo-dump` utility to complete the data backup and restore process. -### 1. [Build the modump binary](../../Develop/export-data/modump.md) +## Steps -For more information on how to build the `modump` binary, see [Build the modump binary](../../Develop/export-data/modump.md). +### 1. Deployment of mo-dump -If the `modump` binary has been built, you can continue to browse the next chapter **Generate the backup of a single database**. +See the [mo-dump tool writing](../../Develop/export-data/modump.md) chapter to complete the deployment of `the mo-dump` tool. -### 2. Generate the backup of a single database +### 2. Generate a backup of a single database -We have a database **t** which is created by the following SQL. +An example is the database *t* and its table *t1* created using the following SQL: ``` DROP DATABASE IF EXISTS `t`; @@ -48,36 +48,36 @@ create table t1 insert into t1 values (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, '2019-01-01', '2019-01-01 00:00:00', '2019-01-01 00:00:00', 'a', 'a', '{"a":1}','1212.1212', 'a', 'aza', '00000000-0000-0000-0000-000000000000'); ``` -If you want to generate the backup of the single database, run the following command. The command will generate the backup of the **t** database with structure and data in the `t.sql` file. +If you want to generate a backup of a single database, you can run the following command. This command will generate a backup of the database named *t* with the structure and data in the *t.sql* file. ``` -./modump -u root -p 111 -h 127.0.0.1 -P 6001 -db t > t.sql +./mo-dump -u root -p 111 -h 127.0.0.1 -P 6001 -db t > t.sql ``` -If you want to generate the backup of a single table in a database, run the following command. The command will generate the backup of the `t1` table of `t` database with structure and data in the `t.sql` file. +If you want to generate a backup of a single table in the database, you can run the following command. This command generates a backup of the *t1* table of the database named *t*, which contains the structure and data in the *t.sql* file. ``` -./modump -u root -p 111 -db t -tbl t1 > t1.sql +./mo-dump -u root -p 111 -db t -tbl t1 > t1.sql ``` !!! note - If you have multiple databases, you need to run `modump` multiple times to generate SQLs one by one. + If you want to generate a backup of multiple databases/tables, you need to separate the database names/table names with `,` . -### 3. Restore the backup to MatrixOne server +### 3. Restore Backup to MatrixOne Server -Restoring a MatrixOne database using the exported 'sql' file is very simple. To restore the database, you must create an empty database and use `mysql client` to restore. +Restoring an exported *sql* file to a MatrixOne database is relatively simple. To recover your database, you must first create an empty database and use the *MySQL client* to recover. -Connect to MatrixOne with MySQL client in the same server, and make sure the exported `sql` file is also in the same machine as the MySQL client. +Connect MatrixOne to the same server as the MySQL client and make sure the exported *sql* file is also on the same server. ``` mysql> create database t if not exists; mysql> source /YOUR_SQL_FILE_PATH/t.sql ``` -Once command executes successfully, execute the following command to verify that all objects have been created on the `t` database. +After successfully executing the above command, execute the following command to check if all objects were created on the named *t* database. ``` mysql> use t; mysql> show tables; mysql> select count(*) from t1; -``` +``` \ No newline at end of file diff --git a/docs/MatrixOne/Overview/1.1-matrixone-feature-list.md b/docs/MatrixOne/Overview/1.1-matrixone-feature-list.md deleted file mode 100644 index 395721f77..000000000 --- a/docs/MatrixOne/Overview/1.1-matrixone-feature-list.md +++ /dev/null @@ -1,194 +0,0 @@ -# MatrixOne Features - -This document lists the features supported by the latest version of MatrixOne and features that are common and in MatrixOne's roadmap but not currently supported. - -## Data definition language (DDL) - -| Data definition Language(DDL) | Supported(Y)/Not supported (N) /Experimental (E)| -| ----------------------------- | ---- | -| CREATE DATABASE | Y | -| DROP DATABASE | Y | -| ALTER DATABASE | N | -| CREATE TABLE | Y | -| ALTER TABLE | E, The clauses: `CHANGE [COLUMN]`, `MODIFY [COLUMN]`, `RENAME COLUMN`, `ADD [CONSTRAINT [symbol]] PRIMARY KEY`, `DROP PRIMARY KEY`, and `ALTER COLUMN ORDER BY` can be used in ALTER It can be freely combined in the TABLE statement. Still, it is not supported to be used with other clauses for the time being. | -| RENAME TABLE | N, Can be replaced by `ALTER TABLE tbl RENAME TO new_tbl` | -| DROP TABLE | Y | -| CREATE INDEX | Y, Secondary indexes have no speedup | -| DROP INDEX | Y | -| MODIFY COLUMN | N | -| PRIMARY KEY | Y | -| CREATE VIEW | Y | -| ALTER VIEW | Y | -| DROP VIEW | Y | -| TRUNCATE TABLE | Y | -| AUTO_INCREMENT | Y | -| SEQUENCE | Y | -| TEMPORARY TABLE | Y | -| CREATE STREAM | E, Only some types are supported | -| PARTITION BY | E, Only some types are supported | -| CHARSET, COLLATION | N, Only UTF8 is supported by default | - -## Data manipulation/query language (DML/DQL) - -| SQL Statement | Supported(Y)/Not supported (N) /Experimental (E) | -| ---------------------- | ------------------------------------ | -| SELECT | Y | -| INSERT | Y | -| UPDATE | Y | -| DELETE | Y | -| REPLACE | Y | -| INSERT ON DUPLICATE KEY UPDATE | Y | -| LOAD DATA | Y | -| SELECT INTO | Y | -| INNER/LEFT/RIGHT/OUTER JOIN | Y | -| UNION, UNION ALL | Y | -| EXCEPT, INTERSECT | Y | -| GROUP BY, ORDER BY | Y | -| CLUSTER BY | Y | -| SUBQUERY | Y | -| (Common Table Expressions, CTE) | Y | -| BEGIN/START TRANSACTION, COMMIT, ROLLBACK | Y | -| EXPLAIN | Y | -| EXPLAIN ANALYZE | Y | -| LOCK/UNLOCK TABLE | N | -| User-defined Variables | Y | - -## Advanced SQL Features - -| Advanced SQL Features | Supported(Y)/Not supported (N) /Experimental (E) | -| ----------------------------- | ---------------------------------- | -| PREPARE | Y | -| STORED PROCEDURE | N | -| TRIGGER | N | -| EVENT SCHEDULER | N | -| UDF | N | -| Materialized VIEW | N | - -## Data types - -| Data type categories | Data types | Supported(Y)/Not supported (N) /Experimental (E) | -| -------------------- | ----------------- | ---- | -| Integer Numbers | TINYINT/SMALLINT/INT/BIGINT (UNSIGNED) | Y | -| | BIT | N | -| Real Numbers | FLOAT | Y | -| | DOUBLE | Y | -| String Types | CHAR | Y | -| | VARCHAR | Y | -| | BINARY | Y | -| | VARBINARY | Y | -| | TINYTEXT/TEXT/MEDIUMTEXT/LONGTEXT | Y | -| | ENUM | Y, Not support **Filtering ENUM values** and **Sorting ENUM values** | -| | SET | N | -| Binary Types | TINYBLOB/BLOB/MEDIUMBLOB/LONGBLOB | Y | -| Time and Date Types | DATE | Y | -| | TIME | Y | -| | DATETIME | Y | -| | TIMESTAMP | Y | -| | YEAR | Y | -| Boolean | BOOL | Y | -| Decimal Types | DECIMAL | Y, up to 38 digits | -| JSON Types | JSON | Y | -| vector type | VECTOR | N | -| Spatial Type | SPATIAL | N | - -## Indexing and constraints - -| Indexing and constraints | Supported(Y)/Not supported (N) /Experimental (E) | -| ------------------------------------ | ---- | -| PRIMARY KEY | Y | -| Composite PRIMARY KEY | Y | -| UNIQUE KEY | Y | -| Secondary KEY | Y, Syntax only implementation | -| FOREIGN KEY | Y | -| Enforced Constraints on Invalid Data | Y | -| ENUM and SET Constraints | N | -| NOT NULL Constraint | Y | -| AUTO INCREMENT Constraint | Y | - -## Transactions - -| Transactions | Supported(Y)/Not supported (N) /Experimental (E) | -| ------------------------ | ---- | -| Pessimistic transactions | Y | -| Optimistic transactions | Y | -| Distributed Transaction | Y | -| Snapshot Isolation | Y | -| READ COMMITTED | Y | - -## Functions and Operators - -| Functions and Operators Categories | Supported(Y)/Not supported (N) /Experimental (E) | -| ---------------------------------- | ------------------- | -| Aggregate Functions | Y | -| Mathematical | Y | -| Datetime | Y | -| String | Y | -| CAST | Y | -| Flow Control Functions | E | -| Window Functions | Y | -| JSON Functions | Y | -| System Functions | Y | -| Other Functions | Y | -| Operators | Y | - -## PARTITION - -| PARTITION | Supported(Y)/Not supported (N) /Experimental (E) | -| ----------------- | ---------------------------------- | -| KEY(column_list) | E | -| HASH(expr) | E | -| RANGE(expr) | E | -| RANGE COLUMNS | E | -| LIST | E | -| LIST COLUMNS | E | - -## Import and Export - -| Import and Export | Supported(Y)/Not supported (N) /Experimental (E) | -| ----------------- | ---------------------------------- | -| LOAD DATA | Y | -| SOURCE | Y | -| Load data from S3 | Y | -| modump| Y | -| mysqldump | N | - -## Security and Access Control - -| Security | Supported(Y)/Not supported (N) /Experimental (E) | -| -------------------------- | ---------------------- ------------ | -| Transport Layer Encryption TLS | Y | -| Encryption at rest | Y | -| Import from Object Storage | Y | -| Role-Based Access Control (RBAC) | Y | -| Multi-Account | Y | - -## Backup and Restore - -| Backup and Restore | Supported(Y)/Not supported (N) /Experimental (E) | -| ------------ | ---------------------------------- | -| Logical Backup and Restore | Y, Only the modump tool is supported | -| Physical Backup and Restore | Y | - -## Management Tool - -| Management Tool | Supported(Y)/Not supported (N) /Experimental (E) | -| -------------------- | ---------------------------------- | -| Stand-alone mo_ctl deployment management | Y | -| Distributed mo_ctl deployment management | E, Enterprise Edition only | -| Visual management platform | E, Public cloud version only | -| System Logging | Y | -| System indicator monitoring | Y | -| Slow query log | Y | -| SQL record | Y | -| Kubernetes operator | Y | - -## Deployment Method - -| Deployment Method | Supported(Y)/Not supported (N) /Experimental (E) | -| -------------------- | ---------------------------- ------- | -| Stand-alone environment privatization deployment | Y | -| Distributed environment privatization deployment | Y, self-built Kubernetes and minIO object storage | -| Alibaba Cloud distributed self-built deployment | Y, ACK+OSS | -| Tencent Cloud Distributed Self-built Deployment | Y, TKE+COS | -| AWS distributed self-built deployment | Y, EKS+S3 | -| Public Cloud Serverless | Y, MatrixOne Cloud, Support AWS, Alibaba Cloud | diff --git a/docs/MatrixOne/Overview/architecture/1.1-matrixone-architecture-design.md b/docs/MatrixOne/Overview/architecture/1.1-matrixone-architecture-design.md deleted file mode 100644 index c7384340b..000000000 --- a/docs/MatrixOne/Overview/architecture/1.1-matrixone-architecture-design.md +++ /dev/null @@ -1,133 +0,0 @@ -# **MatrixOne Architecture Design** - -## **MatrixOne Overview** - -MatrixOne is a future-oriented hyperconverged cloud & edge native DBMS that supports transactional, analytical, and streaming workload with a simplified and distributed database engine working across multiple datacenters, clouds, edges, and other heterogenous infrastructures. This combination of engines is called HSTAP. - -As a redefinition of the HTAP database, HSTAP aims to meet all the needs of Transactional Processing (TP) and Analytical Processing (AP) within a single database. Compared with the traditional HTAP, HSTAP emphasizes its built-in streaming capability used for connecting TP and AP tables. This provides users with an experience that a database can be used just like a Big Data platform, with which many users are already familiar thanks to the Big Data boom. With minimal integration efforts, MatrixOne frees users from the limitations of Big Data and provides one-stop coverage for all TP and AP scenarios for enterprises. - -## **MatrixOne Architecture Layers** - -MatrixOne implements three independent layers, each with its object units and responsibilities. Different nodes can freely scale, no longer constrained by other layers. These three layers are: - -![](https://github.com/matrixorigin/artwork/blob/main/docs/overview/architecture/architecture-1.png?raw=true) - -- Compute Layer: Based on Compute Nodes (CNs), MatrixOne enables serverless computing and transaction processing with its cache, which is capable of random restarts and scaling. -- Transaction Layer: Based on Transaction Nodes and Log Services, MatrixOne provides complete logging services and metadata information, with built-in Logtail for recent data storage. -- Storage Layer: Full data is stored in object storage, represented by S3, implementing a low-cost, infinitely scalable storage method. A unified File Service enables seamless operations on underlying storage by different nodes. - -![](https://github.com/matrixorigin/artwork/blob/main/docs/overview/architecture/architecture-2.png?raw=true) - -After deciding on TAE as the sole storage engine, multiple design adjustments were made to the fused TAE engine, resulting in the TAE storage engine. This engine has the following advantages: - -- Columnar Storage Management: Uniform columnar storage and compression methods provide inherent performance advantages for OLAP businesses. -- Transaction Processing: Shared logs and TN nodes jointly support transaction processing for compute nodes. -- Hot and Cold Separation: Using S3 object storage as the target for File Service, each compute node has its cache. - -![](https://github.com/matrixorigin/artwork/blob/main/docs/overview/architecture/architecture-3.png?raw=true) - -The compute engine is based on the fundamental goal of being compatible with MySQL, with higher requirements for node scheduling, execution plans, and SQL capabilities. The high-performance compute engine has both MPP (massively parallel processing) and experimental architecture: - -- MySQL Compatible: Supports MySQL protocol and syntax. -- Fused Engine: Rebuilds execution plans based on DAG, capable of executing both TP and AP. -- Node Scheduling: Future support for adaptive intra-node and inter-node scheduling, meeting both concurrency and parallelism requirements. -- Comprehensive SQL Capability: Supports subqueries, window functions, CTE, and spill memory overflow processing. -- Vector Support: Supports vector storage and querying, making it a valuable storage tool for various machine learning applications, including facial recognition, recommendation systems, and genomics analytics. - -## **MatrixOne Architecture Design** - -The MatrixOne architecture is as follows: - -![MatrixOne Architecture](https://github.com/matrixorigin/artwork/blob/main/docs/overview/matrixone_new_arch.png?raw=true) - -The architecture of MatrixOne is divided into several layers: - -## **Cluster Management Layer** - -Being responsible for cluster management, it interacts with Kubernetes to obtain resources dynamically when in the cloud-native environment, while in the on-premises deployment, it gets hold of resources based on the configuration. Cluster status is continuously monitored with the role of each node allocated based on resource information. Maintenance works are carried out to ensure that all system components are up and running despite occasional node and network failures. It rebalances the loads on nodes when necessary as well. Major components in this layer are: - -* Prophet Scheduler: take charge of load balancing and node keep-alive. -* Resource Manager: being responsible for physical resources provision. - -## **Serverless Layer** - -Serverless Layer is a general term for a series of stateless nodes, which, as a whole, contains three categories: - -* Background tasks: the most important one is called Offload Worker, which is responsible for offloading expensive compaction tasks and flushing data to S3 storage. -* SQL compute nodes: responsible for executing SQL requests, here divided into write nodes and read nodes. The former also provides the ability to read the freshest data. -* Stream task processing node: responsible for executing stream processing requests. - -## **Log(Reliability) Layer** - -As MatrixOne's Single Source of Truth, data is considered as persistently stored in MatrixOne once it is written into the Log Layer. It is built upon our world-class expertise in the Replicated State Machine model to guarantee state-of-the-art high throughput, high availability, and strong consistency for our data. Following a fully modular and disaggregated design by itself, it is also the central component that helps to decouple the storage and compute layers. This in turn earns our architecture much higher elasticity when compared with traditional NewSQL architecture. - -## **Storage Layer** - -The storage layer transforms the incoming data from the Log Layer into an efficient form for future processing and storage. This includes cache maintenance for fast accessing data that has already been written to S3. - -In MatrixOne, **TAE (Transactional Analytic Engine)** is the primary interface exposed by the Storage Layer, which can support both row and columnar storage together with transaction capabilities. Besides, the Storage Layer includes other internally used storage capabilities as well, e.g. the intermediate storage for streaming. - -## **Storage Provision Layer** - -As an infrastructure agnostic DBMS, MatrixOne stores data in shared storage of S3 / HDFS, or local disks, on-premise servers, hybrid, and any cloud, or even smart devices. The Storage Provision Layer hides such complexity from upper layers by just presenting them with a unified interface for accessing such diversified storage resources. - -## MatrixOne System Components - -![MatrixOne Component](https://github.com/matrixorigin/artwork/blob/main/docs/overview/mo-component.png?raw=true) - -In MatrixOne, to achieve the integration of distributed and multi-engine, a variety of different system components are built to complete the functions of the architecture-related layers: - -### **File Service** - -File Service is the component of MatrixOne responsible for reading and writing all storage media. Storage media include memory, disk, object storage, and so on., which provide the following features: - -- File Service provides a unified interface so that reading and writing of different media can use the same interface. -- The design of the interface follows the concept of immutable data. After the file is written, no further updates are allowed. The update of the data is realized by generating a new file. -- This design simplifies operations such as data caching, migration, and verification and is conducive to improving the concurrency of data operations. -- Based on a unified read-write interface, File Service provides a hierarchical cache and a flexible cache strategy to balance read-write speed and capacity. - -### **Log Service** - -Log Service is a component specially used to process transaction logs in MatrixOne, and it has the following features: - -- The Raft protocol ensures consistency, and multiple copies are used to ensure availability. -- Save and process all transaction logs in MatrixOne, ensure that Log Service logs are read and written typically before the transaction is committed, and check and replay the log content when the instance is restarted. -- After the transaction is submitted and placed, truncate the content of the Log Service to control the size of the Log Service. The content that remains in the Log Service after truncation is called Logtail. -- If multiple Log Service copies are down at the same time, the entire MatrixOne will be down. - -### **Transaction Node** - -The Transaction Node (TN) is the carrier used to run MatrixOne's distributed storage engine TAE, which provides the following features: - -- Manage metadata information in MatrixOne and transaction log content saved in Log Service. -- Receive distributed transaction requests sent by Computing Node (CN), adjudicate the read and write requests of distributed transactions, push transaction adjudication results to CN, and push transaction content to Log Service to ensure the ACID characteristics of transactions. -- Generate a snapshot according to the checkpoint in the transaction to ensure the snapshot isolation of the transaction, and release the snapshot information after the transaction ends. - -### **Computing Node** - -The computing node (CN) is a component of Matrixone that accesses user requests and processes SQL. The toolkit includes the following modules: - -- Frontend, it handles the client SQL protocol, accepts the client's message, parses it to get the executable SQL of MatrixOne, calls other modules to execute the SQL, organizes the query results into a message, and returns it to the client. -- Plan, parse the SQL processed by Frontend, generate a logical execution plan based on MatrixOne's calculation engine and send it to Pipeline. -- Pipeline, which parses the logical plan, converts the logical plan into an actual execution plan and then runs the execution plan through Pipeline. -- Disttae, responsible for specific read and write tasks, including synchronizing Logtail from TN and reading data from S3, and sending the written data to TN. - -### **Stream Engine** - -Stream Engine is a new component within MatrixOne, serving as an integrated stream engine designed for real-time querying, processing, and enriching data stored in a series of incoming data points, also known as data streams. With Stream Engine, you can employ SQL to define and create streaming processing pipelines, offering real-time data backend services. Additionally, you can utilize SQL to query data within streams and establish connections with non-streaming datasets, thereby further streamlining the data stack. - -### **Proxy** - -The Proxy component is a powerful tool mainly used for load balancing and SQL routing. It has the following functions: - -- Through SQL routing, resource isolation between different accounts is realized, ensuring that the CNs of different accounts will not affect each other. -- Through SQL routing, users can do a second split in the resource group of the same account, improving resource utilization. -- The load balancing between different CNs is realized in the second split resource group, making the system more stable and efficient. - -## **Learn More** - -This page outlines the overall architecture design of MatrixOne. For information on other options that are available when trying out MatrixOne, see the following: - -* [Install MatrixOne](../../Get-Started/install-standalone-matrixone.md) -* [MySQL Compatibility](../feature/mysql-compatibility.md) -* [What's New](../whats-new.md) diff --git a/docs/MatrixOne/Overview/architecture/architecture-wal.md b/docs/MatrixOne/Overview/architecture/architecture-wal.md new file mode 100644 index 000000000..bf4736669 --- /dev/null +++ b/docs/MatrixOne/Overview/architecture/architecture-wal.md @@ -0,0 +1,157 @@ +# WAL Technical Details + +WAL (Write Ahead Log) is a technology related to database atomicity and persistence that converts random writes into sequential reads and writes when a transaction is committed. Transactional changes occur randomly on pages that are scattered, and the overhead of random writes is greater than sequential writes, which degrades commit performance. WAL only records changes to a transaction, such as adding a line to a block. The new WAL entry is written sequentially at the end of the WAL file when the transaction is committed, and then asynchronously updates those dirty pages after the commit, destroying the corresponding WAL entry and freeing up space. + +MatrixOne's WAL is a physical log that records where each row of updates occurs, and each time it is played back, the data is not only logically the same, but also the same organization at the bottom. + +## Commit Pipeline + +Commit Pipeline is a component that handles transaction commits. The memtable is updated before committing, persisting WAL entries, and the time taken to perform these tasks determines the performance of the commit. Persistent WAL entry involves IO and is time consuming. The commit pipeline is used in MatrixOne to asynchronously persist WAL entries without blocking updates in memory. + +![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/overview/architecture/wal_Commit_Pipeline.png) + +**The transaction commit process is:** + +- Update the changes to the memtable and update the memtable concurrently before the transaction enters the commit pipeline without blocking each other. The status of these changes is then uncommitted and invisible to any transaction; + +- Enter the commit pipeline to check for conflicts; + +- Persistent WAL entry, collecting WAL entry writes from memory to the backend. Persistent WAL entries are asynchronous, and queues that simply pass the WAL entry to the backend immediately return without waiting for the write to succeed, so that other subsequent transactions are not blocked. The backend handles a batch of entries simultaneously, further accelerating persistence through Group Commit. + +- Updating the state in the memtable makes the transaction visible, and the transaction updates the state in the order in which it was queued, so that the order in which the transaction is visible matches the order in which WAL entries were written in the queue. + +## Checkpoint + +Checkpoint writes dirty data to Storage, destroys old log entries, and frees up space. In MatrixOne, checkpoint is a background-initiated task that flows: + +- Select an appropriate timestamp as checkpoint and scan for changes before the timestamp. t0 on the diagram is the last checkpoint and t1 is the currently selected checkpoint. Changes that occur between \[t0,t1] require staging. + +- Dump DML modifications. DML changes are stored in various blocks in the memtable. Logtail Mgr is a memory module that records which blocks are changed for each transaction. Scan transactions between \[t0,t1] on Logtail Mgr, initiate background transactions to dump these blocks onto Storage, and record addresses in metadata. This way, all DML changes committed before t1 can be traced to addresses in the metadata. In order to do checkpoints in time to keep the WAL from growing infinitely, even if the block in the interval changes only one line, it needs to be dumped. + +
+ +
+ +- Scans for Catalog dump DDL and metadata changes. The Catalog is a tree that records all DDL and metadata information, and each node on the tree records the timestamp at which the change occurred. Collect all changes that fall between \[t0,t1] when scanning. + +
+ +
+ +- Destroy the old WAL entry. The LSN corresponding to each transaction is stored in Logtail Mgr. Based on the timestamp, find the last transaction before t1 and tell Log Backend to clean up all logs before the LSN of this transaction. + +## Log Backend + +MatrixOne's WAL can be written in various Log Backends. The original Log Backend was based on the local file system. For distributed features, we developed our own highly reliable low latency Log Service as the new Log Backend. We abstract a virtual backend to accommodate different log backends, developed by some very lightweight drivers, docking different backends. + +**Driver needs to adapt these interfaces:** + +- Append, write log entry asynchronously when committing a transaction: + +``` Append(entry) (Lsn, error) ``` + +- Read, batch read log entry on reboot: + +``` Read(Lsn, maxSize) (entry, Lsn, error) ``` + +- The Truncate interface destroys all log entries before the LSN, freeing up space: + +``` Truncate(lsn Lsn) error ``` + +## Group Commit + +Group Commit accelerates persistent log entries. Persistent log entry involves IO, is time consuming, and is often a bottleneck for commits. To reduce latency, bulk write log entries to Log Backend. For example, fsync takes a long time in a file system. If each entry is fsynced, it takes a lot of time. In file system-based Log Backend, where multiple entries are written and fsynced only once, the sum of the time costs of these entry swipes approximates the time of one entry swipe. + +![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/overview/architecture/wal_Group_Commit.png) + +Concurrent writes are supported in the Log Service, and the time of each entry swipe can overlap, which also reduces the total time to write an entry and improves the concurrency of commits. + +## Handle out-of-order LSNs for Log Backend + +To accelerate, concurrent entries are written to Log Backend in an inconsistent order of success and the order in which the requests are made, resulting in inconsistent LSNs generated in Log Backend and logical LSNs passed to the Driver by the upper layers. Truncate and reboot to handle these out-of-order LSNs. In order to ensure that the LSNs in Log Backend are basically ordered and the out-of-order span is not too large, a window of logical LSNs is maintained that stops writing new entries to Log Backend if there are very early log entries that are not being written successfully. For example, if the window is 7 in length and an entry with an LSN of 13 in the figure has not been returned, it blocks an entry with an LSN greater than or equal to 20. + +![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/overview/architecture/wal_Log_Backend.png) + +Destroy the log in Log Backend with the truncate operation, destroying all entries before the specified LSN. entry before this LSN corresponds to a logical LSN that is smaller than the logical truncate point. For example, the logic truncates through 7 in the figure. This entry corresponds to 11 in Log Backend, but the logical LSNs for 5, 6, 7, and 10 in Log Backend are all greater than 7 and cannot be truncate. Log Backend can only truncate 4. + +On restart, those discontinuous entries at the beginning and end are skipped. For example, when Log Backend on the diagram writes to 14, the entire machine loses power, and the first 8,9,11 are filtered out on reboot based on the last truncate information. When all entries are read and the logical LSNs of 6,14 and the other entries are not continuous, the last 6 and 14 are discarded. + +## Specific format for WAL in MatrixOne + +Each write transaction corresponds to one log entry and consists of an LSN, Transaction Context, and multiple Commands. + +``` +---------------------------------------------------------+ | Transaction Entry | +-----+---------------------+-----------+-----------+- -+ | LSN | Transaction Context | Command-1 | Command-2 | ... | +-----+---------------------+-----------+-----------+- -+ ``` + +**LSN**: Each log entry corresponds to one LSN. The LSN is incremented continuously and is used to delete entries when doing checkpoints. + +**Transaction Context**:Logging transaction information + +- StartTS is the timestamp when the transaction started. +- CommitTS is the timestamp of the end. +- Memo records where a transaction changes data. Upon reboot, this information is restored to Logtail Mgr and used for checkpointing. + +``` +---------------------------+ | Transaction Context | +---------+----------+------+ | StartTS | CommitTS | Memo | +---------+----------+------+ ``` + +**Transaction Commands**: Each write operation in a transaction corresponds to one or more commands. log entry logs all commands in the transaction. + +| Operator | Command | | :----------------- | :---------------- | | DDL | Update Catalog | | Insert | Update Catalog | | | Append | | Delete | Delete | | Compact&Merge | Update Catalog | + +- Operators: The DN in MatrixOne is responsible for committing transactions, writing log entries into Log Backend, doing checkpoints. DN supports build library, delete library, build table, delete table, update table structure, insert, delete, while background automatically triggers sorting. The update operation is split into insert and delete. + + - DDL + DDL includes building libraries, deleting libraries, building tables, deleting tables, and updating table structures. The DN records information about tables and libraries in the Catalog. The Catalog in memory is a tree and each node is a catalog entry. catalog entry has 4 classes, database, table, segment and block, where segment and block are metadata that changes when data is inserted and sorted in the background. Each database entry corresponds to one library and each table entry corresponds to one table. Each DDL operation corresponds to a database/table entry, which is logged in entry as Update Catalog Command. + + - Insert + The newly inserted data is recorded in the Append Command. The data in the DN is recorded in blocks, which form a segment. If there are not enough blocks or segments in the DN to record the newly inserted data, a new one is created. These changes are documented in the Update Catalog Command. In large transactions, the CN writes the data directly to S3 and the DN commits only the metadata. This way, the data in the Append Command will not be large. + + - Delete + The line number where the DN record Delete occurred. When reading, read all the inserted data before subtracting the rows. In a transaction, all deletions on the same block are merged to correspond to a Delete Command. + + - Compact & Merge + The DN background initiates a transaction to dump the data in memory onto s3. Sort the data on S3 by primary key for easy filtering when reading. compact occurs on a block and the data within the block is ordered after compact. merge occurs within a segment and involves multiple blocks, after merge the entire segment is ordered. The data before and after compact/merge remains the same, changing only the metadata, deleting the old block/segment, and creating a new block/segment. Each delete/create corresponds to one Update Catalog Command. + +- Commands + +
&nbsp&nbsp&nbsp1. &nbspUpdate Catalog
+ +The Catalog is database, table, segment, and block from top to bottom. An Updata Catalog Command corresponds to a Catalog Entry. One Update Catalog Command per ddl or with the new metadata. The Update Catalog Command contains Dest and EntryNode. + +``` +-------------------+ | Update Catalog | +-------+-----------+ | Dest | EntryNode | +-------+-----------+ ``` + +Dest is where this command works, recording the id of the corresponding node and his ancestor node. Upon reboot, via Dest, locate the location of the action on the Catalog. + +| Type | Dest | | :------------------|:------------------------------------------| | Update Database | database id | | Update Table | database id,table id | | Update Segment | database id,table id,segment id | | Update Block | atabase id,table id,segment id,block id | + +EntryNode records when an entry was created and deleted. If entry is not deleted, the deletion time is 0. If the current transaction is being created or deleted, the corresponding time is UncommitTS. + +``` +-------------------+ | Entry Node | +---------+---------+ | Create@ | Delete@ | +---------+---------+ ``` + +For segment and block, Entry Node also records metaLoc, deltaLoc, which are the addresses recorded on S3 for data and deletion, respectively. + +``` +----------------------------------------+ | Entry Node | +---------+---------+---------+----------+ | Create@ | Delete@ | metaLoc | deltaLoc | +---------+---------+---------+----------+ ``` + +For tables, Entry Node also documents the table structure schema. + +``` +----------------------------+ | Entry Node | +---------+---------+--------+ | Create@ | Delete@ | schema | +---------+---------+--------+ ``` + +
&nbsp&nbsp&nbsp2. &nbspAppend
+ +The inserted data and the location of that data are documented in the Append Command. + +``` +-------------------------------------------+ | Append Command | +--------------+--------------+- -+-------+ | AppendInfo-1 | AppendInfo-2 | ... | Batch | +--------------+--------------+- -+-------+ ``` + +- Batch is the inserted data. + +- AppendInfo + Data in one Append Data Command may span multiple blocks. Each block corresponds to an Append Info that records the location of the data in Command's Batch pointer to data, and the location destination of the data in the block. + +``` +------------------------------------------------------------------------------+ | AppendInfo | +-----------------+------------------------------------------------------------+ | pointer to data | destination | +--------+--------+-------+----------+------------+----------+--------+--------+ | offset | length | db id | table id | segment id | block id | offset | length | +--------+--------+-------+----------+------------+----------+--------+--------+ ``` + +
&nbsp&nbsp&nbsp3. &nbspDelete Command
+ +Each Delete Command contains only one delete from a block. + +``` +---------------------------+ | Delete Command | +-------------+-------------+ | Destination | Delete Mask | +-------------+-------------+ ``` + +- Destination record on which Block Delete occurred. +- Delete Mask records the deleted line number. diff --git a/docs/MatrixOne/Overview/feature/mysql-compatibility.md b/docs/MatrixOne/Overview/feature/mysql-compatibility.md index f4bebcb83..628f53cbf 100644 --- a/docs/MatrixOne/Overview/feature/mysql-compatibility.md +++ b/docs/MatrixOne/Overview/feature/mysql-compatibility.md @@ -31,9 +31,6 @@ MatrixOne is highly compatible with the MySQL 8.0 protocol and commonly used fea ### About TABLE -* The `CREATE TABLE .. AS SELECT` statement is not supported. -* Support `AUTO_INCREMENT` in the column definition, but not the `AUTO_INCREMENT` custom start value in a table definition. -* `CHARACTER SET/CHARSET` and `COLLATE` in column definitions are not supported. * `ENGINE=` in the table definition is not supported. * The clauses: `CHANGE [COLUMN]`, `MODIFY [COLUMN]`, `RENAME COLUMN`, `ADD [CONSTRAINT [symbol]] PRIMARY KEY`, `DROP PRIMARY KEY`, and `ALTER COLUMN ORDER BY` can be freely combined in `ALTER TABLE`, these are not supported to be used with other clauses for the time being. * Temporary tables currently do not support using `ALTER TABLE` to modify the table structure. @@ -43,7 +40,6 @@ MatrixOne is highly compatible with the MySQL 8.0 protocol and commonly used fea ### About VIEW -* `CREATE OR REPLACE VIEW` is not supported. * The `with check option` clause is not supported, but MatrixOne simply ignores' ENGINE= '. * The `DEFINER` and `SQL SECURITY` clauses are not supported. @@ -83,7 +79,7 @@ MatrixOne is highly compatible with the MySQL 8.0 protocol and commonly used fea ### About INSERT -* MatrixOne does not support modifiers such as `LOW_PRIORITY`, `DELAYED`, `HIGH_PRIORITY`, `IGNORE`. +* MatrixOne does not support modifiers such as `LOW_PRIORITY`, `DELAYED`, `HIGH_PRIORITY`. ### About UPDATE @@ -100,7 +96,6 @@ MatrixOne is highly compatible with the MySQL 8.0 protocol and commonly used fea ### About LOAD * MatrixOne supports `SET`, but only in the form of `SET columns_name=nullif(expr1,expr2)`. -* MatrixOne does not support `ESCAPED BY`. * MatrixOne supports `LOAD DATA LOCAL` on the client side, but the `--local-infle` parameter must be added when connecting. * MatrixOne supports the import of `JSONlines` files but requires some unique syntax. * MatrixOne supports importing files from object storage but requires some unique syntax. @@ -119,8 +114,13 @@ MatrixOne is highly compatible with the MySQL 8.0 protocol and commonly used fea * Triggers are not supported. * Stored procedures are not supported. * Event dispatchers are not supported. -* Custom functions are not supported. * Materialized views are not supported. +* Support for custom functions, Python only, with big differences in use with MySQL. + +## Stream Computing + +* Streaming is unique to MatrixOne and currently version 1.2.2 only supports Kafka connectors. +* Kafka connectors need to be created and used with a special syntax. ## Data Types @@ -130,19 +130,27 @@ MatrixOne is highly compatible with the MySQL 8.0 protocol and commonly used fea * DATETIME: The maximum value range of MySQL is `'1000-01-01 00:00:00'` to `'9999-12-31 23:59:59'`, and the maximum range of MatrixOne is `'0001-01 -01 00:00:00'` to `'9999-12-31 23:59:59'`. * TIMESTAMP: The maximum value range of MySQL is `'1970-01-01 00:00:01.000000'` UTC to `'2038-01-19 03:14:07.999999'` UTC, the maximum range of MatrixOne is `'0001- 01-01 00:00:00'` UTC to `'9999-12-31 23:59:59'` UTC. * MatrixOne supports `UUID` type. +* MatrixOne supports vector types. * Spatial types are not supported. -* `BIT` and `SET` types are not supported. +* `SET` types are not supported. * `MEDIUMINT` type is not supported. ## Indexes and Constraints +* MatrixOne supports vector indexing. * Secondary indexes only implement syntax and have no speedup effect. * Foreign keys do not support the `ON CASCADE DELETE` cascade delete. ## Partition Support -* Only support `KEY`, `HASH` two partition types. -* Subpartitions implement only syntax, not functionality. +* Supports KEY, HASH, RANGE, RANGE COLUMNS, LIST, LIST COLUMNS. +* Supports KEY, HASH two kinds of partition cropping, the other four are not yet realized. +* Sub-partitioning only implements the syntax, not the function. +* ADD/DROP/TRUNCATE PARTITION is not yet supported. + +## MatrixOne Keywords + +* MatrixOne and MySQL keywords have many differences, see [MatrixOne Keywords](../../Reference/Language-Structure/keywords.md). ## Functions and Operators @@ -187,22 +195,32 @@ MatrixOne is highly compatible with the MySQL 8.0 protocol and commonly used fea * MatrixOne defaults to optimistic transactions. * different from MySQL, DDL statements in MatrixOne are transactional, and DDL operations can be rolled back within a transaction. +* SET operations within a transaction are not allowed in MatrixOne. * Table-level lock `LOCK/UNLOCK TABLE` is not supported. ## Backup and Restore -* The mysqldump backup tool is not supported; only the modump tool is supported. -* Physical backups are supported. -* Does not support binlog log backup. -* Incremental backups are not supported. +* Support for physical backups based on the mobackup tool. +* Snapshot backup support +* The mysqldump backup tool is not supported, only the mo-dump tool. +* Binlog log backup is not supported. ## System variables -* MatrixOne's `lower_case_table_names` has 5 modes; the default is 1. +* MatrixOne's lower_case_table_names has 2 modes, default is 1. +* MatrixOne's sql_mode only supports ONLY_FULL_GROUP_BY. + +## System Tables + +* MatrixOne's system tables have their own unique system tables, but also take into account MySQL's system tables as a whole. +* The default mysql and information_schema libraries in MatrixOne are compatible with the MySQL usage model. +* The system_metrics system library in MatrixOne collects and stores a range of runtime status monitoring data for MatrixOne services. +* The system_system library in MatrixOne collects statements and system logs executed by users and systems in MatrixOne. +* The mo_catalog system library in MatrixOne stores various database objects and metadata in MatrixOne. ## Programming language -* Java, Python, Golang connectors, and ORM are basically supported, and connectors and ORMs in other languages ​​may encounter compatibility issues. +* Java, Python, C#, Golang connectors, and ORM are basically supported, and connectors and ORMs in other languages ​​may encounter compatibility issues. ## Other support tools diff --git a/docs/MatrixOne/Overview/feature/stream.md b/docs/MatrixOne/Overview/feature/stream.md new file mode 100644 index 000000000..0533310bd --- /dev/null +++ b/docs/MatrixOne/Overview/feature/stream.md @@ -0,0 +1,29 @@ +# Stream + +## Features of streaming data + +With the rise of real-time data analytics, streaming data is becoming increasingly important in several areas. These data sources include, but are not limited to, real-time social media developments, online retail transactions, real-time market analysis, network security monitoring, instant messaging records, and real-time data on smart city infrastructure. Streaming data has a wide range of applications, such as: + +- Real-time monitoring system: network traffic monitoring, user online behavior analysis, IoT device status monitoring; +- E-commerce platform: real-time user shopping behavior tracking, inventory dynamic adjustment, real-time price updates; +- Real-time interactive applications: social media dynamic real-time streaming, online gamer interaction data; +- Real-time risk management: financial transaction anomaly monitoring, network security threat detection; +- Smart city management: real-time traffic flow monitoring, public safety monitoring, environmental quality monitoring. + +The distinguishing features of streaming data are real-time and continuity. This means that data is constantly generated and transmitted instantly, reflecting the latest situation at every turn. In addition, due to the large and rapidly changing volume of data streams, traditional data processing methods are often difficult to cope with and require more efficient processing and analysis techniques. Therefore, streaming data processing typically requires: + +- Real-time data aggregation: real-time aggregation and analysis of continuously flowing data; +- Dynamic Data Window: Analyzes data streams within a set time period for trend analysis and pattern recognition; +- High throughput and low latency: processing large amounts of data while ensuring the immediacy and accuracy of data processing. + +These characteristics make streaming data play an increasingly important role in modern data-driven decision-making processes, especially in scenarios that require rapid response and real-time insights. + +## MatrixOne's ability to stream + +### Source + +MatrixOne synchronizes data between external data streams and MatrixOne database tables through Source. By enabling a precise connection and data mapping mechanism, Source not only ensures seamless docking of data streams, but also guarantees data integrity and accuracy. + +### Dynamic Table Dynamic Table + +Dynamic Table is the core embodiment of MatrixOne's capabilities on the stream. Dynamic tables capture, process, and transform data flowing into both Source and plain data tables in real time, guaranteeing instant updates and accurate representation of information flows throughout the system. This design not only improves the flexibility and efficiency of data processing, but also optimizes the responsiveness and processing performance of the entire system for complex data scenarios. diff --git a/docs/MatrixOne/Overview/feature/time-series.md b/docs/MatrixOne/Overview/feature/time-series.md new file mode 100644 index 000000000..7445aeb55 --- /dev/null +++ b/docs/MatrixOne/Overview/feature/time-series.md @@ -0,0 +1,27 @@ +# Timing capability + +## Characteristics of Time Series Data + +With the development of the Internet of Things, there is a growing demand for time series databases, such as data generated by smart cars, equipment monitoring in factories, and transaction market indicator data in the financial industry. Common business scenarios include: + +Monitoring software systems: virtual machines, containers, services, applications; Monitoring physical systems: hydrological monitoring, equipment monitoring in manufacturing plants, national security-related data monitoring, communications monitoring, sensor data, blood glucose meters, blood pressure changes, heart rate, etc.; Asset tracking applications: cars, trucks, physical containers, shipping pallets; Financial transaction systems: traditional securities, emerging crypto digital currencies; Event applications: tracking user, customer interaction data; Business intelligence tools: tracking key indicators and the overall health of the business; Internet industry: also has very large amounts of time series data, such as user behavior tracks on websites, log data generated by applications, etc. + +On the one hand, because of the temporal nature of time series databases, which constantly produce new data over time; on the other hand, the amount of time series data is enormous, with tens or hundreds of millions of pieces of data written every second. These two features make time series databases appear in some business needs, common business needs are: + +1. Get the latest status and query the most recent data (e.g. the latest status of the sensor); +2. Display interval statistics, specify time frames, query statistics such as average, maximum, minimum, count, etc.; +3. Gets the exception data and filters the exception data based on the specified criteria. + +## Timing capabilities of MatrixOne + +Currently there are a number of dedicated NoSQL time series databases in the industry, such as InfluxDB, OpenTSDB, TDEngine, etc., and MatrixOne differs from them in that MatrixOne is still a general purpose database to meet the application development of additions, deletions, changes and data analysis of HTAP as the core, but also remains a modeling form of relational data, the query language used is still the classic SQL language, MatrixOne is to add some time series related capabilities to the general purpose database capabilities, positioning somewhat similar to TimeScaleDB. MatrixOne functionally supports time windows, downsampling, interpolation, partitioning and other time series common capabilities, performance can meet the high throughput, high compression, real-time analysis in time series scenarios, while the overall architecture of strong scalability, hot and cold separation, read and write separation and other features are also ideal for time series related scenarios, while maintaining traditional database support for updates, transactions. Therefore, MatrixOne is better suited for business development that requires a normal relational database, but at the same time has hybrid scenarios that require some timing processing power. + +MatrixOne's timing capabilities are reflected in: + +- Supports string, number, date and other data types common in traditional databases, but also supports JSON, vector and other types of new types of load, see [data types](../../Reference/Data-Types/data-types.md) for details. +- Supports the creation of dedicated timing tables with timestamps as the primary key and arbitrary dimension/indicator columns as detailed in the [time window](../../Develop/read-data/window-function/time-window.md). +- Provides common time window capabilities to downsample queries at different times, as detailed in [Time Window](../../Develop/read-data/window-function/time-window.md). +- Supports interpolation capability for null values and provides interpolation methods for different policies, as detailed in the [time window](../../Develop/read-data/window-function/time-window.md). +- Supports simple and complex query capabilities in a variety of traditional databases. Details can be found in [Single Table Query](../../Develop/read-data/query-data-single-table.md), [Multi Table Query](../../Develop/read-data/multitable-join-query.md), [Sub Query](../../Develop/read-data/subquery.md), [View](../../Develop/read-data/subquery.md), [CTE](../../Develop/read-data/cte.md). +- Supports high speed [offline import](../../Develop/import-data/bulk-load/bulk-load-overview.md), [streaming data write](../../Develop/import-data/stream-load.md), [Insert into write](../../Develop/import-data/insert-data.md) and other ways. +- Various types of [aggregate functions](../../Reference/Functions-and-Operators/Aggregate-Functions/count.md) are supported to satisfy the computation of time series data types. diff --git a/docs/MatrixOne/Overview/feature/udf.md b/docs/MatrixOne/Overview/feature/udf.md new file mode 100644 index 000000000..9e3a69c7e --- /dev/null +++ b/docs/MatrixOne/Overview/feature/udf.md @@ -0,0 +1,28 @@ +# User Defined Function UDF + +You can write user-defined functions (UDFs) to extend the system to do things that the built-in system-defined functions provided by MatrixOne cannot, and you can reuse the UDF several times after you create it. + +## What is UDF? + +In database management systems, user-defined functions (UDFs) are powerful features that allow users to create custom functions based on specific needs. These functions can be used to perform complex calculations, data conversions, and other functions that may be outside the scope of standard SQL functions. + +## Core competencies of UDF + +- Enhanced data processing capabilities: Complex mathematical operations on data, such as advanced statistical analysis or financial model calculations, often exceed the capabilities of standard SQL functions. By creating a UDF, you can perform these complex operations inside the database without exporting the data to an external program for processing. +- Simplify complex queries: A complex query operation that requires frequent execution can be encapsulated in UDF, simplifying SQL queries and making them clearer and easier to manage. +- Improve code reuse and maintenance: The same data processing logic may need to be performed in different query and database applications. By creating a UDF, you can reuse the same function wherever that logic is needed, which helps maintain consistency and reduces duplicate code. +- Optimize performance: Certain types of operations, such as string processing or complex conditional judgments, may be more efficient if implemented at the database level through UDF than at the application layer, as this reduces the burden of data transmission across the network and processing at the application layer. +- Customization and flexibility: Specific business needs, such as currency conversion, tax calculation, or special date-time processing, may not have a direct corresponding function in standard SQL. With UDF, you can customize these features to your business needs. +- Cross-platform compatibility: Many database systems support similar UDF creation and execution logic. This means that UDFs developed in one database system, with minor modifications, may be available in another system, increasing the portability of the code. + +## MatrixOne support for UDF + +In the current release, MatrixOne supports UDF using the Python language. + +For the base usage of UDF-python in MatrixOne, see [UDF-python base usage](../../Develop/udf/udf-python.md). + +For advanced usage of UDF-python in MatrixOne, see [UDF-python advanced usage](../../Develop/udf/udf-python-advanced.md). + +For specific parameters that MatrixOne creates for UDFs, see [Creating UDFs](../../Reference/SQL-Reference/Data-Definition-Language/create-function-python.md). + +For specific parameters for MatrixOne deletion of UDFs, see [Removing UDFs](../../Reference/SQL-Reference/Data-Definition-Language/drop-function.md). diff --git a/docs/MatrixOne/Overview/matrixone-feature-list.md b/docs/MatrixOne/Overview/matrixone-feature-list.md index 3e014ad7a..5213aacc8 100644 --- a/docs/MatrixOne/Overview/matrixone-feature-list.md +++ b/docs/MatrixOne/Overview/matrixone-feature-list.md @@ -61,9 +61,29 @@ This document lists the features supported by the latest version of MatrixOne an | STORED PROCEDURE | N | | TRIGGER | N | | EVENT SCHEDULER | N | -| UDF | N | +| UDF | Y | | Materialized VIEW | N | +## Stream Calculation + +| Stream Computing Capabilities | Supported (Y) / Not Supported (N) / Experimental Features (E) | +|-------------------------------| --------------------------------------------------------------| +| Dynamic Tables | E | +| Kafka Connectors | E | +| Materialized View | N | +| (incremental) Materialized View | N | + +## Timing + +| Timing | Supported (Y) / Not Supported (N) / Experimental Features (E) | +|-------------------------------| ----------------------------------------| +| Timing Table | Y | +| Sliding window | Y | +| Downsampling| Y | +| Interpolation | Y | +| TTL(Time To Live)| N | +| ROLLUP | Y | + ## Data types | Data type categories | Data types | Supported(Y)/Not supported (N) /Experimental (E) | diff --git a/docs/MatrixOne/Overview/matrixone-vs-other_databases/matrixone-positioning .md b/docs/MatrixOne/Overview/matrixone-vs-other_databases/matrixone-positioning .md new file mode 100644 index 000000000..3ff51ffa7 --- /dev/null +++ b/docs/MatrixOne/Overview/matrixone-vs-other_databases/matrixone-positioning .md @@ -0,0 +1,12 @@ +# Positioning of MatrixOne + +Among the large and complex data technology stack and various database products, MatrixOne is positioned as a SQL relational database focused on one-stop convergence and flexible scalability. MatrixOne was designed to provide a database product with a user experience as simple as MySQL that can handle a variety of business loads and data types, including OLTP and OLAP, while being aware of user load and data volume changesAutomating rapid elastic scaling to simplify the user's current complex multi-database product+ETL legacy data architecture. + +The database product closest to MatrixOne among the industry's existing database offerings is SingleStore, both of which use a unified storage model that supports the convergence of OLTP, OLAP, and a host of other data loads and data types, while also both having cloud native and flexible scalability as their core architectural capabilities. + +![mo\_vs\_singlestore](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/overview/mo-other-database/mo_vs_singlestore.png) + +- Architecturally, MatrixOne is a fully cloud-native and containerized database. MatrixOne draws on Snowflake's computational separation design for [cloud-native data warehouses](https://event.cwi.nl/lsde/papers/p215-dageville-snowflake.pdf), completely handing over storage to shared storage on the cloud, while fully building the compute layer into a stateless container. At the same time, to accommodate the processing of fast write requests by OLTP-type loads, MatrixOne adds the concepts of TN and LogService to support high-frequency writes with block storage, ensures high availability of write log WALs with Raft triple copy consistency guarantee, and asynchronously drops WALs into shared storage. Unlike SingleStore, which extends from the Share-nothing architecture to cloud-native memory separation, it only puts cold data in shared storage (see [SingleStore architecture paper](https://dl.acm.org/doi/pdf/10.1145/3514221.3526055)) and still requires data fragmentation and rebalancing. MatrixOne, on the other hand, is consistent with Snowflake and is entirely based on shared storage without any data fragmentation. + +- In terms of load types, MatrixOne uses HTAP as its basic core and gradually expands into a variety of load types such as stream computing, timing processing, machine learning, and search. On HTAP's technical route, MatrixOne differs from the TiDB-based dual storage and compute engine route in that MatrixOne implements HTAP directly on a single storage and compute engine, and the storage engine is also a disk drop-row-row-row-row-row-row-row-only architecture, which is also more consistent with SingleStore. Technical routes such as TiDB require internal dual-engine data synchronization, while TP and AP data need to be stored separately, while MatrixOne does not need to do this synchronization and multiple storage. Compared to SingleStore, which already has good support for a variety of business types other than HTAP for streaming computing, search, GIS, machine learning, etc., MatrixOne itself evolved late and is currently less well supported for other load types other than HTAP. +- Experience-wise, MatrixOne has MySQL as its core compatibility goal, including full MySQL compatibility from transport protocols, SQL syntax, and basic functionality. MatrixOne only has its own unique syntax on capabilities that MySQL does not support, such as streaming writes, vector types, etc. Meanwhile MySQL has some advanced capabilities such as triggers, stored procedures, etc. MatrixOne is not yet supported due to low user usage. In general, MySQL-based applications can be migrated to MatrixOne very easily, while MySQL eco-related tools such as Navicat, DBeaver and other visual management and modeling tools, DataX, Kettle and other ETL tools, as well as Spark, Flink and other computing engines, can be used directly with MySQL-related connectors to interface with MatrixOne. diff --git a/docs/MatrixOne/Overview/matrixone-vs-other_databases/matrixone-vs-oltp.md b/docs/MatrixOne/Overview/matrixone-vs-other_databases/matrixone-vs-oltp.md new file mode 100644 index 000000000..11ede5e66 --- /dev/null +++ b/docs/MatrixOne/Overview/matrixone-vs-other_databases/matrixone-vs-oltp.md @@ -0,0 +1,82 @@ +# MatrixOne versus common OLTP databases + +## General OLTP Database Features + +OLTP refers to a business transaction-oriented database management system. The OLTP database is used to process a large number of short-term transactions, which are typically routine business operations such as order processing, inventory management, banking transactions, etc. It provides high concurrency performance and real-time data processing to meet the needs of enterprises for instant data access. + +The main features of an OLTP database are as follows: + +- ACID: The OLTP system must ensure that the entire transaction is properly logged. Transactions typically involve the execution of programs that perform multiple steps or operations. It may be done when all interested parties confirm a transaction, deliver a product/service, or make a certain number of updates to a particular table in the database. Transactions are only properly documented when all steps involved are performed and documented. If there are any errors in any one step, the entire transaction must be aborted and all steps removed from the system. Therefore, OLTP systems must comply with Atomicity, Consistency, Isolation, and Persistence (ACID) to ensure the accuracy of data in the system. +- High Concurrency: The user base of OLTP systems can be very large, with many users attempting to access the same data simultaneously. The system must ensure that all users attempting to read or write to the system can do so simultaneously. Concurrency control ensures that two users accessing the same data in a database system at the same time will not be able to change that data, or that one user will have to wait for another user to complete processing before changing the data. +- High availability: OLTP systems must always be available and ready to accept transactions. Failure to process a transaction may result in loss of revenue or legal implications. Transactions can be executed anywhere in the world at any time, so the system must be available 24/7. +- Fine-grained data access: OLTP databases, which typically provide data access in units of records, support efficient add, delete, and change operations and provide fast transaction commit and rollback capabilities. + +- High reliability: OLTP systems must be resilient in the event of any hardware or software failure. + +## Classification of OLTP systems in the current industry + +OLTP databases can also be divided into centralized databases, distributed databases, and cloud-native databases depending on the architecture and technical route. + +* Most well-known OLTP databases are traditional centralized databases such as Oracle, Microsoft SQL Server, MySQL, PostgreSQL, DB2, etc. Most were born between 1980 and 2000. +* Typical of Google's 2012 Spanner, the distributed OLTP database uses Share-nothing as the architecture core, scaling through multi-machine data slicing and computing, and distributed consistency through consistency protocols. This architecture is also referred to by many in the industry as NewSQL architecture, representing products such as CockroachDB, SAP HANA, TiDB, Oceanbase, etc. +* There is also a technical route known as cloud-native OLTP databases such as Aurora, PolarDB, NeonDB, etc. Significantly different from the Share-nothing architecture is the adoption of a shared storage architecture with a more thorough separation of memory and scalability through storage systems with their own scalability in cloud computing systems. MatrixOne is also a cloud-native technology route. + +It is worth noting that there are no strict dividing criteria for these three classifications, and each database has gradually begun to integrate the capabilities of other route products as it has evolved in practice. Oracle's RAC architecture, for example, is a typical shared storage architecture with some scalability. Products like CockroachDB and TiDB are also evolving toward cloud-native and shared storage. In practice, OLTP is the most widely needed database scenario, and products along all three technical routes are also used by a large number of users. + +
+ +
+ +## OLTP Features of MatrixOne + +The basic capabilities of MatrixOne meet the characteristics of a typical OLTP database. + +* Data manipulation and ACID features: MatrixOne supports row-level addition, deletion, and lookup operations, and has transaction capabilities with ACID features. For a detailed description of the capabilities, refer to [MatrixOne's transaction description](../../Develop/Transactions/matrixone-transaction-overview/overview.md). +* High Concurrency: MatrixOne can support highly concurrent business requests, reaching a concurrency level of tens of thousands of tpmC in industry-wide TPC-C testing for OLTP, while also increasing based on node expansion. +* High Availability: MatrixOne itself is based on Kubernetes and shared storage, and has proven scenarios in cloud environments to ensure high availability of both of these underlying components. The design of MatrixOne itself also takes into account the availability and failure recovery mechanisms of each of its components. Details can be found in [the highly available introduction](../../Overview/feature/high-availability.md) to MatrixOne. + +As shown in the figure above, MatrixOne belongs to the cloud-native technology route in terms of architectural and technical route classification and is closer to Aurora. The biggest advantage over the Share-nothing architecture is that both storage and compute can be used on demand once storage computing is separated. + +There are two differences from Aurora: + +* Aurora exposes the write node to the user layer, where users can only write from a single node. MatrixOne, on the other hand, hides write processing from the internal TN and LogService, allowing all CN nodes to read and write for users. +* Aurora's shared storage still heavily employs block storage as primary storage and object storage only as backup data storage. MatrixOne, on the other hand, stores objects directly as primary storage for a full amount of data. + +Of course, MatrixOne isn't limited to OLTP capabilities, and MatrixOne's ability to accommodate other loads is significantly different from Aurora's positioning. + +![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/overview/mo-other-database/mo_vs_aurora.png) + +## MatrixOne versus MySQL + +Since MatrixOne's primary goal is to be compatible with MySQL, MySQL itself is the world['s most popular open source database](https://db-engines.com/en/ranking). A large portion of MatrixOne's users are migrated from open source MySQL to MatrixOne, so here we compare MatrixOne to MySQL in detail. + +| | | MySQL | MatrixOne | | | -------------------------- | -------------------------------------- | ---------------------------------- | -------------------------------------------------- | | Version | 8.0.37 | Latest Version | ---------------------------------------------------------------------- | | | License | GPL License | Apache License 2.0 | | | | Schema | Centralized Database | Distributed Cloud Native Database | | | | Load Type | OLTP, Analytical Load Depends on Enterprise Heatwave | HTAP, Timing | | Storage Format | RBAC-Based Functions Base Window | Row Storage Engine | InnoDB/MyIsam | TAE | | Interactions | SQL | SQL | | | Deployment Method | Standalone Deployment/Master/Slave Deployment | Standalone Deployment/Master/Slave Deployment/Distributed Deployment/K8s Deployment | | Scale Out Capabilities | Reliance on Split Table Middleware Implementation | Natural Support | | Transaction Capabilities | Pessimistic Transactions/Optimistic Transactions + ANSI 4 Isolation Levels (InnoDB Engine) | Pessimistic Transactions/Optimistic Transactions + RC/SI | | | | Data Types | Base Numerical Values, Time Date, Characters, JSON, Spatial | Base Numerical Values, Time, Dates, Characters, JSON, Vector | Indexes and Constraints | Primary Keys, Unique Foreign Keys, Unique Foreign Keys, Unique Foreign Keys | Foreign Keys + +Additional details can be found in [MatrixOne's MySQL compatibility details](../../Overview/feature/mysql-compatibility.md). + +Overall, MatrixOne is a highly MySQL-compatible cloud-native HTAP database that works seamlessly with most MySQL-based applications. At the same time, MatrixOne naturally has great scalability and the ability to support other types of business loads. In addition, based on MatrixOne's memory separation and multi-tenancy features, users have the flexibility to design their application architecture with MatrixOne as a one-stop shop for load isolation issues previously addressed by applications, middleware, or other databases. + +
+ +
+ +For MySQL users, MatrixOne is a more appropriate option if they experience bottlenecks with: + +* Single-table data reaches more than 10 million levels, and query performance slows down, requiring table-splitting operations. +* The overall amount of data exceeds the terabyte level and MySQL needs to configure very expensive physical machines. +* Need to do multi-table association classes, or aggregate analysis queries for larger single tables. +* Requires large-scale real-time data writes, such as millions of pieces of data per second. +* Need to do multi-tenant design at the application level, such as SaaS scenarios. +* need to scale vertically on a regular basis as business application load changes. +* Requires constant data transfer and collaboration. +* It needs to be integrated into the K8s environment with the application framework to reduce operational complexity. +* Need to do streaming data processing such as real-time data writing and processing. +* Vector data needs to be stored and searched. + +In MatrixOne's technical blog, we also have more articles for reference on MySQL vs. MatrixOne and migration. + +[Comprehensive Comparison of MatrixOne and MySQL--Deployment Article](https://mp.weixin.qq.com/s?__biz=Mzg2NjU2ODUwMA==&mid=2247491148&idx=2&sn=a83e592da9504d6b4ab356abd6cc2369&chksm=cf9274a6b133599752c811ea241d1c0b25fc44dcc255bf907de131b9a9bb6972d5ebd076d1b6&scene=0&xtrack=1#rd) + +[Comprehensive Comparison of MatrixOne and MySQL--Multitenant Articles](https://mp.weixin.qq.com/s?__biz=Mzg2NjU2ODUwMA==&mid=2247491293&idx=1&sn=e1967b12371a7f8b57b336d1f8ada986&chksm=cf974c93821360fb559c865b5eba71adb155c410a99e3bc4d0f7aac675a80eab6d95a24853f6&scene=0&xtrack=1#rd) + +[Comprehensive Comparison of MatrixOne and MySQL--Migration Article](https://mp.weixin.qq.com/s?__biz=Mzg2NjU2ODUwMA==&mid=2247491369&idx=2&sn=a0bab26c2709edd7bc278a1bcbb07d64&chksm=cf3ea15bec8aef761e476a5281b9723638c90f059af813b0c0cc799a3256a92fc96d483e0670&scene=0&xtrack=1#rd) diff --git a/docs/MatrixOne/Reference/1.1-System-tables.md b/docs/MatrixOne/Reference/1.1-System-tables.md deleted file mode 100644 index 9dc52ee2c..000000000 --- a/docs/MatrixOne/Reference/1.1-System-tables.md +++ /dev/null @@ -1,644 +0,0 @@ -# MatrixOne System Database and Tables - -MatrixOne system database and tables are where MatrixOne stores system information. We can access the system information through them. MatrixOne creates 6 system databases at initialization: `mo_catalog`, `information_schema`, `system_metrcis`, `system`, `mysql`, and `mo_task`. `mo_task` is under development and have no direct impact on users. -The other system databases and table functions are described in this document. - -The system can only modify system databases and tables, and users can only read from them. - -## `mo_catalog` database - -`mo_catalog` stores the metadata of MatrixOne objects: databases, tables, columns, system variables, accounts, users, and roles. - -Start with MatrixOne 0.6 has introduced the concept of multi-account, the default `sys` account and other accounts have slightly different behaviors. The system table `mo_account`, which serves the multi-tenancy management, is only visible for the `sys` account; the other accounts don't see this table. - -### mo_indexes table - -| Column | Type | comments | -| -----------------| --------------- | ----------------- | -| id | BIGINT UNSIGNED(64) | index ID | -| table_id | BIGINT UNSIGNED(64) | ID of the table where the index resides | -| database_id | BIGINT UNSIGNED(64) | ID of the database where the index resides | -| name | VARCHAR(64) | name of the index | -| type | VARCHAR(11) | The type of index, including primary key index (PRIMARY), unique index (UNIQUE), secondary index (MULTIPLE) | -| is_visible | TINYINT(8) | Whether the index is visible, 1 means visible, 0 means invisible (currently all MatrixOne indexes are visible indexes) | -| hidden | TINYINT(8) | Whether the index is hidden, 1 is a hidden index, 0 is a non-hidden index| -| comment | VARCHAR(2048) | Comment information for the index | -| column_name | VARCHAR(256) | The column name of the constituent columns of the index | -| ordinal_position | INT UNSIGNED(32) | Column ordinal in index, starting from 1 | -| options | TEXT(0) | options option information for index | -| index_table_name | VARCHAR(5000) | The table name of the index table corresponding to the index, currently only the unique index contains the index table | - -### mo_table_partitions table - -| Column | Type | comments | -| ------------ | ------------ | ------------ | -| table_id | BIGINT UNSIGNED(64) | The ID of the current partitioned table. | -| database_id | BIGINT UNSIGNED(64) | The ID of the database to which the current partitioned table belongs. | -| number | SMALLINT UNSIGNED(16) | The current partition number. All partitions are indexed in the order they are defined, with 1 assigned to the first partition. | -| name | VARCHAR(64) | The name of the partition. | -| partition_type | VARCHAR(50) | Stores the partition type information for the table. For partitioned tables, the values can be "KEY", "LINEAR_KEY", "HASH", "LINEAR_KEY_51", "RANGE", "RANGE_COLUMNS", "LIST", "LIST_COLUMNS". For non-partitioned tables, the value is an empty string. Note: MatrixOne does not currently support RANGE and LIST partitioning. | -| partition_expression | VARCHAR(2048) | The expression for the partitioning function used in the CREATE TABLE or ALTER TABLE statement that created the partitioned table's partitioning scheme. | -| description_utf8 | TEXT(0) | This column is used for RANGE and LIST partitions. For a RANGE partition, it contains the value set in the partition's VALUES LESS THAN clause, which can be an integer or MAXVALUE. For a LIST partition, this column contains the values defined in the partition's VALUES IN clause, which is a comma-separated list of integer values. For partitions with partition_type other than RANGE or LIST, this column is always NULL. Note: MatrixOne does not currently support RANGE and LIST partitioning, so this column is NULL. | -| comment | VARCHAR(2048) | The text of the comment, if the partition has one. Otherwise, this value is empty. | -| options | TEXT(0) | Partition options information, currently set to NULL. | -| partition_table_name | VARCHAR(1024) | The name of the subtable corresponding to the current partition. | - -### mo_user table - -| Column | Type | comments | -| --------------------- | ------------ | ------------------- | -| user_id | int | user id, primary key | -| user_host | varchar(100) | user host address | -| user_name | varchar(100) | user name | -| authentication_string | varchar(100) | authentication string encrypted with password | -| status | varchar(8) | open,locked,expired | -| created_time | timestamp | user created time | -| expired_time | timestamp | user expired time | -| login_type | varchar(16) | ssl/password/other | -| creator | int | the creator id who created this user | -| owner | int | the admin id for this user | -| default_role | int | the default role id for this user | - -### mo_account table (Only visible for `sys` account) - -| Column | Type | comments | -| ------------ | ------------ | ------------ | -| account_id | int unsigned | account id, primary key | -| account_name | varchar(100) | account name | -| status | varchar(100) | open/suspend | -| created_time | timestamp | create time | -| comments | varchar(256) | comment | -| suspended_time | TIMESTAMP | Time of the account's status is changed | -| version | bigint unsigned | the version status of the current account| - -### mo_database table - -| Column | Type | comments | -| ---------------- | --------------- | --------------------------------------- | -| dat_id | bigint unsigned | Primary key ID | -| datname | varchar(100) | Database name | -| dat_catalog_name | varchar(100) | Database catalog name, default as `def` | -| dat_createsql | varchar(100) | Database creation SQL statement | -| owner | int unsigned | Role id | -| creator | int unsigned | User id | -| created_time | timestamp | Create time | -| account_id | int unsigned | Account id | -| dat_type | varchar(23) | Database type, common library or subscription library | - -### mo_role table - -| Column | Type | comments | -| ------------ | ------------ | ----------------------------- | -| role_id | int unsigned | role id, primary key | -| role_name | varchar(100) | role name | -| creator | int unsigned | user_id | -| owner | int unsigned | MOADMIN/ACCOUNTADMIN ownerid | -| created_time | timestamp | create time | -| comments | text | comment | - -### mo_user_grant table - -| Column | Type | comments | -| ----------------- | ------------ | ----------------------------------- | -| role_id | int unsigned | ID of the authorized role, associated primary key | -| user_id | int unsigned | Obtain the user ID of the authorized role and associate the primary key | -| granted_time | timestamp | granted time | -| with_grant_option | bool | Whether to allow an authorized user to license to another user or role | - -### mo_role_grant table - -| Column | Type | comments | -| ----------------- | ------------ | ----------------------------------- | -| granted_id | int | the role id being granted, associated primary key | -| grantee_id | int | the role id to grant others, associated primary key | -| operation_role_id | int | operation role id | -| operation_user_id | int | operation user id | -| granted_time | timestamp | granted time | -| with_grant_option | bool | Whether to allow an authorized role to be authorized to another user or role| - -### mo_role_privs table - -| Column | Type | comments | -| ----------------- | --------------- | ----------------------------------- | -| role_id | int | role id, associated primary key | -| role_name | varchar(100) | role name: accountadmin/public | -| obj_type | varchar(16) | object type: account/database/table, associated primary key | -| obj_id | bigint unsigned | object id, associated primary key | -| privilege_id | int | privilege id, associated primary key | -| privilege_name | varchar(100) | privilege name: the list of privileges | -| privilege_level | varchar(100) | level of privileges, associated primary key | -| operation_user_id | int unsigned | operation user id | -| granted_time | timestamp | granted time | -| with_grant_option | bool | If permission granting is permitted | - -### mo_user_defined_function table - -| Column | Type | comments | -| -----------------| --------------- | ----------------- | -| function_id | INT(32) | ID of the function, primary key | -| name | VARCHAR(100) | the name of the function | -| owner | INT UNSIGNED(32) | ID of the role who created the function | -| args | TEXT(0) | Argument list for the function | -| rettype | VARCHAR(20) | return type of the function | -| body | TEXT(0) | function body | -| language | VARCHAR(20) | language used by the function | -| db | VARCHAR(100) | database where the function is located | -| definer | VARCHAR(50) | name of the user who defined the function | -| modified_time | TIMESTAMP(0) | time when the function was last modified | -| created_time | TIMESTAMP(0) | creation time of the function | -| type | VARCHAR(10) | type of function, default FUNCTION | -| security_type | VARCHAR(10) | security processing method, uniform value DEFINER | -| comment | VARCHAR(5000) | Create a comment for the function | -| character_set_client | VARCHAR(64) | Client character set: utf8mb4 | -| collation_connection | VARCHAR(64) | Connection sort: utf8mb4_0900_ai_ci | -| database_collation | VARCHAR(64) | Database connection collation: utf8mb4_0900_ai_ci | - -### mo_mysql_compatbility_mode table - -| Column | Type | comments | -| -----------------| --------------- | ----------------- | -| configuration_id | INT(32) | Configuration item id, an auto-increment column, used as a primary key to distinguish different configurations | -| account_name | VARCHAR(300) | The name of the tenant where the configuration is located | -| dat_name | VARCHAR(5000) | The name of the database where the configuration is located | -| configuration | JSON(0) | Configuration content, saved in JSON format | - -### mo_pubs table - -| Column | Type | comments | -| -----------------| --------------- | ----------------- | -| pub_name | VARCHAR(64) | publication name| -| database_name | VARCHAR(5000) | The name of the published data | -| database_id | BIGINT UNSIGNED(64) | ID of the publishing database, corresponding to dat_id in the mo_database table | -| all_table | BOOL(0) | Whether the publishing library contains all tables in the database corresponding to database_id | -| all_account | BOOL(0) | Whether all accounts can subscribe to the library | -| table_list | TEXT(0) | When it is not all table, publish the list of tables contained in the library, and the table name corresponds to the table under the database corresponding to database_id| -| account_list | TEXT(0) |Account list that is allowed to subscribe to the publishing library when it is not all accounts| -| created_time | TIMESTAMP(0) | Time when the release repository was created | -| owner | INT UNSIGNED(32) | The role ID corresponding to the creation of the release library | -| creator | INT UNSIGNED(32) | The ID of the user who created the release library | -| comment | TEXT(0) | Remarks for creating a release library | - -### mo_stages table - -| Column | Type | comments | -| -----------------| ---------------- | ----------------- | -| stage_id | INT UNSIGNED(32) | data stage ID | -| stage_name | VARCHAR(64) | data stage name | -| url | TEXT(0) | Path to object storage (without authentication), path to file system | -| stage_credentials| TEXT(0) | Authentication information, encrypted and saved | -| stage_status | VARCHAR(64) | ENABLED/DISABLED Default: DISABLED | -| created_time | TIMESTAMP(0) | creation time | -| comment | TEXT(0) | comment | - -### mo_sessions view - -| Column | Type | comments | -| --------------- | ----------------- | ------------------------------------------------------------ | -| node_id | VARCHAR(65535) | Unique identifier for MatrixOne nodes. Once started, it cannot be changed. | -| conn_id | INT UNSIGNED | Unique identifier associated with client's TCP connections in MatrixOne, generated by Hakeeper. | -| session_id | VARCHAR(65535) | Unique UUID used to identify sessions. A new UUID is generated for each new session. | -| account | VARCHAR(65535) | Name of the tenant. | -| user | VARCHAR(65535) | Name of the user. | -| host | VARCHAR(65535) | IP address and port where the CN node receives client requests. | -| db | VARCHAR(65535) | Name of the database used when executing SQL. | -| session_start | VARCHAR(65535) | Timestamp when the session was created. | -| command | VARCHAR(65535) | Type of MySQL command, such as COM_QUERY, COM_STMT_PREPARE, COM_STMT_EXECUTE, etc. | -| info | VARCHAR(65535) | The executed SQL statement. Multiple statements may be present within a single SQL statement. | -| txn_id | VARCHAR(65535) | Unique identifier for the related transaction. | -| statement_id | VARCHAR(65535) | Unique identifier (UUID) for a statement within the SQL. | -| statement_type | VARCHAR(65535) | Type of a statement within the SQL, such as SELECT, INSERT, UPDATE, etc. | -| query_type | VARCHAR(65535) | Category of a statement within the SQL, such as DQL (Data Query Language), TCL (Transaction Control Language), etc. | -| sql_source_type | VARCHAR(65535) | Source of a statement within the SQL, such as external or internal. | -| query_start | VARCHAR(65535) | Timestamp when a statement within the SQL started execution. | -| client_host | VARCHAR(65535) | IP address and port of the client. | -| role | VARCHAR(65535) | Role name of the user. | - -### `mo_configurations` table - -| Column Name | Data Type | Description | -| ------------- | --------------- | --------------------------------------- | -| node_type | VARCHAR(65535) | Type of the node: cn, tn, log, proxy. | -| node_id | VARCHAR(65535) | Unique identifier for the node. | -| name | VARCHAR(65535) | Name of the configuration setting, prefixed with nested structure if applicable. | -| current_value | VARCHAR(65535) | Current value of the configuration setting. | -| default_value | VARCHAR(65535) | Default value of the configuration setting. | -| internal | VARCHAR(65535) | Indicates whether the configuration parameter is internal. | - -### mo_locks view - -| Column | Type | comments | -| ------------- | --------------- | ------------------------------------------------ | -| txn_id | VARCHAR(65535) | Transaction holding the lock. | -| table_id | VARCHAR(65535) | The table on which the lock is placed. | -| lock_type | VARCHAR(65535) | Type of lock, which can be `point` or `range`. | -| lock_content | VARCHAR(65535) | Locked content, represented in hexadecimal. For `range` locks, it represents a range; for `point` locks, it represents a single value. | -| lock_mode | VARCHAR(65535) | Lock mode, which can be `shared` or `exclusive`.| -| lock_status | VARCHAR(65535) | Lock status can be `wait`, `acquired`, or `none`.
`wait`: No transaction holds the lock, but transactions are waiting for it.
`acquired`: A transaction holds the lock.
`none`: No transaction holds the lock, and no transactions await it. | -| waiting_txns | VARCHAR(65535) | Transactions waiting on this lock. | - -### `mo_variables` view - -| Column | Type | comments | -| ------------- | --------------- | ------------------------------------------------ | -| configuration_id | INT(32) | Auto-incremented identifier for each configuration. | -| account_id | INT(32) | Unique identifier for the account (tenant). | -| account_name | VARCHAR(300) | Name of the account (tenant). | -| dat_name | VARCHAR(5000)| Name of the database. | -| variable_name | VARCHAR(300) | Name of the configuration variable. | -| variable_value | VARCHAR(5000)| Value of the configuration variable. | -| system_variables | BOOL(0) | Indicates whether the configuration variable is a system-level variable. | - -### `mo_transactions` view - -| Column | Type | comments | -| ------------- | --------------- | ------------------------------------------------ | -| cn_id | VARCHAR(65535) | Unique identifier for the CN (Compute Node). | -| txn_id | VARCHAR(65535) | Unique identifier for the transaction. | -| create_ts | VARCHAR(65535) | Records the transaction creation timestamp, following the RFC3339Nano format ("2006-01-02T15:04:05.999999999Z07:00"). | -| snapshot_ts | VARCHAR(65535) | Represents the snapshot timestamp of the transaction, in a format that combines physical and logical time. | -| prepared_ts | VARCHAR(65535) | Represents the prepared timestamp of the transaction, in a format that combines physical and logical time. | -| commit_ts | VARCHAR(65535) | Represents the commit timestamp of the transaction, in a format that combines physical and logical time. | -| txn_mode | VARCHAR(65535) | Indicates the transaction mode, which can be either pessimistic or optimistic. | -| isolation | VARCHAR(65535) | Represents the isolation level of the transaction, which can be SI (Snapshot Isolation) or RC (Read Committed). | -| user_txn | VARCHAR(65535) | Indicates a user transaction, i.e., a transaction created by SQL operations executed by a user connecting to MatrixOne via a client. | -| txn_status | VARCHAR(65535) | Represents the current status of the transaction, with possible values including active, committed, aborting, aborted. In distributed transaction 2PC mode, it may also include prepared and committing. | -| table_id | VARCHAR(65535) | Represents the ID of the table involved in the transaction. | -| lock_key | VARCHAR(65535) | Represents the type of lock, which can be either range (range lock) or point (point lock). | -| lock_content | VARCHAR(65535) | Represents the value for a point lock or a range for a range lock, usually in the form "low - high." Please note that a transaction may involve multiple locks, but only the first lock is displayed here. | -| lock_mode | VARCHAR(65535) | Represents the lock mode, which can be either exclusive or shared. | - -### mo_columns table - -| Column | type | comments | -| --------------------- | --------------- | ------------------------------------------------------------ | -| att_uniq_name | varchar(256) | Primary Key. Hidden, composite primary key, format is like "${att_relname_id}-${attname}" | -| account_id | int unsigned | accountID | -| att_database_id | bigint unsigned | databaseID | -| att_database | varchar(256) | database Name | -| att_relname_id | bigint unsigned | table id | -| att_relname | varchar(256) | The table this column belongs to.(references mo_tables.relname) | -| attname | varchar(256) | The column name | -| atttyp | varchar(256) | The data type of this column (zero for a dropped column). | -| attnum | int | The number of the column. Ordinary columns are numbered from 1 up. | -| att_length | int | bytes count for the type. | -| attnotnull | tinyint(1) | This represents a not-null constraint. | -| atthasdef | tinyint(1) | This column has a default expression or generation expression. | -| att_default | varchar(1024) | default expression | -| attisdropped | tinyint(1) | This column has been dropped and is no longer valid. A dropped column is still physically present in the table, but is ignored by the parser and so cannot be accessed via SQL. | -| att_constraint_type | char(1) | p = primary key constraint, n=no constraint | -| att_is_unsigned | tinyint(1) | unsigned or not | -| att_is_auto_increment | tinyint(1) | auto increment or not | -| att_comment | varchar(1024) | comment | -| att_is_hidden | tinyint(1) | hidden or not | -| attr_has_update | tinyint(1) | This columns has update expression | -| attr_update | varchar(1024) | update expression | -| attr_is_clusterby | tinyint(1) | Whether this column is used as the cluster by keyword to create the table | - -### mo_tables table - -| Column | Type | comments | -| -------------- | --------------- | ------------------------------------------------------------ | -| rel_id | bigint unsigned | Primary key, table ID | -| relname | varchar(100) | Name of the table, index, view, and so on. | -| reldatabase | varchar(100) | The database that contains this relation. reference mo_database.datname | -| reldatabase_id | bigint unsigned | The database id that contains this relation. reference mo_database.datid | -| relpersistence | varchar(100) | p = permanent table, t = temporary table | -| relkind | varchar(100) | r = ordinary table, e = external table, i = index, S = sequence, v = view, m = materialized view | -| rel_comment | varchar(100) | | -| rel_createsql | varchar(100) | Table creation SQL statement | -| created_time | timestamp | Create time | -| creator | int unsigned | Creator ID | -| owner | int unsigned | Creator's default role id | -| account_id | int unsigned | Account id | -| partitioned | blob | Partition by statement | -| partition_info | blob | the information of partition | -| viewdef | blob | View definition statement | -| constraint | varchar(5000) | Table related constraints | -| rel_version | INT UNSIGNED(0) | Version number of Primary key or table | -| catalog_version | INT UNSIGNED(0) | Version number of the system table | - -## `system_metrics` database - -`system_metrics` collect the status and statistics of SQL statements, CPU & memory resource usage. - -`system_metrics` tables have more or less same column types, fields in these tables are described as follows: - -* collecttime:Collection time -* value: the value of the collecting metric - -- node: the MatrixOne node uuid -- role: the MatrixOne node role, can be CN, TN or LOG. -- account: default as "sys", the account who fires the SQL request. -- type:SQL type, can be `select`, `insert`, `update`, `delete`, `other` types. - -### `metric` table - -| Column | Type | Comment | -| ----------- | ------------ | ------------------------------------------------------------ | -| metric_name | VARCHAR(128) | metric name, like: sql_statement_total, server_connections, process_cpu_percent, sys_memory_used, .. | -| collecttime | DATETIME | metric data collect time | -| value | DOUBLE | metric value | -| node | VARCHAR(36) | MatrixOne node uuid | -| role | VARCHAR(32) | MatrixOne node role | -| account | VARCHAR(128) | account name, default "sys" | -| Type | VARCHAR(32) | SQL type: like insert, select, update ... | - -The other tables are all views of the `metric` table: - -* `process_cpu_percent` table: Process CPU busy percentage. -* `process_open_fs` table: Number of open file descriptors. -* `process_resident_memory_bytes` table: Resident memory size in bytes. -* `server_connection` table: Server connection numbers. -* `sql_statement_errors` table: Counter of sql statements executed with errors. -* `sql_statement_total` table: Counter of executed sql statement. -* `sql_transaction_errors` table: Counter of transactional statements executed with errors. -* `sql_statement_hotspot` table: records the most extended SQL query executed by each tenant within each minute. Only those SQL queries whose execution time does not exceed a certain aggregation threshold will be included in the statistics. -* `sql_transaction_total` table: Counter of transactional sql statement. -* `sys_cpu_combined_percent` table: System CPU busy percentage, average among all logical cores. -* `sys_cpu_seconds_total` table: System CPU time spent in seconds, normalized by number of cores -* `sys_disk_read_bytes` table: System disk read in bytes. -* `sys_disk_write_bytes` table: System disk write in bytes. -* `sys_memory_available` table: System memory available in bytes. -* `sys_memory_used` table: System memory used in bytes. -* `sys_net_recv_bytes` table: System net received in bytes. -* `sys_net_sent_bytes` table: System net sent in bytes. - -## `system` database - -`System` database stores MatrixOne historical SQL statements, system logs, error information. - -### `statement_info` table - -It records user and system SQL statement with detailed information. - -| Column | Type | Comments | -| --------------------- | ------------- | ------------------------------------------------------------ | -| statement_id | VARCHAR(36) | statement unique id | -| transaction_id | VARCHAR(36) | Transaction unique id | -| session_id | VARCHAR(36) | session unique id | -| account | VARCHAR(1024) | account name | -| user | VARCHAR(1024) | user name | -| host | VARCHAR(1024) | user client ip | -| database | VARCHAR(1024) | what database current session stay in | -| statement | TEXT | sql statement | -| statement_tag | TEXT | note tag in statement(Reserved) | -| statement_fingerprint | TEXT | note tag in statement(Reserved) | -| node_uuid | VARCHAR(36) | node uuid, which node gen this data | -| node_type | VARCHAR(64) | node type in MO, val in [TN, CN, LOG] | -| request_at | DATETIME | request accept datetime | -| response_at | DATETIME | response send datetime | -| duration | BIGINT | exec time, unit: ns | -| status | VARCHAR(32) | sql statement running status, enum: Running, Success, Failed | -| err_code | VARCHAR(1024) | error code | -| error | TEXT | error message | -| exec_plan | JSON | statement execution plan | -| rows_read | BIGINT | rows read total | -| bytes_scan | BIGINT | bytes scan total | -| stats | JSON | global stats info in exec_plan | -| statement_type | VARCHAR(1024) | statement type, val in [Insert, Delete, Update, Drop Table, Drop User, ...] | -| query_type | VARCHAR(1024) | query type, val in [DQL, DDL, DML, DCL, TCL] | -| role_id | BIGINT | role id | -| sql_source_type | TEXT | Type of SQL source internally generated by MatrixOne | -| aggr_count | BIGINT(64) | the number of statements aggregated | -| result_count | BIGINT(64) | the number of rows of sql execution results | - -### `rawlog` table - -It records very detailed system logs. - -| Column | Type | Comments | -| -------------- | ------------- | ------------------------------------------------------------ | -| raw_item | VARCHAR(1024) | raw log item | -| node_uuid | VARCHAR(36) | node uuid, which node gen this data. | -| node_type | VARCHAR(64) | node type in MO, val in [TN, CN, LOG] | -| span_id | VARCHAR(16) | span unique id | -| statement_id | VARCHAR(36) | statement unique id | -| logger_name | VARCHAR(1024) | logger name | -| timestamp | DATETIME | timestamp of action | -| level | VARCHAR(1024) | log level, enum: debug, info, warn, error, panic, fatal | -| caller | VARCHAR(1024) | where it log, like: package/file.go:123 | -| message | TEXT | log message | -| extra | JSON | log dynamic fields | -| err_code | VARCHAR(1024) | error log | -| error | TEXT | error message | -| stack | VARCHAR(4096) | | -| span_name | VARCHAR(1024) | span name, for example: step name of execution plan, function name in code, ... | -| parent_span_id | VARCHAR(16) | parent span unique id | -| start_time | DATETIME | | -| end_time | DATETIME | | -| duration | BIGINT | exec time, unit: ns | -| resource | JSON | static resource information | - -The other 3 tables(`log_info`, `span_info` and `error_info`) are views of `statement_info` and `rawlog` table. - -## `information_schema` database - -Information Schema provides an ANSI-standard way of viewing system metadata. MatrixOne also provides a number of custom `information_schema` tables, in addition to the tables included for MySQL compatibility. - -Many `INFORMATION_SCHEMA` tables have a corresponding `SHOW` command. The benefit of querying `INFORMATION_SCHEMA` is that it is possible to join between tables. - -### Tables for MySQL compatibility - -| Table Name | Description | -| :--------------- | :----------------------------------------------------------- | -| KEY_COLUMN_USAGE | Describes the key constraints of the columns, such as the primary key constraint. | -| COLUMNS | Provides a list of columns for all tables. | -| PROFILING | Provides some profiling information during SQL statement execution. | -| PROCESSLIST | Provides similar information to the command `SHOW PROCESSLIST`. | -| USER_PRIVILEGES | Summarizes the privileges associated with the current user. | -| SCHEMATA | Provides similar information to `SHOW DATABASES`. | -| CHARACTER_SETS | Provides a list of character sets the server supports. | -| TRIGGERS | Provides similar information to `SHOW TRIGGERS`. | -| TABLES | Provides a list of tables that the current user has visibility of. Similar to `SHOW TABLES`. | -| PARTITIONS | Provides information about table partitions. | -| VIEWS | Provides information about views in the database. | -| ENGINES | Provides a list of supported storage engines. | -| ROUTINES | Provides some information about stored procedures. | -| PARAMETERS| Provides information about stored procedures' parameters and return values ​​. | -| KEYWORDS | Provide information about keywords in the database; see [Keywords](Language-Structure/keywords.md) for details. | - -### `CHARACTER_SETS` table - -The description of columns in the `CHARACTER_SETS` table is as follows: - -- `CHARACTER_SET_NAME`: The name of the character set. -- `DEFAULT_COLLATE_NAME` The default collation name of the character set. -- `DESCRIPTION` The description of the character set. -- `MAXLEN` The maximum length required to store a character in this character set. - -### `COLUMNS` table - -The description of columns in the `COLUMNS` table is as follows: - -- `TABLE_CATALOG`: The name of the catalog to which the table with the column belongs. The value is always `def`. -- `TABLE_SCHEMA`: The name of the schema in which the table with the column is located. -- `TABLE_NAME`: The name of the table with the column. -- `COLUMN_NAME`: The name of the column. -- `ORDINAL_POSITION`: The position of the column in the table. -- `COLUMN_DEFAULT`: The default value of the column. If the explicit default value is `NULL`, or if the column definition does not include the `default` clause, this value is `NULL`. -- `IS_NULLABLE`: Whether the column is nullable. If the column can store null values, this value is `YES`; otherwise, it is `NO`. -- `DATA_TYPE`: The type of data in the column. -- `CHARACTER_MAXIMUM_LENGTH`: For string columns, the maximum length in characters. -- `CHARACTER_OCTET_LENGTH`: For string columns, the maximum length in bytes. -- `NUMERIC_PRECISION`: The numeric precision of a number-type column. -- `NUMERIC_SCALE`: The numeric scale of a number-type column. -- `DATETIME_PRECISION`: For time-type columns, the fractional seconds precision. -- `CHARACTER_SET_NAME`: The name of the character set of a string column. -- `COLLATION_NAME`: The name of the collation of a string column. -- `COLUMN_TYPE`: The column type. -- `COLUMN_KEY`: Whether this column is indexed. This field might have the following values: - - `Empty`: This column is not indexed, or this column is indexed and is the second column in a multi-column non-unique index. - - `PRI`: This column is the primary key or one of multiple primary keys. - - `UNI`: This column is the first column of the unique index. - - `MUL`: The column is the first column of a non-unique index, in which a given value is allowed to occur for multiple times. -- `EXTRA`: Any additional information of the given column. -- `PRIVILEGES`: The privilege that the current user has on this column. -- `COLUMN_COMMENT`: Comments contained in the column definition. -- `GENERATION_EXPRESSION`: For generated columns, this value displays the expression used to calculate the column value. For non-generated columns, the value is empty. -- `SRS_ID`: This value applies to spatial columns. It contains the column `SRID` value that indicates the spatial reference system for values stored in the column. - -### `ENGINES` table - -The description of columns in the `ENGINES` table is as follows: - -- `ENGINES`: The name of the storage engine. -- `SUPPORT`: The level of support that the server has on the storage engine. -- `COMMENT`: The brief comment on the storage engine. -- `TRANSACTIONS`: Whether the storage engine supports transactions. -- `XA`: Whether the storage engine supports XA transactions. -- `SAVEPOINTS`: Whether the storage engine supports `savepoints`. - -### `PARTITIONS` table - -The description of columns in the `PARTITIONS` table is as follows: - -- `TABLE_CATALOG`: The name of the catalog to which the table belongs. This value is always def. -- `TABLE_SCHEMA`: The name of the schema (database) to which the table belongs. -- `TABLE_NAME`: The name of the table containing the partition. -- `PARTITION_NAME`: The name of the partition. -- `SUBPARTITION_NAME`: If the `PARTITIONS` table row represents a subpartition, the name of subpartition; otherwise NULL. -- `PARTITION_ORDINAL_POSITION`: All partitions are indexed in the same order as they are defined, with 1 being the number assigned to the first partition. The indexing can change as partitions are added, dropped, and reorganized; the number shown is this column reflects the current order, taking into account any indexing changes. -- `SUBPARTITION_ORDINAL_POSITION`: Subpartitions within a given partition are also indexed and reindexed in the same manner as partitions are indexed within a table. -- `PARTITION_METHOD`: One of the values `RANGE`, `LIST`, `HASH`, `LINEAR HASH`, `KEY`, or `LINEAR KEY`. __Note:__ MatrixOne does not currently support RANGE and LIST partitioning. -- `SUBPARTITION_METHOD`: One of the values `HASH`, `LINEAR HASH`, `KEY`, or `LINEAR KEY`. -- `PARTITION_EXPRESSION`: The expression for the partitioning function used in the `CREATE TABLE` or `ALTER TABLE` statement that created the table's current partitioning scheme. -- `SUBPARTITION_EXPRESSION`: This works in the same fashion for the subpartitioning expression that defines the subpartitioning for a table as `PARTITION_EXPRESSION` does for the partitioning expression used to define a table's partitioning. If the table has no subpartitions, this column is `NULL`. -- `PARTITION_DESCRIPTION`: This column is used for `RANGE` and `LIST` partitions. For a `RANGE` partition, it contains the value set in the partition's `VALUES LESS THAN` clause, which can be either an integer or `MAXVALUE`. For a `LIST` partition, this column contains the values defined in the partition's `VALUES IN` clause, which is a list of comma-separated integer values. For partitions whose `PARTITION_METHOD` is other than `RANGE` or `LIST`, this column is always `NULL`. __Note:__ MatrixOne does not currently support RANGE and LIST partitioning. -- `TABLE_ROWS`: The number of table rows in the partition. -- `AVG_ROW_LENGTH`: The average length of the rows stored in this partition or subpartition, in bytes. This is the same as `DATA_LENGTH` divided by `TABLE_ROWS`. -- `DATA_LENGTH`: The total length of all rows stored in this partition or subpartition, in bytes; that is, the total number of bytes stored in the partition or subpartition. -- `INDEX_LENGTH`: The length of the index file for this partition or subpartition, in bytes. -- `DATA_FREE`: The number of bytes allocated to the partition or subpartition but not used. -- `CREATE_TIME`: The time that the partition or subpartition was created. -- `UPDATE_TIME`: The time that the partition or subpartition was last modified. -- `CHECK_TIME`: The last time that the table to which this partition or subpartition belongs was checked. -- `CHECKSUM`: The checksum value, if any; otherwise `NULL`. -- `PARTITION_COMMENT`: The text of the comment, if the partition has one. If not, this value is empty. The maximum length for a partition comment is defined as 1024 characters, and the display width of the `PARTITION_COMMENT` column is also 1024, characters to match this limit. -- `NODEGROUP`: This is the nodegroup to which the partition belongs. -- `TABLESPACE_NAME`: The name of the tablespace to which the partition belongs. The value is always `DEFAULT`. - -### `PROCESSLIST` table - -Fields in the `PROCESSLIST` table are described as follows: - -- ID: The ID of the user connection. -- USER: The name of the user who is executing `PROCESS`. -- HOST: The address that the user is connecting to. -- DB: The name of the currently connected default database. -- COMMAND: The command type that `PROCESS` is executing. -- TIME: The current execution duration of `PROCESS`, in seconds. -- STATE: The current connection state. -- INFO: The requested statement that is being processed. - -### `SCHEMATA` table - -The `SCHEMATA` table provides information about databases. The table data is equivalent to the result of the `SHOW DATABASES` statement. Fields in the `SCHEMATA` table are described as follows: - -- `CATALOG_NAME`: The catalog to which the database belongs. -- `SCHEMA_NAME`: The database name. -- `DEFAULT_CHARACTER_SET_NAME`: The default character set of the database. -- `DEFAULT_COLLATION_NAME`: The default collation of the database. -- `SQL_PATH`: The value of this item is always `NULL`. -- `DEFAULT_TABLE_ENCRYPTION`: defines the *default encryption* setting for databases and general tablespaces. - -### `TABLES` table - -The description of columns in the `TABLES` table is as follows: - -- `TABLE_CATALOG`: The name of the catalog which the table belongs to. The value is always `def`. -- `TABLE_SCHEMA`: The name of the schema which the table belongs to. -- `TABLE_NAME`: The name of the table. -- `TABLE_TYPE`: The type of the table. The base table type is `BASE TABLE`, the view table type is `VIEW`, and the `INFORMATION_SCHEMA` table type is `SYSTEM VIEW`. -- `ENGINE`: The type of the storage engine. -- `VERSION`: Version. The value is `10` by default. -- `ROW_FORMAT`: The row format. The value is `Compact`, `Fixed`, `Dynamic`, `Compressed`, `Redundant`. -- `TABLE_ROWS`: The number of rows in the table in statistics. For `INFORMATION_SCHEMA` tables, `TABLE_ROWS` is `NULL`. -- `AVG_ROW_LENGTH`: The average row length of the table. `AVG_ROW_LENGTH` = `DATA_LENGTH` / `TABLE_ROWS`. -- `DATA_LENGTH`: Data length. `DATA_LENGTH` = `TABLE_ROWS` * the sum of storage lengths of the columns in the tuple. -- `MAX_DATA_LENGTH`: The maximum data length. The value is currently `0`, which means the data length has no upper limit. -- `INDEX_LENGTH`: The index length. `INDEX_LENGTH` = `TABLE_ROWS` * the sum of lengths of the columns in the index tuple. -- `DATA_FREE`: Data fragment. The value is currently `0`. -- `AUTO_INCREMENT`: The current step of the auto- increment primary key. -- `CREATE_TIME`: The time at which the table is created. -- `UPDATE_TIME`: The time at which the table is updated. -- `CHECK_TIME`: The time at which the table is checked. -- `TABLE_COLLATION`: The collation of strings in the table. -- `CHECKSUM`: Checksum. -- `CREATE_OPTIONS`: Creates options. -- `TABLE_COMMENT`: The comments and notes of the table. - -### `USER_PRIVILEGES` table - -The `USER_PRIVILEGES` table provides information about global privileges. - -Fields in the `USER_PRIVILEGES` table are described as follows: - -- `GRANTEE`: The name of the granted user, which is in the format of `'user_name'@'host_name'`. -- `TABLE_CATALOG`: The name of the catalog to which the table belongs. This value is always `def`. -- `PRIVILEGE_TYPE`: The privilege type to be granted. Only one privilege type is shown in each row. -- `IS_GRANTABLE`: If you have the `GRANT OPTION` privilege, the value is `YES`; otherwise, the value is `NO`. - -### `VIEW` table - -- `TABLE_CATALOG`: The name of the catalog the view belongs to. The value is `def`. -- `TABLE_SCHEMA`: The name of the database to which the view belongs. -- `TABLE_NAME`: The name of the view. -- `VIEW_DEFINITION`: The `SELECT` statement that provides the view definition. It contains most of what you see in the "Create Table" column generated by `SHOW Create VIEW`. -- `CHECK_OPTION`: The value of the `CHECK_OPTION` property. Values are `NONE`, `CASCADE`, or `LOCAL`. -- `IS_UPDATABLE`: Set a flag called the view updatable flag when `CREATE VIEW`; if UPDATE and DELETE (and similar operations) are legal for the view, the flag is set to `YES(true)`. Otherwise, the flag is set to `NO(false)`. -- `DEFINER`: The account of the user who created the view, in the format `username@hostname`. -- `SECURITY_TYPE`: View the `SQL SECURITY` attribute. Values ​​are `DEFINER` or `INVOKER`. -- `CHARACTER_SET_CLIENT`: The session value of the `character_set_client` system variable when the view was created. -- `COLLATION_CONNECTION`: The session value of the `collation_connection` system variable when the view was created. - -### `STATISTICS` Table - -Obtain detailed information about database table indexes and statistics. For example, you can check whether an index is unique, understand the order of columns within an index, and estimate the number of unique values in an index. - -- `TABLE_CATALOG`: The catalog name of the table (always 'def'). -- `TABLE_SCHEMA`: The name of the database to which the table belongs. -- `TABLE_NAME`: The name of the table. -- `NON_UNIQUE`: Indicates whether the index allows duplicate values. If 0, the index is unique. -- `INDEX_SCHEMA`: The database name to which the index belongs. -- `INDEX_NAME`: The name of the index. -- `SEQ_IN_INDEX`: The position of the column within the index. -- `COLUMN_NAME`: The name of the column. -- `COLLATION`: The collation of the column. -- `CARDINALITY`: An estimated count of unique values in the index. -- `SUB_PART`: The length of the index part. For the entire column, this value is NULL. -- `PACKED`: Indicates whether compressed storage is used. -- `NULLABLE`: Indicates whether the column allows NULL values. -- `INDEX_TYPE`: The index type (e.g., BTREE, HASH, etc.). -- `COMMENT`: Comment information about the index. - -## `mysql` database - -### Grant system tables - -These system tables contain grant information about user accounts and their privileges: - -- `user`: user accounts, global privileges, and other non-privilege columns. -- `db`: database-level privileges. -- `tables_priv`: table-level privileges. -- `columns_priv`: column-level privileges. -- `procs_priv`: stored procedure and stored function privileges. diff --git a/docs/MatrixOne/Reference/Data-Types/data-types.md b/docs/MatrixOne/Reference/Data-Types/data-types.md index 5a6e63d9a..ec0cfd88e 100644 --- a/docs/MatrixOne/Reference/Data-Types/data-types.md +++ b/docs/MatrixOne/Reference/Data-Types/data-types.md @@ -145,6 +145,95 @@ mysql> select min(big),max(big),max(big)-1 from floattable; 1 row in set (0.05 sec) ``` +## **Binary type** + +| data type | storage space | minimum value | maximum values | grammatical representation | descriptive | +| --------| --------| ----- | -------------------- | -------- | ---- | +| BIT | 1bytes | 0 | 18446744073709551615 | BIT(M) | Data type for storing bit data, M supports the range from 1 to 64, M is 1 by default, if the stored data is less than M bits, then the length will be left zero padded. | + +### **Examples** + +```sql +create table t1 (a bit); +mysql> desc t1;--bit(M) M DEFAULT 1 ++-------+--------+------+------+---------+-------+---------+ +| Field | Type | Null | Key | Default | Extra | Comment | ++-------+--------+------+------+---------+-------+---------+ +| a | BIT(1) | YES | | NULL | | | ++-------+--------+------+------+---------+-------+---------+ +1 row in set (0.01 sec) + +create table t2 (a bit(8)); + +-- Assigning values with bit-value literal syntax +insert into t2 values (0b1); +insert into t2 values (b'1'); +mysql> select * from t2; ++------------+ +| a | ++------------+ +| 0x01 | +| 0x01 | ++------------+ +2 rows in set (0.00 sec) + +truncate table t2; + +--Assigning values with hex-value literal syntax +insert into t2 values (0x10); +insert into t2 values (x'10'); +mysql> select * from t2; ++------------+ +| a | ++------------+ +| 0x10 | +| 0x10 | ++------------+ +2 rows in set (0.00 sec) + +truncate table t2; + +--Supports assignment by int type, but the length of the binary representation of int cannot exceed the length of bit type. +insert into t2 values (255);--a = b'11111111' +mysql> insert into t2 values (256);--The length of the binary representation of 256 exceeds 8. +ERROR 20301 (HY000): invalid input: data too long, type width = 8, val = 100000000 + +mysql> select * from t2; ++------------+ +| a | ++------------+ +| 0xFF | ++------------+ +1 row in set (0.00 sec) + +truncate table t2; + +--Floating-point data will first be rounded to int type and then assigned according to the int type. +insert into t2 values (2.1);--a = b'00000010' +mysql> select * from t2; ++------------+ +| a | ++------------+ +| 0x02 | ++------------+ +1 row in set (0.00 sec) + +truncate table t2; + +--Character data is stored as its encoded value, and the total length of the encoding into which the entire string is converted must not exceed the bit type. +insert into t2 values ('a');--a = b'01100001' +mysql> insert into t2 values ('啊');--utf8('啊') = 0xe5958a; +ERROR 20301 (HY000): invalid input: data too long, type width = 8, val = 111001011001010110001010 + +mysql> select * from t2; ++------------+ +| a | ++------------+ +| 0x61 | ++------------+ +1 row in set (0.00 sec) +``` + ## **String Types** | Data Type | Size |Length | Syntax | Description| @@ -295,6 +384,14 @@ mysql> select * from jsontest; | DateTime | 8 bytes | microsecond | 0001-01-01 00:00:00.000000 | 9999-12-31 23:59:59.999999 | YYYY-MM-DD hh:mi:ssssss | | TIMESTAMP|8 bytes|microsecond|0001-01-01 00:00:00.000000|9999-12-31 23:59:59.999999|YYYYMMDD hh:mi:ss.ssssss| +The Time and Date section type supports the following hint values when inserting data: + +- `Time`:{t 'xx'},{time 'xx'} + +- `Date`:{d 'xx'},{date 'xx'} + +- `TIMESTAMP`:{ts 'xx'},{timestamp 'xx'} + ### **Examples** - TIME @@ -302,17 +399,18 @@ mysql> select * from jsontest; ```sql -- Create a table named "timetest" with 1 attributes of a "time" create table time_02(t1 time); -insert into time_02 values(200); -insert into time_02 values(""); +insert into time_02 values(200),(time'23:29:30'),({t'12:11:12'}),(''); mysql> select * from time_02; +----------+ | t1 | +----------+ | 00:02:00 | +| 23:29:30 | +| 12:11:12 | | NULL | +----------+ -2 rows in set (0.00 sec) +4 rows in set (0.01 sec) ``` - DATE @@ -320,17 +418,17 @@ mysql> select * from time_02; ```sql -- Create a table named "datetest" with 1 attributes of a "date" create table datetest (a date not null, primary key(a)); -insert into datetest values ('2022-01-01'), ('20220102'),('2022-01-03'),('20220104'); - -mysql> select * from datetest order by a asc; +insert into datetest values ({d'2022-01-01'}), ('20220102'),(date'2022-01-03'),({d now()}); +mysql> select * from datetest; +------------+ | a | +------------+ | 2022-01-01 | | 2022-01-02 | | 2022-01-03 | -| 2022-01-04 | +| 2024-03-19 | +------------+ +4 rows in set (0.00 sec) ``` - DATETIME @@ -357,17 +455,18 @@ mysql> select * from datetimetest order by a asc; ```sql -- Create a table named "timestamptest" with 1 attribute of a "timestamp" create table timestamptest (a timestamp(0) not null, primary key(a)); -insert into timestamptest values ('20200101000000'), ('2022-01-02'), ('2022-01-02 00:00:01'), ('2022-01-02 00:00:01.512345'); +insert into timestamptest values ('20200101000000'), (timestamp'2022-01-02 11:30:40'), ({ts'2022-01-02 00:00:01'}), ({ts current_timestamp}); mysql> select * from timestamptest; +---------------------+ | a | +---------------------+ | 2020-01-01 00:00:00 | -| 2022-01-02 00:00:00 | +| 2022-01-02 11:30:40 | | 2022-01-02 00:00:01 | -| 2022-01-02 00:00:02 | +| 2024-03-19 17:22:08 | +---------------------+ +4 rows in set (0.00 sec) ``` ## **Bool** @@ -453,3 +552,25 @@ mysql> select * from t1; +----------------------------------------+ 1 row in set (0.00 sec) ``` + +## **vector data type** + +|type | descriptive | +|------------|--------------------- | +|vecf32 | Vector column type is float32 | +|vecf64 | Vector column type is float64 | + +### **Example** + +```sql +create table t1(n1 vecf32(3), n2 vecf64(2)); +insert into t1 values("[1,2,3]",'[4,5]'); + +mysql> select * from t1; ++-----------+--------+ +| n1 | n2 | ++-----------+--------+ +| [1, 2, 3] | [4, 5] | ++-----------+--------+ +1 row in set (0.00 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Data-Types/vector-type.md b/docs/MatrixOne/Reference/Data-Types/vector-type.md new file mode 100644 index 000000000..13979773b --- /dev/null +++ b/docs/MatrixOne/Reference/Data-Types/vector-type.md @@ -0,0 +1,72 @@ +# Vector Type + +In a database, vectors are usually a set of numbers that are arranged in a particular way to represent some data or feature. These vectors can be one-dimensional arrays, multi-dimensional arrays, or data structures with higher dimensions. MatrixOne support vector data type. + +In MatrixOne, vectors are designed as a data type similar to Array arrays in programming languages (MatrixOne does not currently support array types), but is a more specific array type. First, it is a one-dimensional array type, meaning it cannot be used to build a Matrix matrix. Also currently only vectors of type `float32` and `float64` are supported, called `vecf32` and `vecf64` respectively, not numbers of type string and integer. + +When creating a vector column, we can specify the dimension size of the vector column, such as vecf32(3), which is the length size of the array of vectors and can support up to 65,535 dimensions. + +## How to use vector types in SQL + +The syntax for using vectors is the same as for regular table building, inserting data, querying data. + +### Create a vector column + +You can create two vector columns, one of type Float32 and the other of type Float64, as per the following SQL statement, and you can set the dimension of both vector columns to 3. + +Currently vector types cannot be used as primary or unique keys. + +``` +create table t1(a int, b vecf32(3), c vecf64(3)); +``` + +### Insert Vector + +MatrixOne supports inserting vectors in two formats. + +**Text Format** + +``` +insert into t1 values(1, "[1,2,3]", "[4,5,6]"); +``` + +**binary format** + +If you want to use a Python NumPy array, you can insert that NumPy array directly into MatrixOne by encoding the array in hexadecimal instead of converting it to comma-separated text format. This is faster when inserting vectors with higher dimensions. + +```sql +insert into t1 (a, b) values (2, cast(unhex("7e98b23e9e10383b2f41133f") as blob)); + -- "7e98b23e9e10383b2f41133f" for small-endian hexadecimal encoding of []float32{0.34881967, 0.0028086076, 0.5752134} + ``` + +### query vector + +Vector columns can also be read in two formats. + +**Text Format** + +```sql +mysql> select a, b from t1; ++------+---------------------------------------+ +| a | b | ++------+---------------------------------------+ +| 1 | [1, 2, 3] | +| 2 | [0.34881967, 0.0028086076, 0.5752134] | ++------+---------------------------------------+ +2 rows in set (0.00 sec) +``` + +**binary format** + +Binary format is useful if you need to read the vector result set directly into a NumPy array with minimal conversion costs. + +```sql +mysql> select hex(b) from t1; ++--------------------------+ +| hex(b) | ++--------------------------+ +| 0000803f0000004000004040 | +| 7e98b23e9e10383b2f41133f | ++--------------------------+ +2 rows in set (0.00 sec) +``` diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/bitmap.md b/docs/MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/bitmap.md new file mode 100644 index 000000000..b2562ad6e --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/bitmap.md @@ -0,0 +1,380 @@ +# BITMAP function + +## Function Description + +A `BITMAP` function is a set of built-in functions for processing bitmaps, which are contiguous pieces of memory stored as binary data types. These functions are particularly useful for counting distinct values when dealing with hierarchical aggregations, such as multiple grouped sets, and return results consistent with [`count (distinct`]( count.md)), but more efficiently. + +We can use only one bit to identify the presence or absence of an element, 1 for presence or 0 for absence, and the nth bit of bitmap to record whether the element exists. + +We specify that the maximum width of bitmap is 32768 (2^15 = 4K), and for non-negative integer n, take its lower 15 bits (binary) as the position in bitmap and the other high bits as the number of the bitmap bucket. The following diagram shows the logic of bitmap: + +![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/reference/bitmap.png) + +Each bucket is a bitmap, and since the buckets are orthogonal, each bucket doing the operation (or,bit_count) can be done only in the current bucket, regardless of the other buckets. + +Here are some common `BITMAP` functions and their usage: + +### BITMAP_BUCKET_NUMBER + +The purpose of the `BITMAP_BUCKET_NUMBER()` function is to determine the number of the bucket to which the given value belongs. A bucket is a larger set of bits that can contain multiple bits, each representing a specific value in the data set. This function returns the number of the bucket. A bucket number is typically used to group bitmaps when performing aggregation operations. + +#### Grammar + +``` +> BITMAP_BUCKET_NUMBER(numeric_expr) +``` + +#### Parameter interpretation + +| Parameters | Description | +| ---- | ---- | +| numeric_expr | Required. Expressions that can be cast into non-negative integers. | + +#### Examples + +```sql +mysql> SELECT bitmap_bucket_number(0);-- Returns 0, indicating that it belongs to the first bucket, which records the position 0-32767. ++-------------------------+ +| bitmap_bucket_number(0) | ++-------------------------+ +| 0 | ++-------------------------+ +1 row in set (0.00 sec) + +mysql> SELECT bitmap_bucket_number(32767);-- Returns 0, since 32767 belongs to the end of the first bucket ++-----------------------------+ +| bitmap_bucket_number(32767) | ++-----------------------------+ +| 0 | ++-----------------------------+ +1 row in set (0.00 sec) + +mysql> SELECT bitmap_bucket_number(32768);-- Returns 1, since 32768 is the starting position of the second bucket ++-----------------------------+ +| bitmap_bucket_number(32768) | ++-----------------------------+ +| 1 | ++-----------------------------+ +1 row in set (0.00 sec) +``` + +### BITMAP_BIT_POSITION + +The `BITMAP_BIT_POSITION()` function returns the relative bit position of the given value in the bucket (indexed from 0 to 32767). Used with `BITMAP_BUCKET_NUMBER()` to uniquely identify any number in the bitmap. Because the actual `BITMAP_BIT_POSITION()` marks the lower 15 bits of the parameter (in binary), `BITMAP_BUCKET_NUMBER()` marks the higher bits of the parameter. + +#### Grammar + +``` +BITMAP_BIT_POSITION(numeric_expr) +``` + +#### Parameter interpretation + +| Parameters | Description | +| ---- | ---- | +| numeric_expr | Required. Expressions that can be cast into non-negative integers. | + +#### Examples + +```sql +mysql> SELECT bitmap_bit_position(0);-- Returns 0, since 0 is in the first position of the first bucket ++------------------------+ +| bitmap_bit_position(0) | ++------------------------+ +| 0 | ++------------------------+ +1 row in set (0.00 sec) + +mysql> SELECT bitmap_bit_position(32767);-- Returns 32767, since 32767 is the last position in the first bucket ++----------------------------+ +| bitmap_bit_position(32767) | ++----------------------------+ +| 32767 | ++----------------------------+ +1 row in set (0.00 sec) + +mysql> SELECT bitmap_bit_position(32768);-- returns 0 because 32768 is in the first position of the second bucket ++----------------------------+ +| bitmap_bit_position(32768) | ++----------------------------+ +| 0 | ++----------------------------+ +1 row in set (0.00 sec) + +--The binary of 40000 is: 1001110001000000, bitmap_bit_position records the lower 15 bits: 001110001000000, and bitmap_bucket_number records the higher 1 bit. +mysql> select bin(bitmap_bucket_number(40000)), bin(bitmap_bit_position(40000)),bin(40000); ++----------------------------------+---------------------------------+------------------+ +| bin(bitmap_bucket_number(40000)) | bin(bitmap_bit_position(40000)) | bin(40000) | ++----------------------------------+---------------------------------+------------------+ +| 1 | 1110001000000 | 1001110001000000 | ++----------------------------------+---------------------------------+------------------+ +1 row in set (0.01 sec) +``` + +### BITMAP_COUNT + +The `BITMAP_COUNT()` function is used to calculate the number of bits set to 1 in bitmap to get the total number of different values. This is equivalent to a `COUNT (DISTINCT)` operation on a bitmap, but is usually faster than a traditional `COUNT (DISTINCT`) query. + +The `BITMAP_COUNT()` function is generally used in conjunction with the `BITMAP_CONSTRUCT_AGG()`, `BITMAP_OR_AGG()` functions described below. + +### BITMAP_CONSTRUCT_AGG + +`BITMAP_CONSTRUCT_AGG()` is an aggregate function that is used in the database to build bitmaps. + +The `BITMAP_CONSTRUCT_AGG()` function is useful when you need to count a dense set of non-repeating integer values because it efficiently converts those values into bitmap form. + +#### Grammar + +``` +BITMAP_CONSTRUCT_AGG( ) +``` + +#### Parameter interpretation + +| Parameters | Description | +| ---- | ---- | +| bit_position | Required. Location in bitmap (returned by BITMAP_BIT_POSITION function) | + +#### Examples + +```sql +CREATE TABLE t1 ( n1 int); +INSERT INTO t1 VALUES(0),(1),(1),(32767);--Inserted data in [0,32767]. + +mysql> select * from t1; ++-------+ +| n1 | ++-------+ +| 0 | +| 1 | +| 1 | +| 32767 | ++-------+ +4 rows in set (0.01 sec) + +mysql> SELECT BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(n1)) AS bitmap FROM t1; ++------------------------+ +| bitmap | ++------------------------+ +| :0 ? | ++------------------------+ +1 row in set (0.00 sec) +``` + +!!! note + bitmap column contains the physical representation of bitmap and is not readable. To determine which bits are set, we should use a combination of `BITMAP` functions (rather than checking binary values ourselves). + +```sql +mysql> SELECT bitmap_count(BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(n1))) AS n1_discnt FROM t1;The number set to 1 in the --bitmap. ++-----------+ +| n1_discnt | ++-----------+ +| 3 | ++-----------+ +1 row in set (0.00 sec) + +mysql> SELECT count(DISTINCT n1) AS n1_discnt FROM t1;--Return Consistency ++-----------+ +| n1_discnt | ++-----------+ +| 3 | ++-----------+ +1 row in set (0.01 sec) + +INSERT INTO t1 VALUES(32768),(32769),(65535);--Insert data greater than 32767 + +mysql> select * from t1; ++-------+ +| n1 | ++-------+ +| 0 | +| 1 | +| 1 | +| 32767 | +| 32768 | +| 32769 | +| 65535 | ++-------+ +7 rows in set (0.01 sec) + +--The result is the same as the first insertion because bucket_bit_position = n1 % 32768 and the data inserted the second time is in the same position in a different bucket than the first insertion, so it is de-emphasized. +mysql> SELECT bitmap_count(BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(n1))) AS n1_discnt FROM t1; ++-----------+ +| t1_bitmap | ++-----------+ +| 3 | ++-----------+ +1 row in set (0.00 sec) + +mysql> SELECT bitmap_bit_position(0),bitmap_bit_position(1),bitmap_bit_position(32767),bitmap_bit_position(32768),bitmap_bit_position(65535); ++------------------------+------------------------+----------------------------+----------------------------+----------------------------+ +| bitmap_bit_position(0) | bitmap_bit_position(1) | bitmap_bit_position(32767) | bitmap_bit_position(32768) | bitmap_bit_position(65535) | ++------------------------+------------------------+----------------------------+----------------------------+----------------------------+ +| 0 | 1 | 32767 | 0 | 32767 | ++------------------------+------------------------+----------------------------+----------------------------+----------------------------+ +1 row in set (0.00 sec) +``` + +So you need to combine the `BITMAP_BUCKET_NUMBER()` function if you want to dedupe data larger than 32767. + +```sql +--Grouped in buckets, the first bucket contains three non-repeating numbers (0,1,32767) and the second bucket contains three non-repeating numbers (32768,32769,65535). +mysql> SELECT bitmap_count(BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(n1))) AS t1_bitmap FROM t1 GROUP BY BITMAP_BUCKET_NUMBER(n1); ++-----------+ +| t1_bitmap | ++-----------+ +| 3 | +| 3 | ++-----------+ +2 rows in set (0.01 sec) + +--Combine this with the sum() function to calculate the non-repeating value of n1. +mysql> SELECT SUM(t1_bitmap) FROM ( + -> SELECT bitmap_count(BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(n1))) AS t1_bitmap + -> FROM t1 + -> GROUP BY BITMAP_BUCKET_NUMBER(n1) + -> ); ++----------------+ +| sum(t1_bitmap) | ++----------------+ +| 6 | ++----------------+ +1 row in set (0.01 sec) +``` + +### BITMAP_OR_AGG + +The `BITMAP_OR_AGG()` function is used to calculate the bitwise or (OR) results of multiple bitmaps. Typically used to merge multiple bitmaps to represent the combined information of all input bitmaps in one bitmap. + +`BITMAP_OR_AGG()` is useful when aggregate aggregation of data of different dimensions is required, especially in data warehouses and analytic queries. + +#### Grammar + +``` +BITMAP_OR_AGG( bitmap ) +``` + +#### Parameter interpretation + +| Parameters | Description | +| ---- | ---- | +| bitmap | Required. All bitmaps by bit or merged resulting bitmap. | + +#### Examples + +```sql +--Create a table to store information about the author's published books, including the author's name, the year of publication, and the book id. +CREATE TABLE book_table( + id int auto_increment primary key, + author varchar(100), + pub_year varchar(100), + book_id int +); +INSERT INTO book_table(author,pub_year,book_id) VALUES +('A author','2020',1),('A author','2020',1),('A author','2020',32768), +('A author','2021',32767),('A author','2021',32768),('A author','2021',65536), +('B author','2020',2),('B author','2020',10),('B author','2020',32769), +('B author','2021',5),('B author','2021',65539); + +mysql> select * from book_table; ++------+----------+----------+---------+ +| id | author | pub_year | book_id | ++------+----------+----------+---------+ +| 1 | A author | 2020 | 1 | +| 2 | A author | 2020 | 1 | +| 3 | A author | 2020 | 32768 | +| 4 | A author | 2021 | 32767 | +| 5 | A author | 2021 | 32768 | +| 6 | A author | 2021 | 65536 | +| 7 | B author | 2020 | 2 | +| 8 | B author | 2020 | 10 | +| 9 | B author | 2020 | 32769 | +| 10 | B author | 2021 | 5 | +| 11 | B author | 2021 | 65539 | ++------+----------+----------+---------+ +11 rows in set (0.00 sec) + +--Define a pre-calculated table to save the results of coarse-grained calculations in the table, followed by a variety of different dimensions of aggregation can be used to pre-calculate the results of the table, after a simple calculation of the results can be obtained to accelerate the query. +CREATE TABLE precompute AS +SELECT + author, + pub_year, + BITMAP_BUCKET_NUMBER(book_id) as bucket, + BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(book_id)) as bitmap +FROM book_table +GROUP BY author,pub_year,bucket; + +mysql> select * from precompute; ++---------+----------+--------+----------------------+ +| author | pub_year | bucket | bitmap | ++---------+----------+--------+----------------------+ +| A author| 2020 | 0 | :0 | +| A author| 2020 | 1 | :0 | +| A author| 2021 | 0 | :0 ? | +| A author| 2021 | 1 | :0 | +| A author| 2021 | 2 | :0 | +| B author| 2020 | 0 | :0 | +| B author| 2020 | 1 | :0 | +| B author| 2021 | 0 | :0 | +| B author| 2021 | 2 | :0 | ++---------+----------+--------+----------------------+ + +--Calculates the number of book_id de-duplications in the case of author and publication year aggregation, reflecting the number of book types published by the author in different years. +--The sum() function sums the number of 1's in the bitmaps of different buckets. +--For example, when author=A author, pub_year=2020, book_id=(1,1,32768), after de-emphasis is book_id=(1,32768), but 1 is located in the first bucket, 32768 is located in the second bucket, so we need to sum for accumulation. +mysql> SELECT + -> author, + -> pub_year, + -> SUM(BITMAP_COUNT(bitmap)) + -> FROM precompute + -> GROUP BY author,pub_year; ++---------+----------+---------------------------+ +| author | pub_year | sum(bitmap_count(bitmap)) | ++---------+----------+---------------------------+ +| A author| 2020 | 2 | +| A author| 2021 | 3 | +| B author| 2020 | 3 | +| B author| 2021 | 2 | ++---------+----------+---------------------------+ +4 rows in set (0.00 sec) + +mysql> SELECT author,pub_year,count( DISTINCT book_id) FROM book_table group by author,pub_year;--返回一致 ++----------+----------+-------------------------+ +| author | pub_year | count(distinct book_id) | ++----------+----------+-------------------------+ +| A author | 2020 | 2 | +| A author | 2021 | 3 | +| B author | 2020 | 3 | +| B author | 2021 | 2 | ++----------+----------+-------------------------+ +4 rows in set (0.00 sec) + +--Calculates the number of book_id de-duplications in the case of author aggregation, reflecting the number of book types published by the author in total. +--The BITMAP_OR_AGG() function merges bitmaps of different dimensions (same author different years). +--For example, when author=A author, pub_date=2020, book_id is (1,32768) after de-weighting, when pub_date=2021, book_id is (32767,32768,65536) after de-weighting, BITMAP_OR_AGG do or operation on the bitmap of the two different years to get book_id=(1,32767,32768,65536), finally sum() accumulates book_id of different bucktets. +mysql> SELECT author, SUM(cnt) FROM ( + -> SELECT + -> author, + -> BITMAP_COUNT(BITMAP_OR_AGG(bitmap)) cnt + -> FROM precompute + -> GROUP BY author,bucket + -> ) + -> GROUP BY author; ++---------+----------+ +| author | sum(cnt) | ++---------+----------+ +| A author| 4 | +| B author| 5 | ++---------+----------+ +2 rows in set (0.01 sec) + +mysql> SELECT author,count(DISTINCT book_id) FROM book_table GROUP BY author;--Return Consistency ++----------+-------------------------+ +| author | count(distinct book_id) | ++----------+-------------------------+ +| A author | 4 | +| B author | 5 | ++----------+-------------------------+ +2 rows in set (0.00 sec) + +``` diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/var_pop.md b/docs/MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/var_pop.md new file mode 100644 index 000000000..0d6f77263 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/var_pop.md @@ -0,0 +1,42 @@ +# **VAR_POP** + +## **Function Description** + +`VAR_POP()` is an aggregate function that calculates the overall variance. Synonymous with `VARIANCE()`. + +## **Function syntax** + +``` +> VAR_POP(expr) +``` + +## **Parameter interpretation** + +| Parameters | Description | +| ---- | ---- | +| expr | Column names of columns of any numeric type | + +## **Examples** + +```sql +CREATE TABLE t1(PlayerName VARCHAR(100) NOT NULL,RunScored INT NOT NULL,WicketsTaken INT NOT NULL); +INSERT INTO t1 VALUES('KL Rahul', 52, 0 ),('Hardik Pandya', 30, 1 ),('Ravindra Jadeja', 18, 2 ),('Washington Sundar', 10, 1),('D Chahar', 11, 2 ), ('Mitchell Starc', 0, 3); + +-- Return Consistency +mysql> SELECT VAR_POP(RunScored) as Pop_Standard_Variance FROM t1; ++-----------------------+ +| Pop_Standard_Variance | ++-----------------------+ +| 284.8055555555555 | ++-----------------------+ +1 row in set (0.01 sec) + +-- Calculate the variance of the WicketsTaken columns +mysql> SELECT VAR_POP(WicketsTaken) as Pop_Std_Var_Wickets FROM t1; ++---------------------+ +| Pop_Std_Var_Wickets | ++---------------------+ +| 0.9166666666666665 | ++---------------------+ +1 row in set (0.01 sec) +``` diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Datetime/convert-tz.md b/docs/MatrixOne/Reference/Functions-and-Operators/Datetime/convert-tz.md new file mode 100644 index 000000000..7ad00ebd0 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Datetime/convert-tz.md @@ -0,0 +1,55 @@ +# **CONVERT_TZ()** + +## **Function Description** + +The `CONVERT_TZ()` function is used to convert a given datetime from one time zone to another. If the argument is invalid, the function returns NULL. + +## **Function syntax** + +``` +> CONVERT_TZ(dt,from_tz,to_tz) +``` + +## **Parameter interpretation** + +| Parameters | Description | +| ---- | ---------------- | +| dt | Required parameters. The given datetime to convert. | +| from_tz | Required parameters. Identification of the current time zone | +| to_tz | Required parameters. Identification of the new time zone | + +## **Examples** + +```sql +mysql> SELECT CONVERT_TZ('2004-01-01 12:00:00','GMT','MET'); ++-------------------------------------------+ +| convert_tz(2004-01-01 12:00:00, GMT, MET) | ++-------------------------------------------+ +| 2004-01-01 13:00:00 | ++-------------------------------------------+ +1 row in set (0.00 sec) + +mysql> SELECT CONVERT_TZ('2004-01-01 12:00:00','+00:00','+10:00'); ++-------------------------------------------------+ +| convert_tz(2004-01-01 12:00:00, +00:00, +10:00) | ++-------------------------------------------------+ +| 2004-01-01 22:00:00 | ++-------------------------------------------------+ +1 row in set (0.01 sec) + +mysql> select convert_tz('2023-12-31 10:28:00','+08:00', 'America/New_York') as dtime; ++---------------------+ +| dtime | ++---------------------+ +| 2023-12-30 21:28:00 | ++---------------------+ +1 row in set (0.00 sec) + +mysql> select convert_tz(NULL,'-05:00', '+05:30') as dtime; ++-------+ +| dtime | ++-------+ +| NULL | ++-------+ +1 row in set (0.00 sec) +``` diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Datetime/now.md b/docs/MatrixOne/Reference/Functions-and-Operators/Datetime/now.md new file mode 100644 index 000000000..08050c2aa --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Datetime/now.md @@ -0,0 +1,55 @@ +# NOW() + +## Function Description + +The `NOW()` function returns a value in 'YYYY-MM-DD HH:MM:SS' format for the current date and time. + +`NOW()` Returns the time when the statement started executing. This differs from the behavior of [`SYSDATE()`](sysdate.md), which returns a dynamic real-time time during execution. + +## Function syntax + +``` +> NOW(fsp) +``` + +## Parameter interpretation + +| Parameters | Description | +| ---- | ---- | +| fsp | Non-required parameters. If the fsp parameter is given to specify a decimal precision from 0 to 6, the return value includes the decimal fraction of the number. | + +## Examples + +```sql +mysql> select now(); ++----------------------------+ +| now() | ++----------------------------+ +| 2024-04-29 08:03:50.479238 | ++----------------------------+ +1 row in set (0.03 sec) + +mysql> select now(6); ++----------------------------+ +| now(6) | ++----------------------------+ +| 2024-04-29 08:05:26.528629 | ++----------------------------+ +1 row in set (0.02 sec) + +mysql> SELECT NOW(), SLEEP(2), NOW(); ++----------------------------+----------+----------------------------+ +| now() | sleep(2) | now() | ++----------------------------+----------+----------------------------+ +| 2024-04-29 08:17:23.876546 | 0 | 2024-04-29 08:17:23.876546 | ++----------------------------+----------+----------------------------+ +1 row in set (2.06 sec) + +mysql> SELECT SYSDATE(), SLEEP(2), SYSDATE(); ++----------------------------+----------+----------------------------+ +| sysdate() | sleep(2) | sysdate() | ++----------------------------+----------+----------------------------+ +| 2024-04-29 16:19:21.439725 | 0 | 2024-04-29 16:19:23.440187 | ++----------------------------+----------+----------------------------+ +1 row in set (2.01 sec) +``` diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Datetime/str-to-date.md b/docs/MatrixOne/Reference/Functions-and-Operators/Datetime/str-to-date.md new file mode 100644 index 000000000..b3a037c29 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Datetime/str-to-date.md @@ -0,0 +1,67 @@ +# **STR\_TO\_DATE()** + +## **Function Description** + +The `STR_TO_DATE()` function converts a string to a date or datetime type in the specified date or time display format, synonymous with [`TO_DATE()`](to-date.md). + +Format strings can contain text characters and format specifiers beginning with %. Literal characters and format specifiers in format must match str and expressions are supported. The `STR_TO_DATE` function returns NULL if str cannot be parsed in format or if either argument is NULL. + +See the [`DATE_FORMAT()`](date-format.md) function description for format specifiers that can be used. + +## **Function syntax** + +``` +> STR_TO_DATE(str,format) +``` + +## **Parameter interpretation** + +| Parameters | Description | +| ---- | ---- | +| str | String to format as date (input string) | +| format | Format string to use | + +## **Examples** + +```sql +mysql> SELECT STR_TO_DATE('2022-01-06 10:20:30','%Y-%m-%d %H:%i:%s') as result; ++---------------------+ +| result | ++---------------------+ +| 2022-01-06 10:20:30 | ++---------------------+ +1 row in set (0.00 sec) + +mysql> SELECT STR_TO_DATE('09:30:17','%h:%i:%s'); ++---------------------------------+ +| str_to_date(09:30:17, %h:%i:%s) | ++---------------------------------+ +| 09:30:17 | ++---------------------------------+ +1 row in set (0.00 sec) + +-- Calculate the variance of the WicketsTaken columns +mysql> SELECT str_to_date('2008-01-01',replace('yyyy-MM-dd','yyyy-MM-dd','%Y-%m-%d')) as result; ++------------+ +| result | ++------------+ +| 2008-01-01 | ++------------+ +1 row in set (0.00 sec) + +--The STR_TO_DATE function ignores the extra characters at the end of the input string str when parsing it according to the format string format +mysql> SELECT STR_TO_DATE('25,5,2022 extra characters','%d,%m,%Y'); ++---------------------------------------------------+ +| str_to_date(25,5,2022 extra characters, %d,%m,%Y) | ++---------------------------------------------------+ +| 2022-05-25 | ++---------------------------------------------------+ +1 row in set (0.00 sec) + +mysql> SELECT STR_TO_DATE('2022','%Y'); ++-----------------------+ +| str_to_date(2022, %Y) | ++-----------------------+ +| NULL | ++----------------------- +``` diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Datetime/sysdate.md b/docs/MatrixOne/Reference/Functions-and-Operators/Datetime/sysdate.md new file mode 100644 index 000000000..513dd3247 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Datetime/sysdate.md @@ -0,0 +1,55 @@ +# SYSDATE() + +## Function Description + +The `SYSDATE()` function returns a value in 'YYYY-MM-DD HH:MM:SS' format for the current date and time. + +`SYSDATE()` Returns the dynamic real-time time during execution. This is different from the behavior of [`NOW()`](now.md), which returns when the statement starts executing. + +## Function syntax + +``` +> SYSDATE(fsp) +``` + +## Parameter interpretation + +| Parameters | Description | +| ---- | ---- | +| fsp | Non-required parameters. If the fsp parameter is given to specify a decimal precision from 0 to 6, the return value includes the decimal fraction of the number. | + +## Examples + +```sql +mysql> select sysdate(); ++----------------------------+ +| sysdate() | ++----------------------------+ +| 2024-04-30 10:49:39.554807 | ++----------------------------+ +1 row in set (0.00 sec) + +mysql> select sysdate(6); ++----------------------------+ +| sysdate(6) | ++----------------------------+ +| 2024-04-30 10:50:08.452370 | ++----------------------------+ +1 row in set (0.00 sec) + +mysql> SELECT SYSDATE(), SLEEP(2), SYSDATE(); ++----------------------------+----------+----------------------------+ +| sysdate() | sleep(2) | sysdate() | ++----------------------------+----------+----------------------------+ +| 2024-04-30 10:50:30.004912 | 0 | 2024-04-30 10:50:32.005203 | ++----------------------------+----------+----------------------------+ +1 row in set (2.00 sec) + +mysql> SELECT NOW(), SLEEP(2), NOW(); ++----------------------------+----------+----------------------------+ +| now() | sleep(2) | now() | ++----------------------------+----------+----------------------------+ +| 2024-04-30 10:50:47.904309 | 0 | 2024-04-30 10:50:47.904309 | ++----------------------------+----------+----------------------------+ +1 row in set (2.00 sec) +``` diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Other/sample.md b/docs/MatrixOne/Reference/Functions-and-Operators/Other/sample.md new file mode 100644 index 000000000..d30e7d478 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Other/sample.md @@ -0,0 +1,30 @@ +# SAMPLE Sampling Function + +The SAMPLE sampling function feature is a key tool for handling large amounts of data analysis, primarily to quickly narrow down queries. + +1. Syntax structure + +```sql +SELECT SAMPLE(, /) FROM [WHERE ...] [GROUP BY ...] [ORDER BY ...] [LIMIT ...] [OFFSET ...] +``` + +* ``: List of selected column names. +* `/`: Specifies the number (N rows) or percentage (K%) of samples returned. + +2. Functional Features + +* The SAMPLE function filters the table before performing sampling. +* Returns N random samples, or K% random samples, in the table. +* When N rows are specified, N is a positive integer from 1-1000. +* When K% is specified, the value of K ranges from 0.01-99.99, representing the probability that each row will be selected. The result may be different each time, and the number of rows is not fixed. For example, the table has 10,000 rows and performs SAMPLE(a, 50 PERCENT); since each row has a 50 percent probability of being selected, similar to a 10,000 coin toss, the probability of positive and negative sides is 50 percent each time, but the end result could be 350 positives and 650 negatives. +* Multiple column sampling is supported, such as SELECT SAMPLE(a,b,c,100 ROWS) FROM t1;. +* Can be used in conjunction with WHERE clause, GROUP BY clause, etc. + +3. Application Examples + +```sql +SELECT SAMPLE(a, 100 ROWS) FROM t1; --Returns 100 random samples +SELECT SAMPLE(a, 0.2 PERCENT) FROM t1; --Returns about 0.2 percent of samples +SELECT SAMPLE(a, 100 ROWS) FROM t1 WHERE a > 1; --Filters before sampling +SELECT a, SAMPLE(b, 100 ROWS) FROM t1 GROUP BY a; --Groups after sampling +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Other/serial_extract.md b/docs/MatrixOne/Reference/Functions-and-Operators/Other/serial_extract.md new file mode 100644 index 000000000..e534aa3f0 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Other/serial_extract.md @@ -0,0 +1,75 @@ +# SERIAL_EXTRACT function + +## Function Description + +The `SERIAL_EXTRACT()` function is used to extract the individual elements in a sequence/tuple value and is used in conjunction with the functions [`MAX()`](../Aggregate-Functions/max.md), [`MIN()`](../Aggregate-Functions/min.md), [`SERIAL()`](../../Operators/operators/cast-functions-and-operators/serial.md), [`SERIAL_NULL()`](../../Operators/operators/cast-functions-and-operators/serial_full.md). + +## Function syntax + +``` +>SERIAL_EXTRACT(serial_col, pos as type) +``` + +## Parameter interpretation + +| Parameters | Description | +| ---- | ---- | +| serial_col | Required parameters. Holds a string row of serial/serial_full function values. If you need to change the output type you can use it in conjunction with the [`CAST()`](../../Operators/operators/cast-functions-and-operators/cast.md) function. | +| pos | Required parameters. Position of the field to extract, 0 is the first. | +| type| Required parameter. The original type of the exported element. Requires consistency with extracted element type. | + +## Examples + +```sql +drop table if exists vtab64; +create table vtab64(id int primary key auto_increment,`vecf64_3` vecf64(3),`vecf64_5` vecf64(5)); +insert into vtab64(vecf64_3,vecf64_5) values("[1,NULL,2]",NULL); +insert into vtab64(vecf64_3,vecf64_5) values(NULL,NULL); +insert into vtab64(vecf64_3,vecf64_5) values("[2,3,4]",NULL); +insert into vtab64(vecf64_3,vecf64_5) values ("[4,5,6]","[1,2,3,4,5]"); +insert into vtab64(vecf64_3,vecf64_5) values ("[7,8,9]","[2,3,4,5,6]"); + +mysql> select * from vtab64; ++------+-----------+-----------------+ +| id | vecf64_3 | vecf64_5 | ++------+-----------+-----------------+ +| 1 | NULL | NULL | +| 2 | [2, 3, 4] | NULL | +| 3 | [4, 5, 6] | [1, 2, 3, 4, 5] | +| 4 | [7, 8, 9] | [2, 3, 4, 5, 6] | ++------+-----------+-----------------+ +4 rows in set (0.01 sec) + +--max(max(serial(id, `vecf64_3`, `vecf64_5`)) gets a maximum serialized value, and then normally the max obtained would be the record (4, [7, 8, 9], [2, 3, 4, 5, 6]), but the 1 represents the value in the second position, so that's [7, 8, 9]. +mysql> select serial_extract(max(serial(id, `vecf64_3`, `vecf64_5`)), 1 as vecf64(3)) as a from vtab64; ++-----------+ +| a | ++-----------+ +| [7, 8, 9] | ++-----------+ +1 row in set (0.01 sec) + +mysql> select serial_extract(min(serial(id, `vecf64_3`, `vecf64_5`)), 2 as vecf64(5)) as a from vtab64; ++-----------------+ +| a | ++-----------------+ +| [1, 2, 3, 4, 5] | ++-----------------+ +1 row in set (0.00 sec) + +mysql> select serial_extract(max(serial_full(cast(id as decimal), `vecf64_3`)), 0 as decimal) as a from vtab64; ++------+ +| a | ++------+ +| 4 | ++------+ +1 row in set (0.01 sec) + +mysql> select serial_extract(min(serial_full(cast(id as decimal), `vecf64_3`)), 1 as vecf64(3)) as a from vtab64; ++------+ +| a | ++------+ +| NULL | ++------+ +1 row in set (0.00 sec) +``` diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/String/from_base64.md b/docs/MatrixOne/Reference/Functions-and-Operators/String/from_base64.md new file mode 100644 index 000000000..c43690ecb --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/String/from_base64.md @@ -0,0 +1,45 @@ +# FROM\_BASE64() + +## Function Description + +`FROM_BASE64()` is used to convert a Base64 encoded string back to raw binary data (or text data). Data that is Base64 encoded using the [`TO_BASE64()`](to_base64.md) function can be decoded. If the argument is NULL, the result is NULL. + +## Function syntax + +``` +> FROM_BASE64(str) +``` + +## Parameter interpretation + +| Parameters | Description | +| ---- | ---- | +| str | Required parameters. Base64 encoded string to convert. | + +## Examples + +```SQL +mysql> select from_base64('MjU1'); ++-------------------+ +| from_base64(MjU1) | ++-------------------+ +| 255 | ++-------------------+ +1 row in set (0.00 sec) + +mysql> SELECT TO_BASE64('abc'), FROM_BASE64(TO_BASE64('abc')); ++----------------+-----------------------------+ +| to_base64(abc) | from_base64(to_base64(abc)) | ++----------------+-----------------------------+ +| YWJj | abc | ++----------------+-----------------------------+ +1 row in set (0.00 sec) + +mysql> select from_base64(null); ++-------------------+ +| from_base64(null) | ++-------------------+ +| NULL | ++-------------------+ +1 row in set (0.01 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/String/lcase.md b/docs/MatrixOne/Reference/Functions-and-Operators/String/lcase.md new file mode 100644 index 000000000..99736de24 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/String/lcase.md @@ -0,0 +1,37 @@ +# **LCASE()** + +## **Function Description** + +`LCASE()` is used to convert a given string to lowercase, a synonym for [`LOWER()`](lower.md). + +## **Function syntax** + +``` +> LCASE(str) +``` + +## **Parameter interpretation** + +| Parameters | Description | +| ---- | ---- | +| str | Required parameters, alphabetic characters. | + +## **Examples** + +```sql +mysql> select lcase('HELLO'); ++--------------+ +| lcase(HELLO) | ++--------------+ +| hello | ++--------------+ +1 row in set (0.02 sec) + +mysql> select lcase('A'),lcase('B'),lcase('C'); ++----------+----------+----------+ +| lcase(A) | lcaser(B) | lcase(C) | ++----------+----------+----------+ +| a | b | c | ++----------+----------+----------+ +1 row in set (0.03 sec) +``` diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/String/locate.md b/docs/MatrixOne/Reference/Functions-and-Operators/String/locate.md new file mode 100644 index 000000000..ce660eecb --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/String/locate.md @@ -0,0 +1,73 @@ +# **LOCATE()** + +## **Function Description** + +The `LOCATE()` function is a function used to find the location of a substring in a string. It returns the position of the substring in the string or 0 if not found. + +Because the `LOCATE()` function returns an integer value, it can be nested and used in other functions, such as intercepting strings with the substring function. + +Regarding case, the `LOCATE()` function is case-insensitive. + +## **Function syntax** + +``` +> LOCATE(subtr,str,pos) +``` + +## **Parameter interpretation** + +| Parameters | Description | +| ---- | ---- | +| substr | Required parameters. `substring` is the string you are looking for. | +| str | Required parameter. `string` is the string to search in. | +| pos | Unnecessary argument. `position` is the position indicating the start of the query. | + +## **Examples** + +- Example 1 + +```sql +mysql> SELECT LOCATE('bar', 'footbarbar'); ++-------------------------+ +| locate(bar, footbarbar) | ++-------------------------+ +| 5 | ++-------------------------+ +1 row in set (0.00 sec) +``` + +- Example 2 + +```sql +mysql>SELECT LOCATE('bar', 'footbarbar',6); ++----------------------------+ +| locate(bar, footbarbar, 6) | ++----------------------------+ +| 8 | ++----------------------------+ +1 row in set (0.00 sec) +``` + +- Example 3 + +```sql +mysql>SELECT SUBSTRING('hello world',LOCATE('o','hello world'),5); ++---------------------------------------------------+ +| substring(hello world, locate(o, hello world), 5) | ++---------------------------------------------------+ +| o wor | ++---------------------------------------------------+ +1 row in set (0.00 sec) +``` + +- Example 4 + +```sql +mysql>select locate('a','ABC'); ++----------------+ +| locate(a, ABC) | ++----------------+ +| 1 | ++----------------+ +1 row in set (0.00 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/String/lower.md b/docs/MatrixOne/Reference/Functions-and-Operators/String/lower.md new file mode 100644 index 000000000..f248bbe13 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/String/lower.md @@ -0,0 +1,37 @@ +# **LOWER()** + +## **Function Description** + +`LOWER()` Converts the given string to lowercase. + +## **Function syntax** + +``` +> LOWER(str) +``` + +## **Parameter interpretation** + +| Parameters | Description | +| ---- | ---- | +| str | Required parameters, alphabetic characters. | + +## **Examples** + +```sql +mysql> select lower('HELLO'); ++--------------+ +| lower(HELLO) | ++--------------+ +| hello | ++--------------+ +1 row in set (0.02 sec) + +mysql> select lower('A'),lower('B'),lower('C'); ++----------+----------+----------+ +| lower(A) | lower(B) | lower(C) | ++----------+----------+----------+ +| a | b | c | ++----------+----------+----------+ +1 row in set (0.03 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/String/md5.md b/docs/MatrixOne/Reference/Functions-and-Operators/String/md5.md new file mode 100644 index 000000000..b7323cd1e --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/String/md5.md @@ -0,0 +1,37 @@ +# MD5() + +## Function Description + +`MD5 ()` Function A widely used hash function that generates a 32-character long hexadecimal MD5 hash of a string. It converts an input message of arbitrary length into a 128-bit (16-byte) hash, usually represented as a 32-bit hexadecimal string. Returns NULL if the argument is NULL. + +## Function syntax + +``` +> MD5(str) +``` + +## Parameter interpretation + +| Parameters | Description | +| -------- | ------------ | +| str | Required parameters. String to convert | + +## Examples + +```SQL +mysql> select md5("hello world"); ++----------------------------------+ +| md5(hello world) | ++----------------------------------+ +| 5eb63bbbe01eeed093cb22bb8f5acdc3 | ++----------------------------------+ +1 row in set (0.00 sec) + +mysql> select md5(null); ++-----------+ +| md5(null) | ++-----------+ +| NULL | ++-----------+ +1 row in set (0.00 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/String/sha1.md b/docs/MatrixOne/Reference/Functions-and-Operators/String/sha1.md new file mode 100644 index 000000000..a5e844ce1 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/String/sha1.md @@ -0,0 +1,45 @@ +# SHA1()/SHA() + +## Function Description + +The `SHA1()/SHA()` function is an encrypted hash function that calculates and returns the SHA-1 hash value for a given string. It converts an input message of any length into a fixed length (160 bits, or 20 bytes) hash value, typically expressed as 40 hexadecimal characters. Returns NULL if the argument is NULL. + +## Function syntax + +``` +> SHA1/SHA(str) +``` + +## Parameter interpretation + +| Parameters | Description | +| -------- | ------------ | +| str | Required parameters. String to encrypt | + +## Examples + +```SQL +mysql> select sha1("hello world"); ++------------------------------------------+ +| sha1(hello world) | ++------------------------------------------+ +| 2aae6c35c94fcfb415dbe95f408b9ce91ee846ed | ++------------------------------------------+ +1 row in set (0.00 sec) + +mysql> select sha("hello world"); ++------------------------------------------+ +| sha(hello world) | ++------------------------------------------+ +| 2aae6c35c94fcfb415dbe95f408b9ce91ee846ed | ++------------------------------------------+ +1 row in set (0.00 sec) + +mysql> select sha1(null); ++------------+ +| sha1(null) | ++------------+ +| NULL | ++------------+ +1 row in set (0.00 sec) +``` diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/String/sha2.md b/docs/MatrixOne/Reference/Functions-and-Operators/String/sha2.md new file mode 100644 index 000000000..3e4c63400 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/String/sha2.md @@ -0,0 +1,46 @@ +# **SHA2()** + +## **Function Description** + +The `SHA2()` encryption function is used to calculate the SHA2 hash of the input string. The first argument is the plaintext string to hash. The second parameter indicates the desired bit length of the result, which must be 224, 256, 384, 512, or 0 (equivalent to 256), corresponding to the SHA-224, SHA-256, SHA-384, and SHA-512 algorithms, respectively. Returns NULL if the argument is NULL or not a legal value. + +## **Function syntax** + +``` +> SHA2(str, hash_length) +``` + +## **Parameter interpretation** + +| Parameters | Description | +| -------- | ------------ | +| str | Required parameters. The string | +| hash_length | necessary to calculate the hash value. Hash length. | + +## **Examples** + +```SQL +mysql> select sha2("hello world", 384); ++--------------------------------------------------------------------------------------------------+ +| sha2(hello world, 384) | ++--------------------------------------------------------------------------------------------------+ +| fdbd8e75a67f29f701a4e040385e2e23986303ea10239211af907fcbb83578b3e417cb71ce646efd0819dd8c088de1bd | ++--------------------------------------------------------------------------------------------------+ +1 row in set (0.00 sec) + +mysql> select sha2(null, 512); ++-----------------+ +| sha2(null, 512) | ++-----------------+ +| NULL | ++-----------------+ +1 row in set (0.00 sec) + +mysql> select sha2("abc", 99); ++---------------+ +| sha2(abc, 99) | ++---------------+ +| NULL | ++---------------+ +1 row in set (0.00 sec) +``` diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/String/to_base64.md b/docs/MatrixOne/Reference/Functions-and-Operators/String/to_base64.md new file mode 100644 index 000000000..2d48d1b6f --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/String/to_base64.md @@ -0,0 +1,47 @@ +# TO\_BASE64() + +## Function Description + +The `TO_BASE64()` function is used to convert a string to a Base64 encoded string. If the argument is not a string, it is converted to a string before conversion. If the argument is NULL, the result is NULL. + +You can decode a Base64 encoded string using the [`FROM_BASE64()`](from_base64.md) function. + +## Function syntax + +``` +> TO_BASE64(str) +``` + +## Parameter interpretation + +| Parameters | Description | +| ---- | ---- | +| str | Required parameters. To convert to a Base64 encoded string | + +## Examples + +```SQL +mysql> SELECT TO_BASE64('abc'); ++----------------+ +| to_base64(abc) | ++----------------+ +| YWJj | ++----------------+ +1 row in set (0.00 sec) + +mysql> SELECT TO_BASE64(255); ++----------------+ +| to_base64(255) | ++----------------+ +| MjU1 | ++----------------+ +1 row in set (0.00 sec) + +mysql> SELECT TO_BASE64(null); ++-----------------+ +| to_base64(null) | ++-----------------+ +| NULL | ++-----------------+ +1 row in set (0.01 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/String/ucase.md b/docs/MatrixOne/Reference/Functions-and-Operators/String/ucase.md new file mode 100644 index 000000000..7471b3ad9 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/String/ucase.md @@ -0,0 +1,37 @@ +# **UCASE()** + +## **Function Description** + +`UCASE()` is used to convert a given string to uppercase form, a synonym for [`UPPER()`](upper.md). + +## **Function syntax** + +``` +> UCASE(str) +``` + +## **Parameter interpretation** + +| Parameters | Description | +| ---- | ---- | +| str | Required parameters, alphabetic characters. | + +## **Examples** + +```sql +mysql> select ucase('hello'); ++--------------+ +| ucase(hello) | ++--------------+ +| HELLO | ++--------------+ +1 row in set (0.03 sec) + +mysql> select ucase('a'),ucase('b'),ucase('c'); ++----------+----------+----------+ +| ucase(a) | ucase(b) | ucase(c) | ++----------+----------+----------+ +| A | B | C | ++----------+----------+----------+ +1 row in set (0.03 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/String/upper.md b/docs/MatrixOne/Reference/Functions-and-Operators/String/upper.md new file mode 100644 index 000000000..d6425b6c2 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/String/upper.md @@ -0,0 +1,37 @@ +# **UPPER()** + +## **Function Description** + +`UPPER()` is used to convert a given string to uppercase. + +## **Function syntax** + +``` +> UPPER(str) +``` + +## **Parameter interpretation** + +| Parameters | Description | +| ---- | ---- | +| str | Required parameters, alphabetic characters. | + +## **Examples** + +```sql +mysql> select upper('hello'); ++--------------+ +| upper(hello) | ++--------------+ +| HELLO | ++--------------+ +1 row in set (0.03 sec) + +mysql> select upper('a'),upper('b'),upper('c'); ++----------+----------+----------+ +| upper(a) | upper(b) | upper(c) | ++----------+----------+----------+ +| A | B | C | ++----------+----------+----------+ +1 row in set (0.03 sec) +``` diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/arithmetic.md b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/arithmetic.md index 066fdc933..92eea4b5e 100644 --- a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/arithmetic.md +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/arithmetic.md @@ -1,20 +1,20 @@ -# Arithemetic Operators +# Arithmetic operators -MatrixOne supports the basic arithmetic operators like Add, Subtract, Multiply, Divide on vector. These operator performs element-wise arithemetic operation and return a new vector. +MatrixOne supports basic arithmetic operators such as addition, subtraction, multiplication, and division between vectors or between vectors and scalars. These operators perform element-by-element arithmetic and return a new vector. !!! note - Sub(`-`), Multipy(`*`) and Divide(`/`) are all similar to the Add example. + Subtraction (`-`), multiplication (`*`), and division (`/`) are all similar to addition examples and will not be repeated. ## Add -### **Description** +### **Function Description** -This operator is used to add two vectors element-wise by using the + operator. +`+` is used to add two elements together. -### **Syntax** +### **Function syntax** ``` -> SELECT vector1 + vector2 AS result_vector FROM table_name; +> SELECT para1 + para2 ``` ### **Examples** @@ -38,24 +38,40 @@ mysql> select b + "[1,2,3]" from vec_table; | [2, 4, 6] | +-------------+ 1 row in set (0.00 sec) + +mysql> select b + 1 from vec_table; ++-----------+ +| b + 1 | ++-----------+ +| [2, 3, 4] | ++-----------+ +1 row in set (0.01 sec) + +mysql> select cast("[1,2,3]" as vecf32(3)) + 5.0; ++----------------------------------+ +| cast([1,2,3] as vecf32(3)) + 5.0 | ++----------------------------------+ +| [6, 7, 8] | ++----------------------------------+ +1 row in set (0.00 sec) ``` -### **Constraints** +### **Restrictions** -- Both the argument vectors should be of same dimension -- If operation is performed between vecf32 and vecf64, the result is cast to vecf64 -- If one of the argument is VECTOR in textual format, then the other argument should be of VECTOR type. If both the arguments are TEXT, then Query Engine would treat it like string operation. +- When two vector type parameters are added, the dimensions of the vector should be the same. +- If two vector type parameters are added, of type vecf32 and vecf64, the result is converted to vecf64. +- When vector type and scalar type data are subtracted, vector type data needs to be subtracted. ## Divide -### **Description** +### **Function Description** -This operator is used to divide two vectors element-wise by using the / operator. +`/` Used to divide two vector elements. -## **Syntax** +## **Function syntax** ``` -> SELECT vector1 / vector2 AS result_vector FROM table_name; +> SELECT para1 / para2 ``` ### **Examples** @@ -79,11 +95,27 @@ mysql> select b/b from vec_table; | [1, 1, 1] | +-----------+ 1 row in set (0.00 sec) + +mysql> select cast("[1,2,3]" as vecf32(3)) / b from vec_table;; ++--------------------------------+ +| cast([1,2,3] as vecf32(3)) / b | ++--------------------------------+ +| [1, 1, 1] | ++--------------------------------+ +1 row in set (0.00 sec) + +mysql> select b/2 from vec_table; ++---------------+ +| b / 2 | ++---------------+ +| [0.5, 1, 1.5] | ++---------------+ +1 row in set (0.00 sec) ``` -### **Constraints** +### **Restrictions** -- If one of the element in denominator vector is zero, then it will throw Division By Zero error. --Both the argument vectors should be of same dimension -- If operation is performed between vecf32 and vecf64, the result is cast to vecf64 -- If one of the argument is VECTOR in textual format, then the other argument should be of VECTOR type. If both the arguments are TEXT, then Query Engine would treat it like string operation. +- The denominator element is not allowed to be 0, otherwise an error is generated. +- The dimension should be the same when both parameters are vector types. +- If two vector type parameters are added, of type vecf32 and vecf64, the result is converted to vecf64. +- When vector type and scalar type data are divided, vector type data needs to be divisible. diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/cluster_centers.md b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/cluster_centers.md new file mode 100644 index 000000000..803a161d7 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/cluster_centers.md @@ -0,0 +1,79 @@ +# CLUSTER\_CENTERS + +## Function Description + +The `CLUSTER_CENTERS()` function can be used to determine the K cluster centers of a vector column. Returns a row of JSON array strings containing all cluster centers. + +## Syntax structure + +``` +SELECT cluster_centers(col kmeans 'k, op_type, init_type, normalize') FROM tbl; +``` + +## Parameter interpretation + +| Parameters | Description | +| ---- | ---- | +| col | Required. To determine the vector columns of the clustering centers.| +| k | Required. The number of clusters into which the dataset is to be divided, greater than 0 and less than or equal to the total number of rows.| +| op_type| Required. The distance function to be used during the clustering calculation. Currently vector_l2_ops is supported.| +| init_type | Required. The initialized clustering center algorithm to be used. Currently we support random and kmeansplusplus (K-means++).| +| normalize | Required. Boolean value, the clustering algorithm to use, true for Spherical Kmeans, false for Regular Kmeans.| + +## Examples + +```sql +drop table if exists points; +CREATE TABLE points (id int auto_increment PRIMARY KEY,coordinate vecf32(2)); +insert into points(coordinate) VALUES + ("[-7.68905443,6.62034649]"), + ("[-9.57651383,-6.93440446]"), + ("[6.82968177,1.1648714]"), + ("[-2.90130578,7.55077118]"), + ("[-5.67841327,-7.28818497]"), + ("[-6.04929137,-7.73619342]"), + ("[-6.27824322,7.22746302]"); +SET GLOBAL experimental_ivf_index = 1;--The parameter experimental_ivf_index needs to be set to 1 (default 0) to use vector indexes. +--create index idx_t1 using ivfflat on points(coordinate) lists=1 op_type "vector_l2_ops"; + +-- Each point represents its coordinates on the x and y axes, querying the clustering centers, using Regular Kmeans +--K-means++ +mysql> SELECT cluster_centers(coordinate kmeans '2,vector_l2_ops,kmeansplusplus,false') AS centers FROM points; ++----------------------------------------------------+ +| centers | ++----------------------------------------------------+ +| [ [-2.5097303, 5.640863],[-7.101406, -7.3195944] ] | ++----------------------------------------------------+ +1 row in set (0.01 sec) + +--KMeans +mysql> SELECT cluster_centers(coordinate kmeans '2,vector_l2_ops,random,false') AS centers FROM points; ++----------------------------------------------------+ +| centers | ++----------------------------------------------------+ +| [ [-6.362137, -0.09336702],[6.829682, 1.1648715] ] | ++----------------------------------------------------+ +1 row in set (0.00 sec) + +-- Each point represents latitude and longitude coordinates to query the clustering center using Spherical Kmeans +mysql> SELECT cluster_centers(coordinate kmeans '2,vector_l2_ops,kmeansplusplus,true') AS centers FROM points; ++------------------------------------------------------+ +| centers | ++------------------------------------------------------+ +| [ [0.70710677, 0.70710677],[0.83512634, 0.5500581] ] | ++------------------------------------------------------+ +1 row in set (0.00 sec) + +--Cluster centers within JSON-type data can be taken out in combination with CROSS JOIN and UNNEST syntax. +mysql> SELECT value FROM ( + -> SELECT cluster_centers(coordinate kmeans '2,vector_l2_ops,kmeansplusplus,false') AS centers FROM points + -> ) AS subquery + -> CROSS JOIN UNNEST(subquery.centers) AS u; ++-------------------------+ +| value | ++-------------------------+ +| [-2.5097303, 5.640863] | +| [-7.101406, -7.3195944] | ++-------------------------+ +2 rows in set (0.00 sec) +``` diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/cosine_distance.md b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/cosine_distance.md index 101c58d88..040da6596 100644 --- a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/cosine_distance.md +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/cosine_distance.md @@ -1,16 +1,16 @@ -# COSINE_DISTANCE() +# COSINE\_DISTANCE() -## Description +## Function Description The `COSINE_DISTANCE()` function is used to calculate the cosine distance between two vectors. -Cosine Distance is a measure of the directional difference between two vectors, typically defined as 1 minus the cosine similarity ([Cosine Similarity](cosine_similarity.md)). The value of cosine distance ranges from 0 to 2. A value of 0 indicates that the directions of the two vectors are exactly the same (minimum distance). A value of 2 indicates that the directions of the two vectors are exactly opposite (maximum distance). In text analysis, cosine distance can be used to measure the similarity between documents. Since it only considers the direction of the vectors and not their magnitude, it is fair for comparisons between long and short texts. +Cosine Distance is a measure of the difference in direction between two vectors, usually defined as 1 minus [Cosine Similarity](cosine_similarity.md). The value of the cosine distance ranges from 0 to 2. 0 means that both vectors are in exactly the same direction (minimum distance). 2 means that the two vectors are in exactly the opposite direction (maximum distance). In text analysis, cosine distance can be used to measure similarities between documents. Since it considers only the direction of the vector and not the length, it is fair for comparisons between long and short text.
-## Syntax +## Function syntax ``` > SELECT COSINE_DISTANCE(vector1, vector2) FROM tbl; @@ -55,6 +55,6 @@ mysql> select cosine_distance(b,"[-1,-2,-3]") from vec_table; 1 row in set (0.00 sec) ``` -## Constraints +## Limitations -When using the `COSINE_DISTANCE()`, input vectors must not be zero vectors, as this would result in a division by zero, which is undefined in mathematics. In practical applications, we generally consider the cosine similarity between a zero vector and any other vector to be zero, because there is no directional similarity between them. \ No newline at end of file +Input vectors are not allowed to be 0 vectors when using the `COSINE_DISTANCE()` function, as this results in a division by zero, which is mathematically undefined. In practice, we usually consider the cosine similarity of a zero vector to any other vector to be 0 because there is no similarity in any direction between them. \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/cosine_similarity.md b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/cosine_similarity.md index 32b473fe2..75ef4b7d6 100644 --- a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/cosine_similarity.md +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/cosine_similarity.md @@ -1,12 +1,14 @@ -# **cosine_similarity()** +# **cosine\_similarity()** -## **Description** +## **Function Description** -Cosine similarity measures the cosine of the angle between two vectors, indicating their similarity by how closely they align in a multi-dimensional space, with 1 denoting perfect similarity and -1 indicating perfect dissimilarity. Consine similarity is calculated by dividing Inner Product of two vectors, by the product of their l2 norms. +`cosine_similarity()` is a cosine similarity that measures the cosine value of the angle between two vectors, indicating their similarity by how close they are in multidimensional space, where 1 means exactly similar and -1 means completely different. Cosine similarity is calculated by dividing the inner product of two vectors by the product of their l2 norm. -![cosine_similarity](https://github.com/matrixorigin/artwork/blob/main/docs/reference/vector/cosine_similarity.png?raw=true) +
+ +
-## **Syntax** +## **Function syntax** ``` > SELECT cosine_similarity(vector1, vector2) AS similarity FROM table_name; @@ -35,7 +37,8 @@ mysql> select cosine_similarity(b,"[1,2,3]") from vec_table; 1 row in set (0.00 sec) ``` -## **Constraints** +## **Restrictions** -- Both the argument vectors should have same dimensions -- Cosine similarity value lies between -1 and 1. +- Two parameter vectors must have the same dimension. +- The value for cosine similarity is between -1 and 1. +- Input vectors are not allowed to be 0 vectors because this results in a division by zero, which is mathematically undefined. In practice, we usually consider the cosine similarity of a zero vector to any other vector to be 0 because there is no similarity in any direction between them. diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/inner_product.md b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/inner_product.md index dcc32bbcb..9c2ef20bd 100644 --- a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/inner_product.md +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/inner_product.md @@ -1,12 +1,12 @@ # **inner_product()** -## **Description** +## **Function Description** -The INNER PRODUCT function is used to calculate the inner/dot product between two vectors, which is the result of multiplying corresponding elements of two vectors and then adding them together. +The `INNER PRODUCT` function is used to calculate the inner/dot product between two vectors. It is the result of multiplying the corresponding elements of two vectors and then adding them. -![inner_product](https://github.com/matrixorigin/artwork/blob/main/docs/reference/vector/inner_product.png?raw=true) +![inner_product](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/reference/vector/inner_product.png?raw=true) -## **Syntax** +## **Function syntax** ``` > SELECT inner_product(vector1, vector2) AS result FROM table_name; @@ -35,6 +35,6 @@ mysql> select inner_product(b,"[1,2,3]") from vec_table; 1 row in set (0.00 sec) ``` -## **Constraints** +## **Restrictions** -Both the argument vector shoulds be of same dimensions. +Two parameter vectors must have the same dimension. diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/l1_norm.md b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/l1_norm.md index 97e022701..2b8914ab7 100644 --- a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/l1_norm.md +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/l1_norm.md @@ -1,20 +1,22 @@ # **l1_norm()** -## **Description** +## **Function Description** -The l1_norm function is used to calculate the L1/Manhattan/TaxiCab norm. The L1 norm is obtained by summing the absolute value of vector elements. +The `l1_norm` function is used to calculate the `l1`/Manhattan/TaxiCab norm. The `l1` norm is obtained by summing the absolute values of the vector elements. -![l1_normy](https://github.com/matrixorigin/artwork/blob/main/docs/reference/vector/l1_norm.png?raw=true) +
+ +
-You can use L1 Norm to calculate L1 Distance. +You can use the `l1` norm to calculate the `l1` distance. ``` l1_distance(a,b) = l1_norm(a-b) ``` -Same is appicable for calculating L2 distance from L2_Norm. +The same calculation applies to calculating the `l2` distance from `l2_Norm`. -## **Syntax** +## **Function syntax** ``` > SELECT l1_norm(vector) AS result FROM table_name; diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/l2_distance.md b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/l2_distance.md index f940a427e..8b1ea1ec8 100644 --- a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/l2_distance.md +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/l2_distance.md @@ -1,16 +1,16 @@ # L2_DISTANCE() -## Description +## Function Description -The `L2_DISTANCE()` function is used to calculate the Euclidean distance between two vectors. It returns a value of the FLOAT64 type. +The `L2_DISTANCE()` function is used to calculate the Euclidean distance between two vectors. Returns a value of type FLOAT64. -L2 distance, also known as Euclidean distance, is one of the most commonly used methods of measuring distance in vector spaces. It measures the straight-line distance between two points in multidimensional space. L2 distance has many practical applications, including fields such as machine learning, computer vision, and spatial analysis. +L2 distance, also known as Euclidean Distance, is one of the most commonly used distance measures in vector spaces. It measures the straight line distance between two points in multidimensional space. l2 distance has many practical applications, including areas such as machine learning, computer vision, and spatial analysis.
-## Syntax +## Function syntax ``` > SELECT L2_DISTANCE(vector, const_vector) FROM tbl; diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/l2_norm.md b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/l2_norm.md index e8a2377ac..c198d318c 100644 --- a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/l2_norm.md +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/l2_norm.md @@ -1,12 +1,14 @@ # **l2_norm()** -## **Description** +## **Function Description** -The l2_norm function is used to calculate the L2/Euclidean norm. The L2 norm is obtained by taking the square root of the sum of the squares of the vector elements. +The `l2_norm` function is used to calculate the `l2`/Euclidean norm. The `l2` norm is obtained by performing a square root operation on the sum of squares of the vector elements. -![l2_normy](https://github.com/matrixorigin/artwork/blob/main/docs/reference/vector/l2_norm.png?raw=true) +
+ +
-## **Syntax** +## **Function syntax** ``` > SELECT l2_norm(vector) AS result FROM table_name; diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/misc.md b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/misc.md index bcace2b8b..ba7cdfcc8 100644 --- a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/misc.md +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/misc.md @@ -1,14 +1,14 @@ -# Misc Function +# Mathematical class functions -Other support functions on vector include. +Vectors support the following mathematical functions: ## SQRT -### **Description** +### **Function Description** -The sqrt function is used to calculate the square root of each element in a vector. +The `sqrt` function is used to calculate the square root of each element in a vector. -### **Syntax** +### **Function syntax** ``` > SELECT sqrt(vector_column) FROM table_name; @@ -16,7 +16,7 @@ The sqrt function is used to calculate the square root of each element in a vect #### Return Type -Return a new vector of type vecf64 containing the square root of each element in the original vector. +Returns a new vector of type vecf64 containing the square root of each element in the original vector. ### **Examples** @@ -41,17 +41,17 @@ mysql> select sqrt(b) from vec_table; 1 row in set (0.00 sec) ``` -### **Constraints** +### **Restrictions** -- Elements of the vector cannot have -ve. +- Elements of a vector cannot be negative. ## ABS -### **Description** +### **Function Description** -The abs function is used to calculate the absolute value of a vector. +The `abs` function is used to calculate the absolute value of a vector. -### **Syntax** +### **Function syntax** ``` > SELECT ABS(vector_column) FROM table_name; @@ -59,7 +59,7 @@ The abs function is used to calculate the absolute value of a vector. #### Return Type -Return a new vector of same type, containing the absolute values of each element in the original vector. +Returns a new vector of the same type containing the absolute value of each element in the original vector. ### **Examples** @@ -67,13 +67,13 @@ Return a new vector of same type, containing the absolute values of each element drop table if exists vec_table; create table vec_table(a int, b vecf32(3), c vecf64(3)); insert into vec_table values(1, "[-1,-2,3]", "[4,5,6]"); -mysql> select * from vec_table; -+------+-------------+-----------+ -| a | b | c | -+------+-------------+-----------+ -| 1 | [-1, -2, 3] | [4, 5, 6] | -+------+-------------+-----------+ -1 row in set (0.00 sec) + mysql> select * from vec_table; + +------+-------------+-----------+ + | a | b | c | + +------+-------------+-----------+ + | 1 | [-1, -2, 3] | [4, 5, 6] | + +------+-------------+-----------+ + 1 row in set (0.00 sec) mysql> select abs(b) from vec_table; +-----------+ @@ -86,24 +86,24 @@ mysql> select abs(b) from vec_table; ## CAST -### **Description** +### **Function Description** -The cast function is used to explicitly convert vector from one vector type to another. +The cast function is used to explicitly convert a vector from one vector type to another. -### **Syntax** +### **Function syntax** ``` > SELECT CAST(vector AS vector_type) FROM table_name; ``` -#### Arguments +#### Parameters -- `vector`: input vector -- `vector_type`: new vector type +- `vector`: Enter the vector. +- `vector_type`: new vector type. #### Return Type -The new vector_type vector. +New `vector_type` vector. ### **Examples** @@ -130,11 +130,11 @@ mysql> select b + cast("[1,2,3]" as vecf32(3)) from vec_table; ## SUMMATION -### **Description** +### **Function Description** -The summation function returns the sum of all the elements in a vector. +The `summation` function returns the sum of all the elements in the vector. -### **Syntax** +### **Function syntax** ``` > SELECT SUMMATION(vector_column) FROM table_name; @@ -165,4 +165,4 @@ mysql> select summation(b) from vec_table; | 6 | +--------------+ 1 row in set (0.00 sec) -``` +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/normalize_l2.md b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/normalize_l2.md index 80848b239..1751b9390 100644 --- a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/normalize_l2.md +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/normalize_l2.md @@ -1,16 +1,16 @@ # NORMALIZE_L2() -## Description +## Function Description -The`NORMALIZE_L2()` function performs Euclidean normalization (L2 normalization) on a vector. +The `NORMALIZE_L2()` function performs Euclidean normalization on vectors. -The L2 norm is the square root of the sum of the squares of the vector's elements. Therefore, the purpose of L2 normalization is to make the length (or norm) of the vector equal to 1, which is often referred to as a unit vector. This method of normalization is particularly useful in machine learning, especially when dealing with feature vectors. It can help standardize the scale of features, thereby improving the performance of the algorithm. +The L2 norm is the square root of the sum of the squares of the vector elements, so the purpose of L2 normalization is to make the length (or norm) of the vector 1, which is often referred to as a unit vector. This normalization method is particularly useful in machine learning, especially when dealing with feature vectors. It can help standardize the scale of features and thus improve the performance of algorithms.
-## Syntax +## Function syntax ``` > SELECT NORMALIZE_L2(vector_column) FROM tbl; diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/subvector.md b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/subvector.md new file mode 100644 index 000000000..563ce25d6 --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/subvector.md @@ -0,0 +1,46 @@ +# SUBVECTOR() + +## Function Description + +The `SUBVECTOR()` function is used to extract subvectors from vectors. + +## Function syntax + +\`\`\` +> SUBVECTOR(vec, pos, len) \`\`\` + +## Parameter interpretation + +| Parameters | Description | +| ---- | ---- | +|vec | Required Parameters. The source vector from which the subvectors are extracted| +|pos | Required Parameters. The position at which to start the extraction. The first position in the vector is 1. If pos is positive, the function extracts from the beginning of the vector. If pos is negative, the extraction starts at the end of the vector.| +|len | Optional parameters. The number of dimensions to extract. Defaults to a subvector starting at position pos and ending at the end of the vector. If len is less than 1, the empty vector is returned. | + +## Examples + +```sql +mysql> SELECT SUBVECTOR("[1,2,3]", 2); ++-----------------------+ +| subvector([1,2,3], 2) | ++-----------------------+ +| [2, 3] | ++-----------------------+ +1 row in set (0.01 sec) + +mysql> SELECT SUBVECTOR("[1,2,3]",-1,1); ++---------------------------+ +| subvector([1,2,3], -1, 1) | ++---------------------------+ +| [3] | ++---------------------------+ +1 row in set (0.00 sec) + +mysql> SELECT SUBVECTOR("[1,2,3]",-1,0); ++---------------------------+ +| subvector([1,2,3], -1, 0) | ++---------------------------+ +| [] | ++---------------------------+ +1 row in set (0.00 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/vector_dims.md b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/vector_dims.md index 96713d936..87c326794 100644 --- a/docs/MatrixOne/Reference/Functions-and-Operators/Vector/vector_dims.md +++ b/docs/MatrixOne/Reference/Functions-and-Operators/Vector/vector_dims.md @@ -1,10 +1,10 @@ # **vector_dims()** -## **Description** +## **Function Description** -The vector_dims function is used to determine the dimension of a vector. +The `vector_dims` function is used to determine the dimension of a vector. -## **Syntax** +## **Function syntax** ``` > SELECT vector_dims(vector) AS dimension_count FROM table_name; diff --git a/docs/MatrixOne/Reference/Functions-and-Operators/matrixone-function-list.md b/docs/MatrixOne/Reference/Functions-and-Operators/matrixone-function-list.md new file mode 100644 index 000000000..2f20bfa7f --- /dev/null +++ b/docs/MatrixOne/Reference/Functions-and-Operators/matrixone-function-list.md @@ -0,0 +1,197 @@ +# Summary table of functions + +This document lists the functions supported in the latest version of MatrixOne. + +## Aggregate function + +| Function name | effect | +| ------------------------------------------------------ | --------------------------------------- | +| [ANY_VALUE()](./Aggregate-Functions/any-value.md) | Returns any value in the parameter range| +| [AVG()](./Aggregate-Functions/avg.md) | Calculate the arithmetic mean of the parameter columns.| +| [BITMAP](./Aggregate-Functions/bitmap.md) | A set of built-in functions for working with bitmaps, mainly for calculating different values| +| [BIT_AND()](./Aggregate-Functions/bit_and.md) | Calculated the ratio of all the bits in the column by bit to the| +| [BIT_OR()](./Aggregate-Functions/bit_or.md) | Calculated the bitwise or of all bits in the column| +| [BIT_XOR()](./Aggregate-Functions/bit_xor.md) | Calculated the bitwise dissimilarity of all the bits in the column| +| [COUNT()](./Aggregate-Functions/count.md) | The number of records of the query result was calculated| +| [GROUP_CONCAT()](./Aggregate-Functions/group-concat.md)| Concatenates content specified by columns or expressions| +| [MAX()](./Aggregate-Functions/max.md) | Returns the maximum of a set of values | +| [MEDIAN()](./Aggregate-Functions/median.md) | Returns the median of a set of values| +| [MIN()](./Aggregate-Functions/min.md) | Returns the smallest of a set of values| +| [STDDEV_POP()](./Aggregate-Functions/stddev_pop.md) | Used to calculate the overall standard deviation| +| [SUM()](./Aggregate-Functions/sum.md) | Used to calculate the sum of a set of values| +| [VARIANCE()](./Aggregate-Functions/variance.md) | Used to calculate overall variance| +| [VAR_POP()](./Aggregate-Functions/var_pop.md) | Used to calculate overall variance| + +## Date Time Class Function + +| Function name | effect | +| ----------------------------------------------------- | --------------------------------------- | +| [CONVERT_TZ()](./Datetime/convert-tz.md) |Used to convert a given datetime from one time zone to another.| +| [CURDATE()](./Datetime/curdate.md) |Returns the current date in YYYY-MM-DD format.| +| [CURRENT_TIMESTAMP()](./Datetime/current-timestamp.md)|Returns the current date and time in YYYY-MM-DD hh:mm:ss or YYYYMMDDhhmmss format.| +| [DATE()](./Datetime/date.md) |Intercepts the date portion of input in DATE or DATETIME format.| +| [DATE_ADD()](./Datetime/date-add.md) |Used to perform date arithmetic: add a specified time interval to a specified date.| +| [DATE_FORMAT()](./Datetime/date-format.md) |Formatting date values from format strings| +| [DATE_SUB()](./Datetime/date-sub.md) |Used to perform date arithmetic: subtracts a specified time interval from a specified date.| +| [DATEDIFF()](./Datetime/datediff.md) |Returns the number of days between two dates| +| [DAY()](./Datetime/day.md) | Returns a date as the first of the month.| +| [DAYOFYEAR()](./Datetime/dayofyear.md) |Number of days in a year corresponding to the date of return| +| [EXTRACT()](./Datetime/extract.md) |Partial extraction from the date| +| [HOUR()](./Datetime/hour.md) |Hours of return time| +| [FROM_UNIXTIME()](./Datetime/from-unixtime.md) |Converts internal UNIX timestamp values to normal format datetime values, which are displayed in YYYY-MM-DD HH:MM:SS or YYYYMMDDHHMMSS format.| +| [MINUTE()](./Datetime/minute.md) |Returns the minutes of the time parameter| +| [MONTH()](./Datetime/month.md) |Returns the month of the date parameter| +| [NOW()](./Datetime/now.md) |Returns the current date and time in 'YYYY-MM-DD HH:MM:SS' format.| +| [SECOND()](./Datetime/second.md) |Returns the number of seconds for the time parameter| +| [STR_TO_DATE()](./Datetime/str-to-date.md) |Convert a string to a date or datetime type according to a specified date or time display format| +| [SYSDATE()](./Datetime/sysdate.md) |Returns the current date and time in 'YYYY-MM-DD HH:MM:SS' format.| +| [TIME()](./Datetime/time.md) |Extracts the time portion of a time or datetime and returns it as a string| +| [TIMEDIFF()](./Datetime/timediff.md) |Returns the difference between two time parameters| +| [TIMESTAMP()](./Datetime/timestamp.md) |Returns a date or datetime parameter as a datetime value| +| [TIMESTAMPDIFF()](./Datetime/timestampdiff.md) |Returns an integer representing the time interval between the first datetime expression and the second datetime expression in the given time units| +| [TO_DATE()](./Datetime/to-date.md) |Convert a string to a date or datetime type according to a specified date or time display format| +| [TO_DAYS()](./Datetime/to-days.md) |Used to calculate the difference in the number of days between a given date and the start date of the Gregorian calendar (January 1, 0000)| +| [TO_SECONDS()](./Datetime/to-seconds.md) |Used to calculate the difference in seconds between a given date or datetime expr and 0 hours, 0 minutes, 0 seconds on January 1, 0 AD.| +| [UNIX_TIMESTAMP](./Datetime/unix-timestamp.md) |Returns the number of seconds from 1970-01-01 00:00:00 UTC to the specified time.| +| [UTC_TIMESTAMP()](./Datetime/utc-timestamp.md) |Returns the current UTC time in the format YYYY-MM-DD hh:mm:ss or YYYYMMDDhhmmss| +| [WEEK()](./Datetime/week.md) |Used to calculate the number of weeks for a given date| +| [WEEKDAY()](./Datetime/weekday.md) |Returns the weekday index of the date (0 = Monday, 1 = Tuesday, ... 6 = Sunday)| +| [YEAR()](./Datetime/year.md) |Returns the year of the given date| + +## Mathematical class functions + +| Function name | effect | +| --------------------------------- | --------------------------------------- | +| [ABS()](./Mathematical/abs.md) | Used to find the absolute value of a parameter| +| [ACOS()](./Mathematical/acos.md) | Used to find the cosine of a given value (expressed in radians) | +| [ATAN()](./Mathematical/atan.md) | Used to find the arctangent of a given value (expressed in radians)| +| [CEIL()](./Mathematical/ceil.md) | Used to find the smallest integer that is not less than the argument.| +| [COS()](./Mathematical/cos.md) | Used to find the cosine of an input parameter (expressed in radians).| +| [COT()](./Mathematical/cot.md) | Used to find the cotangent value of the input parameter (expressed in radians). | +| [EXP()](./Mathematical/exp.md) | Used to find the exponent of number with the natural constant e as the base.| +| [FLOOR()](./Mathematical/floor.md)| Used to find the number of digits not greater than the corresponding digit of a number. | +| [LN()](./Mathematical/ln.md) | The natural logarithm used to find the parameters| +| [LOG()](./Mathematical/log.md) | The natural logarithm used to find the parameters| +| [LOG2()](./Mathematical/log2.md) | Used to find the logarithm with 2 as the base parameter.| +| [LOG10()](./Mathematical/log10.md)| Used to find logarithms with a base argument of 10.| +| [PI()](./Mathematical/pi.md) | Used to find the mathematical constant π (pi)| +| [POWER()](./Mathematical/power.md)| POWER(X, Y) is used to find the Yth power of X.| +| [ROUND()](./Mathematical/round.md)| Used to find the value of a number rounded to a specific number of digits.| +| [RAND()](./Mathematical/rand.md) | Used to generate a random number of type Float64 between 0 and 1.| +| [SIN()](./Mathematical/sin.md) | Used to find the sine of an input parameter (expressed in radians)| +| [SINH()](./Mathematical/sinh.md) | For finding the hyperbolic sine of an input parameter (expressed in radians)| +| [TAN()](./Mathematical/tan.md) | Used to find the tangent of the input parameter (expressed in radians).| + +## String class function + +| Function name | effect | +| ---------------------------------------------- | --------------------------------------- | +| [BIN()](./String/bin.md) | Converts arguments to binary string form.| +| [BIT_LENGTH()](./String/bit-length.md) | Returns the length of the string str in bits.| +| [CHAR_LENGTH()](./String/char-length.md) | Returns the length of the string str in characters.| +| [CONCAT()](./String/concat.md) | Concatenate multiple strings (or strings containing only one character) into a single string| +| [CONCAT_WS()](./String/concat-ws.md) | Represents Concatenate With Separator, a special form of CONCAT().| +| [EMPTY()](./String/empty.md) | Determines whether the input string is empty. | +| [ENDSWITH()](./String/endswith.md) | Checks if it ends with the specified suffix.| +| [FIELD()](./String/field.md) | Returns the position of the first string str in the list of strings (str1,str2,str3,...). in the list of strings (str1,str2,str3...) | +| [FIND_IN_SET()](./String/find-in-set.md) | Finds the location of the specified string in a comma-separated list of strings.| +| [FORMAT()](./String/format.md) | Used to format numbers to the "#,###,###. ###" format and round to one decimal place.| +| [FROM_BASE64()](./String/from_base64.md) | Used to convert Base64 encoded strings back to raw binary data (or text data).| +| [HEX()](./String/hex.md) | Returns the hexadecimal string form of the argument| +| [INSTR()](./String/instr.md) | Returns the position of the first occurrence of the substring in the given string.| +| [LCASE()](./String/lcase.md) | Used to convert the given string to lowercase form.| +| [LEFT()](./String/left.md) | Returns the leftmost length character of the str string.| +| [LENGTH()](./String/length.md) | Returns the length of the string.| +| [LOCATE()](./String/locate.md) | Function for finding the location of a substring in a string.| +| [LOWER()](./String/lower.md) | Used to convert the given string to lowercase form.| +| [LPAD()](./String/lpad.md) | Used to fill in the left side of the string.| +| [LTRIM()](./String/ltrim.md) | Removes leading spaces from the input string and returns the processed characters.| +| [MD5()](./String/md5.md) | Generates a 32-character hexadecimal MD5 hash of the input string.| +| [OCT()](./String/oct.md) | Returns a string of the octal value of the argument| +| [REPEAT()](./String/repeat.md) | Repeats the input string n times and returns a new string.| +| [REVERSE()](./String/reverse.md) | Flips the order of the characters in the str string and outputs them.| +| [RPAD()](./String/rpad.md) | Used to fill in the right side of a string| +| [RTRIM()](./String/rtrim.md) | Remove trailing spaces from the input string.| +| [SHA1()/SHA()](./String/sha1.md) | Used to compute and return the SHA-1 hash of a given string.| +| [SHA2()](./String/sha2.md) | Returns the SHA2 hash of the input string.| +| [SPACE()](./String/space.md) | Returns a string of N spaces.| +| [SPLIT_PART()](./String/split_part.md) | Used to break a string into multiple parts based on a given separator character| +| [STARTSWITH()](./String/startswith.md) | The string returns 1 if it starts with the specified prefix, 0 otherwise.| +| [SUBSTRING()](./String/substring.md) | Returns a substring starting at the specified position| +| [SUBSTRING_INDEX()](./String/substring-index.md)| Get characters with different index bits, indexed by the separator.| +| [TO_BASE64()](./String/to_base64.md) | Used to convert strings to Base64 encoded strings| +| [TRIM()](./String/trim.md) | Returns a string, removing unwanted characters.| +| [UCASE()](./String/ucase.md) | Used to convert the given string to uppercase form.| +| [UNHEX()](./String/unhex.md) | Used to convert a hexadecimal string to the corresponding binary string.| +| [UPPER()](./String/upper.md) | Used to convert the given string to uppercase. | + +## Regular Expressions + +| Function name | effect | +| ------------------------------------------------------------------ | -------------------------------------- | +| [NOT REGEXP()](./String/Regular-Expressions/not-regexp.md) | Used to test if a string does not match a specified regular expression| +| [REGEXP_INSTR()](./String/Regular-Expressions/regexp-instr.md) | Returns the starting position in the string of the matched regular expression pattern.| +| [REGEXP_LIKE()](./String/Regular-Expressions/regexp-like.md) | Used to determine if the specified string matches the provided regular expression pattern| +| [REGEXP_REPLACE()](./String/Regular-Expressions/regexp-replace.md) | Used to replace a string matching a given regular expression pattern with a specified new string| +| [REGEXP_SUBSTR()](./String/Regular-Expressions/regexp-substr.md) | Used to return a substring of a string argument that matches a regular expression argument.| + +## Vector class functions + +| Function name | effect | +| ---------------------------------------------------- | --------------------------------------- | +| [基本操作符](./Vector/arithmetic.md) | Addition (+), subtraction (-), multiplication (*) and division (/) of vectors.| +| [SQRT()](./Vector/misc.md) | Used to calculate the square root of each element in a vector| +| [ABS()](./Vector/misc.md) | Used to calculate the absolute value of a vector| +| [CAST()](./Vector/misc.md) | Used to explicitly convert a vector from one vector type to another vector type| +| [SUMMATION()](./Vector/misc.md) | Returns the sum of all elements in the vector| +| [INNER_PRODUCT()](./Vector/inner_product.md) | Used to compute the inner product/dot product between two vectors| +| [CLUSTER_CENTERS()](./Vector/cluster_centers.md) | K clustering centers for determining vector columns | +| [COSINE_DISTANCE()](./Vector/cosine_distance.md) | Used to compute the cosine distance of two vectors.| +| [COSINE_SIMILARITY()](./Vector/cosine_similarity.md) | A measure of the cosine of the angle between two vectors, indicating their similarity by their proximity in a multidimensional space| +| [L2_DISTANCE()](./Vector/l2_distance.md) |Used to compute the Euclidean distance between two vectors| +| [L1_NORM()](./Vector/l1_norm.md) | Used to compute l1/Manhattan/TaxiCab norms| +| [L2_NORM()](./Vector/l2_norm.md) | For calculating l2/Euclidean paradigms| +| [NORMALIZE_L2()](./Vector/normalize_l2.md) | For performing Euclidean normalization| +| [SUBVECTOR()](./Vector/subvector.md) | For extracting subvectors from vectors| +| [VECTOR_DIMS()](./Vector/vector_dims.md) | Used to determine the dimension of the vector| + +## Table function + +| Function name | effect | +| -------------------------------- | --------------------------------------- | +| [UNNEST()](./Table/unnest.md) | Used to expand an array of columns or parameters within JSON-type data into a table.| + +## Window function + +| Function name | effect | +| --------------------------------------------------- | --------------------------------------- | +| [DENSE_RANK()](./Window-Functions/dense_rank.md) | Provide a unique ranking for each row in the dataset| +| [RANK()](./Window-Functions/rank.md) | Provide a unique ranking for each row in the dataset| +| [ROW_UNMBER()](./Window-Functions/row_number.md) | Provide a unique serial number for each row in the data set| + +## JSON functions + +| Function name | effect | +| --------------------------------------------- | --------------------------------------- | +| [JSON_EXTRACT()](./Json/json-functions.md) | Returning data from a JSON document| +| [JSON_QUOTE()](./Json/json-functions.md) | Referencing JSON Documents| +| [JSON_UNQUOTE()](./Json/json-functions.md) | Dereferencing JSON documents| + +## system operation and maintenance function + +| Function name | effect | +| ----------------------------------------------------------- | --------------------------------------- | +| [CURRENT_ROLE_NAME()](./system-ops/current_role_name.md) | Used to query the name of the role owned by the currently logged in user.| +| [CURRENT_ROLE()](./system-ops/current_role.md) | Returns the role of the current session.| +| [CURRENT_USER_NAME()](./system-ops/current_user_name.md) | Used to look up the name of the user you are currently logged in as.| +| [CURRENT_USER()](./system-ops/current_user.md) | Returns the current user account| +| [PURGE_LOG()](./system-ops/purge_log.md) | Used to delete logs recorded in MatrixOne database system tables.| + +## Other functions + +| Function name | effect | +| ------------------------------ | --------------------------------------- | +| [SAMPLE()](./Other/sample.md) | Primarily used for quick query narrowing| +| [SERIAL_EXTRACT()](./Other/serial_extract.md) | Used to extract individual elements of a sequence/tuple of values| +| [SLEEP()](./Other/sleep.md) | Pause (sleep) the current query for the specified number of seconds.| +| [UUID()](./Other/uuid.md) | Returns the generation of an internationally recognized unique identifier according to RFC 4122.| \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Language-Structure/keywords.md b/docs/MatrixOne/Reference/Language-Structure/keywords.md index 253a4b89b..ab49257db 100644 --- a/docs/MatrixOne/Reference/Language-Structure/keywords.md +++ b/docs/MatrixOne/Reference/Language-Structure/keywords.md @@ -1,100 +1,107 @@ # Keywords -This document introduces the keywords of MatrixOne. In MatrixOne, reserved keywords and non-reserved keywords are classified. When you use SQL statements, you can check reserved keywords and non-reserved keywords. +This chapter describes the keywords for MatrixOne. Reserved and non-reserved keywords are categorized in MatrixOne. When you use SQL statements, you can consult both reserved and non-reserved keywords. -**Keyword** is a word with a special meaning in SQL statements, such as `SELECT`, `UPDATE`, `DELETE`, and so on. +**Keywords** are words in SQL statements that have special meanings, such as `SELECT`, `UPDATE`, `DELETE`, and so on. -- **Reserved keyword**: A word in a keyword that requires special processing before it can be used as an identifier is called a reserved keyword. +- **Reserved keyword**: A word in a keyword that requires special processing to be used as an identifier. It is called a reserved keyword. - When using reserved keywords as identifiers, they must be wrapped with backticks. Otherwise, an error will be reported: + When using the reserved keyword as an identifier, you must wrap it in back quotes, otherwise an error will occur: -``` -\\The reserved keyword select is not wrapped in backticks, resulting in an error. -mysql> CREATE TABLE select (a INT); -ERROR 1064 (HY000): SQL parser error: You have an error in your SQL syntax; check the manual that corresponds to your MatrixOne server version for the right syntax to use. syntax error at line 1 column 19 near " select (a INT)"; + ``` + \\Failure to wrap the reserved keyword select in backquotes produces an error + mysql> CREATE TABLE select (a INT); + ERROR 1064 (HY000): SQL parser error: You have an error in your SQL syntax; check the manual that corresponds to your MatrixOne server version for the right syntax to use. syntax error at line 1 column 19 near " select (a INT)"; -\\Correctly wrap the reserved keyword select with backticks. -mysql> CREATE TABLE `select` (a INT); -Query OK, 0 rows affected (0.02 sec) -``` + \\Correctly wrap the reserved keyword select in backquotes + mysql> CREATE TABLE `select` (a INT); + Query OK, 0 rows affected (0.02 sec) + ``` -- **Non-reserved keywords**: keywords can be directly used as identifiers, called non-reserved keywords. +- **Non-reserved keywords**: Keywords can be used directly as identifiers and are called non-reserved keywords. - When using non-reserved keywords as identifiers, they can be used directly without wrapping them in backticks. + When using non-reserved keywords as identifiers, you can use them directly without wrapping them in back quotes. -``` -\\BEGIN is not a reserved keyword and can be wrapped without backticks. -mysql> CREATE TABLE `select` (BEGIN int); -Query OK, 0 rows affected (0.01 sec) -``` + ``` + \\ACCOUNT is a non-reserved keyword that can be wrapped without backquotes + mysql> CREATE TABLE `select` (ACCOUNT int); + Query OK, 0 rows affected (0.01 sec) + ``` !!! note - Unlike MySQL, in MatrixOne, if the qualifier **.** is used, an error will be reported if the reserved keywords are not wrapped in backticks. It is recommended to avoid using reserved keywords when creating tables and databases: + Unlike MySQL, in MatrixOne, if the qualifier \*.* is used, the reserved keyword also generates an error if it is not wrapped in back quotes. It is recommended to avoid using the reserved keyword when creating tables and databases: -``` -mysql> CREATE TABLE test.select (BEGIN int); -ERROR 1064 (HY000): SQL parser error: You have an error in your SQL syntax; check the manual that corresponds to your MatrixOne server version for the right syntax to use. syntax error at line 1 column 24 near "select (BEGIN int)"; -``` + ``` + mysql> CREATE TABLE test.select (ACCOUNT int); + ERROR 1064 (HY000): SQL parser error: You have an error in your SQL syntax; check the manual that corresponds to your MatrixOne server version for the right syntax to use. syntax error at line 1 column 24 near "select (ACCOUNT int)"; + ``` -## Reserved keyword +The following list shows reserved and non-reserved keywords in Matrixone, where those that are not keywords in MySQL are marked with **(M)**. + +## Reserve Keywords ### A - ADD -- ADMIN_NAME - ALL +- ALTER +- ANALYZE - AND - AS - ASC -- ASCII -- AUTO_INCREMENT ### B +- BEGIN - BETWEEN - BINARY +- BOTH - BY ### C +- CALL - CASE +- CHANGE - CHAR - CHARACTER - CHECK - COLLATE -- COLLATION -- CONVERT -- COALESCE -- COLUMN_NUMBER +- COLUMN +- CONFIG **(M)** - CONSTRAINT +- CONVERT - CREATE - CROSS -- CURRENT - CURRENT_DATE -- CURRENT_ROLE -- CURRENT_USER +- CURRENT_ROLE **(M)** - CURRENT_TIME - CURRENT_TIMESTAMP -- CIPHER +- CURRENT_USER ### D - DATABASE - DATABASES +- DAY_HOUR +- DAY_MICROSECOND +- DAY_MINUTE +- DAY_SECOND - DECLARE - DEFAULT - DELAYED - DELETE +- DENSE_RANK - DESC - DESCRIBE - DISTINCT -- DISTINCTROW - DIV - DROP ### E - ELSE +- ELSEIF - ENCLOSED - END - ESCAPE @@ -105,15 +112,13 @@ ERROR 1064 (HY000): SQL parser error: You have an error in your SQL syntax; chec ### F -- FALSE -- FAILED_LOGIN_ATTEMPTS -- FIRST -- FOLLOWING +- FALSE **(M)** - FOR - FORCE - FOREIGN - FROM - FULLTEXT +- FUNCTION ### G @@ -123,29 +128,32 @@ ERROR 1064 (HY000): SQL parser error: You have an error in your SQL syntax; chec ### H - HAVING -- HOUR - HIGH_PRIORITY +- HOUR_MICROSECOND +- HOUR_MINUTE +- HOUR_SECOND ### I -- IDENTIFIED - IF - IGNORE -- IMPORT +- ILIKE **(M)** - IN -- INFILE - INDEX +- INFILE - INNER +- INOUT - INSERT -- INTERVAL -- INTO - INT1 - INT2 - INT3 - INT4 - INT8 +- INTERSECT +- INTERVAL +- INTO - IS -- ISSUER +- ITERATE ### J @@ -154,11 +162,12 @@ ERROR 1064 (HY000): SQL parser error: You have an error in your SQL syntax; chec ### K - KEY +- KILL ### L -- LAST - LEADING +- LEAVE - LEFT - LIKE - LIMIT @@ -167,43 +176,38 @@ ERROR 1064 (HY000): SQL parser error: You have an error in your SQL syntax; chec - LOCALTIME - LOCALTIMESTAMP - LOCK -- LOCKS +- LOOP - LOW_PRIORITY ### M - MATCH - MAXVALUE -- MICROSECOND -- MINUTE +- MINUS **(M)** +- MINUTE_MICROSECOND +- MINUTE_SECOND - MOD -- MODUMP ### N - NATURAL -- NODE - NOT -- NONE - NULL -- NULLS ### O - ON -- OPTIONAL - OPTIONALLY - OR - ORDER +- OUT - OUTER - OUTFILE - OVER ### P -- PASSWORD_LOCK_TIME - PARTITION -- PRECEDING - PRIMARY ### Q @@ -212,60 +216,53 @@ ERROR 1064 (HY000): SQL parser error: You have an error in your SQL syntax; chec ### R -- RANDOM +- RANK +- RECURSIVE +- REFERENCES - REGEXP - RENAME +- REPEAT - REPLACE -- RETURNS -- REUSE -- RIGHT - REQUIRE -- REPEAT +- RIGHT +- RLIKE - ROW +- ROW_NUMBER - ROWS -- ROW_COUNT -- REFERENCES -- RECURSIVE -- REVERSE ### S -- SAN -- SECONDARY -- SSL -- SUBJECT - SCHEMA - SCHEMAS +- SECOND_MICROSECOND - SELECT -- SECOND - SEPARATOR - SET - SHOW -- SQL_SMALL_RESULT - SQL_BIG_RESULT -- STRAIGHT_JOIN +- SQL_BUFFER_RESULT +- SQL_SMALL_RESULT +- SSL - STARTING -- SUSPEND +- STRAIGHT_JOIN ### T - TABLE -- TABLE_NUMBER -- TABLE_SIZE -- TABLE_VALUES +- TEMPORARY - TERMINATED - THEN - TO - TRAILING -- TRUE -- TRUNCATE +- TRUE **(M)** ### U -- UNBOUNDED - UNION - UNIQUE +- UNTIL - UPDATE +- USAGE - USE - USING - UTC_DATE @@ -280,85 +277,140 @@ ERROR 1064 (HY000): SQL parser error: You have an error in your SQL syntax; chec - WHEN - WHERE -- WEEK +- WHILE - WITH -## Non reserved keyword +### X + +- XOR + +### Y + +- YEAR_MONTH + +### _ + +- _BINARY **(M)** + +## Non-Reserved Keywords ### A - ACCOUNT -- ACCOUNTS -- AGAINST -- AVG_ROW_LENGTH -- AUTO_RANDOM -- ATTRIBUTE +- ACCOUNTS **(M)** - ACTION +- ADMIN_NAME **(M)** +- AFTER +- AGAINST - ALGORITHM - ANY +- ASCII +- ATTRIBUTE +- AUTO_INCREMENT +- AUTO_RANDOM **(M)** +- AUTOEXTEND_SIZE +- AVG_ROW_LENGTH ### B -- BEGIN +- BACKEND **(M)** +- BACKUP - BIGINT +- BINDINGS **(M)** - BIT - BLOB - BOOL +- BOOLEAN +- BSI **(M)** +- BTREE ### C +- CANCEL **(M)** +- CASCADE +- CASCADED - CHAIN +- CHARSET - CHECKSUM -- CLUSTER -- COMPRESSION -- COMMENT_KEYWORD +- CIPHER +- CLIENT +- CLUSTER **(M)** +- COALESCE +- COLLATION +- COLUMN_FORMAT +- COLUMN_NUMBER **(M)** +- COLUMNS +- COMMENT - COMMIT - COMMITTED -- CHARSET -- COLUMNS +- COMPACT +- COMPRESSED +- COMPRESSION +- CONNECT **(M)** - CONNECTION +- CONNECTOR **(M)** +- CONNECTORS **(M)** - CONSISTENT -- COMPRESSED -- COMPACT -- COLUMN_FORMAT -- CASCADE +- COPY **(M)** +- CREDENTIALS **(M)** +- CURRENT +- CYCLE **(M)** ### D +- DAEMON **(M)** - DATA - DATE - DATETIME +- DAY +- DEALLOCATE - DECIMAL -- DYNAMIC +- DEFINER +- DELAY_KEY_WRITE +- DIRECTORY +- DISABLE +- DISCARD - DISK - DO - DOUBLE -- DIRECTORY +- DRAINER **(M)** - DUPLICATE -- DELAY_KEY_WRITE +- DYNAMIC ### E -- ENUM +- ENABLE - ENCRYPTION - ENFORCED - ENGINE +- ENGINE_ATTRIBUTE - ENGINES +- ENUM - ERRORS -- EXPANSION +- EVENT +- EVENTS +- EXCLUSIVE **(M)** +- EXECUTE +- EXPANSION **(M)** - EXPIRE - EXTENDED - EXTENSION -- EXTERNAL +- EXTERNAL **(M)** ### F +- FAILED_LOGIN_ATTEMPTS +- FIELDS +- FILE +- FILESYSTEM **(M)** +- FILL **(M)** +- FIRST +- FIXED +- FLOAT +- FOLLOWING +- FORCE_QUOTE **(M)** - FORMAT -- FLOAT_TYPE - FULL -- FIXED -- FIELDS -- FORCE_QUOTE ### G @@ -366,23 +418,38 @@ ERROR 1064 (HY000): SQL parser error: You have an error in your SQL syntax; chec - GEOMETRYCOLLECTION - GLOBAL - GRANT +- GRANTS ### H +- HANDLER - HASH -- HEADER +- HEADER **(M)** - HISTORY +- HOUR ### I +- IDENTIFIED +- IMPORT +- INCREMENT **(M)** +- INDEXES +- INLINE **(M)** +- INPLACE **(M)** +- INSERT_METHOD +- INSTANT **(M)** - INT - INTEGER -- INDEXES +- INVISIBLE +- INVOKER - ISOLATION +- ISSUER +- IVFFLAT **(M)** ### J - JSON +- JSONTYPE **(M)** ### K @@ -392,144 +459,244 @@ ERROR 1064 (HY000): SQL parser error: You have an error in your SQL syntax; chec ### L - LANGUAGE +- LAST - LESS - LEVEL -- LINESTRING - LINEAR +- LINESTRING - LIST +- LISTS **(M)** +- LOCAL +- LOCKS - LONGBLOB - LONGTEXT -- LOCAL -- LOW_CARDINALITY +- LOW_CARDINALITY **(M)** ### M +- MANAGE **(M)** +- MASTER - MAX_CONNECTIONS_PER_HOUR -- MAX_FILE_SIZE +- MAX_FILE_SIZE **(M)** - MAX_QUERIES_PER_HOUR - MAX_ROWS -- MAX_UPDATES_PER_HOUR +- MAX_UPDATE_PER_HOUR **(M)** - MAX_USER_CONNECTIONS +- MEDIAN **(M)** - MEDIUMBLOB - MEDIUMINT - MEDIUMTEXT - MEMORY +- MERGE +- MICROSECOND - MIN_ROWS +- MINUTE +- MINVALUE **(M)** - MODE +- MODIFY +- MODUMP **(M)** - MONTH - MULTILINESTRING - MULTIPOINT - MULTIPOLYGON +- MYSQL_COMPATIBILITY_MODE **(M)** ### N - NAMES - NCHAR -- NUMERIC - NEVER +- NEXT - NO +- NODE **(M)** +- NONE +- NULLS +- NUMERIC ### O - OFFSET - ONLY -- OPTIMIZE +- OP_TYPE **(M)** - OPEN +- OPTIMIZE - OPTION +- OPTIONAL +- OWNERSHIP **(M)** ### P - PACK_KEYS -- PASSWORD +- PARALLEL **(M)** +- PARALLELISM **(M)** +- PARSER - PARTIAL - PARTITIONS +- PASSWORD +- PASSWORD_LOCK_TIME +- PAUSE **(M)** +- PERCENT **(M)** +- PERSIST +- PLUGINS - POINT - POLYGON +- PRECEDING +- PREPARE +- PREV +- PRIVILEGES - PROCEDURE +- PROCESSLIST - PROFILES +- PROPERTIES **(M)** - PROXY +- PUBLICATION **(M)** +- PUBLICATIONS **(M)** +- PUMP **(M)** ### Q - QUARTER - QUERY +- QUERY_RESULT **(M)** ### R -- ROLE +- RANDOM - RANGE - READ - REAL -- REORGANIZE - REDUNDANT +- REFERENCE +- RELEASE +- RELOAD +- REORGANIZE - REPAIR - REPEATABLE -- RELEASE -- REVOKE - REPLICATION -- ROW_FORMAT -- ROLLBACK +- RESET - RESTRICT +- RESTRICTED **(M)** +- RESUME +- RETURNS +- REUSE +- REVERSE +- REVOKE +- ROLE +- ROLES **(M)** +- ROLLBACK +- ROUTINE +- ROW_COUNT +- ROW_FORMAT +- RTREE ### S -- SESSION +- S3OPTION **(M)** +- SAMPLE **(M)** +- SAN **(M)** +- SECOND +- SECONDARY +- SECONDARY_ENGINE_ATTRIBUTE +- SECURITY +- SEQUENCE **(M)** +- SEQUENCES **(M)** - SERIALIZABLE +- SERVERS **(M)** +- SESSION - SHARE +- SHARED **(M)** +- SHUTDOWN - SIGNED +- SIMPLE +- SLAVE +- SLIDING **(M)** - SMALLINT - SNAPSHOT - SOME +- SOURCE - SPATIAL +- SQL +- SQL_CACHE +- SQL_NO_CACHE +- SQL_TSI_DAY +- SQL_TSI_HOUR +- SQL_TSI_MINUTE +- SQL_TSI_MONTH +- SQL_TSI_QUARTER +- SQL_TSI_SECOND +- SQL_TSI_WEEK +- SQL_TSI_YEAR +- STAGE **(M)** +- STAGEOPTION **(M)** +- STAGES **(M)** - START -- STATUS -- STORAGE -- STREAM - STATS_AUTO_RECALC - STATS_PERSISTENT - STATS_SAMPLE_PAGES -- SUBPARTITIONS +- STATUS +- STORAGE +- STREAM +- SUBJECT - SUBPARTITION -- SIMPLE -- S3OPTION +- SUBPARTITIONS +- SUBSCRIPTIONS **(M)** +- SUPER +- SUSPEND ### T +- TABLE_NUMBER **(M)** +- TABLE_SIZE **(M)** +- TABLE_VALUES **(M)** - TABLES +- TABLESPACE +- TASK **(M)** +- TEMPTABLE - TEXT - THAN -- TINYBLOB - TIME - TIMESTAMP +- TINYBLOB - TINYINT - TINYTEXT - TRANSACTION - TRIGGER - TRIGGERS +- TRUNCATE - TYPE ### U +- UNBOUNDED - UNCOMMITTED +- UNDEFINED - UNKNOWN -- UNSIGNED -- UNUSED - UNLOCK +- UNSIGNED - URL - USER +- UUID **(M)** ### V +- VALIDATION +- VALUE - VARBINARY - VARCHAR - VARIABLES +- VECF32 **(M)** +- VECF64 **(M)** +- VERBOSE **(M)** - VIEW +- VISIBLE ### W -- WRITE - WARNINGS +- WEEK +- WITHOUT - WORK +- WRITE ### X @@ -542,3 +709,4 @@ ERROR 1064 (HY000): SQL parser error: You have an error in your SQL syntax; chec ### Z - ZEROFILL +- ZONEMAP **(M)** diff --git a/docs/MatrixOne/Reference/Limitations/1.1-mo-partition-support.md b/docs/MatrixOne/Reference/Limitations/1.1-mo-partition-support.md deleted file mode 100644 index d2e8335a3..000000000 --- a/docs/MatrixOne/Reference/Limitations/1.1-mo-partition-support.md +++ /dev/null @@ -1,220 +0,0 @@ -# MatrixOne DDL statement partitioning supported - -## 1. The partition type supported by MatrixOne - -MatrixOne DDL statements support six partition types, which are the same as the MySQL official website: - -- KEY Partitioning -- HASH Partitioning -- RANGE Partitioning -- RANGE COLUMNS partitioning -- LIST Partitioning -- LIST COLUMNS partitioning - -Subpartitioning syntax is currently supported, but plan builds are not. - -## 2. About Partition Keys - -### Partition Keys, Primary Keys and Unique Keys - -The relationship rules of Partition Keys, Primary Keys, and Unique Keys can be summarized as follows: - -- All columns used in a partitioning expression for a partitioned table must be part of every unique key that the table may have. - - !!! note - The Unique KEY includes PrimaryKey and unique key. - -- That is, each unique key on a table must use each column of the table's partitioning expression. A unique key also includes the primary key of a table because, by definition, a table's primary key is also a unique one. - -#### Examples - -For example, because the unique key on the table does not use every column in the table, each statement that creates the table below is invalid: - -```sql -> CREATE TABLE t1 ( - col1 INT NOT NULL, - col2 DATE NOT NULL, - col3 INT NOT NULL, - col4 INT NOT NULL, - UNIQUE KEY (col1, col2) - ) - PARTITION BY HASH(col3) - PARTITIONS 4; -ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function - -> CREATE TABLE t2 ( - col1 INT NOT NULL, - col2 DATE NOT NULL, - col3 INT NOT NULL, - col4 INT NOT NULL, - UNIQUE KEY (col1), - UNIQUE KEY (col3) - ) - PARTITION BY HASH(col1 + col3) - PARTITIONS 4; - -ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function -``` - -### About the partition KEY is NULL - -1. KEY only accepts lists with Null or more column names. In cases where the table has a primary key, any column used as a partitioning key must contain some or all of the table's primary keys. - - If no column name is specified as the partitioning key, you can ues the table's primary key. For example, the following `CREATE TABLE` statement is valid in MySQL. - -2. If there is no primary KEY but a UNIQUE KEY, the UNIQUE KEY is used as the partitioning key. - - For example, in the following table construction sentence, the KEY partition key is NULL, no primary key is defined, but the unique key is used as the partitioning key when the partition expression is constructed: - -```sql -CREATE TABLE t1 ( - col1 INT NOT NULL, - col2 DATE NOT NULL, - col3 INT NOT NULL, - col4 INT NOT NULL, - UNIQUE KEY (col1, col2) -) -PARTITION BY KEY() -PARTITIONS 4; -``` - -!!! note - Other partition rules are the same as MySQL. - -## 3. About MatrixOne Partition Expressions - -​When a DDL statement constructs a partitioned table, a partition expression is generated for each partition definition. The partition expression can calculate the partition to which the data belongs. - -In the plan build phase, the partition information data structure in the DDL statement is plan.PartitionInfo: - -```sql -type PartitionInfo struct { - Type PartitionType - Expr *Expr - PartitionExpression *Expr - Columns []*Expr - PartitionColumns []string - PartitionNum uint64 - Partitions []*PartitionItem - Algorithm int64 - IsSubPartition bool - PartitionMsg string -} -``` - -`PartitionExpression` is the partition expression. Partition expressions are MatrixOne's way of converting a partition clause into an expression. Each partition expression is constructed as follows: - -### KEY Partitioning - -KEY partitioning will construct a partition expression based on the partition key and the number of partitions. The result of the partition expression is an integer greater than or equal to 0, representing the partition sequence number, which increases sequentially from zero. - -SQL example is as below: - -```sql -CREATE TABLE t1 ( - col1 INT NOT NULL, - col2 DATE NOT NULL, - col3 INT NOT NULL, - col4 INT NOT NULL, - PRIMARY KEY (col1, col2) -) -PARTITION BY KEY(col1) -PARTITIONS 4; -``` - -### HASH Partitioning - -HASH partitioning will construct a partition expression based on the partition function and the number of partitions. The result of the partition expression is an integer greater than or equal to 0, representing the partition sequence number, which increases sequentially from zero. - -SQL example is as below: - -```sql -CREATE TABLE t1 ( - col1 INT, - col2 CHAR(5), - col3 DATE -) -PARTITION BY LINEAR HASH( YEAR(col3)) -PARTITIONS 6; -``` - -### RANGE Partitioning - -RANGE partition will construct a partition expression according to the partition function, the number of partitions, and the definition of the partitioning item. The result of the partition expression is an integer representing the partition number. The standard partition number starts from zero and increases sequentially. The calculation result is -1, indicating that the current data does not belong to any defined partition. According to MySQL syntax, the executor needs to report an error: `Table has no partition for value xxx`. - -SQL example is as below: - -```sql -CREATE TABLE employees ( - id INT NOT NULL, - fname VARCHAR(30), - lname VARCHAR(30), - hired DATE NOT NULL DEFAULT '1970-01-01', - separated DATE NOT NULL DEFAULT '9999-12-31', - job_code INT NOT NULL, - store_id INT NOT NULL -) -PARTITION BY RANGE (store_id) ( - PARTITION p0 VALUES LESS THAN (6), - PARTITION p1 VALUES LESS THAN (11), - PARTITION p2 VALUES LESS THAN (16), - PARTITION p3 VALUES LESS THAN MAXVALUE -); -``` - -### RANGE COLUMNS partitioning - -RANGE partition will construct a partition expression according to the key columns, the number of partitions, and the definition of the partitioning item. The result of the partition expression is an integer representing the partition number. The standard partition number starts from zero and increases sequentially. The calculation result is -1, indicating that the current data does not belong to any defined partition. According to MySQL syntax, the executor needs to report an error: `Table has no partition for value xxx`. - -SQL example is as below: - -```sql -CREATE TABLE rc ( - a INT NOT NULL, - b INT NOT NULL -) -PARTITION BY RANGE COLUMNS(a,b) ( - PARTITION p0 VALUES LESS THAN (10,5) COMMENT = 'Data for LESS THAN (10,5)', - PARTITION p1 VALUES LESS THAN (20,10) COMMENT = 'Data for LESS THAN (20,10)', - PARTITION p2 VALUES LESS THAN (50,MAXVALUE) COMMENT = 'Data for LESS THAN (50,MAXVALUE)', - PARTITION p3 VALUES LESS THAN (65,MAXVALUE) COMMENT = 'Data for LESS THAN (65,MAXVALUE)', - PARTITION p4 VALUES LESS THAN (MAXVALUE,MAXVALUE) COMMENT = 'Data for LESS THAN (MAXVALUE,MAXVALUE)' -); -``` - -### LIST Partitioning - -LIST partition will construct a partition expression according to the partition key, the number of partitions, and the definition of the partitioning item. The result of the partition expression is an integer representing the partition number. The standard partition number starts from zero and increases sequentially. The calculation result is -1, indicating that the current data does not belong to any defined partition. According to MySQL syntax, the executor needs to report an error: `Table has no partition for value xxx`. - -SQL example is as below: - -```sql -CREATE TABLE client_firms ( - id INT, - name VARCHAR(35) -) -PARTITION BY LIST (id) ( - PARTITION r0 VALUES IN (1, 5, 9, 13, 17, 21), - PARTITION r1 VALUES IN (2, 6, 10, 14, 18, 22), - PARTITION r2 VALUES IN (3, 7, 11, 15, 19, 23), - PARTITION r3 VALUES IN (4, 8, 12, 16, 20, 24) -); -``` - -### LIST COLUMNS partitioning - -LIST partition will construct a partition expression according to the list of partitioning keys, the number of partitions, and the definition of the partitioning item. The result of the partition expression is an integer representing the partition number. The standard partition number starts from zero and increases sequentially. The calculation result is -1, indicating that the current data does not belong to any defined partition. According to MySQL syntax, the executor needs to report an error: `Table has no partition for value xxx`. - -SQL example is as below: - -```sql -CREATE TABLE lc ( - a INT NULL, - b INT NULL -) -PARTITION BY LIST COLUMNS(a,b) ( - PARTITION p0 VALUES IN( (0,0), (NULL,NULL) ), - PARTITION p1 VALUES IN( (0,1), (0,2) ), - PARTITION p2 VALUES IN( (1,0), (2,0) ) -); -``` diff --git a/docs/MatrixOne/Reference/Operators/operators/bit-functions-and-operators/bit-functions-and-operators-overview.md b/docs/MatrixOne/Reference/Operators/operators/bit-functions-and-operators/bit-functions-and-operators-overview.md index 1459d7031..db7f812f3 100644 --- a/docs/MatrixOne/Reference/Operators/operators/bit-functions-and-operators/bit-functions-and-operators-overview.md +++ b/docs/MatrixOne/Reference/Operators/operators/bit-functions-and-operators/bit-functions-and-operators-overview.md @@ -6,7 +6,7 @@ | [>>](right-shift.md) | Right shift | | [<<](left-shift.md) |Left shift| | [^](bitwise-xor.md) |Bitwise XOR| -| [|](bitwise-or.md) |Bitwise OR| +| [\|](bitwise-or.md) |Bitwise OR| | [~](bitwise-inversion.md) |Bitwise inversion| Bit functions and operators required BIGINT (64-bit integer) arguments and returned BIGINT values, so they had a maximum range of 64 bits. Non-BIGINT arguments were converted to BIGINT prior to performing the operation and truncation could occur. diff --git a/docs/MatrixOne/Reference/Operators/operators/cast-functions-and-operators/cast-functions-and-operators-overview.md b/docs/MatrixOne/Reference/Operators/operators/cast-functions-and-operators/cast-functions-and-operators-overview.md index 9c4f38035..e0abc4cc4 100644 --- a/docs/MatrixOne/Reference/Operators/operators/cast-functions-and-operators/cast-functions-and-operators-overview.md +++ b/docs/MatrixOne/Reference/Operators/operators/cast-functions-and-operators/cast-functions-and-operators-overview.md @@ -5,3 +5,5 @@ | [BINARY()](binary.md) | convert a value to a binary string | | [CAST()](cast.md) | Cast a value as a certain type | | [CONVERT()](convert.md) | Cast a value as a certain type | +| [SERIAL()](serial.md) | Serialize concatenation strings without handling NULL values | +| [SERIAL_FULL()](serial_full.md) | Serialize concatenation strings and handle NULL values | diff --git a/docs/MatrixOne/Reference/Operators/operators/cast-functions-and-operators/serial.md b/docs/MatrixOne/Reference/Operators/operators/cast-functions-and-operators/serial.md new file mode 100644 index 000000000..a3db71c6c --- /dev/null +++ b/docs/MatrixOne/Reference/Operators/operators/cast-functions-and-operators/serial.md @@ -0,0 +1,45 @@ +# SERIAL() + +## Function Description + +The `SERIAL()` function is used to serialize a concatenation string, combining single or multiple columns/values into a binary format with a return type of `VARCHAR`. It is similar to [`CONCAT()`](../../../Functions-and-Operators/String/concat.md), but type information for values cannot be captured in `CONCAT()`. Typically used with the [`SERIAL_EXTRACT()`](../../../Functions-and-Operators/Other/serial_extract.md) function. + +Returns NULL if any of the parameters in `SERIAL()` is NULL. To handle NULL values, use [`SERIAL_FULL()`](serial_full.md). + +## Function syntax + +``` +> SERIAL(para) +``` + +## Parameter interpretation + +| Parameters | Description | +| ---- | ---- | +| para | Column/Value to Serialize| + +## Examples + +```sql +create table t1(a varchar(3), b int); +insert into t1 values("ABC",1); +insert into t1 values("DEF",NULL); + +mysql> select serial(a,b) from t1;--The query returns the serialized result of the combination of columns a and b. The output is NULL when there is a NULL value. ++--------------+ +| serial(a, b) | ++--------------+ +| FABC : | +| NULL | ++--------------+ +2 rows in set (0.00 sec) + +mysql> select serial(a,'hello') from t1;--The query returns the result of serializing the combination of column a and the value hello. ++------------------+ +| serial(a, hello) | ++------------------+ +| FABC Fhello | +| FDEF Fhello | ++------------------+ +2 rows in set (0.00 sec) +``` diff --git a/docs/MatrixOne/Reference/Operators/operators/cast-functions-and-operators/serial_full.md b/docs/MatrixOne/Reference/Operators/operators/cast-functions-and-operators/serial_full.md new file mode 100644 index 000000000..b8d14b4ab --- /dev/null +++ b/docs/MatrixOne/Reference/Operators/operators/cast-functions-and-operators/serial_full.md @@ -0,0 +1,43 @@ +# SERIAL_FULL() + +## Function Description + +`SERIAL_FULL()` is used to serialize concatenation strings and convert single or multiple column/value combinations into binary format with a return type of `VARCHAR`, generally used with the [`SERIAL_EXTRACT()`](../../../Functions-and-Operators/Other/serial_extract.md) function. `SERIAL_FULL()` is similar to [`SERIAL()`](serial.md), but `SERIAL_FULL()` retains a NULL value. + +## Function syntax + +``` +> SERIAL_FULL(para) +``` + +## Parameter interpretation + +| Parameters | Description | +| ---- | ---- | +| para | Column/Value to Serialize| + +## Examples + +```sql +create table t1(a varchar(3), b int); +insert into t1 values("ABC",1); +insert into t1 values("DEF",NULL); + +mysql> select serial_full(a,b) from t1;--The query returns the result serialized for the combination of columns a and b. NULL values are preserved when available. ++-------------------+ +| serial_full(a, b) | ++-------------------+ +| FABC : | +| FDEF | ++-------------------+ +2 rows in set (0.00 sec) + +mysql> select serial_full(1.2,'world') ;--The query returns the result serialized as a combination of the value 1.2 and the value hello. ++-------------------------+ +| serial_full(1.2, world) | ++-------------------------+ +| D? + Fworld | ++-------------------------+ +1 row in set (0.01 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Control-Language/1.0-alter-account.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Control-Language/1.0-alter-account.md deleted file mode 100644 index 6967c2cd8..000000000 --- a/docs/MatrixOne/Reference/SQL-Reference/Data-Control-Language/1.0-alter-account.md +++ /dev/null @@ -1,113 +0,0 @@ -# **ALTER ACCOUNT** - -## **Description** - -Modify account information. - -!!! note - 1. The cluster administrator (i.e., the root user) can modify the password of the account it creates. - 2. Accounts themselves can modify their own passwords. - 2. Only the cluster administrator (i.e., the root user) can perform **SUSPEND** ​​and **RECOVER (OPEN)** account operations. - -## **Syntax** - -``` -> ALTER ACCOUNT [IF EXISTS] -account auth_option [COMMENT 'comment_string'] - -auth_option: { -ADMIN_NAME [=] 'admin_name' -IDENTIFIED BY 'auth_string' -} - -status_option: { -OPEN -| SUSPEND -| RESTRICTED -} -``` - -### Explanations - -### auth_option - -Modifies the account's default account name and authorization mode, `auth_string` specifies the password explicitly. - -### status_option - -Set the state of the account. They are stored as VARCHAR in the mo_account table under the system database mo_catalog. - -- SUSPEND: Suspend the account's service; the account can no longer access MatrixOne after the suspension. - * When the `SUSPEND` state is enabled for the account, access behavior will be suspended even if the account is accessing it. - * To unsuspend the user's service, switch the status to `OPEN` to unsuspend the service; that is, run `ALTER ACCOUNT account_name OPEN` to unsuspend the service. - -- OPEN: Resume a suspended account, after which the account will usually access MatrixOne. -- RESTRICTED: Allows the user to access and perform limited actions. After the `RESTRICTED` state is enabled for this tenant, this tenant can only perform `SHOW`/`DELETE`/`SELECT`/`USE` operations on the database, and other operations cannot be used. - * When the `RESTRICTED` state is enabled for the tenant, access behavior will be restricted even if the account is accessing it. - * To lift the restrictions on the user, switch the status to `OPEN` to remove the restrictions. - -### comment - -Account notes are stored as VARCHAR in the table *mo_account* in the system database *mo_catalog*. - -`COMMENT` can be arbitrary quoted text, and the new `COMMENT` replaces any existing user comments. As follows: - -```sql -mysql> desc mo_catalog.mo_account; -+----------------+--------------+------+------+---------+-------+---------+ -| Field | Type | Null | Key | Default | Extra | Comment | -+----------------+--------------+------+------+---------+-------+---------+ -| account_id | INT | YES | | NULL | | | -| account_name | VARCHAR(300) | YES | | NULL | | | -| status | VARCHAR(300) | YES | | NULL | | | -| created_time | TIMESTAMP | YES | | NULL | | | -| comments | VARCHAR(256) | YES | | NULL | | | -| suspended_time | TIMESTAMP | YES | | null | | | -+----------------+--------------+------+------+---------+-------+---------+ -6 rows in set (0.06 sec) -``` - -## **Examples** - -- Example 1: Modify the information on the account - -```sql --- Create a account named "root1" with password "111" -mysql> create account acc1 admin_name "root1" identified by "111"; -Query OK, 0 rows affected (0.42 sec) --- Change the initial password "111" to "Abcd_1234@1234" -mysql> alter account acc1 admin_name "root1" identified by "Abcd_1234@1234"; -Query OK, 0 rows affected (0.01 sec) --- Modify the comment for account "root1" -mysql> alter account acc1 comment "new account"; -Query OK, 0 rows affected (0.02 sec) --- Check to verify that the "new account" comment has been added to the account "root1" -mysql> show accounts; -+--------------+------------+---------------------+--------+----------------+----------+-------------+-----------+-------+----------------+ -| account_name | admin_name | created | status | suspended_time | db_count | table_count | row_count | size | comment | -+--------------+------------+---------------------+--------+----------------+----------+-------------+-----------+-------+----------------+ -| acc1 | root1 | 2023-02-15 06:26:51 | open | NULL | 5 | 34 | 787 | 0.036 | new account | -| sys | root | 2023-02-14 06:58:15 | open | NULL | 8 | 57 | 3767 | 0.599 | system account | -+--------------+------------+---------------------+--------+----------------+----------+-------------+-----------+-------+----------------+ -3 rows in set (0.19 sec) -``` - -- Example 2: Modify the status of the account - -```sql --- Create a account named "root1" with password "111" -mysql> create account accx admin_name "root1" identified by "111"; -Query OK, 0 rows affected (0.27 sec) --- Modify the account status to "suspend", that is, suspend user access to MatrixOne. -mysql> alter account accx suspend; -Query OK, 0 rows affected (0.01 sec) --- Check if the modification status is successful. -mysql> show accounts; -+--------------+------------+---------------------+---------+---------------------+----------+-------------+-----------+-------+----------------+ -| account_name | admin_name | created | status | suspended_time | db_count | table_count | row_count | size | comment | -+--------------+------------+---------------------+---------+---------------------+----------+-------------+-----------+-------+----------------+ -| accx | root1 | 2023-02-15 06:26:51 | suspend | 2023-02-15 06:27:15 | 5 | 34 | 787 | 0.036 | new accout | -| sys | root | 2023-02-14 06:58:15 | open | NULL | 8 | 57 | 3767 | 0.599 | system account | -+--------------+------------+---------------------+---------+---------------------+----------+-------------+-----------+-------+----------------+ -2 rows in set (0.15 sec) -``` diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-create-index.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-create-index.md deleted file mode 100644 index 84cf22ab2..000000000 --- a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-create-index.md +++ /dev/null @@ -1,146 +0,0 @@ -# **CREATE INDEX** - -## **Description** - -Create indexes on tables to query data more quickly and efficiently. - -You can't see the index; the index can only be used to speed up the search/query. - -Updating a table with an index takes longer than updating a table without an index because the index also needs to be updated. Therefore, the ideal approach is to create indexes only on frequently searched columns (and tables). - -There are two common types of indexes, namely: - -- Primary Key: The primary key index, that is, the index identified on the primary key column. -- Secondary Index: the secondary index, that is, the index identified on the non-primary key. - -## **Syntax** - -``` -> CREATE [UNIQUE] INDEX index_name -ON tbl_name (key_part,...) -COMMENT 'string' -``` - -### Explanations - -#### CREATE UNIQUE INDEX - -Creates a unique index on a table. Duplicate values are not allowed. - -#### CREATE INDEX - -Creates a secondary index on the table columns. Duplicate and NULL values are allowed in the secondary index columns. - -## **Examples** - -- Example 1: - -```sql -drop table if exists t1; -create table t1(id int PRIMARY KEY,name VARCHAR(255),age int); -insert into t1 values(1,"Abby", 24); -insert into t1 values(2,"Bob", 25); -insert into t1 values(3,"Carol", 23); -insert into t1 values(4,"Dora", 29); -create unique index idx on t1(name); -mysql> select * from t1; -+------+-------+------+ -| id | name | age | -+------+-------+------+ -| 1 | Abby | 24 | -| 2 | Bob | 25 | -| 3 | Carol | 23 | -| 4 | Dora | 29 | -+------+-------+------+ -4 rows in set (0.00 sec) - -mysql> show create table t1; -+-------+--------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+--------------------------------------------------------------------------------------------------------------------------------------------------+ -| t1 | CREATE TABLE `t1` ( -`id` INT NOT NULL, -`name` VARCHAR(255) DEFAULT NULL, -`age` INT DEFAULT NULL, -PRIMARY KEY (`id`), -UNIQUE KEY `idx` (`name`) -) | -+-------+--------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.01 sec) - -create table t2 ( -col1 bigint primary key, -col2 varchar(25), -col3 float, -col4 varchar(50) -); -create unique index idx on t2(col2) comment 'create varchar index'; -insert into t2 values(1,"Abby", 24,'zbcvdf'); -insert into t2 values(2,"Bob", 25,'zbcvdf'); -insert into t2 values(3,"Carol", 23,'zbcvdf'); -insert into t2 values(4,"Dora", 29,'zbcvdf'); -mysql> select * from t2; -+------+-------+------+--------+ -| col1 | col2 | col3 | col4 | -+------+-------+------+--------+ -| 1 | Abby | 24 | zbcvdf | -| 2 | Bob | 25 | zbcvdf | -| 3 | Carol | 23 | zbcvdf | -| 4 | Dora | 29 | zbcvdf | -+------+-------+------+--------+ -4 rows in set (0.00 sec) -mysql> show create table t2; -+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| t2 | CREATE TABLE `t2` ( -`col1` BIGINT NOT NULL, -`col2` VARCHAR(25) DEFAULT NULL, -`col3` FLOAT DEFAULT NULL, -`col4` VARCHAR(50) DEFAULT NULL, -PRIMARY KEY (`col1`), -UNIQUE KEY `idx` (`col2`) COMMENT `create varchar index` -) | -+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.01 sec) -``` - -- Example 2: Create secondary index and query the result - -```sql -CREATE TABLE Employees ( - EmployeeID INT PRIMARY KEY, - FirstName VARCHAR(50), - LastName VARCHAR(50), - Department VARCHAR(50), - Salary INT -); - -INSERT INTO Employees (EmployeeID, FirstName, LastName, Department, Salary) -VALUES (1, 'John', 'Doe', 'HR', 50000); - -INSERT INTO Employees (EmployeeID, FirstName, LastName, Department, Salary) -VALUES (2, 'Jane', 'Smith', 'IT', 60000); - -INSERT INTO Employees (EmployeeID, FirstName, LastName, Department, Salary) -VALUES (3, 'Mark', 'Johnson', 'IT', 55000); - -INSERT INTO Employees (EmployeeID, FirstName, LastName, Department, Salary) -VALUES (4, 'Mary', 'Brown', 'Sales', 48000); - -mysql> CREATE INDEX DepartmentIndex ON Employees (Department); -Query OK, 0 rows affected (0.01 sec) - -mysql> SELECT * FROM Employees WHERE Department = 'IT'; -+------------+-----------+----------+------------+--------+ -| employeeid | firstname | lastname | department | salary | -+------------+-----------+----------+------------+--------+ -| 2 | Jane | Smith | IT | 60000 | -| 3 | Mark | Johnson | IT | 55000 | -+------------+-----------+----------+------------+--------+ -2 rows in set (0.00 sec) -``` - -## Constraints - -The secondary index does not provide performance acceleration currently. diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-create-replace-view.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-create-replace-view.md deleted file mode 100644 index dafde32c0..000000000 --- a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-create-replace-view.md +++ /dev/null @@ -1,73 +0,0 @@ -# **CREATE OR REPLACE VIEW** - -## **Description** - -`CREATE OR REPLACE VIEW` is used to create a new view or, when the view already exists, replace the existing view. This means updating the definition of the view when it already exists without the need to delete the existing view. - -## **Syntax** - -``` -> CREATE OR REPLACE VIEW view_name AS -SELECT column1, column2, ... -FROM table_name -WHERE condition; -``` - -### Explanations - -- `view_name`: The name of the view to be created or replaced. You need to specify a unique name for the view. - -- `AS`: Indicates that the following query statement is the query definition of the view. - -- `SELECT column1, column2, ...`: After the AS keyword, you need to specify the query definition of the view. This SELECT statement can select specific columns from a table using computed fields, expressions, and more. The view will use the result of this query as its data. - -- `FROM table_name`: The `FROM` clause is used to specify the table's name to be queried. Select one or more tables and perform related operations in the view. - -- `WHERE condition`: An optional `WHERE` clause used to filter the query results. - -## **Examples** - -```sql --- Create a table t1 with two columns, a and b -create table t1 (a int, b int); - --- Insert three rows of data into table t1 -insert into t1 values (1, 11), (2, 22), (3, 33); - --- Create a view v1 that includes all data from table t1 -create view v1 as select * from t1; - --- Query all data from view v1 -mysql> select * from v1; -+------+------+ -| a | b | -+------+------+ -| 1 | 11 | -| 2 | 22 | -| 3 | 33 | -+------+------+ -3 rows in set (0.01 sec) - --- Query data from view v1 where column a is greater than 1 -mysql> select * from v1 where a > 1; -+------+------+ -| a | b | -+------+------+ -| 2 | 22 | -| 3 | 33 | -+------+------+ -2 rows in set (0.00 sec) - --- Replace view v1 with a new view that only includes data from table t1 where column a is greater than 1 -create or replace view v1 as select * from t1 where a > 1; - --- Query view v1 again, now containing data that meets the new condition -mysql> select * from v1; -+------+------+ -| a | b | -+------+------+ -| 2 | 22 | -| 3 | 33 | -+------+------+ -2 rows in set (0.00 sec) -``` diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-create-stage.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-create-stage.md deleted file mode 100644 index 99af1f6e5..000000000 --- a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-create-stage.md +++ /dev/null @@ -1,86 +0,0 @@ -# **CREATE STAGE** - -## **Description** - -The `CREATE STAGE` command is used in the MatrixOne database to create a named internal or external data stage for **data export**. By creating a data stage and exporting data to it, you can download data files to your local system or store them in cloud storage services. - -- **Internal Stage**: Internal stages store data files within the MatrixOne system. Internal stages can be either permanent or temporary. - -- **External Stage**: External stages reference data files stored outside the MatrixOne environment. Currently, the following cloud storage services are supported: - - - Amazon S3 buckets - - Aliyun buckets - -The storage location can be private/protected or public—however, data held in archival cloud storage classes that require restoration before retrieval cannot be accessed. - -An internal or external stage can include a directory table. Directory tables maintain a catalog of staged file directories in cloud storage. - -- Configure a specified path to control the write permissions for user `SELECT INTO` operations. After creation, users can only write to the set `STAGE` path. - -- If no `STAGE` is created or all `STAGE` instances are `DISABLED`, users can write to any path permitted by the operating system or object storage permissions. - -- If not using a `STAGE`, users must forcefully include `credential` information during `SELECT INTO` operations. - -!!! note - 1. Cluster administrators (i.e., root users) and tenant administrators can create data stages. - 2. Once created, data tables can only be imported to the paths specified in the STAGE. - -## **Syntax** - -``` -> CREATE STAGE [ IF NOT EXISTS ] { stage_name } - { StageParams } - [ directoryTableParams ] - [ COMMENT = '' ] - -StageParams (for Amazon S3) : -URL = "endpoint"='' CREDENTIALS = {"access_key_id"='', "secret_access_key"='', "filepath"='', "region"=''} - -StageParams (for Aliyun OSS) : -URL = "endpoint"='' CREDENTIALS = {"access_key_id"='', "secret_access_key"=''} - -StageParams (for File System) : -URL= 'filepath' - -directoryTableParams : -ENABLE = { TRUE | FALSE } -``` - -### Explanations - -- `IF NOT EXISTS`: An optional parameter used to check whether a stage with the same name already exists when creating a stage, avoiding duplicate creations. - -- `stage_name`: The name of the stage to be created. - -- `StageParams`: This parameter group is used to specify the stage's configuration parameters. - - - `endpoint`: The connection URL for the stage, indicating the location of the object storage service. This URL's content may vary for object storage services like Amazon S3, Aliyun OSS, or a file system. For example s3.us-west-2.amazonaws.com - - - `CREDENTIALS`: This JSON object contains the credentials required to connect to the object storage service. - - + `access_key_id`: Access key ID used for authentication. - + `secret_access_key`: Key associated with the access key ID for authentication. - + `"filepath"=''`: Specifies the file path or directory in the S3 storage. - + `"region"=''`: Specifies the AWS region where the Amazon S3 storage is located. - -- `directoryTableParams`: This parameter group is used to specify the configuration of a directory table associated with the stage. - - - `ENABLE`: Indicates whether the directory table is enabled, with values `TRUE` or `FALSE` values. - -## **Examples** - -```sql -CREATE TABLE `user` (`id` int(11) ,`user_name` varchar(255) ,`sex` varchar(255)); -INSERT INTO user(id,user_name,sex) values('1', 'weder', 'man'), ('2', 'tom', 'man'), ('3', 'wederTom', 'man'); - --- Create internal data stage -mysql> CREATE STAGE stage1 URL='/tmp' ENABLE = TRUE; - --- Export data from the table to data stage -mysql> SELECT * FROM user INTO OUTFILE 'stage1:/user.csv'; --- You can see your exported table in your local directory - --- After setting the data stage, the data table can only be exported to the specified path, and an error will be reported when exporting to other paths -mysql> SELECT * FROM user INTO OUTFILE '~/tmp/csv2/user.txt'; -ERROR 20101 (HY000): internal error: stage exists, please try to check and use a stage instead -``` diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-create-table.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-create-table.md deleted file mode 100644 index 11194df61..000000000 --- a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-create-table.md +++ /dev/null @@ -1,835 +0,0 @@ -# **CREATE TABLE** - -## **Description** - -Create a new table. - -## **Syntax** - -``` -> CREATE [TEMPORARY] TABLE [IF NOT EXISTS] tbl_name - (create_definition,...) - [table_options] - [partition_options] - -create_definition: { - col_name column_definition - | [CONSTRAINT [symbol]] PRIMARY KEY - [index_type] (key_part,...) - [index_option] ... - | [CONSTRAINT [symbol]] FOREIGN KEY - [index_name] (col_name,...) - reference_definition -} - -column_definition: { - data_type [NOT NULL | NULL] [DEFAULT {literal | (expr)} ] - [AUTO_INCREMENT] [UNIQUE [KEY]] [[PRIMARY] KEY] - [COMMENT 'string'] - [reference_definition] - | data_type - [[PRIMARY] KEY] - [COMMENT 'string'] - [reference_definition] -} - -reference_definition: - REFERENCES tbl_name (key_part,...) - [ON DELETE reference_option] - [ON UPDATE reference_option] - -reference_option: - RESTRICT | CASCADE | SET NULL | NO ACTION - -table_options: - table_option [[,] table_option] ... - -table_option: { - | AUTO_INCREMENT [=] value - | COMMENT [=] 'string' - | START TRANSACTION -} - -partition_options: - PARTITION BY - { [LINEAR] HASH(expr) - | [LINEAR] KEY [ALGORITHM={1 | 2}] (column_list)} - | RANGE{(expr) | COLUMNS(column_list)} - | LIST{(expr) | COLUMNS(column_list)} } - [PARTITIONS num] - [(partition_definition [, partition_definition] ...)] - -partition_definition: - PARTITION partition_name - [VALUES - {LESS THAN {(expr | value_list) | MAXVALUE} - | - IN (value_list)}] - [COMMENT [=] 'string' ] -``` - -### Explanations - -Various parameters and options that can be used when creating a table, including table creation, column definition, constraints, options, and partitioning, are explained below: - -- `CREATE [TEMPORARY] TABLE [IF NOT EXISTS] tbl_name`: This is the primary table syntax. The `TEMPORARY` keyword indicates creating a temporary table, `IF NOT EXISTS` ensures creation only if the table doesn't exist, and `tbl_name` is the name of the table to be created. - -- `(create_definition,...)`: This is the section for column definitions, used to define the table's columns and their attributes. - -- `[table_options]`: This is for table-level options where you can set parameters like storage engine, character set, etc. - -- `[partition_options]`: This is used for partitioned tables and defining partitioning methods and keys. - -The `create_definition` section is used to define attributes for each column, and it can contain the following: - -- `col_name column_definition`: This defines the column name and its attributes, including data type, whether it can be null, default value, etc. - -- `[CONSTRAINT [symbol]] PRIMARY KEY`: This defines a primary key constraint and can set a constraint name and the columns that make up the primary key. - -- `[CONSTRAINT [symbol]] FOREIGN KEY`: This defines a foreign key constraint and can set a constraint name, columns for the foreign key, and the referenced table. - -The `column_definition` section is used to define attributes for specific columns and can include the following: - -- `data_type [NOT NULL | NULL] [DEFAULT {literal | (expr)} ]`: This defines the data type of the column, whether it can be null, and its default value. - -- `[AUTO_INCREMENT] [UNIQUE [KEY]] [[PRIMARY] KEY]`: This sets options like auto-increment, uniqueness, and primary key constraint. - -- `[COMMENT 'string']`: This sets a comment for the column. - -- `[reference_definition]`: This is an optional reference definition used to define foreign key constraints. - -The `reference_definition` section is used to define references for foreign keys and includes the following: - -- `REFERENCES tbl_name (key_part,...)`: This specifies the referenced table and columns for the foreign key. - -- `[ON DELETE reference_option]`: This sets the action to be taken when a referenced row is deleted. - -- `[ON UPDATE reference_option]`: This sets the action to be taken when a referenced row is updated. - -`reference_option` represents the options for foreign key actions, including `RESTRICT`, `CASCADE`, `SET NULL`, and `NO ACTION`. - -The `table_options` section sets table-level options, including initial auto-increment value, table comments, etc. - -The `partition_options` section defines options for partitioned tables, including partitioning methods, partition keys, and the number of partitions. - -For more detailed syntax explanations, see the following content. - -#### Temporary Tables - -You can use the `TEMPORARY` keyword when creating a table. A `TEMPORARY` table is visible only within the current session, and is dropped automatically when the session is closed. This means that two different sessions can use the same temporary table name without conflicting with each other or with an existing non-TEMPORARY table of the same name. (The existing table is hidden until the temporary table is dropped.) - -Dropping a database does automatically drop any `TEMPORARY` tables created within that database. - -The creating session can perform any operation on the table, such as `DROP TABLE`, `INSERT`, `UPDATE`, or `SELECT`. - -#### COMMENT - -A comment for a column or a table can be specified with the `COMMENT` option. - -- Up to 1024 characters long. The comment is displayed by the `SHOW CREATE TABLE` and `SHOW FULL COLUMNS` statements. It is also shown in the `COLUMN_COMMENT` column of the `INFORMATION_SCHEMA.COLUMNS` table. - -#### AUTO_INCREMENT - -The initial `AUTO_INCREMENT` value for the table. - -An integer column can have the additional attribute `AUTO_INCREMENT`. When you insert a value of NULL (recommended) or 0 into an indexed AUTO_INCREMENT column, the column is set to the next sequence value. Typically this is value+1, where the value is the largest value for the column currently in the table. `AUTO_INCREMENT` sequences begin with 1 default. - -There can be only one `AUTO_INCREMENT` column per table, which must be indexed and cannot have a DEFAULT value. An `AUTO_INCREMENT` column works properly only if it contains only positive values. Inserting a negative number is regarded as inserting a very large positive number. This is done to avoid precision problems when numbers "wrap" over from positive to negative and also to ensure that you do not accidentally get an `AUTO_INCREMENT` column that contains 0. - -You can use the `AUTO_INCREMENT` attribute to define the starting value of an auto-increment column. If you want to set the starting value of the auto-increment column to 10, you can use the `AUTO_INCREMENT` keyword when creating the table and specify the starting value later. - -For example, to create a table and define an auto-increment column with a starting value of 10, you can use the following SQL statement: - -```sql --- set up -create table t1(a int auto_increment primary key) auto_increment = 10; -``` - -In this example, the `id` column is an auto-incrementing column with a starting value 10. When a new record is inserted into the table, the value of the `id` column will start from 10 and increment by 1 each time. If the starting value of `AUTO_INCREMENT` is not specified, the default starting value is 1, which is automatically incremented by 1 each time. - -!!! note - 1. MatrixOne currently only supports the default increment step size of 1; regardless of the initial value of the auto-increment column, each auto-increment is 1. Temporarily does not support setting the incremental step size. - 2. MatrixOne only syntax supports using the system variable `set @@auto_increment_offset=n` to set the initial value of the auto-increment column, but it does not take effect. - -#### PRIMARY KEY - -The PRIMARY KEY constraint uniquely identifies each record in a table. - -Primary keys must contain UNIQUE values, and cannot contain NULL values. - -A table can have only ONE primary key; and in the table, this primary key can consist of single column (field). - -- **SQL PRIMARY KEY on CREATE TABLE** - -The following SQL creates a PRIMARY KEY on the "ID" column when the "Persons" table is created: - -``` -> CREATE TABLE Persons ( - ID int NOT NULL, - LastName varchar(255) NOT NULL, - FirstName varchar(255), - Age int, - PRIMARY KEY (ID) -); -``` - -#### FOREIGN KEY - -The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables. - -A FOREIGN KEY is a field (or collection of fields) in one table, that refers to the PRIMARY KEY in another table. - -The table with the foreign key is called the child table, and the table with the primary key is called the referenced or parent table. - -The FOREIGN KEY constraint prevents invalid data from being inserted into the foreign key column, because it has to be one of the values contained in the parent table. - -When defining FOREIGN KEY, the following rules need to be followed: - -- The parent table must already exist in the database or be a table currently being created. In the latter case, the parent table and the slave table are the same table, such a table is called a self-referential table, and this structure is called self-referential integrity. - -- A primary key must be defined for the parent table. - -- Specify the column name or combination of column names after the table name of the parent table. This column or combination of columns must be the primary or candidate key of the primary table. - -- The number of columns in the foreign key must be the same as the number of columns in the primary key of the parent table. - -- The data type of the column in the foreign key must be the same as the data type of the corresponding column in the primary key of the parent table. - -The following is an example to illustrate the association of parent and child tables through FOREIGN KEY and PRIMARY KEY: - -First, create a parent table with field a as the primary key: - -```sql -create table t1(a int primary key,b varchar(5)); -insert into t1 values(101,'abc'),(102,'def'); -mysql> select * from t1; -+------+------+ -| a | b | -+------+------+ -| 101 | abc | -| 102 | def | -+------+------+ -2 rows in set (0.00 sec) -``` - -Then create a child table with field c as the foreign key, associated with parent table field a: - -```sql -create table t2(a int ,b varchar(5),c int, foreign key(c) references t1(a)); -insert into t2 values(1,'zs1',101),(2,'zs2',102); -insert into t2 values(3,'xyz',null); -mysql> select * from t2; -+------+------+------+ -| a | b | c | -+------+------+------+ -| 1 | zs1 | 101 | -| 2 | zs2 | 102 | -| 3 | xyz | NULL | -+------+------+------+ -3 rows in set (0.00 sec) -``` - -In addition, `[ON DELETE reference_option]` and `[ON UPDATE reference_option]` are used when defining a foreign key relationship to specify actions to be taken when records in the parent table are deleted or updated. These two parameters are primarily used to maintain data integrity and consistency: - -- `ON DELETE reference_option`: This parameter specifies how to handle associated foreign key data when data in the referenced table is deleted. Common options include: - - + `RESTRICT`: If related foreign key data exists in the referenced table, deletion of data in the table is not allowed. This prevents accidental deletion of related data, ensuring data consistency. - - + `CASCADE`: When data in the referenced table is deleted, associated foreign key data is also deleted. This is used for cascading deletion of related data to maintain data integrity. - - + `SET NULL`: When data in the referenced table is deleted, the value of the foreign key column is set to NULL. This is used to retain foreign key data while disconnecting it from the referenced data upon deletion. - - + `NO ACTION`: Indicates no action is taken; it only checks for the existence of associated data. This is similar to `RESTRICT` but may have minor differences in some databases. - -- `ON UPDATE reference_option`: This parameter specifies how to handle associated foreign key data when data in the referenced table is updated. Common options are similar to those of `ON DELETE reference_option`, and their usage is identical, but they apply to data update operations. - -See the example below: - -Suppose there are two tables, `Orders` and `Customers`, where the `Orders` table has a foreign key column `customer_id` referencing the `id` column in the `Customers` table. If, when a customer is deleted from the `Customers` table, you also want to delete the associated order data, you can use `ON DELETE CASCADE`. - -```sql -CREATE TABLE Customers ( - id INT PRIMARY KEY, - name VARCHAR(50) -); - -CREATE TABLE Orders ( - id INT PRIMARY KEY, - order_number VARCHAR(10), - customer_id INT, - FOREIGN KEY (customer_id) REFERENCES Customers(id) ON DELETE CASCADE -); -``` - -In the above example, when a customer is deleted from the `Customers` table, the associated order data will also be deleted through cascading, maintaining data integrity. Similarly, the `ON UPDATE` parameter can handle update operations. - -For more information on data integrity constraints, see [Data Integrity Constraints Overview](../../../Develop/schema-design/data-integrity/overview-of-integrity-constraint-types.md). - -#### Cluster by - -`Cluster by` is a command used to optimize the physical arrangement of a table. When creating a table, the `Cluster by` command can physically sort the table based on a specified column for tables without a primary key. It will rearrange the data rows to match the order of values in that column. Using `Cluster by` improves query performance. - -- The syntax for a single column is: `create table() cluster by col;` -- The syntax for multiple columns is: `create table() cluster by (col1, col2);` - -__Note:__ `Cluster by` cannot coexist with a primary key, or a syntax error will occur. `Cluster by` can only be specified when creating a table and does not support dynamic creation. - -For more information on using `Cluster by` for performing tuning, see [Using `Cluster by` for performance tuning](../../../Performance-Tuning/optimization-concepts/through-cluster-by.md). - -#### Table PARTITION and PARTITIONS - -``` -partition_options: - PARTITION BY - { [LINEAR] HASH(expr) - | [LINEAR] KEY [ALGORITHM={1 | 2}] (column_list) - [PARTITIONS num] - [(partition_definition [, partition_definition] ...)] - -partition_definition: - PARTITION partition_name - [VALUES - {LESS THAN {(expr | value_list) | MAXVALUE} - | - IN (value_list)}] - [COMMENT [=] 'string' ] -``` - -Partitions can be modified, merged, added to tables, and dropped from tables. - -- **PARTITION BY** - -If used, a partition_options clause begins with PARTITION BY. This clause contains the function that is used to determine the partition; the function returns an integer value ranging from 1 to num, where num is the number of partitions. - -- **HASH(expr)** - -Hashes one or more columns to create a key for placing and locating rows. expr is an expression using one or more table columns. For example, these are both valid CREATE TABLE statements using PARTITION BY HASH: - -``` -CREATE TABLE t1 (col1 INT, col2 CHAR(5)) - PARTITION BY HASH(col1); - -CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATETIME) - PARTITION BY HASH ( YEAR(col3) ); -``` - -- **KEY(column_list)** - -This is similar to `HASH`. The column_list argument is simply a list of 1 or more table columns (maximum: 16). This example shows a simple table partitioned by key, with 4 partitions: - -``` -CREATE TABLE tk (col1 INT, col2 CHAR(5), col3 DATE) - PARTITION BY KEY(col3) - PARTITIONS 4; -``` - -For tables that are partitioned by key, you can employ linear partitioning by using the `LINEAR` keyword. This has the same effect as with tables that are partitioned by `HASH`. This example uses linear partitioning by `key` to distribute data between 5 partitions: - -``` -CREATE TABLE tk (col1 INT, col2 CHAR(5), col3 DATE) - PARTITION BY LINEAR KEY(col3) - PARTITIONS 5; -``` - -- **RANGE(expr)** - -In this case, expr shows a range of values using a set of `VALUES LESS THAN` operators. When using range partitioning, you must define at least one partition using `VALUES LESS THAN`. You cannot use `VALUES IN` with range partitioning. - -`PARTITION ... VALUES LESS THAN ...` statements work in a consecutive fashion. `VALUES LESS THAN MAXVALUE` works to specify "leftover" values that are greater than the maximum value otherwise specified. - -The clauses must be arranged in such a way that the upper limit specified in each successive `VALUES LESS THAN` is greater than that of the previous one, with the one referencing `MAXVALUE` coming last of all in the list. - -- **RANGE COLUMNS(column_list)** - -This variant on `RANGE` facilitates partition pruning for queries using range conditions on multiple columns (that is, having conditions such as `WHERE a = 1 AND b < 10 or WHERE a = 1 AND b = 10 AND c < 10)`. It enables you to specify value ranges in multiple columns by using a list of columns in the COLUMNS clause and a set of column values in each `PARTITION ... VALUES LESS THAN (value_list)` partition definition clause. (In the simplest case, this set consists of a single column.) The maximum number of columns that can be referenced in the column_list and value_list is 16. - -The column_list used in the `COLUMNS` clause may contain only names of columns; each column in the list must be one of the following MySQL data types: the integer types; the string types; and time or date column types. Columns using BLOB, TEXT, SET, ENUM, BIT, or spatial data types are not permitted; columns that use floating-point number types are also not permitted. You also may not use functions or arithmetic expressions in the `COLUMNS` clause. - -The `VALUES LESS THAN` clause used in a partition definition must specify a literal value for each column that appears in the COLUMNS() clause; that is, the list of values used for each `VALUES LESS THAN` clause must contain the same number of values as there are columns listed in the `COLUMNS` clause. An attempt to use more or fewer values in a `VALUES LESS THAN` clause than there are in the COLUMNS clause causes the statement to fail with the error Inconsistency in usage of column lists for partitioning.... You cannot use `NULL` for any value appearing in `VALUES LESS THAN`. It is possible to use MAXVALUE more than once for a given column other than the first, as shown in this example: - -``` -CREATE TABLE rc ( - a INT NOT NULL, - b INT NOT NULL -) -PARTITION BY RANGE COLUMNS(a,b) ( - PARTITION p0 VALUES LESS THAN (10,5), - PARTITION p1 VALUES LESS THAN (20,10), - PARTITION p2 VALUES LESS THAN (50,MAXVALUE), - PARTITION p3 VALUES LESS THAN (65,MAXVALUE), - PARTITION p4 VALUES LESS THAN (MAXVALUE,MAXVALUE) -); -``` - -Each value used in a `VALUES LESS THAN` value list must match the type of the corresponding column exactly; no conversion is made. For example, you cannot use the string '1' for a value that matches a column that uses an integer type (you must use the numeral 1 instead), nor can you use the numeral 1 for a value that matches a column that uses a string type (in such a case, you must use a quoted string: '1'). - -- **LIST(expr)** - -This is useful when assigning partitions based on a table column with a restricted set of possible values, such as a state or country code. In such a case, all rows pertaining to a certain state or country can be assigned to a single partition, or a partition can be reserved for a certain set of states or countries. It is similar to RANGE, except that only VALUES IN may be used to specify permissible values for each partition. - -`VALUES IN` is used with a list of values to be matched. For instance, you could create a partitioning scheme such as the following: - -``` -CREATE TABLE client_firms ( - id INT, - name VARCHAR(35) -) -PARTITION BY LIST (id) ( - PARTITION r0 VALUES IN (1, 5, 9, 13, 17, 21), - PARTITION r1 VALUES IN (2, 6, 10, 14, 18, 22), - PARTITION r2 VALUES IN (3, 7, 11, 15, 19, 23), - PARTITION r3 VALUES IN (4, 8, 12, 16, 20, 24) -); -``` - -When using list partitioning, you must define at least one partition using VALUES IN. You cannot use VALUES LESS THAN with PARTITION BY LIST. - -!!! note - For tables partitioned by LIST, the value list used with VALUES IN must consist of integer values only. In MySQL 8.0, you can overcome this limitation using partitioning by LIST COLUMNS, which is described later in this section. - -- **LIST COLUMNS(column_list)** - -This variant on `LIST` facilitates partition pruning for queries using comparison conditions on multiple columns (that is, having conditions such as `WHERE a = 5 AND b = 5 or WHERE a = 1 AND b = 10 AND c = 5`). It enables you to specify values in multiple columns by using a list of columns in the COLUMNS clause and a set of column values in each `PARTITION ... VALUES IN` (value_list) partition definition clause. - -The rules governing regarding data types for the column list used in `LIST COLUMNS(column_list)` and the value list used in `VALUES IN(value_list)` are the same as those for the column list used in `RANGE COLUMNS(column_list)` and the value list used in `VALUES LESS THAN(value_list)`, respectively, except that in the `VALUES IN` clause, `MAXVALUE` is not permitted, and you may use `NULL`. - -There is one important difference between the list of values used for VALUES IN with `PARTITION BY LIST COLUMNS` as opposed to when it is used with `PARTITION BY LIST`. When used with `PARTITION BY LIST COLUMNS`, each element in the `VALUES IN` clause must be a set of column values; the number of values in each set must be the same as the number of columns used in the `COLUMNS` clause, and the data types of these values must match those of the columns (and occur in the same order). In the simplest case, the set consists of a single column. The maximum number of columns that can be used in the column_list and in the elements making up the value_list is 16. - -The table defined by the following `CREATE TABLE` statement provides an example of a table using `LIST COLUMNS` partitioning: - -``` -CREATE TABLE lc ( - a INT NULL, - b INT NULL -) -PARTITION BY LIST COLUMNS(a,b) ( - PARTITION p0 VALUES IN( (0,0), (NULL,NULL) ), - PARTITION p1 VALUES IN( (0,1), (0,2), (0,3), (1,1), (1,2) ), - PARTITION p2 VALUES IN( (1,0), (2,0), (2,1), (3,0), (3,1) ), - PARTITION p3 VALUES IN( (1,3), (2,2), (2,3), (3,2), (3,3) ) -); -``` - -- **PARTITIONS num** - -The number of partitions may optionally be specified with a PARTITIONS num clause, where num is the number of partitions. If both this clause and any PARTITION clauses are used, num must be equal to the total number of any partitions that are declared using PARTITION clauses. - -## **Examples** - -- Example 1: Create a common table - -```sql -CREATE TABLE test(a int, b varchar(10)); -INSERT INTO test values(123, 'abc'); - -mysql> SELECT * FROM test; -+------+---------+ -| a | b | -+------+---------+ -| 123 | abc | -+------+---------+ -``` - -- Example 2: Add comments when creating a table - -```sql -create table t2 (a int, b int) comment = "fact table"; - -mysql> show create table t2; -+-------+---------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+---------------------------------------------------------------------------------------+ -| t2 | CREATE TABLE `t2` ( -`a` INT DEFAULT NULL, -`b` INT DEFAULT NULL -) COMMENT='fact table', | -+-------+---------------------------------------------------------------------------------------+ -``` - -- Example 3: Add comments to columns when creating tables - -```sql -create table t3 (a int comment 'Column comment', b int) comment = "table"; - -mysql> SHOW CREATE TABLE t3; -+-------+----------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+----------------------------------------------------------------------------------------------------------+ -| t3 | CREATE TABLE `t3` ( -`a` INT DEFAULT NULL COMMENT 'Column comment', -`b` INT DEFAULT NULL -) COMMENT='table', | -+-------+----------------------------------------------------------------------------------------------------------+ -``` - -- Example 4: Create a common partitioned table - -```sql -CREATE TABLE tp1 (col1 INT, col2 CHAR(5), col3 DATE) PARTITION BY KEY(col3) PARTITIONS 4; - -mysql> SHOW CREATE TABLE tp1; -+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ -| tp1 | CREATE TABLE `tp1` ( -`col1` INT DEFAULT NULL, -`col2` CHAR(5) DEFAULT NULL, -`col3` DATE DEFAULT NULL -) partition by key algorithm = 2 (col3) partitions 4 | -+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.00 sec) - --- do not specify the number of partitions -CREATE TABLE tp2 (col1 INT, col2 CHAR(5), col3 DATE) PARTITION BY KEY(col3); - -mysql> SHOW CREATE TABLE tp2; -+-------+---------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+---------------------------------------------------------------------------------------------------------------------------------------------+ -| tp2 | CREATE TABLE `tp2` ( -`col1` INT DEFAULT NULL, -`col2` CHAR(5) DEFAULT NULL, -`col3` DATE DEFAULT NULL -) partition by key algorithm = 2 (col3) | -+-------+---------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.00 sec) - --- Specify partition algorithm -CREATE TABLE tp3 -( - col1 INT, - col2 CHAR(5), - col3 DATE -) PARTITION BY KEY ALGORITHM = 1 (col3); - - -mysql> show create table tp3; -+-------+---------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+---------------------------------------------------------------------------------------------------------------------------------------------+ -| tp3 | CREATE TABLE `tp3` ( -`col1` INT DEFAULT NULL, -`col2` CHAR(5) DEFAULT NULL, -`col3` DATE DEFAULT NULL -) partition by key algorithm = 1 (col3) | -+-------+---------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.00 sec) - --- Specify partition algorithm and the number of partitions -CREATE TABLE tp4 (col1 INT, col2 CHAR(5), col3 DATE) PARTITION BY LINEAR KEY ALGORITHM = 1 (col3) PARTITIONS 5; - -mysql> SHOW CREATE TABLE tp4; -+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| tp4 | CREATE TABLE `tp4` ( -`col1` INT DEFAULT NULL, -`col2` CHAR(5) DEFAULT NULL, -`col3` DATE DEFAULT NULL -) partition by linear key algorithm = 1 (col3) partitions 5 | -+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.01 sec) - --- Multi-column partition -CREATE TABLE tp5 -( - col1 INT, - col2 CHAR(5), - col3 DATE -) PARTITION BY KEY(col1, col2) PARTITIONS 4; - -mysql> SHOW CREATE TABLE tp5; -+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| tp5 | CREATE TABLE `tp5` ( -`col1` INT DEFAULT NULL, -`col2` CHAR(5) DEFAULT NULL, -`col3` DATE DEFAULT NULL -) partition by key algorithm = 2 (col1, col2) partitions 4 | -+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.01 sec) - --- Create a primary key column partition -CREATE TABLE tp6 -( - col1 INT NOT NULL PRIMARY KEY, - col2 DATE NOT NULL, - col3 INT NOT NULL, - col4 INT NOT NULL -) PARTITION BY KEY(col1) PARTITIONS 4; - -mysql> SHOW CREATE TABLE tp6; -+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| tp6 | CREATE TABLE `tp6` ( -`col1` INT NOT NULL, -`col2` DATE NOT NULL, -`col3` INT NOT NULL, -`col4` INT NOT NULL, -PRIMARY KEY (`col1`) -) partition by key algorithm = 2 (col1) partitions 4 | -+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.01 sec) - --- Create HASH partition -CREATE TABLE tp7 -( - col1 INT, - col2 CHAR(5) -) PARTITION BY HASH(col1); - -mysql> SHOW CREATE TABLE tp7; -+-------+------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+------------------------------------------------------------------------------------------------------+ -| tp7 | CREATE TABLE `tp7` ( -`col1` INT DEFAULT NULL, -`col2` CHAR(5) DEFAULT NULL -) partition by hash (col1) | -+-------+------------------------------------------------------------------------------------------------------+ -1 row in set (0.01 sec) - --- Specifies the number of HASH partitions when creating hash partition -CREATE TABLE tp8 -( - col1 INT, - col2 CHAR(5) -) PARTITION BY HASH(col1) PARTITIONS 4; - -mysql> SHOW CREATE TABLE tp8; -+-------+-------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+-------------------------------------------------------------------------------------------------------------------+ -| tp8 | CREATE TABLE `tp8` ( -`col1` INT DEFAULT NULL, -`col2` CHAR(5) DEFAULT NULL -) partition by hash (col1) partitions 4 | -+-------+-------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.00 sec) - --- specify the partition granularity when creating a partition -CREATE TABLE tp9 -( - col1 INT, - col2 CHAR(5), - col3 DATETIME -) PARTITION BY HASH (YEAR(col3)); - -mysql> SHOW CREATE TABLE tp9; -+-------+------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+------------------------------------------------------------------------------------------------------------------------------------------+ -| tp9 | CREATE TABLE `tp9` ( -`col1` INT DEFAULT NULL, -`col2` CHAR(5) DEFAULT NULL, -`col3` DATETIME DEFAULT NULL -) partition by hash (year(col3)) | -+-------+------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.00 sec) - --- specify the partition granularity and number of partitions when creating a partition -CREATE TABLE tp10 -( - col1 INT, - col2 CHAR(5), - col3 DATE -) PARTITION BY LINEAR HASH( YEAR(col3)) PARTITIONS 6; - -mysql> SHOW CREATE TABLE tp10; -+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ -| tp10 | CREATE TABLE `tp10` ( -`col1` INT DEFAULT NULL, -`col2` CHAR(5) DEFAULT NULL, -`col3` DATE DEFAULT NULL -) partition by linear hash (year(col3)) partitions 6 | -+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.00 sec) - --- Use the primary key column as the HASH partition when creating a partition -CREATE TABLE tp12 (col1 INT NOT NULL PRIMARY KEY, col2 DATE NOT NULL, col3 INT NOT NULL, col4 INT NOT NULL) PARTITION BY HASH(col1) PARTITIONS 4; - -mysql> SHOW CREATE TABLE tp12; -+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| tp12 | CREATE TABLE `tp12` ( -`col1` INT NOT NULL, -`col2` DATE NOT NULL, -`col3` INT NOT NULL, -`col4` INT NOT NULL, -PRIMARY KEY (`col1`) -) partition by hash (col1) partitions 4 | -+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.01 sec) - --- Create a RANGE partition and divide the partition range -CREATE TABLE tp13 (id INT NOT NULL PRIMARY KEY, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT NOT NULL, store_id INT NOT NULL) PARTITION BY RANGE (id) (PARTITION p0 VALUES LESS THAN (6), PARTITION p1 VALUES LESS THAN (11), PARTITION p2 VALUES LESS THAN (16), PARTITION p3 VALUES LESS THAN (21)); - -mysql> SHOW CREATE TABLE tp13; -+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| tp13 | CREATE TABLE `tp13` ( -`id` INT NOT NULL, -`fname` VARCHAR(30) DEFAULT NULL, -`lname` VARCHAR(30) DEFAULT NULL, -`hired` DATE DEFAULT '1970-01-01', -`separated` DATE DEFAULT '9999-12-31', -`job_code` INT NOT NULL, -`store_id` INT NOT NULL, -PRIMARY KEY (`id`) -) partition by range(id) (partition p0 values less than (6), partition p1 values less than (11), partition p2 values less than (16), partition p3 values less than (21)) | -+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.01 sec) - -CREATE TABLE tp14 (id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT, store_id INT) PARTITION BY RANGE ( YEAR(separated) ) ( PARTITION p0 VALUES LESS THAN (1991), PARTITION p1 VALUES LESS THAN (1996), PARTITION p2 VALUES LESS THAN (2001), PARTITION p3 VALUES LESS THAN MAXVALUE); - -mysql> SHOW CREATE TABLE tp14; -+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| tp14 | CREATE TABLE `tp14` ( -`id` INT NOT NULL, -`fname` VARCHAR(30) DEFAULT NULL, -`lname` VARCHAR(30) DEFAULT NULL, -`hired` DATE DEFAULT '1970-01-01', -`separated` DATE DEFAULT '9999-12-31', -`job_code` INT DEFAULT NULL, -`store_id` INT DEFAULT NULL -) partition by range(year(separated)) (partition p0 values less than (1991), partition p1 values less than (1996), partition p2 values less than (2001), partition p3 values less than (MAXVALUE)) | -+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.00 sec) - --- Use multiple columns as RANGE partitions and specify the range of partitions -CREATE TABLE tp15 (a INT NOT NULL, b INT NOT NULL) PARTITION BY RANGE COLUMNS(a,b) PARTITIONS 4 (PARTITION p0 VALUES LESS THAN (10,5), PARTITION p1 VALUES LESS THAN (20,10), PARTITION p2 VALUES LESS THAN (50,20), PARTITION p3 VALUES LESS THAN (65,30)); - -mysql> SHOW CREATE TABLE tp15; -+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| tp15 | CREATE TABLE `tp15` ( -`a` INT NOT NULL, -`b` INT NOT NULL -) partition by range columns (a, b) partitions 4 (partition p0 values less than (10, 5), partition p1 values less than (20, 10), partition p2 values less than (50, 20), partition p3 values less than (65, 30)) | -+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.00 sec) - --- Create LIST partition -CREATE TABLE tp16 (id INT PRIMARY KEY, name VARCHAR(35), age INT unsigned) PARTITION BY LIST (id) (PARTITION r0 VALUES IN (1, 5, 9, 13, 17, 21), PARTITION r1 VALUES IN (2, 6, 10, 14, 18, 22), PARTITION r2 VALUES IN (3, 7, 11, 15, 19, 23), PARTITION r3 VALUES IN (4, 8, 12, 16, 20, 24)); - -mysql> SHOW CREATE TABLE tp16; -+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| tp16 | CREATE TABLE `tp16` ( -`id` INT DEFAULT NULL, -`name` VARCHAR(35) DEFAULT NULL, -`age` INT UNSIGNED DEFAULT NULL, -PRIMARY KEY (`id`) -) partition by list(id) (partition r0 values in (1, 5, 9, 13, 17, 21), partition r1 values in (2, 6, 10, 14, 18, 22), partition r2 values in (3, 7, 11, 15, 19, 23), partition r3 values in (4, 8, 12, 16, 20, 24)) | -+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.01 sec) - -CREATE TABLE tp17 (id INT, name VARCHAR(35), age INT unsigned) PARTITION BY LIST (id) (PARTITION r0 VALUES IN (1, 5, 9, 13, 17, 21), PARTITION r1 VALUES IN (2, 6, 10, 14, 18, 22), PARTITION r2 VALUES IN (3, 7, 11, 15, 19, 23), PARTITION r3 VALUES IN (4, 8, 12, 16, 20, 24)); - -mysql> SHOW CREATE TABLE tp17; -+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| tp17 | CREATE TABLE `tp17` ( -`id` INT DEFAULT NULL, -`name` VARCHAR(35) DEFAULT NULL, -`age` INT UNSIGNED DEFAULT NULL -) partition by list(id) (partition r0 values in (1, 5, 9, 13, 17, 21), partition r1 values in (2, 6, 10, 14, 18, 22), partition r2 values in (3, 7, 11, 15, 19, 23), partition r3 values in (4, 8, 12, 16, 20, 24)) | -+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.01 sec) - --- Use multiple columns as LIST partitions -CREATE TABLE tp18 (a INT NULL,b INT NULL) PARTITION BY LIST COLUMNS(a,b) (PARTITION p0 VALUES IN( (0,0), (NULL,NULL) ), PARTITION p1 VALUES IN( (0,1), (0,2), (0,3), (1,1), (1,2) ), PARTITION p2 VALUES IN( (1,0), (2,0), (2,1), (3,0), (3,1) ), PARTITION p3 VALUES IN( (1,3), (2,2), (2,3), (3,2), (3,3) )); - -mysql> SHOW CREATE TABLE tp18; -+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Table | Create Table | -+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| tp18 | CREATE TABLE `tp18` ( -`a` INT DEFAULT NULL, -`b` INT DEFAULT NULL -) partition by list columns (a, b) (partition p0 values in ((0, 0), (null, null)), partition p1 values in ((0, 1), (0, 2), (0, 3), (1, 1), (1, 2)), partition p2 values in ((1, 0), (2, 0), (2, 1), (3, 0), (3, 1)), partition p3 values in ((1, 3), (2, 2), (2, 3), (3, 2), (3, 3))) | -+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -1 row in set (0.00 sec) -``` - -- Example 5: Primary key auto increment - -```sql -drop table if exists t1; -create table t1(a bigint primary key auto_increment, - b varchar(10)); -insert into t1(b) values ('bbb'); -insert into t1 values (3, 'ccc'); -insert into t1(b) values ('bbb1111'); - -mysql> select * from t1 order by a; -+------+---------+ -| a | b | -+------+---------+ -| 1 | bbb | -| 3 | ccc | -| 4 | bbb1111 | -+------+---------+ -3 rows in set (0.01 sec) - -insert into t1 values (2, 'aaaa1111'); - -mysql> select * from t1 order by a; -+------+----------+ -| a | b | -+------+----------+ -| 1 | bbb | -| 2 | aaaa1111 | -| 3 | ccc | -| 4 | bbb1111 | -+------+----------+ -4 rows in set (0.00 sec) - -insert into t1(b) values ('aaaa1111'); - -mysql> select * from t1 order by a; -+------+----------+ -| a | b | -+------+----------+ -| 1 | bbb | -| 2 | aaaa1111 | -| 3 | ccc | -| 4 | bbb1111 | -| 5 | aaaa1111 | -+------+----------+ -5 rows in set (0.01 sec) - -insert into t1 values (100, 'xxxx'); -insert into t1(b) values ('xxxx'); - -mysql> select * from t1 order by a; -+------+----------+ -| a | b | -+------+----------+ -| 1 | bbb | -| 2 | aaaa1111 | -| 3 | ccc | -| 4 | bbb1111 | -| 5 | aaaa1111 | -| 100 | xxxx | -| 101 | xxxx | -+------+----------+ -7 rows in set (0.00 sec) -``` - -## **Constraints** - -1. Currently, it is not supported to use the `ALTER TABLE table_name DROP PRIMARY KEY` statement to drop the primary key from a table. -2. The `ALTER TABLE table_name AUTO_INCREMENT = n;` statement is not supported to modify the initial value of the auto-increment column. -3. In MatrixOne, only syntax supports using the system variable `set @@auto_increment_increment=n` to set the incremental step size, and only syntax supports using the system variable `set @@auto_increment_offset=n` to set the default auto-increment column initial value, but it does not take effect; currently supports setting the initial value `AUTO_INCREMENT=n` of the auto-increment column, but the step size is still 1 by default. diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-publication.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-publication.md index 04339fb5f..b9d423051 100644 --- a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-publication.md +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-publication.md @@ -1,26 +1,27 @@ # **ALTER PUBLICATION** -## **Description** +## **Grammar description** -`ALTER PUBLICATION` can change the attributes of a publication. +`ALTER PUBLICATION` Modify the publication. -You must own the publication to use ALTER PUBLICATION. Adding a table to a publication additionally requires owning that table. - -## **Syntax** +## **Grammar structure** ``` -ALTER PUBLICATION pubname ACCOUNT  - { ALL - | account_name, [, ... ] - | ADD account_name, [, ... ] - | DROP account_name, [, ... ] - [ COMMENT 'string'] +ALTER PUBLICATION pubname + [ACCOUNT + { ALL + | account_name, [, ... ] + | ADD account_name, [, ... ] + | DROP account_name, [, ... ]] + [COMMENT 'string'] + [DATABASE database_name] ``` -## **Explanations** +## Interpretation of grammar -- pubname: The name of an existing publication whose definition is to be altered. -- account_name: The user name of the owner of the publication. +- pubname: The name of the publication that already exists. +- account_name: Gets the tenant name for this publication. +- database_name: The name of the release library to modify. ## **Examples** @@ -30,10 +31,8 @@ create account acc1 admin_name 'root' identified by '111'; create account acc2 admin_name 'root' identified by '111'; create database t; create publication pub3 database t account acc0,acc1; -mysql> alter publication pub3 account add accx; -show create publication pub3; -Query OK, 0 rows affected (0.00 sec) +alter publication pub3 account add accx;--Modification of the scope of publication mysql> show create publication pub3; +-------------+-----------------------------------------------------------------------+ | Publication | Create Publication | @@ -43,10 +42,29 @@ mysql> show create publication pub3; 1 row in set (0.01 sec) mysql> show publications; -+------+----------+ -| Name | Database | -+------+----------+ -| pub3 | t | -+------+----------+ ++-------------+----------+---------------------+---------------------+----------------+----------+ +| publication | database | create_time | update_time | sub_account | comments | ++-------------+----------+---------------------+---------------------+----------------+----------+ +| pub3 | t | 2024-04-24 11:17:37 | 2024-04-24 11:17:44 | acc0,acc1,accx | | ++-------------+----------+---------------------+---------------------+----------------+----------+ +1 row in set (0.01 sec) + +alter publication pub3 comment "this is pubs";--修改发布备注 +mysql> show publications; ++-------------+----------+---------------------+---------------------+----------------+--------------+ +| publication | database | create_time | update_time | sub_account | comments | ++-------------+----------+---------------------+---------------------+----------------+--------------+ +| pub3 | t | 2024-04-24 11:17:37 | 2024-04-24 11:41:43 | acc0,acc1,accx | this is pubs | ++-------------+----------+---------------------+---------------------+----------------+--------------+ +1 row in set (0.00 sec) + +create database new_pub3; +alter publication pub3 database new_pub3;--修改发布数据库 +mysql> show publications; ++-------------+----------+---------------------+---------------------+----------------+--------------+ +| publication | database | create_time | update_time | sub_account | comments | ++-------------+----------+---------------------+---------------------+----------------+--------------+ +| pub3 | new_pub3 | 2024-04-24 11:17:37 | 2024-04-24 11:43:36 | acc0,acc1,accx | this is pubs | ++-------------+----------+---------------------+---------------------+----------------+--------------+ 1 row in set (0.00 sec) ``` diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-reindex.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-reindex.md new file mode 100644 index 000000000..62062d7c9 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-reindex.md @@ -0,0 +1,74 @@ +# ALTER REINDEX + +## Syntax Description + +`ALTER TABLE ... ALTER REINDEX` is used to repartition data in a vector table. + +When data records within a vector scale grow significantly, the original cluster center set may no longer be applicable. To do this, we have to re-index the data with the aim of identifying new cluster centers and repartitioning the dataset accordingly. + +!!! note + A data insertion operation cannot be performed on this table while the index is being reconstructed. + +The ideal values for LISTS are: + +- If total rows <1000000:lists=total rows/1000 +- If total rows > 1000000:lists=sqrt (total rows) + +## Syntax structure + +``` +> ALTER TABLE table_name ALTER REINDEX index_name LISTS=XX +``` + +## Examples + +```sql +SET GLOBAL experimental_ivf_index = 1;--The parameter experimental_ivf_index needs to be set to 1 (default 0) to use vector indexes. +drop table if exists t1; +create table t1(n1 int,n2 vecf32(3)); +insert into t1 values(1,"[1,2,3]"),(2,"[2,3,4]"),(3,"[3,4,5]"); +create index idx_t1 using ivfflat on t1(n2) lists=2 op_type "vector_l2_ops"; + +mysql> show create table t1; ++-------+-------------------------------------------------------------------------------------------------------------------------------------------------+ +| Table | Create Table | ++-------+-------------------------------------------------------------------------------------------------------------------------------------------------+ +| t1 | CREATE TABLE `t1` ( +`n1` INT DEFAULT NULL, +`n2` VECF32(3) DEFAULT NULL, +KEY `idx_t1` USING ivfflat (`n2`) lists = 2 op_type 'vector_l2_ops' +) | ++-------+-------------------------------------------------------------------------------------------------------------------------------------------------+ +1 row in set (0.01 sec) + +mysql> show index from t1; ++-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+-----------------------------------------+---------+------------+ +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Index_params | Visible | Expression | ++-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+-----------------------------------------+---------+------------+ +| t1 | 1 | idx_t1 | 1 | n2 | A | 0 | NULL | NULL | YES | ivfflat | | | {"lists":"2","op_type":"vector_l2_ops"} | YES | NULL | ++-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+-----------------------------------------+---------+------------+ +1 row in set (0.00 sec) + +mysql> alter table t1 alter reindex idx_t1 ivfflat lists=100; +Query OK, 0 rows affected (0.03 sec) + +mysql> show create table t1; ++-------+---------------------------------------------------------------------------------------------------------------------------------------------------+ +| Table | Create Table | ++-------+---------------------------------------------------------------------------------------------------------------------------------------------------+ +| t1 | CREATE TABLE `t1` ( +`n1` INT DEFAULT NULL, +`n2` VECF32(3) DEFAULT NULL, +KEY `idx_t1` USING ivfflat (`n2`) lists = 100 op_type 'vector_l2_ops' +) | ++-------+---------------------------------------------------------------------------------------------------------------------------------------------------+ +1 row in set (0.00 sec) + +mysql> show index from t1; ++-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+-------------------------------------------+---------+------------+ +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Index_params | Visible | Expression | ++-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+-------------------------------------------+---------+------------+ +| t1 | 1 | idx_t1 | 1 | n2 | A | 0 | NULL | NULL | YES | ivfflat | | | {"lists":"100","op_type":"vector_l2_ops"} | YES | NULL | ++-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+-------------------------------------------+---------+------------+ +1 row in set (0.01 sec) +``` diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-alter-sequence.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-sequence.md similarity index 60% rename from docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-alter-sequence.md rename to docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-sequence.md index a57a44472..f9434c770 100644 --- a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/1.1-alter-sequence.md +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-sequence.md @@ -1,10 +1,10 @@ # **ALTER SEQUENCE** -## **Description** +## **Grammar description** `ALTER SEQUENCE` is used to modify an existing sequence. -## **Syntax** +## **Grammar structure** ``` > ALTER SEQUENCE [ IF EXISTS ] SEQUENCE_NAME @@ -14,28 +14,28 @@ [ START [ WITH ] start ] [ [ NO ] CYCLE ] ``` -### Explanations +### Grammatical interpretation -- `[ IF EXISTS ]`: An optional clause that indicates that if the specified sequence does not exist, it will not raise an error. If this clause is used, the system checks if the sequence exists; if it does not, it will ignore the modification request. +- `[IF EXISTS]`: Optional clause indicating that no error is raised if the specified sequence does not exist. If this clause is used, the system checks whether the sequence exists, and if not, ignores the modification request. -- `SEQUENCE_NAME`: The name of the sequence to be modified. +- `SEQUENCE_NAME`: The name of the sequence to modify. -- `[ AS data_type ]`: An optional clause that allows you to specify the data type for the sequence. Typically, the data type of a sequence is an integer. +- `[AS data_type]`: Optional clause that allows you to specify a data type for a sequence. Typically, the data type of a sequence is an integer. -- `[ INCREMENT [ BY ] increment ]`: This specifies the increment value for the sequence. The increment value of the sequence is the amount to be added to the current value each time it is incremented or decremented. If the increment value is not specified, it is typically set to 1. +- `[INCREMENT[BY]increment]`: This is the incremental value of the specified sequence. The increment value of a sequence is the amount to be added to the current value each time it is incremented or decremented. If no incremental value is specified, it usually defaults to 1. -- `[ MINVALUE minvalue ]`: This is the minimum value of the sequence, specifying the minimum value allowed for the sequence. If a minimum value is set, the sequence's current value cannot go below this value. +- `[MINVALUE minvalue]`: This is the minimum value of the sequence, which specifies the minimum value allowed for the sequence. If a minimum value is specified, the current value of the sequence cannot be lower than this value. -- `[ MAXVALUE maxvalue ]`: This is the maximum value of the sequence, specifying the maximum value allowed for the sequence. If a maximum value is specified, the sequence's current value cannot exceed this value. +- `[MAXVALUE maxvalue]`: This is the maximum value of the sequence, which specifies the maximum allowed for the sequence. If a maximum value is specified, the current value of the sequence cannot exceed this value. -- `[ START [ WITH ] start ]`: This is the sequence's starting value, specifying the sequence's initial value. If the starting value is not specified, it is typically set to 1. +- `[ START [`WITH ] start ]: This is the start value of the sequence, which specifies the initial value of the sequence. If no starting value is specified, it usually defaults to 1. -- `[ [ NO ] CYCLE ]`: An optional clause used to specify whether the sequence values should cycle. If `NO CYCLE` is specified, the sequence will stop incrementing or decrementing after reaching the maximum or minimum value. If this clause is not specified, it typically defaults to not cycling. +- `[NO] CYCLE]`: Optional clause that specifies whether sequence values should be recycled. If `NO CYCLE` is specified, the sequence stops incrementing or decrementing when the maximum or minimum value is reached. If this clause is not specified, it usually defaults to no loop. ## **Examples** ```sql --- Create a sequence named alter_seq_01 with an increment of 2, a minimum value of 30, a maximum value of 100, and enable cycling +-- Create a sequence called alter_seq_01, set the increment of the sequence to 2, set the minimum value of the sequence to 30 and the maximum value to 100, and enable the loop create sequence alter_seq_01 as smallint increment by 2 minvalue 30 maxvalue 100 cycle; mysql> show sequences; @@ -57,7 +57,7 @@ mysql> show sequences; +--------------+-----------+ 1 row in set (0.00 sec) --- Remove cycling for sequence alter_seq_01 +-- Cancel loop for sequence alter_seq_01 mysql> alter sequence alter_seq_01 no cycle; Query OK, 0 rows affected (0.01 sec) @@ -77,7 +77,7 @@ mysql> select nextval('alter_seq_01'),currval('alter_seq_01'); +-----------------------+-----------------------+ 1 row in set (0.00 sec) --- Set the starting value of sequence alter_seq_01 to 40 +-- Set the starting value of the sequence alter_seq_01 to 40 mysql> alter sequence alter_seq_01 start with 40; Query OK, 0 rows affected (0.01 sec) @@ -97,7 +97,7 @@ mysql> select nextval('alter_seq_01'),currval('alter_seq_01'); +-----------------------+-----------------------+ 1 row in set (0.00 sec) --- Set the increment value of sequence alter_seq_01 to 3 +-- Set the incremental value of the sequence alter_seq_01 to 3 mysql> alter sequence alter_seq_01 increment by 3; Query OK, 0 rows affected (0.01 sec) diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-cluster-table.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-cluster-table.md new file mode 100644 index 000000000..ac33dfa6c --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-cluster-table.md @@ -0,0 +1,108 @@ +# **CREATE CLUSTER TABLE** + +## **Grammar description** + +A cluster table is a table created by the system library `mo_catalog` under a system tenant that takes effect simultaneously under other tenants. DDL and DML operations can be performed on the table under the system tenant, other tenants can only query or create views based on the table. + +This document describes how to set up cluster tables in a MatrixOne database. + +## **Grammar structure** + +``` +> CREATE CLUSTER TABLE [IF NOT EXISTS] tbl_name + (create_definition,...) + [table_options] + [partition_options] +``` + +## **Instructions for use** + +- Creating cluster tables is limited to the sys tenant administrator role. + +- The cluster table for the sys tenant contains all the data and may only see some data under other tenants. + +- In the cluster table, the `account_id` field is automatically generated and represents the id of the visible tenant who specified the data when inserting or LOADING DATA. Only one visible tenant can be selected per data. If you want multiple tenants to be able to view the data, you need to insert the specified different tenant id multiple times, and the field data is not returned by queries in other tenants. + +- Cluster tables cannot be exterior or temporary and have exactly the same table structure under all tenants. + +## Examples + +```sql +--Create two tenants, test1 and test2 +mysql> create account test1 admin_name = 'root' identified by '111' open comment 'tenant_test'; +Query OK, 0 rows affected (0.44 sec) + +mysql> create account test2 admin_name = 'root' identified by '111' open comment 'tenant_test'; +Query OK, 0 rows affected (0.51 sec) + +--Create a cluster table under the sys tenant +mysql> use mo_catalog; +Database changed +mysql> drop table if exists t1; +Query OK, 0 rows affected (0.00 sec) + +mysql> create cluster table t1(a int); +Query OK, 0 rows affected (0.01 sec) + +--View tenant id +mysql> select * from mo_account; ++------------+--------------+--------+---------------------+----------------+---------+----------------+ +| account_id | account_name | status | created_time | comments | version | suspended_time | ++------------+--------------+--------+---------------------+----------------+---------+----------------+ +| 0 | sys | open | 2024-01-11 08:56:57 | system account | 1 | NULL | +| 6 | test1 | open | 2024-01-15 03:15:40 | tenant_test | 7 | NULL | +| 7 | test2 | open | 2024-01-15 03:15:48 | tenant_test | 8 | NULL | ++------------+--------------+--------+---------------------+----------------+---------+----------------+ +3 rows in set (0.01 sec) + +--Inserting data into clustered table t1 is only visible to test1 tenants +mysql> insert into t1 values(1,6),(2,6),(3,6); +Query OK, 3 rows affected (0.01 sec) + +--Looking at the data for t1 in the sys tenant, you can see all the data including the `account_id` field +mysql> select * from t1; ++------+------------+ +| a | account_id | ++------+------------+ +| 1 | 6 | +| 2 | 6 | +| 3 | 6 | ++------+------------+ +3 rows in set (0.00 sec) + +--Looking at the data for t1 in the test1 tenant, you can see data that is not in the `account_id` field +mysql> select * from t1; ++------+ +| a | ++------+ +| 1 | +| 2 | +| 3 | ++------+ +3 rows in set (0.01 sec) + +--Viewing the data for t1 in the test2 tenant will not show any data +mysql> select * from t1; +Empty set (0.01 sec) + +--Creating a t1-based view in a test1 tenant +mysql> create view t1_view as select * from mo_catalog.t1; +Query OK, 0 rows affected (0.01 sec) + +mysql> select * from t1_view; ++------+ +| a | ++------+ +| 1 | +| 2 | +| 3 | ++------+ +3 rows in set (0.00 sec) + +--Create a t1-based view in the test2 tenant +mysql> create view t1_view as select * from mo_catalog.t1; +Query OK, 0 rows affected (0.01 sec) + +mysql> select * from t1_view; +Empty set (0.01 sec) +``` diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-dynamic-table.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-dynamic-table.md new file mode 100644 index 000000000..31c380712 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-dynamic-table.md @@ -0,0 +1,29 @@ +# **CREATE DYNAMIC TABLE** + +## **Grammar description** + +`CREATE DYNAMIC TABLE` Adds a new dynamic table to the current database. + +## **Grammar structure** + +```sql +CREATE DYNAMIC TABLE [IF NOT EXISTS] table_name +AS SELECT ... from stream_name ; +``` + +## Interpretation of grammar + +- table_name: Dynamic table name. The dynamic table name must be different from any existing dynamic table name in the current database. +- stream_name: The name of the SOURCE that has been created. + +## **Examples** + +```sql +create dynamic table dt_test as select * from stream_test; Query OK, 0 rows affected (0.01 sec) +``` + +## Limitations + +The use of aggregate functions, mathematical functions, string functions, date functions, and `limit, offset`, `from subquery`, `not in/in subquery`, `group by`, `order by``, having` statements is not yet supported when creating dynamic tables. + +Joins to two SOURCE tables, can join SOURCE tables, and normal data tables are not yet supported when creating dynamic tables. diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-function-python.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-function-python.md new file mode 100644 index 000000000..3bf6043e6 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-function-python.md @@ -0,0 +1,95 @@ +# **CREATE FUNCTION...LANGUAGE PYTHON AS** + +## **Grammar description** + +`CREATE FUNCTION...LANGUAGE PYTHON AS` is used to create user-defined Python functions. Users use self-defined functions to meet customization needs and simplify query writing. UDFs can also be created by importing external Python files or external whl packages. + +In some scenarios, we'd expect a python function to receive multiple tuples at once to run more efficiently, and MatrixOne provides vector options for functions to handle this. + +MatrixOne Python UDF does not currently support overloading, and function names are required to be unique within a matrixone cluster. + +## **Grammar structure** + +```sql +> CREATE [ OR REPLACE ] FUNCTION ( +[ ] [ , ... ] ) +RETURNS LANGUAGE PYTHON AS +$$ + +[ add.vector = True ] +$$ +HANDLER = '' +``` + +## **Structural description** + +- ``: Specifies the name of the custom function. + +- ` `: Used to specify parameters for custom functions, where only the name and type are used. + +- `RETURNS `: The data type used to declare the return value of a custom function. + +- ``: The body portion of the custom function, which must contain a RETURN statement that specifies the return value of the custom function. + +- `[add.vector = True` ]: Flags that the python function receives multiple tuples at once. + +- `HANDLIER :` Specifies the name of the python function called. + +## Type Mapping + +To ensure that the data types used in writing Python UDF are consistent with those supported by MatrixOne, you need to focus on the data type mapping relationship between the two, as follows: + +| MatrixOne Type | Python Type | +| -------------------------------------------------------- | --------------------------- | +| bool | bool | +| int8, int16, int32, int64, uint8, uint16, uint32, uint64 | int | +| float32, float64 | float | +| char, varchar, text, uuid | str | +| json | str, int, float, list, dict | +| time | datetime.timedelta | +| date | datetime.date | +| datetime, timestamp | datetime.datetime | +| decimal64, decimal128 | decimal.Decimal | +| binary, varbinary, blob | bytes | + +## **Examples** + +**Example 1** + +```sql +--Sum of Two Numbers with python UDF +create or replace function py_add(a int, b int) returns int language python as +$$ +def add(a, b): + return a + b +$$ +handler 'add'; + +--call function +mysql> select py_add(1,2); ++--------------+ +| py_add(1, 2) | ++--------------+ +| 3 | ++--------------+ +1 row in set (0.01 sec) +``` + +**Example 2** + +```sql +create or replace function py_helloworld() returns varchar(255) language python as +$$ +def helloworld(): + return "helloworld!" +$$ +handler 'helloworld'; + +mysql> select py_helloworld(); ++-----------------+ +| py_helloworld() | ++-----------------+ +| helloworld! | ++-----------------+ +1 row in set (0.01 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-function-sql.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-function-sql.md new file mode 100644 index 000000000..eed450130 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-function-sql.md @@ -0,0 +1,68 @@ +# **CREATE FUNCTION...LANGUAGE SQL AS** + +## **Grammar description** + +`CREATE FUNCTION...LANGUAGE SQL AS` is used to create SQL UDFs. + +A SQL custom function is a user-written SQL function that performs custom actions based on specific needs. These functions can be used for tasks such as queries, data conversions, etc., making sQL code more modular and maintainable. + +MatrixOne SQL UDF does not currently support overloading and function names are required to be unique within a matrixone cluster. + +## **Grammar structure** + +```sql +> CREATE [ OR REPLACE ] FUNCTION ( +[ ] [ , ... ] ) +RETURNS LANGUAGE SQL AS 'function_body' +``` + +## **Structural description** + +- ``: Specifies the name of the custom function. + +- ` `: Used to specify parameters for custom functions, where only the name and type are used. + +- `RETURNS `: The data type used to declare the return value of a custom function, see [Data Type Overview](../../../Reference/Data-Types/data-types.md) for the complete data type + +- `function_body`: The body part of the custom function. Users must use $1, $2, ... to reference parameters instead of the actual parameter name. The function body supports select statements and has unique return values. If the sql function body is not an expression and is a select statement on a table, the query should limit its results to 1 using limit 1 or an aggregate function without a group by clause. + +## **Examples** + +**Example 1** + +```sql +--Creating an unparameterized sql custom function + +mysql> create table t1(n1 int); +Query OK, 0 rows affected (0.02 sec) + +mysql> insert into t1 values(1),(2),(3); +Query OK, 3 rows affected (0.01 sec) + +mysql> CREATE FUNCTION t1_fun () RETURNS VARCHAR LANGUAGE SQL AS 'select n1 from t1 limit 1' ; +Query OK, 0 rows affected (0.01 sec) + +mysql> select t1_fun(); ++----------+ +| t1_fun() | ++----------+ +| 1 | ++----------+ +1 row in set (0.01 sec) +``` + +**Example 2** + +```sql +--Creating sql custom functions that return the sum of two arguments +mysql> CREATE FUNCTION twoadd (x int, y int) RETURNS int LANGUAGE SQL AS 'select $1 + $2' ; +Query OK, 0 rows affected (0.02 sec) + +mysql> select twoadd(1,2); ++--------------+ +| twoadd(1, 2) | ++--------------+ +| 3 | ++--------------+ +1 row in set (0.00 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-index-ivfflat.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-index-ivfflat.md new file mode 100644 index 000000000..48a487c85 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-index-ivfflat.md @@ -0,0 +1,90 @@ +# CREATE INDEX USING IVFFLAT + +## Syntax Description + +Vector indexes can be used to speed up KNN (K-Nearest Neighbors) queries on tables containing vector columns. Matrixone currently supports IVFFLAT vector indexes with [`l2_distance`](../../Functions-and-Operators/Vector/l2_distance.md) metric. + +We can specify PROBE_LIMIT to determine the number of cluster centers to query. PROBE_LIMIT defaults to 1, that is, only 1 cluster center is scanned. But if you set it to a higher value, it scans for a larger number of cluster centers and vectors, which may degrade performance a little but increase accuracy. We can specify the appropriate number of probes to balance query speed and recall rate. The ideal values for PROBE_LIMIT are: + +- If total rows <1000000:PROBE_LIMIT=total rows/10 +- If total rows > 1000000:PROBE_LIMIT=sqrt (total rows) + +## Syntax structure + +``` +> CREATE INDEX index_name +USING IVFFLAT +ON tbl_name (col,...) +LISTS=lists +OP_TYPE "vector_l2_ops" +``` + +### Grammatical interpretation + +- `index_name`: index name +- `IVFFLAT`: vector index type, currently supports vector_l2_ops +- `lists`: number of partitions required in index, must be greater than 0 +- `OP_TYPE`: distance measure to use + +__NOTE__: + +- The ideal values for LISTS are: + - If total rows <1000000:lists=total rows/1000 + - If total rows > 1000000:lists=sqrt (total rows) +- It is recommended that the index is not created until the data is populated. If a vector index is created on an empty table, all vector quantities will be mapped to a partition, and the amount of data continues to grow over time, causing the index to become larger and larger and query performance to degrade. + +## Examples + +```sql +--The parameter experimental_ivf_index needs to be set to 1 (default 0) to use vector indexes. +SET GLOBAL experimental_ivf_index = 1; +drop table if exists t1; +create table t1(coordinate vecf32(2),class char); +-- There are seven points, each representing its coordinates on the x and y axes, and each point's class is labeled A or B. +insert into t1 values("[2,4]","A"),("[3,5]","A"),("[5,7]","B"),("[7,9]","B"),("[4,6]","A"),("[6,8]","B"),("[8,10]","B"); +--Creating Vector Indexes +create index idx_t1 using ivfflat on t1(coordinate) lists=1 op_type "vector_l2_ops"; + +mysql> show create table t1; ++-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Table | Create Table | ++-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| t1 | CREATE TABLE `t1` ( +`coordinate` VECF32(2) DEFAULT NULL, +`class` CHAR(1) DEFAULT NULL, +KEY `idx_t1` USING ivfflat (`coordinate`) lists = 1 op_type 'vector_l2_ops' +) | ++-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +1 row in set (0.01 sec) + +mysql> show index from t1; ++-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+-----------------------------------------+---------+------------+ +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Index_params | Visible | Expression | ++-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+-----------------------------------------+---------+------------+ +| t1 | 1 | idx_t1 | 1 | coordinate | A | 0 | NULL | NULL | YES | ivfflat | | | {"lists":"1","op_type":"vector_l2_ops"} | YES | NULL | ++-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+-----------------------------------------+---------+------------+ +1 row in set (0.01 sec) + +--Set the number of clustering centers to scan +SET @PROBE_LIMIT=1; +--Now, we have a new point with coordinates (4, 4) and we want to use a KNN query to predict the class of this point. +mysql> select * from t1 order by l2_distance(coordinate,"[4,4]") asc; ++------------+-------+ +| coordinate | class | ++------------+-------+ +| [3, 5] | A | +| [2, 4] | A | +| [4, 6] | A | +| [5, 7] | B | +| [6, 8] | B | +| [7, 9] | B | +| [8, 10] | B | ++------------+-------+ +7 rows in set (0.01 sec) + +--Based on the query results the category of this point can be predicted as A +``` + +## Limitations + +Only one vector index on one vector column is supported at a time. If you need to build a vector index on multiple vector columns, you can execute the create statement multiple times. diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-publication.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-publication.md index 16171c5dd..83bdb4a61 100644 --- a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-publication.md +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-publication.md @@ -1,10 +1,10 @@ # **CREATE PUBLICATION** -## **Description** +## **Grammar description** -`CREATE PUBLICATION` adds a new publication into the current database. +`CREATE PUBLICATION` Adds a new publication to the current database. -## **Syntax** +## **Grammar structure** ``` CREATE PUBLICATION pubname @@ -14,11 +14,11 @@ CREATE PUBLICATION pubname [ COMMENT 'string'] ``` -## **Explanations** +## Interpretation of grammar -- pubname: The publication name. The publication name must be distinct from the name of any existing publication in the current database. -- database_name: specifies the database name that exists under the current account. -- account_name: The account name. The name of the account which obtains the publication. +- pubname: The publication name. The publication name must be different from the name of any existing publication in the current database. +- database_name: The name of a database that already exists under the current tenant. +- account_name: Gets the tenant name for this publication. ## **Examples** @@ -30,6 +30,6 @@ mysql> create publication pub1 database t account acc0,acc1; Query OK, 0 rows affected (0.01 sec) ``` -## **Constraints** +## Limitations -MatrxiOne currently only supports publishing one database at a time. +MatrxiOne currently only supports publishing one database data at a time. diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-snapshot.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-snapshot.md new file mode 100644 index 000000000..c591dd9a8 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-snapshot.md @@ -0,0 +1,46 @@ +# CREATE SNAPSHOT + +## Syntax Description + +The `CREATE SNAPSHOT` command is used to create a snapshot. System tenants can create snapshots for themselves or for regular tenants, but regular tenants can only create snapshots for themselves. Snapshots created by a tenant are visible only to this tenant. + +## Syntax structure + +```sql +> CREATE SNAPSHOT snapshot_name FOR ACCOUNT account_name +``` + +## Examples + +```sql +--Execute under system tenant sys +create snapshot sp1 for account sys; +create snapshot sp2 for account acc1; + +mysql> show snapshots; ++---------------+----------------------------+----------------+--------------+---------------+------------+ +| SNAPSHOT_NAME | TIMESTAMP | SNAPSHOT_LEVEL | ACCOUNT_NAME | DATABASE_NAME | TABLE_NAME | ++---------------+----------------------------+----------------+--------------+---------------+------------+ +| sp2 | 2024-05-10 09:49:08.925908 | account | acc1 | | | +| sp1 | 2024-05-10 09:48:50.271707 | account | sys | | | ++---------------+----------------------------+----------------+--------------+---------------+------------+ +2 rows in set (0.00 sec) + +--Executed under tenant acc1 +mysql> create snapshot sp3 for account acc2;--Regular tenants can only create snapshots for themselves +ERROR 20101 (HY000): internal error: only sys tenant can create tenant level snapshot for other tenant + +create snapshot sp3 for account acc1; + +mysql> show snapshots; ++---------------+----------------------------+----------------+--------------+---------------+------------+ +| SNAPSHOT_NAME | TIMESTAMP | SNAPSHOT_LEVEL | ACCOUNT_NAME | DATABASE_NAME | TABLE_NAME | ++---------------+----------------------------+----------------+--------------+---------------+------------+ +| sp3 | 2024-05-10 09:53:09.948762 | account | acc1 | | | ++---------------+----------------------------+----------------+--------------+---------------+------------+ +1 row in set (0.00 sec) +``` + +## Limitations + +- Currently only tenant-level snapshots are supported, not cluster-level, database-level, and table-level snapshots. \ No newline at end of file diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-source.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-source.md new file mode 100644 index 000000000..60551abd0 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-source.md @@ -0,0 +1,51 @@ +# **CREATE SOURCE** + +## **Grammar description** + +`CREATE SOURCE` Creates a connection to streamed data and adds a new SOURCE table to the current database. + +## **Grammar structure** + +```sql +CREATE [OR REPLACE] SOURCE [IF NOT EXISTS] stream_name +( { column_name data_type [KEY | HEADERS | HEADER(key)] } [, ...] ) +WITH ( property_name = expression [, ...]); +``` + +## Interpretation of grammar + +- stream_name: SOURCE Name. The SOURCE name must be different from any existing SOURCE name in the current database. +- column_name: Streaming data maps to column names in the SOURCE table. +- data_type: column_name corresponds to the type of field in the data table. +- property_name = expression: For specific configuration item names for streaming data mappings and corresponding values, the configurable items are as follows: + +| property_name | expression Description | +| :-----------------: | :------------------------------------------------------------------------------: | +| "type" | Only 'kafka' is supported: Currently, only kafka is supported as an accepted source. | +| "topic" | The corresponding topic in the kafka data source | +| "partion" | The corresponding partion in the kafka data source | +| "value" | Only 'json' is supported: Currently, only json is supported as an accepted data format. | +| "bootstrap.servers" | The IP:PORT of the kafka server. | +| "sasl.username" | Specify the SASL (Simple Authentication and Security Layer) username to use when connecting to Kafka. | +| "sasl.password" | Used in conjunction with sasl.username, this parameter provides the corresponding password| +| "sasl.mechanisms" | SASL mechanism for authentication between client and server| +| "security.protocol" | Specifies the security protocol to use when communicating with the Kafka server | + +## **Examples** + +```sql +create source stream_test(c1 char(25),c2 varchar(500),c3 text,c4 tinytext,c5 mediumtext,c6 longtext )with( + "type"='kafka', + "topic"= 'test', + "partition" = '0', + "value"= 'json', + "bootstrap.servers"='127.0.0.1:9092' +) +Query OK, 0 rows affected (0.01 sec) +``` + +## Limitations + +drop and alter are not currently supported in the SOURCE table. + +Only join kafka is currently supported when creating a SOURCE table, and only transport data in json format is supported. diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-subscription.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-subscription.md index c1042ed97..7fcb63201 100644 --- a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-subscription.md +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-subscription.md @@ -1,10 +1,10 @@ # **CREATE...FROM...PUBLICATION...** -## **Description** +## **Grammar description** -`CREATE...FROM...PUBLICATION...` is when the subscriber subscribes to a publication created by the publisher to obtain the publisher's shared data. +`CREATE...FROM...PUBLICATION...` is a subscription by a subscriber to a publication created by the publisher to get the publisher's shared data. -## **Syntax** +## **Grammar structure** ``` CREATE DATABASE database_name @@ -12,19 +12,19 @@ FROM account_name PUBLICATION pubname; ``` -## **Explanations** +## Interpretation of grammar - database_name: The name of the database created by the subscriber. -- pubname: The name of the publication that the publisher has published. -- account_name: The account name of the publication can be obtained. +- pubname: The published name of the publisher. +- account_name: Gets the tenant name for this publication. ## **Examples** ```sql ---Suppose the system administrator creates a account acc1 as the subscriber +-- Suppose the system administrator creates a tenant, acc1, as a subscriber. create account acc1 admin_name 'root' identified by '111'; ---Assuming session 1 is the publisher, the publisher first publishes a database to the account +-- Assuming session 1 is the publisher, the publisher first publishes a database to the tenant create database sys_db_1; use sys_db_1; create table sys_tbl_1(a int primary key ); @@ -32,39 +32,36 @@ insert into sys_tbl_1 values(1),(2),(3); create view v1 as (select * from sys_tbl_1); create publication sys_pub_1 database sys_db_1; mysql> show publications; -+-----------+----------+ -| Name | Database | -+-----------+----------+ -| sys_pub_1 | sys_db_1 | -+-----------+----------+ ++-------------+----------+---------------------+-------------+-------------+----------+ +| publication | database | create_time | update_time | sub_account | comments | ++-------------+----------+---------------------+-------------+-------------+----------+ +| sys_pub_1 | sys_db_1 | 2024-04-24 11:54:36 | NULL | * | | ++-------------+----------+---------------------+-------------+-------------+----------+ 1 row in set (0.01 sec) ---Open a new session again, assuming that session 2 is the subscriber and the subscriber subscribes to the published database -mysql -h 127.0.0.1 -P 6001 -u acc1:root -p -- Log into the account -create database sub1 from sys publication pub1; - -mysql> create database sub1 from sys publication sys_pub_1; -Query OK, 1 row affected (0.02 sec) +-- A new session is opened, assuming that session 2 is a subscriber who subscribes to the published database +mysql -h 127.0.0.1 -P 6001 -u acc1:root -p --Login to Tenant Account +create database sub1 from sys publication sys_pub_1; mysql> show databases; +--------------------+ | Database | +--------------------+ -| system | -| system_metrics | | information_schema | -| mysql | | mo_catalog | +| mysql | | sub1 | +| system | +| system_metrics | +--------------------+ -6 rows in set (0.00 sec) +6 rows in set (0.01 sec) mysql> show subscriptions; -+------+--------------+ -| Name | From_Account | -+------+--------------+ -| sub1 | sys | -+------+--------------+ ++-----------+-------------+--------------+---------------------+----------+---------------------+ +| pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | ++-----------+-------------+--------------+---------------------+----------+---------------------+ +| sys_pub_1 | sys | sys_db_1 | 2024-04-24 11:54:36 | sub1 | 2024-04-24 11:56:05 | ++-----------+-------------+--------------+---------------------+----------+---------------------+ 1 row in set (0.01 sec) mysql> use sub1; @@ -98,8 +95,8 @@ mysql> select * from sys_tbl_1 order by a; | 3 | +------+ 3 rows in set (0.01 sec) --- Subscribe successfully +-- Subscription Success ``` !!! note - If you need to unsubscribe, you can directly delete the subscribed database. Refer to ['DROP DATABASE`](drop-database.md ). + If you need to unsubscribe, you can simply delete the subscribed database name and use [`DROP DATABASE`](drop-database.md). diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-table-as-select.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-table-as-select.md new file mode 100644 index 000000000..50023e3a7 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-table-as-select.md @@ -0,0 +1,321 @@ +# CREATE TABLE AS SELECT + +## Syntax Description + +The `CREATE TABLE AS SELECT` command creates a new table by copying column definitions and column data from an existing table specified in the `SELECT` query. However, it does not copy constraints, indexes, views, or other non-data attributes of the original table. + +## Syntax structure + +``` +> CREATE [TEMPORARY] TABLE [ IF NOT EXISTS ] table_name +[ (column_name [, ...] ) ] AS {query} + +Query can be any select statement in MO syntax. + +SELECT +[ALL | DISTINCT ] +select_expr [, select_expr] [[AS] alias] ... +[INTO variable [, ...]] +[FROM table_references +[WHERE where_condition] +[GROUP BY {col_name | expr | position} +[ASC | DESC]] +[HAVING where_condition] +[ORDER BY {col_name | expr | position} +[ASC | DESC]] [ NULLS { FIRST | LAST } ] +[LIMIT {[offset,] row_count | row_count OFFSET offset}] +``` + +## Grammatical interpretation + +- ALL: Default option to return all matching rows, including duplicate rows. + +- DISTINCT: Indicates that only unique rows are returned, i.e. duplicate rows are removed. + +- select_expr: Indicates the column or expression to select. + +- AS alias: Specifies an alias for the selected column or expression. + +- [INTO variable[, ...]: Used to store query results in a variable instead of returning them to the client. + +- [FROM table_references]: Specifies which table or tables to retrieve data from. table_references can be a table name or a complex expression (such as a join) with multiple tables. + +- [WHERE where_condition]: Used to filter the result set to return only rows that satisfy the where_condition condition. + +- [GROUP BY {col_name | expr | position} [ASC | DESC]]: Used to group result sets by one or more columns or expressions; ASC and DESC are used to specify how rows within a group are sorted. + +- [HAVING where_condition]: Filter groups after they are grouped. Usually used with GROUP BY to filter out groups that do not meet the criteria. + +- [ORDER BY {col_name | expr | position} [ASC | DESC] [NULLS {FIRST | LAST}]: Used to sort result sets; ASC and DESC are used to specify sorting methods. + +- [NULLS {FIRST | LAST}]: Used to specify how to handle the position of NULL values in the sort. + +- [LIMIT {[offset,] row_count | row_count OFFSET offset}]: Used to limit the number of rows returned. offset specifies which row of the result set to return from, with 0 being the first row. row_count Specifies the number of rows returned. + +## Permissions + +In `Matrixone`, executing the `CREATE TABLE AS SELECT` statement requires at least the following permissions: + +- `CREATE` permissions: Users need to have permissions to create tables, which can be done with `CREATE` permissions. + +- `INSERT` permission: Because the `CREATE TABLE AS SELECT` statement inserts the selected data into the new table, the user also needs to have permission to insert data into the target table. This can be done with `INSERT` privileges. + +- `SELECT` permission: Users need to be able to select data from the source data table, so they need to have SELECT permission. + +For more permission-related actions, check out the [Matrixone permission classification](../../access-control-type.md) and [grant instructions](../Data-Control-Language/grant.md). + +## Examples + +- Example 1 + +```sql +create table t1(a int default 123, b char(5)); +INSERT INTO t1 values (1, '1'),(2,'2'),(0x7fffffff, 'max'); + +mysql> create table t2 as select *from t1;--Whole Table Replication +Query OK, 3 rows affected (0.02 sec) + +mysql> desc t2; ++-------+---------+------+------+---------+-------+---------+ +| Field | Type | Null | Key | Default | Extra | Comment | ++-------+---------+------+------+---------+-------+---------+ +| a | INT(32) | YES | | 123 | | | +| b | CHAR(5) | YES | | NULL | | | ++-------+---------+------+------+---------+-------+---------+ +2 rows in set (0.01 sec) + +mysql> select * from t2; ++------------+------+ +| a | b | ++------------+------+ +| 1 | 1 | +| 2 | 2 | +| 2147483647 | max | ++------------+------+ +3 rows in set (0.00 sec) +``` + +- Example 2 + +```sql +create table t1(a int default 123, b char(5)); +INSERT INTO t1 values (1, '1'),(2,'2'),(0x7fffffff, 'max'); + +mysql> CREATE table test as select a as alias_a from t1;--Specify an alias for the selection column +Query OK, 3 rows affected (0.02 sec) + +mysql> desc test; ++---------+---------+------+------+---------+-------+---------+ +| Field | Type | Null | Key | Default | Extra | Comment | ++---------+---------+------+------+---------+-------+---------+ +| alias_a | INT(32) | YES | | 123 | | | ++---------+---------+------+------+---------+-------+---------+ +1 row in set (0.01 sec) + +mysql> select * from test; ++------------+ +| alias_a | ++------------+ +| 1 | +| 2 | +| 2147483647 | ++------------+ +3 rows in set (0.01 sec) +``` + +- Example 3 + +```sql +create table t1(a int default 123, b char(5)); +INSERT INTO t1 values (1, '1'),(2,'2'),(0x7fffffff, 'max'); + +mysql> create table t3 as select * from t1 where 1=2;--Copy only the fields, not the data +Query OK, 0 rows affected (0.01 sec) + +mysql> desc t3; ++-------+---------+------+------+---------+-------+---------+ +| Field | Type | Null | Key | Default | Extra | Comment | ++-------+---------+------+------+---------+-------+---------+ +| a | INT(32) | YES | | 123 | | | +| b | CHAR(5) | YES | | NULL | | | ++-------+---------+------+------+---------+-------+---------+ +2 rows in set (0.01 sec) + +mysql> select * from t3; +Empty set (0.00 sec) +``` + +- Example 4 + +```sql +create table t1(a int default 123, b char(5)); +INSERT INTO t1 values (1, '1'),(2,'2'),(0x7fffffff, 'max'); + +mysql> CREATE table t4(n1 int unique) as select max(a) from t1;--Use the original table data aggregation values as columns in the new table +Query OK, 1 row affected (0.03 sec) + +mysql> desc t4; ++--------+---------+------+------+---------+-------+---------+ +| Field | Type | Null | Key | Default | Extra | Comment | ++--------+---------+------+------+---------+-------+---------+ +| n1 | INT(32) | YES | UNI | NULL | | | +| max(a) | INT(32) | YES | | NULL | | | ++--------+---------+------+------+---------+-------+---------+ +2 rows in set (0.01 sec) + +mysql> select * from t4; ++------+------------+ +| n1 | max(a) | ++------+------------+ +| NULL | 2147483647 | ++------+------------+ +1 row in set (0.00 sec) +``` + +- Example 5 + +```sql +create table t5(n1 int,n2 int,n3 int); +insert into t5 values(1,1,1),(1,1,1),(3,3,3); + +mysql> create table t5_1 as select distinct n1 from t5;--Remove duplicate lines +Query OK, 2 rows affected (0.02 sec) + +mysql> select * from t5_1; ++------+ +| n1 | ++------+ +| 1 | +| 3 | ++------+ +2 rows in set (0.00 sec) +``` + +- Example 6 + +```sql +create table t6(n1 int,n2 int,n3 int); +insert into t6 values(1,1,3),(2,2,2),(3,3,1); + +mysql> create table t6_1 as select * from t6 order by n3;--Sorting the result set +Query OK, 3 rows affected (0.01 sec) + +mysql> select * from t6_1; ++------+------+------+ +| n1 | n2 | n3 | ++------+------+------+ +| 3 | 3 | 1 | +| 2 | 2 | 2 | +| 1 | 1 | 3 | ++------+------+------+ +3 rows in set (0.01 sec) +``` + +- Example 7 + +```sql +create table t7(n1 int,n2 int,n3 int); +insert into t7 values(1,1,3),(1,2,2),(2,3,1),(2,3,1),(3,3,1); + +mysql> CREATE TABLE t7_1 AS SELECT n1 FROM t7 GROUP BY n1 HAVING count(n1)>1;--Grouping of result sets +Query OK, 2 rows affected (0.02 sec) + +mysql> +mysql> select * from t7_1; ++------+ +| n1 | ++------+ +| 1 | +| 2 | ++------+ +2 rows in set (0.01 sec) +``` + +- Example 8 + +```sql +create table t8(n1 int,n2 int,n3 int); +insert into t8 values(1,1,1),(2,2,2),(3,3,3); + +mysql> CREATE TABLE t8_1 AS SELECT * FROM t8 limit 1 offset 1;--Specifies to return from the second row of the result set, and the number of rows to return is 1. + +mysql> select * from t8_1; ++------+------+------+ +| n1 | n2 | n3 | ++------+------+------+ +| 2 | 2 | 2 | ++------+------+------+ +1 row in set (0.00 sec) +``` + +- Example 9 + +```sql +create table t9 (a int primary key, b varchar(5) unique key); +create table t9_1 ( +a int primary key, +b varchar(5) unique, +c int , +d int, +foreign key(c) references t9(a), +INDEX idx_d(d) +); +insert into t9 values (101,'abc'),(102,'def'); +insert into t9_1 values (1,'zs1',101,1),(2,'zs2',102,1); + +mysql> create table t9_2 as select * from t9_1; + +mysql> show create table t9_1; ++-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Table | Create Table | ++-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| t9_1 | CREATE TABLE `t9_1` ( +`a` INT NOT NULL, +`b` VARCHAR(5) DEFAULT NULL, +`c` INT DEFAULT NULL, +`d` INT DEFAULT NULL, +PRIMARY KEY (`a`), +UNIQUE KEY `b` (`b`), +KEY `idx_d` (`d`), +CONSTRAINT `018f27eb-0b33-7289-a3c2-af479b1833b1` FOREIGN KEY (`c`) REFERENCES `t9` (`a`) ON DELETE RESTRICT ON UPDATE RESTRICT +) | ++-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +1 row in set (0.01 sec) + +mysql> show create table t9_2;--If the source table has constraints or indexes, the new table created by CTAS will not have the constraints and indexes of the original table by default. ++-------+-------------------------------------------------------------------------------------------------------------------+ +| Table | Create Table | ++-------+-------------------------------------------------------------------------------------------------------------------+ +| t9_2 | CREATE TABLE `t9_2` ( +`a` INT NOT NULL, +`b` VARCHAR(5) DEFAULT NULL, +`c` INT DEFAULT NULL, +`d` INT DEFAULT NULL +) | ++-------+-------------------------------------------------------------------------------------------------------------------+ +1 row in set (0.00 sec) + +--If you want the new table to come with the original table with constraints and indexes, you can build the table by adding the +ALTER TABLE t9_2 ADD PRIMARY KEY (a); +ALTER TABLE t9_2 ADD UNIQUE KEY (b); +ALTER TABLE t9_2 ADD FOREIGN KEY (c) REFERENCES t9 (a); +ALTER TABLE t9_2 ADD INDEX idx_d3 (d); + +mysql> show create table t9_2; ++-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Table | Create Table | ++-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| t9_2 | CREATE TABLE `t9_2` ( +`a` INT NOT NULL, +`b` VARCHAR(5) DEFAULT NULL, +`c` INT DEFAULT NULL, +`d` INT DEFAULT NULL, +PRIMARY KEY (`a`), +UNIQUE KEY `b` (`b`), +KEY `idx_d3` (`d`), +CONSTRAINT `018f282d-4563-7e9d-9be5-79c0d0e8136d` FOREIGN KEY (`c`) REFERENCES `t9` (`a`) ON DELETE RESTRICT ON UPDATE RESTRICT +) | ++-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +1 row in set (0.00 sec) + +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-table-like.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-table-like.md new file mode 100644 index 000000000..4f1331e89 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-table-like.md @@ -0,0 +1,72 @@ +# CREATE TABLE ... LIKE + +## Syntax Description + +`CREATE TABLE ... LIKE` Create an empty table based on the definition of another table, which copies the structure of the original table but not the data stored in the original table. + +## Syntax structure + +```sql +CREATE TABLE new_tbl LIKE orig_tbl; +``` + +## Examples + +```sql +create table test1 (a int primary key, b varchar(5) unique key); +create table test2 (a int primary key,b varchar(5) unique key,c double DEFAULT 0, d char,e int, foreign key(e) references foreign01(a), unique index(c,d)); +insert into test1 values (101,'abc'),(102,'def'); +insert into test2 values (1,'zs1',1,'a',101),(2,'zs2',2,'b',102); + +mysql> create table test3 like test2; +Query OK, 0 rows affected (0.02 sec) + +mysql> show CREATE TABLE test2; ++-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Table | Create Table | ++-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| test2 | CREATE TABLE `test2` ( +`a` INT NOT NULL, +`b` VARCHAR(5) DEFAULT NULL, +`c` DOUBLE DEFAULT 0, +`d` CHAR(1) DEFAULT NULL, +`e` INT DEFAULT NULL, +PRIMARY KEY (`a`), +UNIQUE KEY `b` (`b`), +UNIQUE KEY `c` (`c`,`d`), +CONSTRAINT `018eb74f-38f3-7eb4-80c1-95d9c65de706` FOREIGN KEY (`e`) REFERENCES `foreign01` (`a`) ON DELETE RESTRICT ON UPDATE RESTRICT +) | ++-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +1 row in set (0.00 sec) + +mysql> show CREATE TABLE test3; ++-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Table | Create Table | ++-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| test3 | CREATE TABLE `test3` ( +`a` INT NOT NULL, +`b` VARCHAR(5) DEFAULT null, +`c` DOUBLE DEFAULT 0, +`d` CHAR(1) DEFAULT null, +`e` INT DEFAULT null, +PRIMARY KEY (`a`), +UNIQUE KEY `b` (`b`), +UNIQUE KEY `c` (`c`,`d`), +CONSTRAINT `018eb74f-38f3-7eb4-80c1-95d9c65de706` FOREIGN KEY (`e`) REFERENCES `foreign01` (`a`) ON DELETE RESTRICT ON UPDATE RESTRICT +) | ++-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +1 row in set (0.00 sec) + +mysql> select * from test2; ++------+------+------+------+------+ +| a | b | c | d | e | ++------+------+------+------+------+ +| 1 | zs1 | 1 | a | 101 | +| 2 | zs2 | 2 | b | 102 | ++------+------+------+------+------+ +2 rows in set (0.00 sec) + +mysql> select * from test3; +Empty set (0.01 sec) + +``` diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-view.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-view.md index a5c20ee71..ff49008ba 100644 --- a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-view.md +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-view.md @@ -13,7 +13,7 @@ A view is created with the `CREATE VIEW` statement. ## **Syntax** ``` -> CREATE VIEW view_name AS +> CREATE [OR REPLACE] VIEW view_name AS SELECT column1, column2, ... FROM table_name WHERE condition; diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/drop-function.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/drop-function.md new file mode 100644 index 000000000..2d61585c8 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/drop-function.md @@ -0,0 +1,48 @@ +# **DROP FUNCTION** + +## **Grammar description** + +The `DROP FUNCTION` statement represents the deletion of a user-defined function. + +## **Grammar structure** + +``` +> DROP FUNCTION ([ ]… ) +``` + +## **Examples** + +**Example 1** + +```sql +--Removing Parametric Functions + +create or replace function py_add(a int, b int) returns int language python as +$$ +def add(a, b): + return a + b +$$ +handler 'add'; + +mysql> select py_add(1,2); ++--------------+ +| py_add(1, 2) | ++--------------+ +| 3 | ++--------------+ +1 row in set (0.01 sec) + +--When we no longer need the function, we can remove it +drop function py_add(int, int); +``` + +**Example 2** + +```sql +--Deleting Unreferenced Functions +mysql> CREATE FUNCTION t1_fun () RETURNS VARCHAR LANGUAGE SQL AS 'select n1 from t1 limit 1' ; +Query OK, 0 rows affected (0.01 sec) + +mysql> drop function t1_fun(); +Query OK, 0 rows affected (0.01 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/drop-snapshot.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/drop-snapshot.md new file mode 100644 index 000000000..abe9da314 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/drop-snapshot.md @@ -0,0 +1,30 @@ +# DROP SNAPSHOT + +## Syntax Description + +`DROP SNAPSHOT` is used to delete snapshots created under the current tenant. + +## Syntax structure + +``` +> DROP SNAPSHOT snapshot_name; +``` + +## Examples + +```sql +create snapshot sp1 for account sys; + +mysql> show snapshots; ++---------------+----------------------------+----------------+--------------+---------------+------------+ +| SNAPSHOT_NAME | TIMESTAMP | SNAPSHOT_LEVEL | ACCOUNT_NAME | DATABASE_NAME | TABLE_NAME | ++---------------+----------------------------+----------------+--------------+---------------+------------+ +| sp1 | 2024-05-10 09:55:11.601605 | account | sys | | | ++---------------+----------------------------+----------------+--------------+---------------+------------+ +1 row in set (0.01 sec) + +drop snapshot sp1; + +mysql> show snapshots; +Empty set (0.01 sec) +``` diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/restore-account.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/restore-account.md new file mode 100644 index 000000000..f3ad98330 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Definition-Language/restore-account.md @@ -0,0 +1,281 @@ +# RESTORE ACCOUNT + +## Syntax Description + +`RESTORE ACCOUNT` Restores a tenant/database/table to a state corresponding to a timestamp based on a snapshot created under the current tenant. + +## Syntax structure + +``` +> RESTORE ACCOUNT account_name [DATABASE database_name [TABLE table_name]] FROM SNAPSHOT snapshot_name [TO ACCOUNT account_name]; +``` + +## Examples + +- Example 1: Restore tenant to this tenant + +```sql +--Executed under tenant acc1 +CREATE database db1; +CREATE database db2; + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| db1 | +| db2 | +| information_schema | +| mo_catalog | +| mysql | +| system | +| system_metrics | ++--------------------+ +7 rows in set (0.00 sec) + +create snapshot acc1_snap1 for account acc1;--Creating a Snapshot +drop database db1;--Delete databases db1,db2 +drop database db2; + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| information_schema | +| mo_catalog | +| mysql | +| system | +| system_metrics | ++--------------------+ +5 rows in set (0.01 sec) + +restore account acc1 FROM snapshot acc1_snap1;--Restore tenant-level snapshots + +mysql> show databases;--Successful recovery ++--------------------+ +| Database | ++--------------------+ +| db1 | +| db2 | +| information_schema | +| mo_catalog | +| mysql | +| system | +| system_metrics | ++--------------------+ +7 rows in set (0.01 sec) +``` + +- Example 2: Restore database to this tenant + +```sql +--Executed under tenant acc1 +CREATE database db1; + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| db1 | +| information_schema | +| mo_catalog | +| mysql | +| system | +| system_metrics | ++--------------------+ +7 rows in set (0.00 sec) + +create snapshot acc1_db_snap1 for account acc1;--Creating a Snapshot +drop database db1;--Delete database db1 + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| information_schema | +| mo_catalog | +| mysql | +| system | +| system_metrics | ++--------------------+ +6 rows in set (0.01 sec) + +restore account acc1 database db1 FROM snapshot acc1_db_snap1;--Recovering database-level snapshots + +mysql> show databases;--Successful recovery ++--------------------+ +| Database | ++--------------------+ +| db1 | +| information_schema | +| mo_catalog | +| mysql | +| system | +| system_metrics | ++--------------------+ +7 rows in set (0.00 sec) +``` + +- Example 3: Restore table to this tenant + +```sql +--Executed under tenant acc1 +CREATE TABLE t1(n1 int); +INSERT INTO t1 values(1); + +mysql> SELECT * FROM t1; ++------+ +| n1 | ++------+ +| 1 | ++------+ +1 row in set (0.00 sec) + +create snapshot acc1_tab_snap1 for account acc1;--Creating a Snapshot +truncate TABLE t1;--Clear t1 + +mysql> SELECT * FROM t1; +Empty set (0.01 sec) + +restore account acc1 database db1 TABLE t1 FROM snapshot acc1_tab_snap1;--Restore Snapshot + +mysql> SELECT * FROM t1;--Successful recovery ++------+ +| n1 | ++------+ +| 1 | ++------+ +1 row in set (0.00 sec) +``` + +- Example 4: System tenant restores normal tenant to normal tenant This tenant + +```sql +--Executed under tenant acc1 +create database db1; + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| db1 | +| information_schema | +| mo_catalog | +| mysql | +| system | +| system_metrics | ++--------------------+ +6 rows in set (0.01 sec) + +--Execute under system tenant sys +create snapshot acc1_snap1 for account acc1;--Creating a snapshot for acc1 + +--Executed under tenant acc1 +drop database db1;--Delete database db1 + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| information_schema | +| mo_catalog | +| mysql | +| system | +| system_metrics | ++--------------------+ +6 rows in set (0.01 sec) + +--Execute under system tenant sys +restore account acc1 FROM snapshot acc1_snap1 TO account acc1;--Snapshot recovery of acc1 under system tenant + +--Executed under tenant acc1 +mysql> show databases;--Successful recovery ++--------------------+ +| Database | ++--------------------+ +| db1 | +| information_schema | +| mo_catalog | +| mysql | +| system | +| system_metrics | ++--------------------+ +6 rows in set (0.01 sec) +``` + +- Example 5: System tenant restores normal tenant to new tenant + +```sql +--Executed under tenant acc1 +create database db1; + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| db1 | +| information_schema | +| mo_catalog | +| mysql | +| system | +| system_metrics | ++--------------------+ +6 rows in set (0.01 sec) + +--Execute under system tenant sys +create snapshot acc1_snap1 for account acc1;--Creating a snapshot for acc1 + +--Executed under tenant acc1 +drop database db1;--Delete db1 + +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| information_schema | +| mo_catalog | +| mysql | +| system | +| system_metrics | ++--------------------+ +6 rows in set (0.01 sec) + +--Execute under system tenant sys +create account acc2 ADMIN_NAME admin IDENTIFIED BY '111';--Need to create new tenants to be targeted in advance +restore account acc1 FROM snapshot acc1_snap1 TO account acc2;--Snapshot recovery of acc1 under system tenant to acc2 + +--Executed under tenant acc1 +mysql> show databases; ++--------------------+ +| Database | ++--------------------+ +| information_schema | +| mo_catalog | +| mysql | +| system | +| system_metrics | ++--------------------+ +5 rows in set (0.00 sec) + +--Executed under tenant acc2 +mysql> show databases;--Revert to acc2 ++--------------------+ +| Database | ++--------------------+ +| db1 | +| information_schema | +| mo_catalog | +| mysql | +| system | +| system_metrics | ++--------------------+ +6 rows in set (0.01 sec) +``` + +## Limitations + +- Currently only tenant/database/table level recovery is supported, not clustered. + +- System tenant recovery normal tenant to new tenant allows only tenant level recovery. + +- Only system tenants can perform restore data to a new tenant, and only tenant-level restores are allowed. New tenants need to be created in advance, and in order to avoid object conflicts, it is best to have a new tenant. \ No newline at end of file diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/1.1-load-data.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/1.1-load-data.md deleted file mode 100644 index 0e45b972e..000000000 --- a/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/1.1-load-data.md +++ /dev/null @@ -1,576 +0,0 @@ -# **LOAD DATA** - -## **Description** - -The LOAD DATA statement reads rows from a text file into a table at a very high speed. The file can be read from the server host or a [S3 compatible object storage](../../../Develop/import-data/bulk-load/load-s3.md). `LOAD DATA` is the complement of [`SELECT ... INTO OUTFILE`](../../../Develop/export-data/select-into-outfile.md). To write data from a table to a file, use `SELECT ... INTO OUTFILE`. To read the file back into a table, use LOAD DATA. The syntax of the `FIELDS` and `LINES` clauses is the same for both statements. - -## **Syntax** - -### Load external data - -``` -> LOAD DATA [LOCAL] - INFILE 'file_name' - INTO TABLE tbl_name - [{FIELDS | COLUMNS} - [TERMINATED BY 'string'] - [[OPTIONALLY] ENCLOSED BY 'char'] - ] - [LINES - [STARTING BY 'string'] - [TERMINATED BY 'string'] - ] - [IGNORE number {LINES | ROWS}] - [SET column_name_1=nullif(column_name_1, expr1), column_name_2=nullif(column_name_2, expr2)...] - [PARALLEL {'TRUE' | 'FALSE'}] -``` - -**Parameter Explanation** - -#### Input File Location - -- `LOAD DATA INFILE 'file_name'`: Indicates that the data file to be loaded is on the same machine as the MatrixOne host server. `file_name` can be the relative path name of the storage location of the file, or it can be the absolute path name. - -- `LOAD DATA LOCAL INFILE 'file_name'`: indicates that the data file to be loaded is not on the same machine as the MatrixOne host server; that is, the data file is on the client server. `file_name` can be the relative path name of the storage location of the file, or it can be the absolute path name. - -#### Field and Line Handling - -For both the LOAD DATA and `SELECT ... INTO OUTFILE` statements, the syntax of the FIELDS and LINES clauses is the same. Both clauses are optional, but FIELDS must precede LINES if both are specified. - -If you specify a `FIELDS` clause, each of its subclauses (`TERMINATED BY`, `[OPTIONALLY] ENCLOSED BY`) is also optional, except that you must specify at least one of them. Arguments to these clauses are permitted to contain only ASCII characters. - -If you specify no `FIELDS` or `LINES` clause, the defaults are the same as if you had written this: - -``` -FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' -``` - -!!! note - - `FIELDS TERMINATED BY ','`: with and only `,`, `|` or `\t` as delimiters. - - `ENCLOSED BY '"'`: with and only `"` as the included character. - - `LINES TERMINATED BY '\n'`: Use and only use `\n` or `\r\n` as the line separator. - -**FIELDS TERMINATED BY** - -`FIELDS TERMINATED BY` specifies the delimiter for a field. The `FIELDS TERMINATED BY` values can be more than one character. - -For example, to read the comma-delimited file, the correct statement is: - -``` -LOAD DATA INFILE 'data.txt' INTO TABLE table1 - FIELDS TERMINATED BY ','; -``` - -If instead you tried to read the file with the statement shown following, it would not work because it instructs `LOAD DATA` to look for tabs between fields: - -``` -LOAD DATA INFILE 'data.txt' INTO TABLE table1 - FIELDS TERMINATED BY '\t'; -``` - -The likely result is that each input line would be interpreted as a single field. You may encounter an error of `"ERROR 20101 (HY000): internal error: the table column is larger than input data column"`. - -**FIELDS ENCLOSED BY** - -`FIELDS TERMINATED BY` option specifies the character enclose the input values. `ENCLOSED BY` value must be a single character. If the input values are not necessarily enclosed within quotation marks, use `OPTIONALLY` before the `ENCLOSED BY` option. - -For example, if some input values are enclosed within quotation marks, some are not: - -``` -LOAD DATA INFILE 'data.txt' INTO TABLE table1 - FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'; -``` - -**LINES TERMINATED BY** - -`LINES TERMINATED BY` specifies the delimiter for the a line. The `LINES TERMINATED BY` values can be more than one character. - -For example, if the lines in a csv file are terminated by carriage return/newline pairs, you can load it with `LINES TERMINATED BY '\r\n'`: - -``` -LOAD DATA INFILE 'data.txt' INTO TABLE table1 - FIELDS TERMINATED BY ',' ENCLOSED BY '"' - LINES TERMINATED BY '\r\n'; -``` - -**LINE STARTING BY** - -If all the input lines have a common prefix that you want to ignore, you can use `LINES STARTING BY` 'prefix_string' to skip the prefix and anything before it. If a line does not include the prefix, the entire line is skipped. Suppose that you issue the following statement: - -``` -LOAD DATA INFILE '/tmp/test.txt' INTO TABLE table1 - FIELDS TERMINATED BY ',' LINES STARTING BY 'xxx'; -``` - -If the data file looks like this: - -``` -xxx"abc",1 -something xxx"def",2 -"ghi",3 -``` - -The resulting rows are ("abc",1) and ("def",2). The third row in the file is skipped because it does not contain the prefix. - -#### IGNORE LINES - -The IGNORE number LINES clause can be used to ignore lines at the start of the file. For example, you can use `IGNORE 1 LINES` to skip an initial header line containing column names: - -``` -LOAD DATA INFILE '/tmp/test.txt' INTO TABLE table1 IGNORE 1 LINES; -``` - -#### SET - -MatrixOne only supports `SET column_name=nullif(column_name,expr)`. That is, when `column_name = expr`, it returns `NULL`; otherwise, it returns the original value of `column_name`. For example, `SET a=nullif(a,1)`, if a=1, returns `NULL`; otherwise, it returns the original value of column a. - -By setting the parameter, you can use `SET column_name=nullif(column_name,"null")` to return the `NULL` value in the column when loading the file. - -**Example** - -1. The details of the local file `test.txt` are as follows: - - ``` - id,user_name,sex - 1,"weder","man" - 2,"tom","man" - null,wederTom,"man" - ``` - -2. Create a table named `user` in MatrixOne: - - ```sql - create database aaa; - use aaa; - CREATE TABLE `user` (`id` int(11) ,`user_name` varchar(255) ,`sex` varchar(255)); - ``` - -3. Load `test.txt` into the table `user`: - - ```sql - LOAD DATA INFILE '/tmp/test.txt' INTO TABLE user SET id=nullif(id,"null"); - ``` - -4. The result of the talbe is as below: - - ```sql - select * from user; - +------+-----------+------+ - | id | user_name | sex | - +------+-----------+------+ - | 1 | weder | man | - | 2 | tom | man | - | null | wederTom | man | - +------+-----------+------+ - ``` - -#### PARALLEL - -For a sizeable well-formed file, such as a *JSOLLines* file or a *CSV* file with no line breaks in a line of data, you can use `PARALLEL` to load the file in parallel to speed up the loading. - -For example, for a large file of 2 G, use two threads to load; the second thread first splits and locates the 1G position, then reads and loads backward. In this way, two threads can read large files at the same time, and each thread can read 1G of data. - -**Enable/Disable Parallel Loading Command Line Example**: - -```sql --- Enable Parallel Loading -load data infile 'file_name' into table tbl_name FIELDS TERMINATED BY '|' ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 LINES PARALLEL 'TRUE'; - --- Disable Parallel Loading -load data infile 'file_name' into table tbl_name FIELDS TERMINATED BY '|' ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 LINES PARALLEL 'FALSE'; - --- Parallel loading is disabled by default -load data infile 'file_name' into table tbl_name FIELDS TERMINATED BY '|' ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 LINES; -``` - -!!! note - `[PARALLEL {'TRUE' | 'FALSE'}]` currently only support `TRUE` or `FALSE` and are not case-sensitive. - -__Note:__ If the `PARALLEL` field is not added in the `LOAD` statement, for *CSV* files, parallel loading is disabled by default; for *JSOLLines* files, parallel loading is enabled by default. If there is a line terminator in the *CSV* file, such as '\n', otherwise it may cause data errors when the file is loaded. If the file is too large, manually splitting the file from the '\n' as the starting and ending point is recommended, then enabling parallel loading. - -### Load Time-Series Data - -``` -> LOAD DATA INLINE FORMAT='' DATA='' -INTO TABLE tbl_name - [{FIELDS | COLUMNS} - [TERMINATED BY 'string'] - [[OPTIONALLY] ENCLOSED BY 'char'] - ] - [LINES - [STARTING BY 'string'] - [TERMINATED BY 'string'] - ] - [IGNORE number {LINES | ROWS}] -``` - -**Parameter Explanation** - -The SQL command `LOAD DATA INLINE` for loading time-series data has the following parameter explanations: - -- `FORMAT`: Specifies the format of the time-series data being loaded; for example, `FORMAT='csv'` indicates that the data being loaded is in CSV format. It supports the same formats as `LOAD DATA INFILE`. - -- `DATA`: Specifies the actual time-series data to be loaded. In the example, `DATA='1\n2\n'` indicates that the data to be loaded consists of two lines containing the numbers 1 and 2. - -!!! note - Parameters such as `FIELDS`, `COLUMNS`, `TERMINATED BY`, `ENCLOSED BY`, `LINES`, `STARTING BY`, and `IGNORE` can be referred to the parameter explanations of `LOAD DATA INFILE` as mentioned earlier. - -Example command for loading time-series data: `load data inline format='csv', data='1\n2\n' into table t1;`. This command loads the specified data in CSV format (which includes two lines, namely 1 and 2) into a database table named `t1`. - -## Supported file formats - -In MatrixOne's current release, `LOAD DATA` supports CSV(comma-separated values) format and JSONLines format file. -See full tutorials for loading [csv](../../../Develop/import-data/bulk-load/load-csv.md) and [jsonline](../../../Develop/import-data/bulk-load/load-jsonline.md). - -### *CSV* format standard description - -The *CSV* format loaded by MatrixOne conforms to the RFC4180 standard, and the *CSV* format is specified as follows: - -1. Each record is on a separate line, separated by a newline character (CRLF): - - ``` - aaa,bbb,ccc CRLF - zzz,yyy,xxx CRLF - ``` - - Imported into the table as follows: - - +---------+---------+---------+ - | col1 | col2 | col3 | - +---------+---------+---------+ - | aaa | b bb | ccc | - | zzz | yyy | xxx | - +---------+---------+---------+ - -2. The last record in the file can have a terminating newline or no terminating newline (CRLF): - - ``` - aaa,bbb,ccc CRLF - zzz,yyy,xxx - ``` - - Imported into the table as follows: - - +---------+---------+---------+ - | col1 | col2 | col3 | - +---------+---------+---------+ - | aaa | b bb | ccc | - | zzz | yyy | xxx | - +---------+---------+---------+ - -3. An optional header line appears as the first line of the file and has the same format as a standard record line. For example: - - ``` - field_name,field_name,field_name CRLF - aaa,bbb,ccc CRLF - zzz,yyy,xxx CRLF - ``` - - Imported into the table as follows: - - +------------+------------+------------+ - | field_name | field_name | field_name | - +------------+------------+------------+ - | aaa | bbb | ccc | - | zzz | yyy | xxx | - +------------+------------+------------+ - -4. In the header and each record, there may be one or more fields separated by commas. Whitespace within a field is part of the field and should not be ignored. A comma cannot follow the last field in each record. For example: - - ``` - aaa,bbb,ccc - ``` - - Or: - - ``` - a aa, bbb,cc c - ``` - - Both examples are correct. - - Imported into the table as follows: - - +---------+---------+---------+ - | col1 | col2 | col3 | - +---------+---------+---------+ - | aaa | bbb | ccc | - +---------+---------+---------+ - - Or: - - +---------+---------+---------+ - | col1 | col2 | col3 | - +---------+---------+---------+ - | a aa | bbb | cc c | - +---------+---------+---------+ - -5. Each field can be enclosed in double quotes or not. Double quotes cannot appear inside a field if the field is not enclosed in double-quotes. For example: - - ``` - "aaa","bbb","ccc" CRLF - zzz,yyy,xxx - ``` - - Or: - - ``` - "aaa","bbb",ccc CRLF - zzz,yyy,xxx - ``` - - Both examples are correct. - - Imported into the table as follows: - - +---------+---------+---------+ - | col1 | col2 | col3 | - +---------+---------+---------+ - | aaa | bbb | ccc | - | zzz | yyy | xxx | - +---------+---------+---------+ - -6. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example: - - ``` - "aaa","b CRLF - bb","ccc" CRLF - zzz,yyy,xxx - ``` - - Imported into the table as follows: - - +---------+---------+---------+ - | col1 | col2 | col3 | - +---------+---------+---------+ - | aaa | b bb | ccc | - | zzz | yyy | xxx | - +---------+---------+---------+ - -7. If double quotation marks are used to enclose the field, then multiple double quotation marks appearing in the field must also be enclosed in double quotation marks; otherwise, the first quotation mark of two double quotation marks in the field will be parsed as an escape character, thus keep a single, double quote. For example: - - ``` - "aaa","b","bb","ccc" - ``` - - The above *CSV* will parse `"b""bb"` into `b"bb`; if the correct field is `b""bb`, then it should be written as: - - ``` - "aaa","b""""bb","ccc" - ``` - - Or: - - ``` - "aaa",b""bb,"ccc" - ``` - - Imported into the table as follows: - - +---------+---------+---------+ - | col1 | col2 | col3 | - +---------+---------+---------+ - | aaa | b""bb | ccc | - +---------+---------+---------+ - -## **Examples** - -The SSB Test is an example of LOAD DATA syntax. [Complete a SSB Test with MatrixOne](../../../Test/performance-testing/SSB-test-with-matrixone.md) - -``` -> LOAD DATA INFILE '/ssb-dbgen-path/lineorder_flat.tbl ' INTO TABLE lineorder_flat; -``` - -The above statement means: load the *lineorder_flat.tbl* data set under the directory path */ssb-dbgen-path/* into the MatrixOne data table *lineorder_flat*. - -You can also refer to the following syntax examples to quickly understand `LOAD DATA`: - -### Example 1: LOAD CSV - -#### Simple example - -The data in the file locally named *char_varchar.csv* is as follows: - -``` -a|b|c|d -"a"|"b"|"c"|"d" -'a'|'b'|'c'|'d' -"'a'"|"'b'"|"'c'"|"'d'" -"aa|aa"|"bb|bb"|"cc|cc"|"dd|dd" -"aa|"|"bb|"|"cc|"|"dd|" -"aa|||aa"|"bb|||bb"|"cc|||cc"|"dd|||dd" -"aa'|'||aa"|"bb'|'||bb"|"cc'|'||cc"|"dd'|'||dd" -aa"aa|bb"bb|cc"cc|dd"dd -"aa"aa"|"bb"bb"|"cc"cc"|"dd"dd" -"aa""aa"|"bb""bb"|"cc""cc"|"dd""dd" -"aa"""aa"|"bb"""bb"|"cc"""cc"|"dd"""dd" -"aa""""aa"|"bb""""bb"|"cc""""cc"|"dd""""dd" -"aa""|aa"|"bb""|bb"|"cc""|cc"|"dd""|dd" -"aa""""|aa"|"bb""""|bb"|"cc""""|cc"|"dd""""|dd" -||| -|||| -""|""|""| -""""|""""|""""|"""" -""""""|""""""|""""""|"""""" -``` - -Create a table named t1 in MatrixOne: - -```sql -mysql> drop table if exists t1; -Query OK, 0 rows affected (0.01 sec) - -mysql> create table t1( - -> col1 char(225), - -> col2 varchar(225), - -> col3 text, - -> col4 varchar(225) - -> ); -Query OK, 0 rows affected (0.02 sec) -``` - -Load the data file into table t1: - -```sql -load data infile '/char_varchar.csv' into table t1 fields terminated by'|'; -``` - -The query result is as follows: - -``` -mysql> select * from t1; -+-----------+-----------+-----------+-----------+ -| col1 | col2 | col3 | col4 | -+-----------+-----------+-----------+-----------+ -| a | b | c | d | -| a | b | c | d | -| 'a' | 'b' | 'c' | 'd' | -| 'a' | 'b' | 'c' | 'd' | -| aa|aa | bb|bb | cc|cc | dd|dd | -| aa| | bb| | cc| | dd| | -| aa|||aa | bb|||bb | cc|||cc | dd|||dd | -| aa'|'||aa | bb'|'||bb | cc'|'||cc | dd'|'||dd | -| aa"aa | bb"bb | cc"cc | dd"dd | -| aa"aa | bb"bb | cc"cc | dd"dd | -| aa"aa | bb"bb | cc"cc | dd"dd | -| aa""aa | bb""bb | cc""cc | dd""dd | -| aa""aa | bb""bb | cc""cc | dd""dd | -| aa"|aa | bb"|bb | cc"|cc | dd"|dd | -| aa""|aa | bb""|bb | cc""|cc | dd""|dd | -| | | | | -| | | | | -| | | | | -| " | " | " | " | -| "" | "" | "" | "" | -+-----------+-----------+-----------+-----------+ -20 rows in set (0.00 sec) -``` - -### Add conditional Example - -Following the example above, you can modify the `LOAD DATA` statement and add `LINES STARTING BY 'aa' ignore 10 lines;` at the end of the statement to experience the difference: - -```sql -delete from t1; -load data infile '/char_varchar.csv' into table t1 fields terminated by'|' LINES STARTING BY 'aa' ignore 10 lines; -``` - -The query result is as follows: - -```sql -mysql> select * from t1; -+---------+---------+---------+---------+ -| col1 | col2 | col3 | col4 | -+---------+---------+---------+---------+ -| aa"aa | bb"bb | cc"cc | dd"dd | -| aa""aa | bb""bb | cc""cc | dd""dd | -| aa""aa | bb""bb | cc""cc | dd""dd | -| aa"|aa | bb"|bb | cc"|cc | dd"|dd | -| aa""|aa | bb""|bb | cc""|cc | dd""|dd | -| | | | | -| | | | | -| | | | | -| " | " | " | " | -| "" | "" | "" | "" | -+---------+---------+---------+---------+ -10 rows in set (0.00 sec) -``` - -As you can see, the query result ignores the first line and -and ignores the common prefix aa. - -For more information on loding *csv*, see [Import the *.csv* data](../../../Develop/import-data/bulk-load/load-csv.md). - -### Example 2: LOAD JSONLines - -#### Simple example - -The data in the file locally named *jsonline_array.jl* is as follows: - -``` -[true,1,"var","2020-09-07","2020-09-07 00:00:00","2020-09-07 00:00:00","18",121.11,["1",2,null,false,true,{"q":1}],"1qaz",null,null] -["true","1","var","2020-09-07","2020-09-07 00:00:00","2020-09-07 00:00:00","18","121.11",{"c":1,"b":["a","b",{"q":4}]},"1aza",null,null] -``` - -Create a table named t1 in MatrixOne: - -```sql -mysql> drop table if exists t1; -Query OK, 0 rows affected (0.01 sec) - -mysql> create table t1(col1 bool,col2 int,col3 varchar(100), col4 date,col5 datetime,col6 timestamp,col7 decimal,col8 float,col9 json,col10 text,col11 json,col12 bool); -Query OK, 0 rows affected (0.03 sec) -``` - -Load the data file into table t1: - -``` -load data infile {'filepath'='/jsonline_array.jl','format'='jsonline','jsondata'='array'} into table t1; -``` - -The query result is as follows: - -```sql -mysql> select * from t1; -+------+------+------+------------+---------------------+---------------------+------+--------+---------------------------------------+-------+-------+-------+ -| col1 | col2 | col3 | col4 | col5 | col6 | col7 | col8 | col9 | col10 | col11 | col12 | -+------+------+------+------------+---------------------+---------------------+------+--------+---------------------------------------+-------+-------+-------+ -| true | 1 | var | 2020-09-07 | 2020-09-07 00:00:00 | 2020-09-07 00:00:00 | 18 | 121.11 | ["1", 2, null, false, true, {"q": 1}] | 1qaz | NULL | NULL | -| true | 1 | var | 2020-09-07 | 2020-09-07 00:00:00 | 2020-09-07 00:00:00 | 18 | 121.11 | {"b": ["a", "b", {"q": 4}], "c": 1} | 1aza | NULL | NULL | -+------+------+------+------------+---------------------+---------------------+------+--------+---------------------------------------+-------+-------+-------+ -2 rows in set (0.00 sec) -``` - -#### Add conditional Example - -Following the example above, you can modify the `LOAD DATA` statement and add `ignore 1 lines` at the end of the statement to experience the difference: - -``` -delete from t1; -load data infile {'filepath'='/jsonline_array.jl','format'='jsonline','jsondata'='array'} into table t1 ignore 1 lines; -``` - -The query result is as follows: - -```sql -mysql> select * from t1; -+------+------+------+------------+---------------------+---------------------+------+--------+-------------------------------------+-------+-------+-------+ -| col1 | col2 | col3 | col4 | col5 | col6 | col7 | col8 | col9 | col10 | col11 | col12 | -+------+------+------+------------+---------------------+---------------------+------+--------+-------------------------------------+-------+-------+-------+ -| true | 1 | var | 2020-09-07 | 2020-09-07 00:00:00 | 2020-09-07 00:00:00 | 18 | 121.11 | {"b": ["a", "b", {"q": 4}], "c": 1} | 1aza | NULL | NULL | -+------+------+------+------------+---------------------+---------------------+------+--------+-------------------------------------+-------+-------+-------+ -1 row in set (0.00 sec) -``` - -As you can see, the query result ignores the first line. - -For more information on loding *JSONLines*, see [Import the JSONLines data](../../../Develop/import-data/bulk-load/load-jsonline.md). - -## **Constraints** - -1. The `REPLACE` and `IGNORE` modifiers control handling of new (input) rows that duplicate existing table rows on unique key values (`PRIMARY KEY` or `UNIQUE index` values) are not supported in MatrixOne yet. -2. Input pre-pressing with `SET` is supported very limitedly. Only `SET columns_name=nullif(expr1,expr2)` is supported. -3. When enabling the parallel loading, it must be ensured that each row of data in the file does not contain the specified line terminator, such as '\n'; otherwise, it will cause data errors during file loading. -4. The parallel loading of files requires that the files be in uncompressed format, and parallel loading of files in compressed form is not currently supported. -5. When you use `load data local`, you need to use the command line to connect to the MatrixOne service host: `mysql -h -P 6001 -uroot -p111 --local-infile`. -6. MatrixOne does not support `ESCAPED BY` currently. Writing or reading special characters differs from MySQL to some extent. diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/insert-on-duplicate.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/insert-on-duplicate.md deleted file mode 100644 index 0b3f4fbca..000000000 --- a/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/insert-on-duplicate.md +++ /dev/null @@ -1,66 +0,0 @@ -# **INSERT ... ON DUPLICATE KEY UPDATE** - -## **Description** - -`INSERT ... ON DUPLICATE KEY UPDATE` is used to insert data into the database table; if the data already exists, update the data; otherwise, insert new data. - -The `INSERT INTO` statement is a standard statement used to insert data into a database table; the `ON DUPLICATE KEY UPDATE` statement performs an update operation when there are duplicate records in the table. If a record with the same unique index or primary key exists in the table, use the `UPDATE` clause to update the corresponding column value; otherwise, use the `INSERT` clause to insert a new record. - -It should be noted that the premise of using this syntax is that a primary key constraint needs to be established in the table to determine whether there are duplicate records. At the same time, both the update operation and the insert operation need to set the corresponding column value. Otherwise, a syntax error will result. - -## **Syntax** - -``` -> INSERT INTO [db.]table [(c1, c2, c3)] VALUES (v11, v12, v13), (v21, v22, v23), ... - [ON DUPLICATE KEY UPDATE column1 = value1, column2 = value2, column3 = value3, ...]; -``` - -## **Examples** - -```sql -CREATE TABLE user ( - id INT(11) NOT NULL PRIMARY KEY, - name VARCHAR(50) NOT NULL, - age INT(3) NOT NULL -); --- Insert a new data; it does not exist, insert the new data -INSERT INTO user (id, name, age) VALUES (1, 'Tom', 18) -ON DUPLICATE KEY UPDATE name='Tom', age=18; - -mysql> select * from user; -+------+------+------+ -| id | name | age | -+------+------+------+ -| 1 | Tom | 18 | -+------+------+------+ -1 row in set (0.01 sec) - --- Increment the age field of an existing record by 1 while keeping the name field unchanged -INSERT INTO user (id, name, age) VALUES (1, 'Tom', 18) -ON DUPLICATE KEY UPDATE age=age+1; - -mysql> select * from user; -+------+------+------+ -| id | name | age | -+------+------+------+ -| 1 | Tom | 19 | -+------+------+------+ -1 row in set (0.00 sec) - --- Insert a new record, and update the name and age fields to the specified values -INSERT INTO user (id, name, age) VALUES (2, 'Lucy', 20) -ON DUPLICATE KEY UPDATE name='Lucy', age=20; - -mysql> select * from user; -+------+------+------+ -| id | name | age | -+------+------+------+ -| 1 | Tom | 19 | -| 2 | Lucy | 20 | -+------+------+------+ -2 rows in set (0.01 sec) -``` - -## **Constraints** - -Unique key are not currently supported with `INSERT ... ON DUPLICATE KEY UPDATE`, and since unique key can be null, some unknown errors can occur. diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/load-data-inline.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/load-data-inline.md new file mode 100644 index 000000000..439ca2cd3 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/load-data-inline.md @@ -0,0 +1,57 @@ +# **LOAD DATA INLINE** + +## **Overview** + +The `LOAD DATA INLINE` syntax imports strings organized in *csv* format into a data table, faster than `INSERT INTO` operations. The functionality of `LOAD DATA INLINE` is suitable for streaming fast data writes without a primary key, as in IoT class scenarios. + +## Syntax structure + +```mysql +mysql> LOAD DATA INLINE +FORMAT='csv' , +DATA=$XXX$ +csv_string $XXX$ +INTO TABLE tbl_name; +``` + +**Parametric interpretation** + +`FORMAT='csv'` indicates that the string data in the following `DATA` is organized in `csv` format. + +`$XXX$` in `DATA=$XXX$ csv_string $XXX$` is the identifier for the beginning and end of the data. `csv_string` is string data organized in `csv` format with `\n` or `\r\n` as a newline character. + +!!! note + `$XXX$` is the identifier for the beginning and end of the data, note that `$XXX$` at the end of the data needs to be on the same line as the last line of data, a new line may cause `ERROR 20101` + +### Example: Importing data using `LOAD DATA INLINE` + +1. Start the MySQL client and connect to MatrixOne: + + ```mysql + mysql -h 127.0.0.1 -P 6001 -uroot -p111 + ``` + + !!! note + The login account in the above code section is the initial account. Please change the initial password promptly after logging into MatrixOne, see [Password Management](../../../Security/password-mgmt.md). + +2. Before executing `LOAD DATA INLINE` in MatrixOne, you need to create the completion data table `user` in MatrixOne in advance: + + ```mysql + + CREATE TABLE `user` ( + `name` VARCHAR(255) DEFAULT null, + `age` INT DEFAULT null, + `city` VARCHAR(255) DEFAULT null + ) + ``` + +3. Perform a `LOAD DATA INLINE` on the MySQL client for data import, importing data in *csv* format: + + ```mysql + mysql> LOAD DATA INLINE + FORMAT='csv', + DATA=$XXX$ + Lihua,23,Shanghai + Bob,25,Beijing $XXX$ + INTO TABLE user; + ``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/insert-ignore.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/insert-ignore.md new file mode 100644 index 000000000..da7b6dbcd --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/insert-ignore.md @@ -0,0 +1,52 @@ +# INSERT IGNORE + +## Syntax Description + +`INSERT IGNORE` is used when inserting data into a database table with the same unique index or primary key to ignore the data if it already exists instead of returning an error, otherwise insert new data. + +Unlike MySQL, MatrixOne ignores errors when inserting duplicate values for unique indexes or primary keys, whereas MySQL has alert messages. + +## Syntax structure + +``` +> INSERT IGNORE INTO [db.]table [(c1, c2, c3)] VALUES (v11, v12, v13), (v21, v22, v23), ...; +``` + +## Examples + +```sql +CREATE TABLE user ( + id INT(11) NOT NULL PRIMARY KEY, + name VARCHAR(50) NOT NULL, + age INT(3) NOT NULL +); +-- Insert a new piece of data, the id doesn't exist, so enter the new data. +mysql> INSERT IGNORE INTO user VALUES (1, 'Tom', 18); +Query OK, 0 rows affected (0.02 sec) + +mysql> SELECT * FROM USER; ++------+------+------+ +| id | name | age | ++------+------+------+ +| 1 | Tom | 18 | ++------+------+------+ +1 row in set (0.01 sec) + +-- Insert a new piece of data, the id exists, and the data is ignored. +mysql> INSERT IGNORE INTO user VALUES (1, 'Jane', 16); +Query OK, 0 rows affected (0.00 sec) + +mysql> SELECT * FROM USER; ++------+------+------+ +| id | name | age | ++------+------+------+ +| 1 | Tom | 18 | ++------+------+------+ +1 row in set (0.01 sec) +``` + +## Limitations + +- `INSERT IGNORE` does not support writing `NULL` to `NOT NULL` columns. +- `INSERT IGNORE` does not support incorrect data type conversions. +- `INSERT IGNORE` does not support handling operations where inserted data in a partition table contains mismatched partition values. \ No newline at end of file diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/insert-on-duplicate.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/insert-on-duplicate.md new file mode 100644 index 000000000..cffe0da6f --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/insert-on-duplicate.md @@ -0,0 +1,66 @@ +# **INSERT ... ON DUPLICATE KEY UPDATE** + +## **Grammar description** + +`INSERT ... ON` DUPLICATE KEY UPDATE When inserting data into a database table, update the data if it already exists, otherwise insert new data. + +The `INSERT INTO` statement is the standard statement used to insert data into a database table; the `ON DUPLICATE KEY UPDATE` statement is used to update when there are duplicate records in the table. If a record with the same unique index or primary key exists in the table, use the `UPDATE` clause to update the corresponding column value, otherwise use the `INSERT` clause to insert a new record. + +It is important to note that using this syntax presupposes that a primary key constraint needs to be established in the table to determine if there are duplicate records. At the same time, both update and insert operations need to have the corresponding column values set, otherwise syntax errors will result. + +## **Grammar structure** + +``` +> INSERT INTO [db.]table [(c1, c2, c3)] VALUES (v11, v12, v13), (v21, v22, v23), ... + [ON DUPLICATE KEY UPDATE column1 = value1, column2 = value2, column3 = value3, ...]; +``` + +## **Examples** + +```sql +CREATE TABLE user ( + id INT(11) NOT NULL PRIMARY KEY, + name VARCHAR(50) NOT NULL, + age INT(3) NOT NULL +); +-- Insert a new piece of data, the id doesn't exist, so enter the new data. +INSERT INTO user (id, name, age) VALUES (1, 'Tom', 18) +ON DUPLICATE KEY UPDATE name='Tom', age=18; + +mysql> select * from user; ++------+------+------+ +| id | name | age | ++------+------+------+ +| 1 | Tom | 18 | ++------+------+------+ +1 row in set (0.01 sec) + +-- Increases the age field of an existing record by 1, while leaving the name field unchanged. +INSERT INTO user (id, name, age) VALUES (1, 'Tom', 18) +ON DUPLICATE KEY UPDATE age=age+1; + +mysql> select * from user; ++------+------+------+ +| id | name | age | ++------+------+------+ +| 1 | Tom | 19 | ++------+------+------+ +1 row in set (0.00 sec) + +-- Inserts a new row, updating the name and age fields to the specified values. +INSERT INTO user (id, name, age) VALUES (2, 'Lucy', 20) +ON DUPLICATE KEY UPDATE name='Lucy', age=20; + +mysql> select * from user; ++------+------+------+ +| id | name | age | ++------+------+------+ +| 1 | Tom | 19 | +| 2 | Lucy | 20 | ++------+------+------+ +2 rows in set (0.01 sec) +``` + +## **Restrictions** + +`INSERT ... ON` DUPLICATE KEY UPDATE does not currently support Unique keys, which may cause some unknown errors because they can be `NULL`. diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/replace.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/replace.md new file mode 100644 index 000000000..2d39bee16 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/replace.md @@ -0,0 +1,153 @@ +# **REPLACE** + +## **Grammar description** + +`REPLACE` is not only a string function, but also a data manipulation statement for a replacement operation. The effect of the `REPLACE` statement is to insert data into the table. If an eligible record already exists in the table, the record is deleted before the new data is inserted. If no eligible records exist in the table, the new data is inserted directly. + +`REPLACE` is typically used in tables with unique constraints. + +- The `REPLACE` statement requires that a primary key or unique index must be present in the table to determine if the same record already exists. +- When inserting a new record using the `REPLACE` statement, if a record with the same primary key or unique index already exists, the old record is deleted and the new record is inserted, which may cause the value to change since it was added. + +## **Grammar structure** + +``` +REPLACE + [INTO] tbl_name + [(col_name [, col_name] ...)] + { VALUES(value_list) + | + VALUES row_constructor_list + } + +REPLACE + [INTO] tbl_name + SET assignment_list + +value: + {expr | DEFAULT} + +value_list: + value [, value] ... + +row_constructor_list: + ROW(value_list) + +assignment: + col_name = value + +assignment_list: + assignment [, assignment] ... +``` + +### Parameter interpretation + +`REPLACE` statements are used to insert data into a table or to update existing data. Its syntax takes two forms: an insert based on the column name, and an update based on the SET clause. + +The following is an explanation of each parameter: + +1. `INTO`: Optional keyword indicating which table to insert or update data into. + +2. `tbl_name`: Indicates the name of the table into which data is to be inserted or updated. + +3. `col_name`: Optional parameter indicating the column name to insert or update. In insert form, you can specify which columns to insert by column name; in update form, specify which columns to update. + +4. `value`: Indicates the value to insert or update. This can be a specific expression (expr) or default (DEFAULT). + +5. `value_list`: Represents a set of values to insert. Multiple values are separated by commas. + +6. (Not yet supported) `row_constructor_list`: Represents a row consisting of a set of values used for insertion. The values for each line are enclosed in parentheses and separated by commas. + +7. `assignment`: Represents the association of a column name with its corresponding value for updating the form. + +8. `assignment_list`: Represents an association of multiple column names and corresponding values for updating forms. Multiple column names and values are separated by commas. + +!!! note + When using insert form, you can insert data using the `VALUES` keyword followed by `value_list` or `row_constructor_list`. `VALUES is` followed by `value_list` for inserting a row of data, and `VALUES` is followed by `row_constructor_list` for inserting multiple rows of data. - When using the update form, use the `SET` keyword followed by `assignment_list` to specify the columns and corresponding values to update. + +## **Examples** + +```sql +create table names(id int PRIMARY KEY,name VARCHAR(255),age int); + +-- Insert a row of data with id=1, name="Abby", age=24 +replace into names(id, name, age) values(1,"Abby", 24); +mysql> select name, age from names where id = 1; ++------+------+ +| name | age | ++------+------+ +| Abby | 24 | ++------+------+ +1 row in set (0.00 sec) + +mysql> select * from names; ++------+------+------+ +| id | name | age | ++------+------+------+ +| 1 | Abby | 24 | ++------+------+------+ +1 row in set (0.00 sec) + +-- Use the replace statement to update the name and age columns of the record with id=1 to the values "Bob" and 25. +replace into names(id, name, age) values(1,"Bobby", 25); + +mysql> select name, age from names where id = 1; ++-------+------+ +| name | age | ++-------+------+ +| Bobby | 25 | ++-------+------+ +1 row in set (0.00 sec) + +mysql> select * from names; ++------+-------+------+ +| id | name | age | ++------+-------+------+ +| 1 | Bobby | 25 | ++------+-------+------+ +1 row in set (0.01 sec) + +-- Use the replace statement to insert a row with id=2, name="Ciro", and age NULL. +replace into names set id = 2, name = "Ciro"; + +mysql> select name, age from names where id = 2; ++------+------+ +| name | age | ++------+------+ +| Ciro | NULL | ++------+------+ +1 row in set (0.01 sec) + +mysql> select * from names; ++------+-------+------+ +| id | name | age | ++------+-------+------+ +| 1 | Bobby | 25 | +| 2 | Ciro | NULL | ++------+-------+------+ +2 rows in set (0.00 sec) + +-- Use the replace statement to update the name column of the record with id=2 to the value "Ciro" and the age column to the value 17 +replace into names set id = 2, name = "Ciro", age = 17; + +mysql> select name, age from names where id = 2; ++------+------+ +| name | age | ++------+------+ +| Ciro | 17 | ++------+------+ +1 row in set (0.01 sec) + +mysql> select * from names; ++------+-------+------+ +| id | name | age | ++------+-------+------+ +| 1 | Bobby | 25 | +| 2 | Ciro | 17 | ++------+-------+------+ +2 rows in set (0.01 sec) +``` + +## **Restrictions** + +MatrixOne does not currently support rows consisting of a set of values inserted using the `VALUES row_constructor_list` parameter. diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/upsert.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/upsert.md new file mode 100644 index 000000000..6ca8f5ab6 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/upsert.md @@ -0,0 +1,51 @@ +# UPSERT + +## What is Upsert in SQL? + +`UPSERT` is one of the basic functions of a database management system to manage a database. It is a combination of `UPDATE` and `INSERT` that allows the database operation language to insert a new piece of data into a table or update existing data. An `INSERT` operation is triggered when a `UPSERT` operation is a new piece of data, and `UPSERT is` similar to the `UPDATE` statement if the record already exists in the table. + +For example, we have a `student` table with the `id` column as the primary key: + +```sql +> desc student; ++-------+-------------+------+------+---------+-------+---------+ +| Field | Type | Null | Key | Default | Extra | Comment | ++-------+-------------+------+------+---------+-------+---------+ +| id | INT(32) | NO | PRI | NULL | | | +| name | VARCHAR(50) | YES | | NULL | | | ++-------+-------------+------+------+---------+-------+---------+ +``` + +We can use `upsert` when changing student information in this table. The logic goes like this: + +- If a student id exists in the table, update the row with new information. + +- If no students exist in the table, add them as new rows. + +However, the `UPSERT` command does not exist in Matrixone, but `UPSERT` can still be implemented. By default, Matrixone provides three ways to implement Matrixone UPSERT operations: + +- [INSERT IGNORE](insert-ignore.md) + +- [INSERT ON DUPLICATE KEY UPDATE](insert-on-duplicate.md) + +- [REPLACE](replace.md) + +## INSERT IGNORE + +When we insert illegal rows into a table, the `INSERT IGNORE` statement ignores the execution error. For example, the primary key column does not allow us to store duplicate values. When we insert a piece of data into a table using INSERT and the primary key of that data already exists in the table, the Matrixone server generates an error and the statement execution fails. However, when we execute this statement using `INSERT IGNORE`, the Matrixone server will not generate an error. + +## REPLACE + +In some cases, we want to update data that already exists. You can use `REPLACE` at this point. When we use the REPLACE command, two things can happen: + +- If there is no corresponding record in the database, the standard `INSERT` statement is executed. + +- If there are corresponding records in the database, the `REPLACE` statement deletes the corresponding records in the database before executing the standard INSERT statement (this update is performed when the primary key or unique index is duplicated) + +In a `REPLACE` statement, updating data is done in two steps: deleting the original record and then inserting the record you want to update. + +## INSERT ON DUPLICATE KEY UPDATE + +We've looked at two `UPSERT` commands so far, but both have some limitations. `INSERT ON DUPLICATE KEY IGNORE` simply ignores the `duplicate error`. `REPLACE` detects `INSERT errors`, but it deletes the old data before adding it. So we still need a better solution. + +`INSERT ON DUPLICATE KEY UPDATE` is a better solution. It doesn't remove duplicate rows. When we use the `ON DUPLICATE KEY UPDATE` clause in a SQL statement and one row of data produces a `duplicate error` on the primary key or unique index, we update the existing data. diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Query-Language/join/cross-join.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Query-Language/join/cross-join.md new file mode 100644 index 000000000..a0e1fca61 --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Query-Language/join/cross-join.md @@ -0,0 +1,56 @@ +# **CROSS JOIN** + +## **Grammar description** + +`CROSS JOIN` is used to implement the Cartesian product of two tables, which is to generate a combination of all rows in both tables. + +## **Grammar structure** + +``` +>SELECT column_list +FROM table1 +CROSS JOIN table2; +``` + +## **Examples** + +```sql +CREATE TABLE Colors ( + color_id INT AUTO_INCREMENT, + color_name VARCHAR(50), + PRIMARY KEY (color_id) +); + +CREATE TABLE Fruits ( + fruit_id INT AUTO_INCREMENT, + fruit_name VARCHAR(50), + PRIMARY KEY (fruit_id) +); + +INSERT INTO Colors (color_name) VALUES ('Red'), ('Green'), ('Blue'); +INSERT INTO Fruits (fruit_name) VALUES ('Apple'), ('Banana'), ('Cherry'); + +mysql> SELECT c.color_name, f.fruit_name FROM Colors c CROSS JOIN Fruits f;--Generate a result set with all colors and all fruit combinations ++------------+------------+ +| color_name | fruit_name | ++------------+------------+ +| Red | Apple | +| Green | Apple | +| Blue | Apple | +| Red | Banana | +| Green | Banana | +| Blue | Banana | +| Red | Cherry | +| Green | Cherry | +| Blue | Cherry | ++------------+------------+ +9 rows in set (0.00 sec) + +mysql> SELECT c.color_name,f.fruit_name FROM Colors c CROSS JOIN Fruits f WHERE c.color_name = 'Red' AND f.fruit_name = 'Apple';--Filter out combinations of specific colors and specific fruits ++------------+------------+ +| color_name | fruit_name | ++------------+------------+ +| Red | Apple | ++------------+------------+ +1 row in set (0.01 sec) +``` diff --git a/docs/MatrixOne/Reference/SQL-Reference/Data-Query-Language/select.md b/docs/MatrixOne/Reference/SQL-Reference/Data-Query-Language/select.md index dccdc89bd..2965e2cd4 100644 --- a/docs/MatrixOne/Reference/SQL-Reference/Data-Query-Language/select.md +++ b/docs/MatrixOne/Reference/SQL-Reference/Data-Query-Language/select.md @@ -169,3 +169,4 @@ mysql> select * from t1 order by spID asc nulls last; 1. `SELECT...FOR UPDATE` currently only supports single-table queries. 2. `INTO OUTFILE` is limitedly support. +3. When the table name is `DUAL`, it is not supported to execute `SELECT xx from DUAL` directly into the corresponding database (`USE DBNAME`), but you can specify the database name to query the table `DUAL` by using `SELECT xx from DBNAME.DUAL`. \ No newline at end of file diff --git a/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain-analyze.md b/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain-analyze.md index 0db86cf7c..92e72488f 100644 --- a/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain-analyze.md +++ b/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain-analyze.md @@ -1,34 +1,36 @@ -# Obtaining Information with EXPLAIN ANALYZE +# Get information with `EXPLAIN ANALYZE` -EXPLAIN ANALYZE is a profiling tool for your queries that will show you where SQL spends time on your query and why. It will plan the query, instrument it and execute it while counting rows and measuring time spent at various points in the execution plan. When execution finishes, EXPLAIN ANALYZE will print the plan and the measurements instead of the query result. +`EXPLAIN ANALYZE` is an analysis tool for queries that shows you how long SQL spends on queries and why. It will schedule the query, detect it, and execute it, while counting the rows and measuring the time spent at various points in the execution plan. When execution is complete, `EXPLAIN ANALYZE` prints the plan and measurements instead of querying the results. -`EXPLAIN ANALYZE`, which runs a statement and produces EXPLAIN output along with timing and additional, iterator-based, information about how the optimizer's expectations matched the actual execution. For each iterator, the following information is provided: +`EXPLAIN ANALYZE`, which runs SQL statements to produce `EXPLAIN` output, in addition to other information, such as time and iterator-based additional information, and about the expected versus actual execution of the optimizer. -- Estimated execution cost +For each iterator, provide the following information: - Some iterators are not accounted for by the cost model, and so are not included in the estimate. +- Estimated implementation costs -- Estimated number of returned rows + Some iterators are not considered in the cost model and are therefore not included in the estimate. -- Time to return first row +- Estimated number of rows returned -- Time spent executing this iterator (including child iterators, but not parent iterators), in milliseconds. +- Returns the time of the first row + +- Time, in milliseconds, spent executing this iterator (including only child iterators but not parent iterators). - Number of rows returned by the iterator -- Number of loops +- Number of cycles -The query execution information is displayed using the TREE output format, in which nodes represent iterators. `EXPLAIN ANALYZE` always uses the TREE output format, also can optionally be specified explicitly using FORMAT=TREE; formats other than TREE remain unsupported. +Query execution information is displayed using the `TREE` output format, where nodes represent iterators. `EXPLAIN ANALYZE` always uses the `TREE` output format. -`EXPLAIN ANALYZE` can be used with `SELECT` statements, as well as with multi-table `UPDATE` and `DELETE` statements. +`EXPLAIN ANALYZE` can be used with `SELECT` statements or with multi-table `UPDATE` and `DELETE` statements. -You can terminate this statement using `KILL QUERY` or `CTRL-C`. +You can use `KILL QUERY` or `CTRL-C` to terminate this statement. -`EXPLAIN ANALYZE` cannot be used with FOR CONNECTION. +`EXPLAIN ANALYZE` cannot be used with `FOR CONNECTION`. -## Example +## Examples -**Create table** +**Building table** ```sql CREATE TABLE t1 ( @@ -47,57 +49,55 @@ CREATE TABLE t3 ( ); ``` -**Example output**: +**Table output results**: ```sql -> EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON (t1.c1 = t2.c2)\G +> mysql> EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON (t1.c1 = t2.c2)\G *************************** 1. row *************************** QUERY PLAN: Project *************************** 2. row *************************** -QUERY PLAN: Analyze: timeConsumed=0us inputRows=0 outputRows=0 inputSize=0bytes outputSize=0bytes memorySize=0bytes +QUERY PLAN: Analyze: timeConsumed=0ms waitTime=0ms inputRows=0 outputRows=0 InputSize=0bytes OutputSize=0bytes MemorySize=0bytes *************************** 3. row *************************** QUERY PLAN: -> Join *************************** 4. row *************************** -QUERY PLAN: Analyze: timeConsumed=5053us inputRows=0 outputRows=0 inputSize=0bytes outputSize=0bytes memorySize=0bytes +QUERY PLAN: Analyze: timeConsumed=0ms waitTime=0ms inputRows=0 outputRows=0 InputSize=0bytes OutputSize=0bytes MemorySize=16441bytes *************************** 5. row *************************** QUERY PLAN: Join Type: INNER *************************** 6. row *************************** QUERY PLAN: Join Cond: (t1.c1 = t2.c2) *************************** 7. row *************************** -QUERY PLAN: -> Table Scan on aaa.t1 +QUERY PLAN: -> Table Scan on tpch.t1 *************************** 8. row *************************** -QUERY PLAN: Analyze: timeConsumed=2176us inputRows=0 outputRows=0 inputSize=0bytes outputSize=0bytes memorySize=0bytes +QUERY PLAN: Analyze: timeConsumed=0ms waitTime=0ms inputRows=0 outputRows=0 InputSize=0bytes OutputSize=0bytes MemorySize=0bytes *************************** 9. row *************************** -QUERY PLAN: -> Table Scan on aaa.t2 +QUERY PLAN: -> Table Scan on tpch.t2 *************************** 10. row *************************** -QUERY PLAN: Analyze: timeConsumed=0us inputRows=0 outputRows=0 inputSize=0bytes outputSize=0bytes memorySize=0bytes +QUERY PLAN: Analyze: timeConsumed=0ms waitTime=0ms inputRows=0 outputRows=0 InputSize=0bytes OutputSize=0bytes MemorySize=0bytes 10 rows in set (0.00 sec) > EXPLAIN ANALYZE SELECT * FROM t3 WHERE i > 8\G *************************** 1. row *************************** QUERY PLAN: Project *************************** 2. row *************************** -QUERY PLAN: Analyze: timeConsumed=0us inputRows=0 outputRows=0 inputSize=0bytes outputSize=0bytes memorySize=0bytes +QUERY PLAN: Analyze: timeConsumed=0ms waitTime=0ms inputRows=0 outputRows=0 InputSize=0bytes OutputSize=0bytes MemorySize=0bytes *************************** 3. row *************************** -QUERY PLAN: -> Table Scan on aaa.t3 +QUERY PLAN: -> Table Scan on tpch.t3 *************************** 4. row *************************** -QUERY PLAN: Analyze: timeConsumed=154us inputRows=0 outputRows=0 inputSize=0bytes outputSize=0bytes memorySize=0bytes +QUERY PLAN: Analyze: timeConsumed=0ms waitTime=0ms inputRows=0 outputRows=0 InputSize=0bytes OutputSize=0bytes MemorySize=0bytes *************************** 5. row *************************** -QUERY PLAN: Filter Cond: (CAST(t3.i AS BIGINT) > 8) +QUERY PLAN: Filter Cond: (t3.i > 8) 5 rows in set (0.00 sec) > EXPLAIN ANALYZE SELECT * FROM t3 WHERE pk > 17\G *************************** 1. row *************************** QUERY PLAN: Project *************************** 2. row *************************** -QUERY PLAN: Analyze: timeConsumed=0us inputRows=0 outputRows=0 inputSize=0bytes outputSize=0bytes memorySize=0bytes +QUERY PLAN: Analyze: timeConsumed=0ms waitTime=0ms inputRows=0 outputRows=0 InputSize=0bytes OutputSize=0bytes MemorySize=0bytes *************************** 3. row *************************** -QUERY PLAN: -> Table Scan on aaa.t3 +QUERY PLAN: -> Table Scan on tpch.t3 *************************** 4. row *************************** -QUERY PLAN: Analyze: timeConsumed=309us inputRows=0 outputRows=0 inputSize=0bytes outputSize=0bytes memorySize=0bytes +QUERY PLAN: Analyze: timeConsumed=0ms waitTime=0ms inputRows=0 outputRows=0 InputSize=0bytes OutputSize=0bytes MemorySize=0bytes *************************** 5. row *************************** -QUERY PLAN: Filter Cond: (CAST(t3.pk AS BIGINT) > 17) -5 rows in set (0.00 sec) +QUERY PLAN: Filter Cond: (t3.pk > 17) +5 rows in set (0.01 sec) ``` - -Values shown for actual time in the output of this statement are expressed in milliseconds. diff --git a/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain-prepared.md b/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain-prepared.md new file mode 100644 index 000000000..4d7f701cf --- /dev/null +++ b/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain-prepared.md @@ -0,0 +1,106 @@ +# EXPLAIN PREPARED + +## Syntax Description + +In MatrixOne, EXPLAIN is a command to get an execution plan for a SQL query, and PREPARE is a command to create a prepared statement. Using these two commands together provides the following advantages: + +- Performance tuning: By looking at the execution plan, you can understand the efficiency of queries and identify potential performance bottlenecks. + +- Security: Because PREPARE separates the structure and data of SQL statements, it helps prevent SQL injection attacks. + +- Reuse: Prepared statements can be reused, which is useful when you need to execute the same query multiple times but with different parameters. + +## Syntax structure + +``` +PREPARE stmt_name FROM preparable_stmt +``` + +``` +EXPLAIN + +where option can be one of: + ANALYZE [ boolean ] + VERBOSE [ boolean ] + (FORMAT=TEXT) + +FORCE EXECUTE stmt_name +``` + +## Examples + +**Example 1** + +```sql +create table t1(n1 int); +insert into t1 values(1); +prepare st_t1 from 'select * from t1'; + +mysql> explain force execute st_t1; ++----------------------------+ +| QUERY PLAN | ++----------------------------+ +| Project | +| -> Table Scan on db1.t1 | ++----------------------------+ +2 rows in set (0.01 sec) +``` + +**Example 2** + +```sql +create table t2 (col1 int, col2 decimal); +insert into t2 values (1,2); +prepare st from 'select * from t2 where col1 = ?'; +set @A = 1; + +mysql> explain force execute st using @A; ++---------------------------------------------------+ +| QUERY PLAN | ++---------------------------------------------------+ +| Project | +| -> Table Scan on db1.t2 | +| Filter Cond: (t2.col1 = cast('1' AS INT)) | ++---------------------------------------------------+ +3 rows in set (0.00 sec) + +mysql> explain verbose force execute st using @A; ++----------------------------------------------------------------------------------------+ +| QUERY PLAN | ++----------------------------------------------------------------------------------------+ +| Project (cost=1000.00 outcnt=1000.00 selectivity=1.0000 blockNum=1) | +| Output: t2.col1, t2.col2 | +| -> Table Scan on db1.t2 (cost=1000.00 outcnt=1000.00 selectivity=1.0000 blockNum=1) | +| Output: t2.col1, t2.col2 | +| Table: 't2' (0:'col1', 1:'col2') | +| Filter Cond: (t2.col1 = cast('1' AS INT)) | ++----------------------------------------------------------------------------------------+ +6 rows in set (0.00 sec) + +mysql> explain analyze force execute st using @A; ++-----------------------------------------------------------------------------------------------------------------------------------------------+ +| QUERY PLAN | ++-----------------------------------------------------------------------------------------------------------------------------------------------+ +| Project | +| Analyze: timeConsumed=0ms waitTime=0ms inputRows=1 outputRows=1 InputSize=20bytes OutputSize=20bytes MemorySize=0bytes | +| -> Table Scan on db1.t2 | +| Analyze: timeConsumed=0ms waitTime=0ms inputBlocks=1 inputRows=1 outputRows=1 InputSize=20bytes OutputSize=20bytes MemorySize=21bytes | +| Filter Cond: (t2.col1 = 1) | ++-----------------------------------------------------------------------------------------------------------------------------------------------+ +5 rows in set (0.00 sec) + +mysql> explain analyze verbose force execute st using @A; ++-----------------------------------------------------------------------------------------------------------------------------------------------+ +| QUERY PLAN | ++-----------------------------------------------------------------------------------------------------------------------------------------------+ +| Project (cost=1000.00 outcnt=1000.00 selectivity=1.0000 blockNum=1) | +| Output: t2.col1, t2.col2 | +| Analyze: timeConsumed=0ms waitTime=0ms inputRows=1 outputRows=1 InputSize=20bytes OutputSize=20bytes MemorySize=0bytes | +| -> Table Scan on db1.t2 (cost=1000.00 outcnt=1000.00 selectivity=1.0000 blockNum=1) | +| Output: t2.col1, t2.col2 | +| Table: 't2' (0:'col1', 1:'col2') | +| Analyze: timeConsumed=0ms waitTime=0ms inputBlocks=1 inputRows=1 outputRows=1 InputSize=20bytes OutputSize=20bytes MemorySize=21bytes | +| Filter Cond: (t2.col1 = 1) | ++-----------------------------------------------------------------------------------------------------------------------------------------------+ +8 rows in set (0.00 sec) +``` diff --git a/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain-workflow.md b/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain-workflow.md index e36172a34..74aeb7053 100644 --- a/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain-workflow.md +++ b/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain-workflow.md @@ -2,122 +2,463 @@ ## Output Structure -The command's result is a textual description of the plan selected for the *`statement`*, optionally annotated with execution statistics. +The syntax structure execution result is a textual description of the plan selected for the `statement`, optionally annotated with execution statistics. -Take the following SQL as an example, we demonstrate the output structure. +The following is an example of an output structure using query analysis of a dataset in [TPCH]( ../../../../Test/performance-testing/TPCH-test-with-matrixone.md): ``` -explain select city,libname1,count(libname1) as a from t3 join t1 on libname1=libname3 join t2 on isbn3=isbn2 group by city,libname1; +explain SELECT * FROM customer WHERE c_nationkey = (SELECT n_nationkey FROM nation +WHERE customer.c_nationkey = nation.n_nationkey AND nation.n_nationkey > 5); ``` ``` -+--------------------------------------------------------------------------------------------+ -| QUERY PLAN | -+--------------------------------------------------------------------------------------------+ -| Project(cost=0.00..0.00 card=400.00 ndv=0.00 rowsize=0 | -| -> Aggregate(cost=0.00..0.00 card=400.00 ndv=0.00 rowsize=0 | -| Group Key:#[0,1], #[0,0] | -| Aggregate Functions: count(#[0,0]) | -| -> Join(cost=0.00..0.00 card=400.00 ndv=0.00 rowsize=0 | -| Join Type: INNER | -| Join Cond: (#[1,2] = #[0,0]) | -| -> Table Scan on abc.t2(cost=0.00..0.00 card=8.00 ndv=0.00 rowsize=0 | -| -> Join(cost=0.00..0.00 card=50.00 ndv=0.00 rowsize=0 | -| Join Type: INNER | -| Join Cond: (#[0,0] = #[1,1]) | -| -> Table Scan on abc.t1(cost=0.00..0.00 card=5.00 ndv=0.00 rowsize=0 | -| -> Table Scan on abc.t3(cost=0.00..0.00 card=10.00 ndv=0.00 rowsize=0 | -+--------------------------------------------------------------------------------------------+ -13 rows in set (0.00 sec) +mysql> explain SELECT * FROM customer WHERE c_nationkey = (SELECT n_nationkey FROM nation + -> WHERE customer.c_nationkey = nation.n_nationkey AND nation.n_nationkey > 5); ++----------------------------------------------------------------------+ +| QUERY PLAN | ++----------------------------------------------------------------------+ +| Project | +| -> Filter | +| Filter Cond: (customer.c_nationkey = nation.n_nationkey) | +| -> Join | +| Join Type: SINGLE hashOnPK | +| Join Cond: (customer.c_nationkey = nation.n_nationkey) | +| -> Table Scan on tpch.customer | +| -> Table Scan on tpch.nation | +| Filter Cond: (nation.n_nationkey > 5) | +| Block Filter Cond: (nation.n_nationkey > 5) | ++----------------------------------------------------------------------+ +10 rows in set (0.01 sec) ``` -EXPLAIN outputs a tree structure, named as `Execution Plan Tree`. Every leaf node includes the information of node type, affected objects and other properties such as `cost`, `rowsize` and so on. We can simplify the above example only with node type information. It visualizes the whole process of a SQL query, shows which operation nodes it goes through and what are their cost estimation. +EXPLAIN 输出一个名 +EXPLAIN outputs a tree structure named `QUERY PLAN`, with each leaf node containing the node type, affected objects. We will now only use node type information to simplify the presentation of the example above. The `QUERY PLAN` tree visualizes the entire process of a SQL query, showing the nodes through which it operates. ``` Project -└── Aggregate +└── Filter └── Join └── Table Scan - └── Join - └──Table Scan - └──Table Scan + └── Table Scan ``` -## Node types +## Node Type -MatrixOne supports the following node types: +MatrixOne supports the following node types. -| Node Type | Name in Explain | -| --------------- | --------------- | -| Node_TABLE_SCAN | Table Scan | -| Node_VALUE_SCAN | Values Scan | -| Node_PROJECT | Project | -| Node_AGG | Aggregate | -| Node_FILTER | Filter | -| Node_JOIN | Join | -| Node_SORT | Sort | -| Node_INSERT | Insert | -| Node_UPDATE | Update | -| Node_DELETE | Delete | +| Node name | meaning | +|: -------------------------- |: --------------- | +| Values Scan | Scanning of processed values| +| Table Scan | Scanning data from a table| +| External Scan | Handling external data scanning| +| Source Scan | Processing a data scan of the source table +| Project | Projective operations on data| +| Sink | Distribute the same data to one / more objects| +| Sink Scan | Read data distributed by other objects| +| Recursive Scan | In the loop CTE syntax, the data at the end of each loop is processed to determine whether to open the next round of looping | +| CTE Scan | Loop CTE syntax to read the data at the beginning of each loop | +| Aggregate | Aggregation of data| +| Filter | Filtering of data| +| Join | Concatenation of data| +| Sample | SAMPLE Sampling function to sample the data| +| Sort | Sorting data| +| Partition | Sorting data in the range window and slicing by value| +| Union | Combining result sets for two or more queries| +| Union All | Combination of result sets for two or more queries, including duplicate rows| +| Window | Perform range window calculations on data| +| Time Window | Perform time window calculations on data| +| Fill | Handling NULL values in the time window| +| Insert | Insertion of data| +| Delete | Deletion of data| +| Intersect | Combination of rows that exist for two or more queries| +| Intersect All | Combination of rows that exist for two or more queries, including duplicate rows.| +| Minus | Compares the results of two queries and returns the rows that exist in the first query but not in the second query| +| Table Function | Reading data through table functions| +| PreInsert | Organize the data to be written| +| PreInsert UniqueKey | Organize the data to be written to the unique key hidden table| +| PreInsert SecondaryKey | Organize the data to be written to the secondary index hidden table| +| PreDelete | Organize the data that needs to be deleted from the partitioned table.| +| On Duplicate Key | Updates to duplicate data| +| Fuzzy Filter for duplicate key | De-duplication of written/updated data| +| Lock |Locking the data of an operation| + +## Example + +### VALUES Scan & Project + +```sql +mysql> explain select abs(-1); ++-------------------------------+ +| QUERY PLAN | ++-------------------------------+ +| Project | +| -> Values Scan "*VALUES*" | ++-------------------------------+ +2 rows in set (0.00 sec) +``` ### Table Scan -| Property | Format | Description | -| ----------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -| cost | cost=0.00..0.00 | The first is estimated start-up cost. This is the time expended before the output phase can begin, e.g., time to do the sorting in a sort node. The second is estimated total cost. This is stated on the assumption that the plan node is run to completion, i.e., all available rows are retrieved. In practice a node's parent node might stop short of reading all available rows (see the `LIMIT` example below). | -| card | card=14.00 | Estimated column cardinality. | -| ndv | ndv=0.00 | Estimated number of distinct values. | -| rowsize | rowsize=0.00 | Estimated rowsize. | -| output | Output: #[0,0], #[0,1], #[0,2], #[0,3], #[0,4], #[0,5], #[0,6], #[0,7] | Node output information. | -| Table | Table : 'emp' (0:'empno', 1:'ename', 2:'job', 3:'mgr',) | Table definition information after column pruning. | -| Filter Cond | Filter Cond: (CAST(#[0,5] AS DECIMAL128) > CAST(20 AS DECIMAL128)) | Filter condition. | +```sql +mysql> explain select * from customer; ++-----------------------------------+ +| QUERY PLAN | ++-----------------------------------+ +| Project | +| -> Table Scan on tpch.customer | ++-----------------------------------+ +2 rows in set (0.01 sec) +``` + +### External Scan + +```sql +mysql> create external table extable(n1 int)infile{"filepath"='yourpath/xx.csv'} ; +Query OK, 0 rows affected (0.03 sec) -### Values Scan +mysql> explain select * from extable; ++------------------------------------+ +| QUERY PLAN | ++------------------------------------+ +| Project | +| -> External Scan on db1.extable | ++------------------------------------+ +2 rows in set (0.01 sec) +``` + +### Sink & Lock & Delete & Insert & PreInsert & Sink Scan -| Property | Format | Description | -| -------- | ----------------------------------------------- | ----------------------- | -| cost | (cost=0.00..0.00 card=14.00 ndv=0.00 rowsize=0) | Estimated cost | -| output | Output: 0 | Node output information | +```sql +mysql> create table t3(n1 int); +Query OK, 0 rows affected (0.02 sec) -### Project +mysql> insert into t3 values(1); +Query OK, 1 row affected (0.01 sec) -| Property | Format | Description | -| -------- | ----------------------------------------------- | ----------------------- | -| cost | (cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | Estimated cost | -| output | Output: (CAST(#[0,0] AS INT64) + 2) | Node output information | +mysql> explain update t3 set n1=2; ++-----------------------------------------------+ +| QUERY PLAN | ++-----------------------------------------------+ +| Plan 0: | +| Sink | +| -> Lock | +| -> Project | +| -> Project | +| -> Table Scan on tpch.t3 | +| Plan 1: | +| Delete on tpch.t3 | +| -> Sink Scan | +| DataSource: Plan 0 | +| Plan 2: | +| Insert on tpch.t3 | +| -> Project | +| -> PreInsert on tpch.t3 | +| -> Project | +| -> Sink Scan | +| DataSource: Plan 0 | ++-----------------------------------------------+ +17 rows in set (0.00 sec) +``` -### Aggregate +### Recursive Scan & CTE Scan & Filter -| Property | Format | Description | -| ------------------- | ------------------------------------------------------------ | ----------------------- | -| cost | (cost=0.00..0.00 card=14.00 ndv=0.00 rowsize=0) | Estimated cost | -| output | Output: #[0,0], #[0,1], #[0,2], #[0,3], #[0,4], #[0,5], #[0,6], #[0,7] | Node output information | -| Group Key | Group Key:#[0,0] | Key for grouping | -| Aggregate Functions | Aggregate Functions: max(#[0,1]) | Aggregate function name | +```sql +mysql> create table t4(n1 int,n2 int); +Query OK, 0 rows affected (0.02 sec) -### Filter +mysql> insert into t4 values(1,1),(2,2),(3,3); +Query OK, 3 rows affected (0.01 sec) -| Property | Format | Description | -| ----------- | ------------------------------------------------------------ | ----------------------- | -| cost | (cost=0.00..0.00 card=14.00 ndv=0.00 rowsize=0) | Estimated cost | -| output | Output: #[0,0], #[0,1], #[0,2], #[0,3], #[0,4], #[0,5], #[0,6], #[0,7] | Node output information | -| Filter Cond | Filter Cond: (CAST(#[0,1] AS INT64) > 10) | Filter condition | +mysql> explain WITH RECURSIVE t4_1(n1_1) AS ( + -> SELECT n1 FROM t4 + -> UNION all + -> SELECT n1_1 FROM t4_1 WHERE n1_1=1 + -> ) + -> SELECT * FROM t4_1; ++---------------------------------------------------------------------------------------------------+ +| QUERY PLAN | ++---------------------------------------------------------------------------------------------------+ +| Plan 0: | +| Sink | +| -> Project | +| -> Table Scan on tpch.t4 | +| Plan 1: | +| Sink | +| -> Project | +| -> Filter | +| Filter Cond: (t4_1.n1_1 = 1), mo_check_level((t4_1.__mo_recursive_level_col < 100)) | +| -> Recursive Scan | +| DataSource: Plan 2 | +| Plan 2: | +| Sink | +| -> CTE Scan | +| DataSource: Plan 0, Plan 1 | +| Plan 3: | +| Project | +| -> Sink Scan | +| DataSource: Plan 2 | ++---------------------------------------------------------------------------------------------------+ +19 rows in set (0.00 sec) +``` + +### Aggregate + +```sql +mysql> explain SELECT count(*) FROM NATION group by N_NAME; ++-------------------------------------------+ +| QUERY PLAN | ++-------------------------------------------+ +| Project | +| -> Aggregate | +| Group Key: nation.n_name | +| Aggregate Functions: starcount(1) | +| -> Table Scan on tpch.nation | ++-------------------------------------------+ +5 rows in set (0.01 sec) +``` ### Join -| Property | Format | Description | -| ---------------- | ----------------------------------------------- | ----------------------- | -| cost | (cost=0.00..0.00 card=14.00 ndv=0.00 rowsize=0) | Estimated cost | -| output | Output: #[0,0] | Node output information | -| Join Type: INNER | Join Type: INNER | Join type | -| Join Cond | Join Cond: (#[0,0] = #[1,0]) | Join condition | - -### Sort - -| Property | Format | Description | -| -------- | ------------------------------------------------------------ | ----------------------------- | -| cost | (cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | Estimated cost | -| output | Output: #[0,0], #[0,1], #[0,2], #[0,3], #[0,4], #[0,5], #[0,6], #[0,7] | Node output information | -| Sort Key | Sort Key: #[0,0] DESC, #[0,1] INTERNAL | Sort key | -| Limit | Limit: 10 | Number limit for output data | -| Offset | Offset: 20 | Number offset for output data | +```sql +mysql> create table t5(n1 int); +Query OK, 0 rows affected (0.01 sec) + +mysql> insert into t5 values(1),(2),(3); +Query OK, 3 rows affected (0.01 sec) + +mysql> create table t6(n1 int); +Query OK, 0 rows affected (0.01 sec) + +mysql> insert into t5 values(3),(4),(5); +Query OK, 3 rows affected (0.01 sec) + +mysql> explain SELECT * FROM t5 LEFT JOIN t6 ON t5.n1 = t6.n1; ++------------------------------------+ +| QUERY PLAN | ++------------------------------------+ +| Project | +| -> Join | +| Join Type: LEFT | +| Join Cond: (t5.n1 = t6.n1) | +| -> Table Scan on tpch.t5 | +| -> Table Scan on tpch.t6 | ++------------------------------------+ +6 rows in set (0.00 sec) +``` + +### Sample + +```sql +mysql> explain SELECT SAMPLE(c_address, 90 percent) FROM customer; ++-----------------------------------------------------+ +| QUERY PLAN | ++-----------------------------------------------------+ +| Project | +| -> Sample | +| Sample 90.00 Percent by: customer.c_address | +| -> Table Scan on tpch.customer | ++-----------------------------------------------------+ +4 rows in set (0.00 sec) +``` + +### SORT + +```sql +mysql> explain select * from customer order by c_custkey; ++-----------------------------------------------+ +| QUERY PLAN | ++-----------------------------------------------+ +| Project | +| -> Sort | +| Sort Key: customer.c_custkey INTERNAL | +| -> Table Scan on tpch.customer | ++-----------------------------------------------+ +4 rows in set (0.00 sec) +``` + +### Partition & Window + +```sql +mysql>CREATE TABLE t7(n1 int,n2 int); +Query OK, 0 rows affected (0.01 sec) + +mysql> INSERT INTO t7 values(1,3),(2,2),(3,1); +Query OK, 3 rows affected (0.01 sec) + +mysql> explain SELECT SUM(n1) OVER(PARTITION BY n2) AS sn1 FROM t7; ++----------------------------------------------------------+ +| QUERY PLAN | ++----------------------------------------------------------+ +| Project | +| -> Window | +| Window Function: sum(t7.n1); Partition By: t7.n2 | +| -> Partition | +| Sort Key: t7.n2 INTERNAL | +| -> Table Scan on tpch.t7 | ++----------------------------------------------------------+ +6 rows in set (0.01 sec) +``` + +### Time window & Fill + +```sql +mysql> CREATE TABLE sensor_data (ts timestamp(3) primary key, temperature FLOAT); +Query OK, 0 rows affected (0.01 sec) + +mysql> INSERT INTO sensor_data VALUES('2023-08-01 00:00:00', 25.0); +Query OK, 1 row affected (0.01 sec) + +mysql> INSERT INTO sensor_data VALUES('2023-08-01 00:05:00', 26.0); +Query OK, 1 row affected (0.01 sec) + +mysql> explain select _wstart, _wend from sensor_data interval(ts, 10, minute) fill(prev); ++---------------------------------------------------+ +| QUERY PLAN | ++---------------------------------------------------+ +| Project | +| -> Fill | +| Fill Columns: | +| Fill Mode: Prev | +| -> Time window | +| Sort Key: sensor_data.ts | +| Aggregate Functions: _wstart, _wend | +| -> Table Scan on db2.sensor_data | ++---------------------------------------------------+ +8 rows in set (0.00 sec) +``` + +### Intersect + +```sql +mysql> explain select * from t5 intersect select * from t6; ++-----------------------------------------+ +| QUERY PLAN | ++-----------------------------------------+ +| Project | +| -> Intersect | +| -> Project | +| -> Table Scan on tpch.t5 | +| -> Project | +| -> Table Scan on tpch.t6 | ++-----------------------------------------+ +6 rows in set (0.00 sec) +``` + +### Intersect All + +```sql +mysql> explain select * from t5 intersect all select * from t6; ++-----------------------------------------+ +| QUERY PLAN | ++-----------------------------------------+ +| Project | +| -> Intersect All | +| -> Project | +| -> Table Scan on tpch.t5 | +| -> Project | +| -> Table Scan on tpch.t6 | ++-----------------------------------------+ +6 rows in set (0.00 sec) +``` + +### Minus + +```sql +mysql> explain select * from t5 minus select * from t6; ++-----------------------------------------+ +| QUERY PLAN | ++-----------------------------------------+ +| Project | +| -> Minus | +| -> Project | +| -> Table Scan on tpch.t5 | +| -> Project | +| -> Table Scan on tpch.t6 | ++-----------------------------------------+ +6 rows in set (0.00 sec) +``` + +### Table Function + +```sql +mysql> explain select * from unnest('{"a":1}') u; ++-------------------------------------+ +| QUERY PLAN | ++-------------------------------------+ +| Project | +| -> Table Function on unnest | +| -> Values Scan "*VALUES*" | ++-------------------------------------+ +3 rows in set (0.10 sec) +``` + +### PreInsert UniqueKey & Fuzzy Filter for duplicate key + +```sql +mysql> CREATE TABLE t8(n1 int,n2 int UNIQUE key); +Query OK, 0 rows affected (0.01 sec) + +mysql> explain INSERT INTO t8(n2) values(1); ++---------------------------------------------------------------------------------+ +| QUERY PLAN | ++---------------------------------------------------------------------------------+ +| Plan 0: | +| Sink | +| -> PreInsert on tpch.t8 | +| -> Project | +| -> Project | +| -> Values Scan "*VALUES*" | +| Plan 1: | +| Sink | +| -> Lock | +| -> PreInsert UniqueKey | +| -> Sink Scan | +| DataSource: Plan 0 | +| Plan 2: | +| Insert on tpch.__mo_index_unique_018e2d16-6629-719d-82b5-036222e9658a | +| -> Sink Scan | +| DataSource: Plan 1 | +| Plan 3: | +| Fuzzy Filter for duplicate key | +| -> Table Scan on tpch.__mo_index_unique_018e2d16-6629-719d-82b5-036222e9658a | +| Filter Cond: (__mo_index_idx_col = 1) | +| Block Filter Cond: (__mo_index_idx_col = 1) | +| -> Sink Scan | +| DataSource: Plan 1 | +| Plan 4: | +| Insert on tpch.t8 | +| -> Sink Scan | +| DataSource: Plan 0 | ++---------------------------------------------------------------------------------+ +27 rows in set (0.01 sec) +``` + +### PreInsert SecondaryKey + +```sql +mysql> CREATE TABLE t9 ( n1 int , n2 int, KEY key2 (n2) USING BTREE); +Query OK, 0 rows affected (0.02 sec) + +mysql> explain INSERT INTO t9(n2) values(2); ++--------------------------------------------------------------------------+ +| QUERY PLAN | ++--------------------------------------------------------------------------+ +| Plan 0: | +| Sink | +| -> PreInsert on tpch.t9 | +| -> Project | +| -> Project | +| -> Values Scan "*VALUES*" | +| Plan 1: | +| Insert on tpch.__mo_index_secondary_018e2d14-6f20-7db0-babb-c1fd505fd3c5 | +| -> Lock | +| -> PreInsert SecondaryKey | +| -> Sink Scan | +| DataSource: Plan 0 | +| Plan 2: | +| Insert on tpch.t9 | +| -> Sink Scan | +| DataSource: Plan 0 | ++--------------------------------------------------------------------------+ +16 rows in set (0.00 sec) +``` diff --git a/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain.md b/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain.md index 6c1e3444a..201429a92 100644 --- a/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain.md +++ b/docs/MatrixOne/Reference/SQL-Reference/Other/Explain/explain.md @@ -1,8 +1,8 @@ # EXPLAIN -EXPLAIN — show the execution plan of a statement. +EXPLAIN - Shows the execution plan for a statement. -## Syntax +## Syntax structure ``` EXPLAIN [ ( option [, ...] ) ] statement @@ -10,238 +10,35 @@ EXPLAIN [ ( option [, ...] ) ] statement where option can be one of: ANALYZE [ boolean ] VERBOSE [ boolean ] - FORMAT { TEXT | JSON } + (FORMAT=TEXT) ``` -## Description +## Syntax Description -This command displays the execution plan that the MatrixOne planner generates for the supplied statement. The execution plan shows how the table(s) referenced by the statement will be scanned — by plain sequential scan, index scan, and so on. — and if multiple tables are referenced, what join algorithms will be used to bring together the required rows from each input table. +The primary effect of this command is to display the execution plan generated by the MatrixOne scheduler for the supplied statement. The execution plan shows how to scan the tables referenced by statements through normal sequential scans, index scans, etc. If multiple tables are referenced, what join algorithm will be used to bring together the required rows in each input table. -The most critical part of the display is the estimated statement execution cost, which is the planner's guess at how long it will take to run the statement (measured in cost units that are arbitrary, but conventionally mean disk page fetches). Actually two numbers are shown: the start-up cost before the first row can be returned, and the total cost to return all the rows. For most queries the total cost is what matters, but in contexts such as a subquery in `EXISTS`, the planner will choose the smallest start-up cost instead of the smallest total cost (since the executor will stop after getting one row, anyway). Also, if you limit the number of rows to return with a `LIMIT` clause, the planner makes an appropriate interpolation between the endpoint costs to estimate which plan is really the cheapest. +The most critical part of the display is estimating the cost of statement execution, that is, the scheduler will estimate the time it will take to run the statement (measured in either cost unit, but typically obtained from disk pages). Actually two numbers are shown here: the startup cost before returning the first row, and the total cost of returning all rows. For most queries, the total cost is the most important, but in subqueries in `EXISTS`, the scheduler chooses the smallest start-up cost over the smallest total cost (because the performer stops after getting a row). In addition, if you limit the number of rows returned with a `LIMIT` clause, the scheduler will interpolate the endpoint costs appropriately to estimate which plan is really cheapest. -The ANALYZE option causes the statement to be actually executed, not only planned. Then actual run time statistics are added to the display, including the total elapsed time expended within each plan node (in milliseconds) and the total number of rows it actually returned. This is useful for seeing whether the planner's estimates are close to reality. +The `ANALYZE` clause syntax option adds actual runtime statistics to the display for statement actual execution, not just scheduled execution, including the total run time (in milliseconds) spent in each scheduled node and the total number of rows actually returned. This helps to understand whether planners' expectations are close to reality. -## Parameters +## Parameter interpretation * ANALYZE: -Carry out the command and show actual run times and other statistics. This parameter defaults to `FALSE`. +Executes the command and displays actual runtime and other statistics. This parameter defaults to `FALSE`. * VERBOSE: -Display additional information regarding the plan. Specifically, include the output column list for each node in the plan tree, schema-qualify table and function names, always label variables in expressions with their range table alias, and always print the name of each trigger for which statistics are displayed. This parameter is `FALSE` by default. +`VERBOSE` is used to display additional information about the plan. Specifically, include a list of output columns, schema-qualified tables, and function names for each node in the plan tree, always tag variables in expressions with range table aliases, and always print the name of each trigger that displays statistics. This parameter defaults to `FALSE`. * FORMAT: -Specify the output format, which can be TEXT, JSON. Non-text output contains the same information as the text output format, but is easier for programs to parse. This parameter is `TEXT` by dafault. +`FORMAT` is used as the specified output format, the syntax is `explain (format xx)`, only `TEXT` format is supported for now. Non-text output contains information in the same format as text output and is easily parsed by the program. This parameter defaults to `TEXT`. * BOOLEAN: -Specifies whether the selected option should be turned on or off. You can write `TRUE`to enable the option, and `FALSE` to disable it. The *`boolean`* value can also be omitted, in which case `TRUE` is assumed. +`BOOLEAN` Specifies whether the selected option is on or off. You can write `TRUE` to enable this option, or `FALSE` to disable it. This parameter defaults to `TRUE`. -* STETEMENT +* STATEMENT -MatrixOne supports any `SELECT`, `UPDATE`, `DELETE` statement execution plan. For `INSERT` statement, only `INSERT INTO..SELECT` is supported in 0.5.1 version. `INSERT INTO...VALUES` is not supported yet. - -## Examples - -### Node_TABLE_SCAN - -```sql -mysql> explain verbose SELECT N_NAME, N_REGIONKEY a FROM NATION WHERE N_NATIONKEY > 0 OR N_NATIONKEY < 10; -+------------------------------------------------------------------------------------+ -| QUERY PLAN | -+------------------------------------------------------------------------------------+ -| Project(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,0], #[0,1] | -| -> Table Scan on db1.nation(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,1], #[0,2] | -| Table: 'nation' (0:'n_nationkey', 1:'n_name', 2:'n_regionkey') | -| Filter Cond: ((CAST(#[0,0] AS INT64) > 0) or (CAST(#[0,0] AS INT64) < 10)) | -+------------------------------------------------------------------------------------+ -``` - -### Node_VALUE_SCAN - -```sql -mysql> explain verbose select abs(-1); -+-----------------------------------------------------------------------------+ -| QUERY PLAN | -+-----------------------------------------------------------------------------+ -| Project(cost=0.00..0.00 card=1.00 ndv=0.00 rowsize=0) | -| Output: 1 | -| -> Values Scan "*VALUES*" (cost=0.00..0.00 card=1.00 ndv=0.00 rowsize=0) | -| Output: 0 | -+-----------------------------------------------------------------------------+ -``` - -### Node_SORT - -```sql -mysql> explain verbose SELECT N_NAME, N_REGIONKEY a FROM NATION WHERE N_NATIONKEY > 0 AND N_NATIONKEY < 10 ORDER BY N_NAME, N_REGIONKEY DESC; -+--------------------------------------------------------------------------------------------+ -| QUERY PLAN | -+--------------------------------------------------------------------------------------------+ -| Project(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,0], #[0,1] | -| -> Sort(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,0], #[0,1] | -| Sort Key: #[0,0] INTERNAL, #[0,1] DESC | -| -> Project(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,0], #[0,1] | -| -> Table Scan on db1.nation(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,1], #[0,2] | -| Table: 'nation' (0:'n_nationkey', 1:'n_name', 2:'n_regionkey') | -| Filter Cond: (CAST(#[0,0] AS INT64) > 0), (CAST(#[0,0] AS INT64) < 10) | -+--------------------------------------------------------------------------------------------+ -``` - -With limit and offset: - -```sql -mysql> explain SELECT N_NAME, N_REGIONKEY FROM NATION WHERE abs(N_REGIONKEY) > 0 AND N_NAME LIKE '%AA' ORDER BY N_NAME DESC, N_REGIONKEY limit 10; -+-------------------------------------------------------------------------------------------+ -| QUERY PLAN | -+-------------------------------------------------------------------------------------------+ -| Project(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| -> Sort(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Sort Key: #[0,0] DESC, #[0,1] INTERNAL | -| Limit: 10 | -| -> Project(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| -> Table Scan on db1.nation(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Filter Cond: (abs(CAST(#[0,1] AS INT64)) > 0), (#[0,0] like '%AA') | -+-------------------------------------------------------------------------------------------+ - -``` - -```sql -mysql> explain SELECT N_NAME, N_REGIONKEY FROM NATION WHERE abs(N_REGIONKEY) > 0 AND N_NAME LIKE '%AA' ORDER BY N_NAME DESC, N_REGIONKEY LIMIT 10 offset 20; -+-------------------------------------------------------------------------------------------+ -| QUERY PLAN | -+-------------------------------------------------------------------------------------------+ -| Project(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| -> Sort(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Sort Key: #[0,0] DESC, #[0,1] INTERNAL | -| Limit: 10, Offset: 20 | -| -> Project(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| -> Table Scan on db1.nation(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Filter Cond: (abs(CAST(#[0,1] AS INT64)) > 0), (#[0,0] like '%AA') | -+-------------------------------------------------------------------------------------------+ -``` - -### Node_AGG - -```sql -mysql> explain verbose SELECT count(*) FROM NATION group by N_NAME; -+-------------------------------------------------------------------------------------+ -| QUERY PLAN | -+-------------------------------------------------------------------------------------+ -| Project(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,0] | -| -> Aggregate(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[-2,0] | -| Group Key:#[0,1] | -| Aggregate Functions: starcount(#[0,0]) | -| -> Table Scan on db1.nation(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,0], #[0,1] | -| Table: 'nation' (0:'n_nationkey', 1:'n_name') | -+-------------------------------------------------------------------------------------+ -``` - -### Node_JOIN - -```sql -mysql> explain verbose SELECT NATION.N_NAME, REGION.R_NAME FROM NATION join REGION on NATION.N_REGIONKEY = REGION.R_REGIONKEY WHERE NATION.N_REGIONKEY > 10 AND LENGTH(NATION.N_NAME) > LENGTH(REGION.R_NAME); -+--------------------------------------------------------------------------------------------+ -| QUERY PLAN | -+--------------------------------------------------------------------------------------------+ -| Project(cost=0.00..0.00 card=125.00 ndv=0.00 rowsize=0) | -| Output: #[0,1], #[0,0] | -| -> Filter(cost=0.00..0.00 card=125.00 ndv=0.00 rowsize=0) | -| Output: #[0,0], #[0,1] | -| Filter Cond: (length(CAST(#[0,1] AS CHAR)) > length(CAST(#[0,0] AS CHAR))) | -| -> Join(cost=0.00..0.00 card=125.00 ndv=0.00 rowsize=0) | -| Output: #[0,1], #[1,0] | -| Join Type: INNER | -| Join Cond: (#[1,1] = #[0,0]) | -| -> Table Scan on tpch.region(cost=0.00..0.00 card=5.00 ndv=0.00 rowsize=0) | -| Output: #[0,0], #[0,1] | -| Table: 'region' (0:'r_regionkey', 1:'r_name') | -| -> Table Scan on tpch.nation(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,0], #[0,1] | -| Table: 'nation' (0:'n_name', 1:'n_regionkey') | -| Filter Cond: (CAST(#[0,1] AS INT64) > 10) | -+--------------------------------------------------------------------------------------------+ - -``` - -### Node_INSERT - -```sql -mysql> explain verbose INSERT NATION select * from nation; -+---------------------------------------------------------------------------------------------+ -| QUERY PLAN | -+---------------------------------------------------------------------------------------------+ -| Insert on db1.nation (cost=0.0..0.0 rows=0 ndv=0 rowsize=0) | -| Output: #[0,0], #[0,1], #[0,2], #[0,3] | -| -> Project(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,0], #[0,1], #[0,2], #[0,3] | -| -> Table Scan on db1.nation(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,0], #[0,1], #[0,2], #[0,3] | -| Table: 'nation' (0:'n_nationkey', 1:'n_name', 2:'n_regionkey', 3:'n_comment') | -+---------------------------------------------------------------------------------------------+ -7 rows in set (0.00 sec) -``` - -### Node_Update - -```sql -mysql> explain verbose UPDATE NATION SET N_NAME ='U1', N_REGIONKEY=2 WHERE N_NATIONKEY > 10 LIMIT 20; -+-------------------------------------------------------------------------------------+ -| QUERY PLAN | -+-------------------------------------------------------------------------------------+ -| Update on db1.nation (cost=0.0..0.0 rows=0 ndv=0 rowsize=0) | -| -> Project(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,0], 'U1', CAST(2 AS INT32) | -| Limit: 20 | -| -> Table Scan on db1.nation(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,1] | -| Table: 'nation' (0:'n_nationkey', 1:'PADDR') | -| Filter Cond: (CAST(#[0,0] AS INT64) > 10) | -+-------------------------------------------------------------------------------------+ -``` - -### Node_Delete - -```sql -mysql> explain verbose DELETE FROM NATION WHERE N_NATIONKEY > 10; -+-------------------------------------------------------------------------------------+ -| QUERY PLAN | -+-------------------------------------------------------------------------------------+ -| Delete on db1.nation (cost=0.0..0.0 rows=0 ndv=0 rowsize=0) | -| -> Project(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,0] | -| -> Table Scan on db1.nation(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,1] | -| Table: 'nation' (0:'n_nationkey', 1:'PADDR') | -| Filter Cond: (CAST(#[0,0] AS INT64) > 10) | -+-------------------------------------------------------------------------------------+ -``` - -With limit: - -```sql -mysql> explain verbose DELETE FROM NATION WHERE N_NATIONKEY > 10 LIMIT 20; -+-------------------------------------------------------------------------------------+ -| QUERY PLAN | -+-------------------------------------------------------------------------------------+ -| Delete on db1.nation (cost=0.0..0.0 rows=0 ndv=0 rowsize=0) | -| -> Project(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,0] | -| Limit: 20 | -| -> Table Scan on db1.nation(cost=0.00..0.00 card=25.00 ndv=0.00 rowsize=0) | -| Output: #[0,1] | -| Table: 'nation' (0:'n_nationkey', 1:'PADDR') | -| Filter Cond: (CAST(#[0,0] AS INT64) > 10) | -+-------------------------------------------------------------------------------------+ -``` +MatrixOne supports any `SELECT`, `INSERT`, `UPDATE`, `DELETE` statement execution plan. \ No newline at end of file diff --git a/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-create-publication.md b/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-create-publication.md index 496cf81e1..a25128292 100644 --- a/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-create-publication.md +++ b/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-create-publication.md @@ -7,7 +7,7 @@ Returns the SQL statement when PUBLICATION was created. ## **Syntax** ``` -SHOW CREATE PUBLICATION pubname; +SHOW CREATE PUBLICATION pubname; ``` ## **Examples** @@ -19,7 +19,6 @@ create account acc2 admin_name 'root' identified by '111'; create database t; create publication pub3 database t account acc0,acc1; mysql> alter publication pub3 account add accx; -show create publication pub3; Query OK, 0 rows affected (0.00 sec) mysql> show create publication pub3; diff --git a/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-function-status.md b/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-function-status.md index f778b099c..9d4e7bbc9 100644 --- a/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-function-status.md +++ b/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-function-status.md @@ -26,30 +26,30 @@ The output will include the function name, database name, type, creation time, a ## **Examples** ```sql -mysql> create function twosum (x float, y float) returns float language sql as 'select $1 + $2' ; -Query OK, 0 rows affected (0.03 sec) - -mysql> create function mysumtable(x int) returns int language sql as 'select mysum(test_val, id) from tbl1 where id = $1'; -Query OK, 0 rows affected (0.02 sec) - -mysql> create function helloworld () returns int language sql as 'select id from tbl1 limit 1'; -Query OK, 0 rows affected (0.02 sec) +create or replace function py_add(a int, b int) returns int language python as +$$ +def add(a, b): + return a + b +$$ +handler 'add'; +create function twosum (x float, y float) returns float language sql as 'select $1 + $2' ; +create function helloworld () returns int language sql as 'select id from tbl1 limit 1'; mysql> show function status; -+------+------------+----------+---------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+ -| Db | Name | Type | Definer | Modified | Created | Security_type | Comment | character_set_client | collation_connection | Database Collation | -+------+------------+----------+---------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+ -| aab | twosum | FUNCTION | root | 2023-03-27 06:25:41 | 2023-03-27 06:25:41 | DEFINER | | utf8mb4 | utf8mb4_0900_ai_ci | utf8mb4_0900_ai_ci | -| aab | mysumtable | FUNCTION | root | 2023-03-27 06:25:51 | 2023-03-27 06:25:51 | DEFINER | | utf8mb4 | utf8mb4_0900_ai_ci | utf8mb4_0900_ai_ci | -| aab | helloworld | FUNCTION | root | 2023-03-27 06:25:58 | 2023-03-27 06:25:58 | DEFINER | | utf8mb4 | utf8mb4_0900_ai_ci | utf8mb4_0900_ai_ci | -+------+------------+----------+---------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+ -3 rows in set (0.00 sec) ++------+-------------+----------+---------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+ +| Db | Name | Type | Definer | Modified | Created | Security_type | Comment | character_set_client | collation_connection | Database Collation | ++------+-------------+----------+---------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+ +| db1 | py_add | FUNCTION | root | 2024-01-16 08:00:21 | 2024-01-16 08:00:21 | DEFINER | | utf8mb4 | utf8mb4_0900_ai_ci | utf8mb4_0900_ai_ci | +| db1 | twosum | FUNCTION | root | 2024-01-16 08:00:39 | 2024-01-16 08:00:39 | DEFINER | | utf8mb4 | utf8mb4_0900_ai_ci | utf8mb4_0900_ai_ci | +| db1 | helloworld | FUNCTION | root | 2024-01-16 08:00:53 | 2024-01-16 08:00:53 | DEFINER | | utf8mb4 | utf8mb4_0900_ai_ci | utf8mb4_0900_ai_ci | ++------+-------------+----------+---------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+ +3 rows in set (0.01 sec) mysql> show function status like 'two%'; +------+--------+----------+---------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+ | Db | Name | Type | Definer | Modified | Created | Security_type | Comment | character_set_client | collation_connection | Database Collation | +------+--------+----------+---------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+ -| aab | twosum | FUNCTION | root | 2023-03-27 06:25:41 | 2023-03-27 06:25:41 | DEFINER | | utf8mb4 | utf8mb4_0900_ai_ci | utf8mb4_0900_ai_ci | +| db1 | twosum | FUNCTION | root | 2024-01-16 08:00:39 | 2024-01-16 08:00:39 | DEFINER | | utf8mb4 | utf8mb4_0900_ai_ci | utf8mb4_0900_ai_ci | +------+--------+----------+---------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+ -1 row in set (0.01 sec) -``` +1 rows in set (0.01 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-publications.md b/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-publications.md index fe0c90fd7..8b9c012d9 100644 --- a/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-publications.md +++ b/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-publications.md @@ -2,9 +2,9 @@ ## **Description** -Returns a list of all PUBLICATION names and corresponding database names. +Returns a list of all publish names, the database name of the publish, the publish creation time, the publish last modification time, a list of the tenant names specified by the publish (show "*" if all), and notes. -For more information, you need have the authority of account administrator; check the system table mo_pubs for more parameters. +For more information, you need to have tenant administrator privileges and view the system table mo_pubs to see more parameters. ## **Syntax** @@ -22,10 +22,10 @@ create database t; create publication pub3 database t account acc0,acc1; mysql> show publications; -+------+----------+ -| Name | Database | -+------+----------+ -| pub3 | t | -+------+----------+ ++-------------+----------+---------------------+-------------+-------------+----------+ +| publication | database | create_time | update_time | sub_account | comments | ++-------------+----------+---------------------+-------------+-------------+----------+ +| pub3 | t | 2024-04-23 10:10:59 | NULL | acc0,acc1 | | ++-------------+----------+---------------------+-------------+-------------+----------+ 1 row in set (0.00 sec) ``` diff --git a/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-subscriptions.md b/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-subscriptions.md index 28239aee2..4d06a90d0 100644 --- a/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-subscriptions.md +++ b/docs/MatrixOne/Reference/SQL-Reference/Other/SHOW-Statements/show-subscriptions.md @@ -2,27 +2,49 @@ ## **Description** -Returns a list of all subscription library names and source account names. +Returns all publish names, publish tenant names, published database names, times published to that tenant, subscription names, and times when that subscription was created. ## **Syntax** ``` -SHOW SUBSCRIPTIONS; +SHOW SUBSCRIPTIONS [ALL]; ``` +## **Syntax explanation** + +- The **ALL** option allows you to see all subscriptions with permissions, unsubscribed sub_time, sub_name is null, without **ALL** you can only see the published information of subscribed ones. + ## **Examples** ```sql -Create database sub1 from sys publication pub1; +mysql> show subscriptions all; ++----------+-------------+--------------+---------------------+----------+----------+ +| pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | ++----------+-------------+--------------+---------------------+----------+----------+ +| pub3 | sys | t | 2024-04-23 11:11:06 | NULL | NULL | ++----------+-------------+--------------+---------------------+----------+----------+ +1 row in set (0.01 sec) + +mysql> show subscriptions; +Empty set (0.00 sec) -mysql> create database sub1 from sys publication sys_pub_1; +mysql> create database sub3 from sys publication pub3; Query OK, 1 row affected (0.02 sec) +mysql> show subscriptions all; ++----------+-------------+--------------+---------------------+----------+---------------------+ +| pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | ++----------+-------------+--------------+---------------------+----------+---------------------+ +| pub3 | sys | t | 2024-04-23 11:11:06 | sub3 | 2024-04-23 11:12:11 | ++----------+-------------+--------------+---------------------+----------+---------------------+ +1 row in set (0.00 sec) + mysql> show subscriptions; -+------+--------------+ -| Name | From_Account | -+------+--------------+ -| sub1 | sys | -+------+--------------+ ++----------+-------------+--------------+---------------------+----------+---------------------+ +| pub_name | pub_account | pub_database | pub_time | sub_name | sub_time | ++----------+-------------+--------------+---------------------+----------+---------------------+ +| pub3 | sys | t | 2024-04-23 11:11:06 | sub3 | 2024-04-23 11:12:11 | ++----------+-------------+--------------+---------------------+----------+---------------------+ 1 row in set (0.01 sec) -``` + +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/System-tables.md b/docs/MatrixOne/Reference/System-tables.md index 279704c8e..2e43c6a13 100644 --- a/docs/MatrixOne/Reference/System-tables.md +++ b/docs/MatrixOne/Reference/System-tables.md @@ -1,15 +1,14 @@ -# MatrixOne System Database and Tables +# MatrixOne System Databases and Tables -MatrixOne system database and tables are where MatrixOne stores system information. We can access the system information through them. MatrixOne creates 6 system databases at initialization: `mo_catalog`, `information_schema`, `system_metrcis`, `system`, `mysql`, and `mo_task`. `mo_task` is under development and have no direct impact on users. -The other system databases and table functions are described in this document. +MatrixOne system databases and tables are where MatrixOne stores the system information that you can access through them.MatrixOne creates six system databases at initialization: `mo_catalog`, `information_schema`, `system_metrcis`, `system`, `mysql`, and `mo_task`. system`,`mysql`and`mo_task`.`mo_task` is currently under development and will not have a direct impact on the operations you perform for the time being. Other system database and table functions are described in this document. -The system can only modify system databases and tables, and users can only read from them. +The system can only modify system databases and tables; you can only read from them. ## `mo_catalog` database -`mo_catalog` stores the metadata of MatrixOne objects: databases, tables, columns, system variables, accounts, users, and roles. +The `mo_catalog` is used to store metadata about MatrixOne objects such as: databases, tables, columns, system variables, tenants, users, and roles. -Start with MatrixOne 0.6 has introduced the concept of multi-account, the default `sys` account and other accounts have slightly different behaviors. The system table `mo_account`, which serves the multi-tenancy management, is only visible for the `sys` account; the other accounts don't see this table. +The concept of multi-tenancy was introduced with MatrixOne version 0.6, and the default `sys` tenant behaves slightly differently from other tenants. The system table `mo_account`, which serves multi-tenant management, is only visible to `sys` tenants; it is not visible to other tenants. ### mo_database table @@ -175,6 +174,72 @@ Start with MatrixOne 0.6 has introduced the concept of multi-account, the defaul | created_time | TIMESTAMP(0) | creation time | | comment | TEXT(0) | comment | +## `mo_sessions` view + +| column | type | comments | +| ----------------- | ----------------- | ------------------------------------------------------------ | +| node_id | VARCHAR(65535) | Unique identifier of the atrixOne node. Once activated, it cannot be changed. | +| conn_id | INT UNSIGNED | A unique number associated with the client TCP connection in MatrixOne, automatically generated by Hakeeper. | +| session_id | VARCHAR(65535) | A unique UUID used to identify a session. a new UUID is generated for each new session. | +| account | VARCHAR(65535) | Name of the tenant. | +| user | VARCHAR(65535) | The name of the user. | +| host | VARCHAR(65535) | The IP address and port on which the CN node receives client requests. | +| db | VARCHAR(65535) | The name of the database used when executing the SQL. | +| session_start | VARCHAR(65535) | The timestamp of the session creation. | +| command | VARCHAR(65535) | Types of MySQL commands, such as COM_QUERY, COM_STMT_PREPARE, COM_STMT_EXECUTE, and so on. | +| info | VARCHAR(65535) | The SQL statement to execute. A single SQL may contain multiple statements. | +| txn_id | VARCHAR(65535) | The unique identifier of the associated transaction. | +| statement_id | VARCHAR(65535) | The unique identifier (UUID) of the SQL statement. | +| statement_type | VARCHAR(65535) | Types of SQL statements, such as SELECT, INSERT, UPDATE, and so on. | +| query_type | VARCHAR(65535) | Types of SQL statements such as DQL (Data Query Language), TCL (Transaction Control Language), etc. | +| sql_source_type | VARCHAR(65535) | The source of the SQL statement, such as external or internal. | +| query_start | VARCHAR(65535) | The timestamp at which the SQL statement began execution. | +| client_host | VARCHAR(65535) | The IP address and port number of the client. | +| role | VARCHAR(65535) | The name of the user's role. | + +### `mo_configurations` view + +| column | type | comments | +| ------------- | --------------- | ------------------------------------ | +| node_type | VARCHAR(65535) | Types of nodes: cn (compute node), tn (transaction node), log (log node), proxy (proxy). | +| node_id | VARCHAR(65535) | The unique identifier of the node. | +| name | VARCHAR(65535) |The name of the configuration item, possibly accompanied by a nested structure prefix.| +| current_value | VARCHAR(65535) | The current value of the configuration item. | +| default_value | VARCHAR(65535) | The default value of the configuration item. | +| internal | VARCHAR(65535) | Indicates whether the configuration parameter is internal. | + +### `mo_locks` view + +| column | type | comments | +| ------------- | --------------- | ------------------------------------------------ | +| cn_id | VARCHAR(65535) | cn's uuid | +| txn_id | VARCHAR(65535) | The transaction holding the lock. | +| table_id | VARCHAR(65535) | Locked tables. | +| lock_key | VARCHAR(65535) | Lock type. Can be `point` or `range`. | +| lock_content | VARCHAR(65535) | The contents of the lock, in hexadecimal. For `range` locks, an interval; for `point` locks, a single value. | +| lock_mode | VARCHAR(65535) | Lock mode. Can be `shared` or `exclusive`. | +| lock_status | VARCHAR(65535) | Lock status, which may be `wait`, `acquired` or `none`.
wait. No transaction holds the lock, but there are transactions waiting on the lock.
acquired. A transaction holds the lock.
none. No transaction holds the lock, and no transaction is waiting on the lock. | +| lock_wait | VARCHAR(65535) | Transactions waiting on this lock. | + +### `mo_transactions` view + +| column | type | comments | +| ------------- | --------------- | ------------------------------------ | +| cn_id | VARCHAR(65535) | ID that uniquely identifies the CN (Compute Node). | +| txn_id | VARCHAR(65535) | The ID that uniquely identifies the transaction. | +| create_ts | VARCHAR(65535) | Record the transaction creation timestamp, following the RFC3339Nano format ("2006-01-02T15:04:05.99999999999Z07:00"). | +| snapshot_ts | VARCHAR(65535) | Represents the snapshot timestamp of the transaction, expressed in both physical and logical time. | +| prepared_ts | VARCHAR(65535) | Indicates the prepared timestamp of the transaction, in the form of physical and logical time. | +| commit_ts | VARCHAR(65535) | Indicates the commit timestamp of the transaction, in both physical and logical time.| +| txn_mode | VARCHAR(65535) | Identifies the transaction mode, which can be either pessimistic or optimistic. | +| isolation | VARCHAR(65535) | Indicates the isolation level of the transaction, either SI (Snapshot Isolation) or RC (Read Committed). | +| user_txn | VARCHAR(65535) | Indicates a user transaction, i.e., a transaction created by a SQL operation performed by a user connecting to MatrixOne via a client. | +| txn_status | VARCHAR(65535) | Indicates the current state of the transaction, with possible values including active, committed, aborting, aborted. In the distributed transaction 2PC model, this would also include prepared and committing. | +| table_id | VARCHAR(65535) | Indicates the ID of the table involved in the transaction. | +| lock_key | VARCHAR(65535) | Indicates the type of lock, either range or point. | +| lock_content | VARCHAR(65535) | Point locks represent individual values, range locks represent ranges, usually in the form of "low - high". Note that transactions may involve multiple locks, but only the first lock is shown here.| +| lock_mode | VARCHAR(65535) | Indicates the mode of the lock, either exclusive or shared. | + ### mo_user_defined_function table | column | type | comments | @@ -201,10 +266,14 @@ Start with MatrixOne 0.6 has introduced the concept of multi-account, the defaul | column | type | comments | | -----------------| --------------- | ----------------- | -| configuration_id | INT(32) | Configuration item id, an auto-increment column, used as a primary key to distinguish different configurations | +| configuration_id | INT(32) | Configuration item id, self-incrementing column, used as primary key to distinguish between different configurations | +| account_id | INT(32) | Tenant id of the configuration | | account_name | VARCHAR(300) | The name of the tenant where the configuration is located | -| dat_name | VARCHAR(5000) | The name of the database where the configuration is located | -| configuration | JSON(0) | Configuration content, saved in JSON format | +| dat_name | VARCHAR(5000) | The name of the database where the configuration resides | +| variable_name | VARCHAR(300) | The name of the variable | +| variable_value | VARCHAR(5000) | The name of the database where the configuration resides. | +| variable_value | VARCHAR(5000) | The value of the variable | +| system_variables | BOOL(0) | if it is a system variable (compatibility variables are added in addition to system variables) | ### mo_pubs table @@ -231,6 +300,9 @@ Start with MatrixOne 0.6 has introduced the concept of multi-account, the defaul | database_id | BIGINT UNSIGNED(64) | ID of the database where the index resides | | name | VARCHAR(64) | name of the index | | type | VARCHAR(11) | The type of index, including primary key index (PRIMARY), unique index (UNIQUE), secondary index (MULTIPLE) | +| algo_table_type | VARCHAR(11) | Algorithm for creating indexes | +| algo_table_type | VARCHAR(11) | Hidden table types for multi-table indexes | +| | algo_params | VARCHAR(2048) | Parameters for indexing algorithms | | is_visible | TINYINT(8) | Whether the index is visible, 1 means visible, 0 means invisible (currently all MatrixOne indexes are visible indexes) | | hidden | TINYINT(8) | Whether the index is hidden, 1 is a hidden index, 0 is a non-hidden index| | comment | VARCHAR(2048) | Comment information for the index | @@ -316,11 +388,15 @@ It records user and system SQL statement with detailed information. | exec_plan | JSON | statement execution plan | | rows_read | BIGINT | rows read total | | bytes_scan | BIGINT | bytes scan total | -| stats | JSON | global stats info in exec_plan | +| stats | JSON | global stats info in | +|exec_plan | JSON | statement execution plan | +| rows_read | BIGINT | Read the total number of rows | +| bytes_scan | BIGINT | Total bytes scanned | +| stats | JSON | Global statistics in exec_plan | | statement_type | VARCHAR(1024) | statement type, val in [Insert, Delete, Update, Drop Table, Drop User, ...] | -| query_type | VARCHAR(1024) | query type, val in [DQL, DDL, DML, DCL, TCL] | -| role_id | BIGINT | role id | -| sql_source_type | TEXT | Type of SQL source internally generated by MatrixOne | +| query_type | VARCHAR(1024) | query type, val in [DQL, DDL, DML, DCL, TCL] | +| role_id | BIGINT | role id | +| sql_source_type | TEXT | Type of SQL source internally generated by MatrixOne | | aggr_count | BIGINT(64) | the number of statements aggregated | | result_count | BIGINT(64) | the number of rows of sql execution results | @@ -328,28 +404,31 @@ It records user and system SQL statement with detailed information. It records very detailed system logs. -| Column | Type | Comments | +| Column | Type | Comments | | -------------- | ------------- | ------------------------------------------------------------ | -| raw_item | VARCHAR(1024) | raw log item | -| node_uuid | VARCHAR(36) | node uuid, which node gen this data. | -| node_type | VARCHAR(64) | node type in MO, val in [TN, CN, LOG] | -| span_id | VARCHAR(16) | span unique id | -| statement_id | VARCHAR(36) | statement unique id | -| logger_name | VARCHAR(1024) | logger name | -| timestamp | DATETIME | timestamp of action | -| level | VARCHAR(1024) | log level, enum: debug, info, warn, error, panic, fatal | -| caller | VARCHAR(1024) | where it log, like: package/file.go:123 | -| message | TEXT | log message | -| extra | JSON | log dynamic fields | -| err_code | VARCHAR(1024) | error log | -| error | TEXT | error message | -| stack | VARCHAR(4096) | | -| span_name | VARCHAR(1024) | span name, for example: step name of execution plan, function name in code, ... | -| parent_span_id | VARCHAR(16) | parent span unique id | -| start_time | DATETIME | | -| end_time | DATETIME | | -| duration | BIGINT | exec time, unit: ns | -| resource | JSON | static resource information | +| raw_item | VARCHAR(1024) | Original log entry | +| node_uuid | VARCHAR(36) | Node uuid, i.e. a node that generates data | +| node_type | VARCHAR(64) | Node type of TN/CN/Log to which var belongs within MatrixOne | +| span_id | VARCHAR(16) | The unique ID of the span | +| trace_id | VARCHAR(36) | trace unique uuid | +| logger_name | VARCHAR(1024) | Name of the logger | +| timestamp | DATETIME | Time-stamped actions | +| level | VARCHAR(1024) | Log level, e.g. debug, info, warn, error, panic, fatal | +| caller | VARCHAR(1024) | Where the Log is generated: package/file.go:123 | +| message | TEXT | log message | +| extra | JSON | Log dynamic fields | +| err_code | VARCHAR(1024) | error log | +| error | TEXT | error message | +| stack | VARCHAR(4096) | Stack information for log_info and error_info | +| span_name | VARCHAR(1024) | span name, e.g. step name of execution plan, function name in code, ... | +| parent_span_id | VARCHAR(16) | Parent span unique ID | +| start_time | DATETIME | span Start time | +| end_time | DATETIME | span End time | +| duration | BIGINT | Execution time in ns | +| resource | JSON | Static resource information | +| span_kind | VARCHAR(1024) | span type. internal: MO internal generated trace (default); statement: trace_id==statement_id; remote: communicate via morpc| +| statement_id | VARCHAR(36) | ID of the declaration statement | +| session_id | VARCHAR(36) | ID of the session | The other 3 tables(`log_info`, `span_info` and `error_info`) are views of `statement_info` and `rawlog` table. @@ -459,18 +538,27 @@ The description of columns in the `PARTITIONS` table is as follows: - `NODEGROUP`: This is the nodegroup to which the partition belongs. - `TABLESPACE_NAME`: The name of the tablespace to which the partition belongs. The value is always `DEFAULT`. -### `PROCESSLIST` table - -Fields in the `PROCESSLIST` table are described as follows: - -- ID: The ID of the user connection. -- USER: The name of the user who is executing `PROCESS`. -- HOST: The address that the user is connecting to. -- DB: The name of the currently connected default database. -- COMMAND: The command type that `PROCESS` is executing. -- TIME: The current execution duration of `PROCESS`, in seconds. -- STATE: The current connection state. -- INFO: The requested statement that is being processed. +### `PROCESSLIST` view + +Fields in the `PROCESSLIST` view are described as follows: + +- `NODE_ID`: CN node UUID +- `CONN_ID`: ID of the user connection +- `SESSION_ID`: ID of the session +- `ACCOUNT`: tenant name +- `USER`: user name +- `HOST`: the listening address of the CN node +- `DB`: the currently connected database +- `SESSION_START`: session creation time +- `COMMAND`: the MySQL protocol command for the statement +- `INFO`: SQL statement being processed +- `TXN_ID`: transaction ID +- `STATEMENT_ID`: Statement ID +- `STATEMENT_TYPE`: type of statement, Select/Update/Delete, etc. +- `QUERY_TYPR`: query type, DQL/DDL/DML etc. +- `SQL_SOURCE_TYPE`: SQL statement source type, external or internal SQL: external_sql/internal_sql +- `QUERY_START`: Query start time. +- `CLIENT_HOST`: client address ### `SCHEMATA` table @@ -520,7 +608,7 @@ Fields in the `USER_PRIVILEGES` table are described as follows: - `PRIVILEGE_TYPE`: The privilege type to be granted. Only one privilege type is shown in each row. - `IS_GRANTABLE`: If you have the `GRANT OPTION` privilege, the value is `YES`; otherwise, the value is `NO`. -### `VIEW` table +### `VIEW` view - `TABLE_CATALOG`: The name of the catalog the view belongs to. The value is `def`. - `TABLE_SCHEMA`: The name of the database to which the view belongs. @@ -533,7 +621,7 @@ Fields in the `USER_PRIVILEGES` table are described as follows: - `CHARACTER_SET_CLIENT`: The session value of the `character_set_client` system variable when the view was created. - `COLLATION_CONNECTION`: The session value of the `collation_connection` system variable when the view was created. -### `STATISTICS` Table +### `STATISTICS` view Obtain detailed information about database table indexes and statistics. For example, you can check whether an index is unique, understand the order of columns within an index, and estimate the number of unique values in an index. diff --git a/docs/MatrixOne/Reference/Variable/system-variables/foreign_key_checks.md b/docs/MatrixOne/Reference/Variable/system-variables/foreign_key_checks.md new file mode 100644 index 000000000..a52028d49 --- /dev/null +++ b/docs/MatrixOne/Reference/Variable/system-variables/foreign_key_checks.md @@ -0,0 +1,97 @@ +# Foreign key constraint checking + +In MatrixOne, `foreign_key_checks` is a system variable that controls the checking of foreign key constraints. This variable can be global or session level. When set to 1 (the default), MatrixOne checks the integrity of the foreign key constraint, ensuring the referential integrity of the data. If set to 0, these checks are skipped. + +!!! note + What is inconsistent with MySQL behavior is that when foreign key constraint checking is turned off, the parent table is deleted and MySQL does not delete the foreign key relationship of the child table referencing the parent table, but MatrixOne deletes the foreign key relationship of the child table referencing the parent table and reestablishes the foreign key relationship after rebuilding the parent table. + +## View foreign_key_checks + +Use the following command in MatrixOne to view foreign_key_checks: + +```sql + +--global mode +SELECT @@global.foreign_key_checks; +SHOW global VARIABLES LIKE 'foreign_key_checks'; + +--session mode +SELECT @@session.foreign_key_checks; +SHOW session VARIABLES LIKE 'foreign_key_checks'; +``` + +## Set foreign_key_checks + +Set foreign_key_checks in MatrixOne with the following command: + +```sql +--Global mode, reconnecting the database takes effect +set global foreign_key_checks = 'xxx' + +--session mode +set session foreign_key_checks = 'xxx' +``` + +## Examples + +```sql +mysql> SELECT @@session.foreign_key_checks; ++----------------------+ +| @@foreign_key_checks | ++----------------------+ +| 1 | ++----------------------+ +1 row in set (0.00 sec) + +create table t2(a int primary key,b int); +create table t1( b int, constraint `c1` foreign key `fk1` (b) references t2(a)); + +insert into t2 values(1,2); +mysql> insert into t1 values(3);--When foreign key constraint checking is turned on, values that violate the constraint cannot be inserted +ERROR 20101 (HY000): internal error: Cannot add or update a child row: a foreign key constraint fails + +mysql> drop table t2;--Parent table cannot be deleted when foreign key constraint checking is turned on +ERROR 20101 (HY000): internal error: can not drop table 't2' referenced by some foreign key constraint + +set session foreign_key_checks =0;--Turn off foreign key constraint checking +mysql> SELECT @@session.foreign_key_checks; ++----------------------+ +| @@foreign_key_checks | ++----------------------+ +| 0 | ++----------------------+ +1 row in set (0.00 sec) + +mysql> insert into t1 values(3);--When you turn off foreign key constraint checking, you can insert values that violate constraints +Query OK, 1 row affected (0.01 sec) + +mysql> drop table t2;--When you turn off foreign key constraint checking, you can delete the parent table. +Query OK, 0 rows affected (0.02 sec) + +mysql> show create table t1;--Delete the parent table and the foreign key constraints are also deleted ++-------+--------------------------------------------+ +| Table | Create Table | ++-------+--------------------------------------------+ +| t1 | CREATE TABLE `t1` ( +`b` INT DEFAULT NULL +) | ++-------+--------------------------------------------+ +1 row in set (0.00 sec) + +mysql> create table t2(n1 int);--Rebuild the deleted parent table t2 with the original foreign key reference columns of the child table. +ERROR 20101 (HY000): internal error: column 'a' no exists in table '' + +mysql> create table t2(n1 int,a int primary key);--Contains referenced primary key column a, rebuild successful +Query OK, 0 rows affected (0.01 sec) + +mysql> show create table t1;--After rebuilding t2, the foreign key relationship is automatically re-established ++-------+-------------------------------------------------------------------------------------------------------------------------------------------+ +| Table | Create Table | ++-------+-------------------------------------------------------------------------------------------------------------------------------------------+ +| t1 | CREATE TABLE `t1` ( +`b` INT DEFAULT NULL, +CONSTRAINT `c1` FOREIGN KEY (`b`) REFERENCES `t2` (`a`) ON DELETE RESTRICT ON UPDATE RESTRICT +) | ++-------+-------------------------------------------------------------------------------------------------------------------------------------------+ +1 row in set (0.00 sec) +``` diff --git a/docs/MatrixOne/Reference/Variable/system-variables/keep_user_target_list_in_result.md b/docs/MatrixOne/Reference/Variable/system-variables/keep_user_target_list_in_result.md new file mode 100644 index 000000000..3eaeef6b0 --- /dev/null +++ b/docs/MatrixOne/Reference/Variable/system-variables/keep_user_target_list_in_result.md @@ -0,0 +1,76 @@ +# keep_user_target_list_in_result Keep query result set column names consistent with user specified case + +In a MatrixOne query, keep the result set column names the same size as the name specified by the user, except by using aliases, or by setting parameters. + +`keep_user_target_list_in_result` is a global parameter that MatrixOne sets whether the query result set column names match the user-specified name case. + +## View keep_user_target_list_in_result + +View `keep_user_target_list_in_result` in MatrixOne using the following command: + +```sql +--default 1 +show variables like "keep_user_target_list_in_result"; +select @@keep_user_target_list_in_result; +``` + +## Set keep_user_target_list_in_result + +Set `keep_user_target_list_in_result` in MatrixOne with the following command: + +```sql +--default is 1, reconnecting to database takes effect +set global keep_user_target_list_in_result = 0; +``` + +## Examples + +```sql +create table t1(aa int, bb int, cc int, AbC varchar(25), A_BC_d double); +insert into t1 values (1,2,3,'A',10.9); + +mysql> select * from t1; ++------+------+------+------+--------+ +| aa | bb | cc | abc | a_bc_d | ++------+------+------+------+--------+ +| 1 | 2 | 3 | A | 10.9 | ++------+------+------+------+--------+ +1 row in set (0.00 sec) + +mysql> select @@keep_user_target_list_in_result; --Query parameter values, on by default ++-----------------------------------+ +| @@keep_user_target_list_in_result | ++-----------------------------------+ +| 1 | ++-----------------------------------+ +1 row in set (0.01 sec) + +mysql> select aA, bB, CC, abc, a_Bc_D from t1;--On, the query result set column names are case sensitive as specified by the user. ++------+------+------+------+--------+ +| aA | bB | CC | abc | a_Bc_D | ++------+------+------+------+--------+ +| 1 | 2 | 3 | A | 10.9 | ++------+------+------+------+--------+ +1 row in set (0.00 sec) + +mysql> set global keep_user_target_list_in_result =0;--Turn off the query result set column name and user-specified name size consistency setting +Query OK, 0 rows affected (0.01 sec) + +mysql> exit;--Parameters take effect after exiting the database and reconnecting + +mysql> show variables like "keep_user_target_list_in_result"; ++---------------------------------+-------+ +| Variable_name | Value | ++---------------------------------+-------+ +| keep_user_target_list_in_result | 0 | ++---------------------------------+-------+ +1 row in set (0.00 sec) + +mysql> select aA, bB, CC, abc, a_Bc_D from t1;--The column names of the query result set do not match the case of the user-specified name when the setting is turned off ++------+------+------+------+--------+ +| aa | bb | cc | abc | a_bc_d | ++------+------+------+------+--------+ +| 1 | 2 | 3 | A | 10.9 | ++------+------+------+------+--------+ +1 row in set (0.00 sec) +``` diff --git a/docs/MatrixOne/Reference/Variable/system-variables/lower_case_tables_name.md b/docs/MatrixOne/Reference/Variable/system-variables/lower_case_tables_name.md index 75174c4b1..9c3116fad 100644 --- a/docs/MatrixOne/Reference/Variable/system-variables/lower_case_tables_name.md +++ b/docs/MatrixOne/Reference/Variable/system-variables/lower_case_tables_name.md @@ -1,82 +1,57 @@ -# `lower_case_table_names` support +# lower_case_table_names Case sensitive support -There are 5 different modes for the MatrixOne case sensitivity, and the case parameter `lower_case_table_names` can be set to 0, 1, 2, 3, or 4. +`lower_case_table_names` is a global variable that MatrixOne sets whether case is sensitive. -## Parameter Explanation - -### Setting Parameter Value to 0 +!!! note + Unlike mysql, MatrixOne supports only **0** and **1** modes for now, and defaults to 1 on both linux and mac systems. -Setting `lower_case_table_names` to 0 stores identifiers as the original strings, and name comparisons are case sensitive. +## View lower_case_table_names -**Examples** +View `lower_case_table_names` in MatrixOne using the following command: ```sql -set global lower_case_table_names = 0; -create table Tt (Aa int); -insert into Tt values (1), (2), (3); - -mysql> select Aa from Tt; -+------+ -| Aa | -+------+ -| 1 | -| 2 | -| 3 | -+------+ -3 rows in set (0.03 sec) +show variables like "lower_case_table_names"; -- defaults to 1 ``` -### Setting Parameter Value to 1 - -Setting `lower_case_table_names` to 1 stores identifiers as lowercase, and name comparisons are case insensitive. +## Set lower_case_table_names -**Examples** +Set `lower_case_table_names` in MatrixOne with the following command: ```sql -set global lower_case_table_names = 1; -create table Tt (Aa int); -insert into Tt values (1), (2), (3); - -mysql> select Aa from Tt; -+------+ -| aa | -+------+ -| 1 | -| 2 | -| 3 | -+------+ -3 rows in set (0.03 sec) +set global lower_case_table_names = 0; --default is 1, reconnecting to database takes effect ``` -```sql -set global lower_case_table_names = 1; -create table t(a int); -insert into t values(1), (2), (3); +## Explanation of parameters --- Column aliases display the original string when the result set is returned, but name comparisons are case insensitive, as shown in the following example: -mysql> select a as Aa from t; -+------+ -| Aa | -+------+ -| 1 | -| 2 | -| 3 | -+------+ -3 rows in set (0.03 sec) -``` +### parameter is set to 0 -### Setting Parameter Value to 2 - -Setting `lower_case_table_names` to 2 stores identifiers as the original strings, and name comparisons are case insensitive. +Set `lower_case_table_names` to 0. Identifiers are stored as raw strings with names that are case sensitive. **Examples** ```sql -set global lower_case_table_names = 2; +mysql> show variables like "lower_case_table_names";--Check the default parameter, the default value is 1 ++------------------------+-------+ +| Variable_name | Value | ++------------------------+-------+ +| lower_case_table_names | 1 | ++------------------------+-------+ +1 row in set (0.00 sec) + +set global lower_case_table_names = 0;--Reconnecting to the database takes effect + +mysql> show variables like "lower_case_table_names";--Reconnect to the database to view the parameters, the change was successful ++------------------------+-------+ +| Variable_name | Value | ++------------------------+-------+ +| lower_case_table_names | 0 | ++------------------------+-------+ +1 row in set (0.00 sec) + create table Tt (Aa int); -insert into tt values (1), (2), (3); +insert into Tt values (1), (2), (3); -mysql> select AA from tt; +mysql> select Aa from Tt;--Name comparison is case sensitive +------+ | Aa | +------+ @@ -87,69 +62,44 @@ mysql> select AA from tt; 3 rows in set (0.03 sec) ``` -### Setting Parameter Value to 3 +### Parameter set to 1 -Setting `lower_case_table_names` to 3 stores identifiers as uppercase, and name comparisons are case insensitive. +将 `lower_case_table_names` Set to 1. identifiers are stored in lowercase and name comparisons are case insensitive. -**Examples** +**Example** ```sql -set global lower_case_table_names = 3; -create table Tt (Aa int); -insert into Tt values (1), (2), (3); +set global lower_case_table_names = 1;--Reconnecting to the database takes effect -mysql> select Aa from Tt; +mysql> show variables like "lower_case_table_names";--Reconnect to the database to view the parameters, the change was successful ++------------------------+-------+ +| Variable_name | Value | ++------------------------+-------+ +| lower_case_table_names | 1 | ++------------------------+-------+ +1 row in set (0.00 sec) + +create table Tt (Aa int,Bb int); +insert into Tt values (1,2), (2,3), (3,4); + +mysql> select Aa from Tt;--Name comparison is case insensitive +------+ -| AA | +| aa | +------+ | 1 | | 2 | | 3 | +------+ 3 rows in set (0.03 sec) -``` - -### Setting Parameter Value to 4 - -Setting `lower_case_table_names` to 4 stores identifiers with `` as the original strings and case sensitive, while others are converted to lowercase. - -## Configuration Parameters - -- To configure globally, insert the following code in the cn.toml configuration file before starting MatrixOne: - -``` -[cn.frontend] -lowerCaseTableNames = "0" // default is 1 -# 0 stores identifiers as the original strings and name comparisons are case sensitive -# 1 stores identifiers as lowercase and name comparisons are case insensitive -# 2 stores identifiers as the original strings and name comparisons are case insensitive -# 3 stores identifiers as uppercase and name comparisons are case insensitive -# 4 stores identifiers with `` as the original strings and case sensitive, while others are converted to lowercase -``` - -When configuring globally, each cn needs to be configured if multiple cns are started. For configuration file parameter instructions, see[Boot Parameters for standalone installation](../../System-Parameters/system-parameter.md). - -!!! note - Currently, you can only set the parameter to 0 or 1. However, the parameter 2,3 or 4 is not supported. - -- To enable saving query results only for the current session: - -```sql -set global lower_case_table_names = 1; -``` - -When creating a database, MatrixOne automatically obtains the value of `lower_case_table_names` as the default value for initializing the database configuration. - -## Features that are different from MySQL - -MatrixOne lower_case_table_names is set to 1 by default and only supports setting the value to 0 or 1. - -The default value in MySQL: - -- On Linux: 0. Table and database names are stored on disk using the letter case specified in the CREATE TABLE or CREATE DATABASE statement. Name comparisons are case-sensitive. -- On Windows: 1. It means that table names are stored in lowercase on disk, and name comparisons are not case-sensitive. MySQL converts all table names to lowercase on storage and lookup. This behavior also applies to database names and table aliases. -- On macOS: 2. Table and database names are stored on disk using the letter case specified in the CREATE TABLE or CREATE DATABASE statement, but MySQL converts them to lowercase on lookup. Name comparisons are not case-sensitive. - -## **Constraints** -MatrixOne system variable `lower_case_table_names` does not currently support setting values 2, 3, or 4. +-- The alias of a column displays the original string when the result set is returned, but the name comparison is case insensitive, as shown in the following example: +mysql> select Aa as AA,Bb from Tt; ++------+------+ +| AA | bb | ++------+------+ +| 1 | 2 | +| 2 | 3 | +| 3 | 4 | ++------+------+ +3 rows in set (0.00 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Variable/system-variables/save_query_result.md b/docs/MatrixOne/Reference/Variable/system-variables/save_query_result.md index ef437ed55..b78b16ce4 100644 --- a/docs/MatrixOne/Reference/Variable/system-variables/save_query_result.md +++ b/docs/MatrixOne/Reference/Variable/system-variables/save_query_result.md @@ -1,100 +1,107 @@ -# save_query_result Support +# save_query_result Save query result support -After enabling `save_query_result`, MatrixOne will save the query results. +When `save_query_result` is turned on, MatrixOne saves the query results. -Three parameters affect the saving of query results: +There are three parameters that have an impact on saving query results: -- `save_query_result`: enables/disables the saving of query results. -- `query_result_timeout`: sets the time for saving query results. -- `query_result_maxsize`: sets the maximum size of a single query result. +- `save_query_result`: Turn on/off saving query results. -## Enable `save_query_result` +- `query_result_timeout`: Set how long to save query results. -Enable saving query results for the current session only: +- `query_result_maxsize`: Sets the maximum value of a single query result. + +## Limitations + +- Only statements with returned results, such as `SELECT`, `SHOW`, `DESC`, `EXECUTE` statements, are supported for saving +- For `SELECT` statements, only the results of `SELECT` statements that start fixedly with `/*cloud_user */` and `/*save_result */` are saved. + +## Turn on the Save Query Results setting + +- Turn on save query results for current session only: ```sql --- The default is off -set global save_query_result = on -``` + -- defaults to off + set save_query_result = on + ``` -- If you need to enable it globally, you can modify the configuration file `cn.toml` before starting MatrixOne, insert the following code, and save it: +- To turn on saving query results globally: -``` -[cn.frontend] -saveQueryResult = "on" // The default is off +```sql +-- defaults to off +set global save_query_result = on ``` -## Set the saving time +## Set save time -Set the save time unit to hours. +Set the save time in hours. -- Enable `query_result_timeout` only for the current session: +- Open query results save time only for current session: ```sql --- The default is 24 -set global query_result_timeout = 48 +-- Defaults to 24 +set query_result_timeout = 48 ``` -- If you need to enable it globally, you can modify the configuration file `cn.toml` before starting MatrixOne, insert the following code, and save it: +- To open query results globally save time: -``` -[cn.frontend] -queryResultTimeout = 48 //The default is 24 +```sql +-- defaults to 24 +set global query_result_timeout = 48 ``` -__Note:__ If the save time is set to a shorter value than the previous one, it will not affect the previous save results. +__Note:__ Save timeIf the value set is shorter than the last setting, it does not affect the previous save result. -## Set the maximum value of a single query result +## Sets the maximum value of a single query result -Set the maximum unit of a single query result to MB. +Sets the maximum value in MB for a single query result. -- Set the maximum value of query results for the current session only: +- Set a maximum value for query results only for the current session: ```sql --- The default is 100 -set global query_result_maxsize = 200 +-- defaults to 100 +set query_result_maxsize = 200 ``` -- If you need to enable it globally, you can modify the configuration file `cn.toml` before starting MatrixOne, insert the following code, and save it: +- Set the maximum value of query results for the global: -``` -[cn.frontend] -queryResultMaxsize = 200 // The default is 100 +```sql +-- defaults to 100 +set global query_result_maxsize = 200 ``` -__Note:__ If the maximum value of a single query result is set smaller than the previous setting, it will not affect the size of the last saved results. +__Note:__ The maximum value of a single query result does not affect the previously saved result size if the value set is smaller than the previous setting. ### Query metadata information -You can use the following SQL statement to query metadata information: +You can query metadata information using the following SQL statement: ```sql select * from meta_scan(query_id) as u; -当前 account_id +current account_id select query_id from meta_scan(query_id) as u; ``` The metadata information is as follows: -| Column Name | Type | Remarks | -| ------------ | --------- | -------------------------- ------------------------------------- | -| query_id | uuid | query result ID | -| statement | text | SQL statement executed | -| account_id | uint32 | account ID | -| role_id | uint32 | role ID | -| result_path | text | The path to save the query results, the default is the `mo-data/s3` path of the matrixone folder, if you want to modify the default path, you need to modify `data-dir = "mo-data/s3"` in the configuration file . For a description of configuration file parameters, see [Common Parameter Configuration](../../System-Parameters/system-parameter.md) | -| created_time | timestamp | creation time | -| result_size | float | Result size in MB. | -| tables | text | tables used by SQL | -| user_id | uint32 | user ID | -| expired_time | timestamp | timeout of query result| -| column_map | text | If the query has a column result name with the same name, the result scan will remap the column name | +| column | type | comments | +| ------------ | --------- | ------------------------------------------------------------ | +| query_id | uuid | Result ID | +| statement | text | SQL statement executed | +| account_id | uint32 | Account ID | +| role_id | uint32 | Character ID | +| result_path | text | The default path to save the query result is matrixone folder mo-data/s3, if you want to change the default path, you need to change `data-dir = "mo-data/s3"` in the configuration file. If you need to change the default path, you need to change `data-dir = "mo-data/s3"` in the configuration file. For a description of the configuration file parameters, see [General Parameter Configuration](... /... /System-Parameters/system-parameter.md) | +| created_time | timestamp | Creation time | +| result_size | float | Result size in MB. | +| tables | text | Tables used in SQL | +| user_id | uint32 | User ID | +| expired_time | timestamp | Timeout for query results| +| column_map | text | If a query has columns with the same name, result scan remaps the columns. | ## Save query results -You can store query results locally or in S3. +You can save the query results on your local disk or S3. -### Syntax +### Syntax structure ```sql MODUMP QUERY_RESULT query_id INTO s3_path @@ -105,9 +112,9 @@ MODUMP QUERY_RESULT query_id INTO s3_path [MAX_FILE_SIZE unsigned_number] ``` -- query_id: A string of UUID. +- query_id: is a string of UUIDs. -- s3_path: the path where the query result file is saved. The default is the mo-data/s3 path in the matrixone folder. If you need to modify the default path, you must modify `data-dir = "mo-data/s3"` in the configuration file. For more information about configuration file parameters, see [Common Parameter Configuration](../../System-Parameters/system-parameter.md) +- s3_path: is the path where the query result file is saved. The default save path is the matrixone folder mo-data/s3. To modify the default save path, modify `data-dir = "mo-data/s3"` in the configuration file. For a description of the profile parameters, see [General Parameters Configuration](../../System-Parameters/system-parameter.md) ``` root@rootMacBook-Pro 02matrixone % cd matrixone/mo-data @@ -115,67 +122,205 @@ MODUMP QUERY_RESULT query_id INTO s3_path tn-data etl local logservice-data s3 ``` - __Note:__ If you need to export the `csv` file. The path needs to start with `etl:`. + __Note:__ If you need to export a `csv` file. The path needs to start with `etl:` The beginning. -- [FIELDS TERMINATED BY 'char']: optional parameter. Field delimiter, the default is single quote `'`. +- [FIELDS TERMINATED BY 'char']: Optional parameter. Field split symbol, defaults to single quotes`'`. -- [ENCLOSED BY 'char']: optional parameter. Fields include symbols, which default to double quotes `"`. +- [ENCLOSED BY 'char']: Optional parameter. Fields include symbols, defaulting to the quotient double sign `'`. -- [LINES TERMINATED BY 'string']: optional parameter. The end of line symbol, the default is the newline symbol `\n`. +- [LINES TERMINATED BY 'string']: Optional parameter. End of line symbol, defaults to line break symbol `\n`. -The first row of the `csv` file is a header row for each column name.- [header 'bool']: optional parameter. The bool type can choose `true` or `false`. +- [header 'bool']: optional argument. The bool type can be selected as `true` or `false`. The header row for each column name in the first line `of the csv` file. -- [MAX_FILE_SIZE unsigned_number]: optional parameter. The maximum file size of the file is in KB. The default is 0. +- [MAX_FILE_SIZE unsigned_number]: Optional parameter. Maximum file size of the file in KB. The default is 0. -## Example +## Examples + +- Example 1 ```sql --- Enable save_query_result mysql> set global save_query_result = on; --- Set the saving time to 24 hours mysql> set global query_result_timeout = 24; --- Set the maximum value of a single query result to 100M mysql> set global query_result_maxsize = 200; --- Create a table and insert datas mysql> create table t1 (a int); mysql> insert into t1 values(1); --- You can check the table structure to confirm that the inserted data is correct -mysql> select a from t1; +mysql> /* cloud_user */select a from t1; +------+ | a | +------+ | 1 | +------+ 1 row in set (0.16 sec) --- Query the most recently executed query ID in the current session +-- Queries the ID of the most recently executed query in the current session mysql> select last_query_id(); +--------------------------------------+ | last_query_id() | +--------------------------------------+ -| c187873e-c25d-11ed-aa5a-acde48001122 | +| f005ebc6-a3dc-11ee-bb76-26dd28356ef3 | +--------------------------------------+ 1 row in set (0.12 sec) --- Get the query results for this query ID -mysql> select * from result_scan('c187873e-c25d-11ed-aa5a-acde48001122') as t; +-- Get results for this query ID +mysql> select * from result_scan('f005ebc6-a3dc-11ee-bb76-26dd28356ef3') as t; +------+ | a | +------+ | 1 | +------+ 1 row in set (0.01 sec) --- Check the metadata for this query ID -mysql> select * from meta_scan('c187873e-c25d-11ed-aa5a-acde48001122') as t; +-- View metadata for this query ID +mysql> select * from meta_scan('f005ebc6-a3dc-11ee-bb76-26dd28356ef3') as t; +--------------------------------------+------------------+------------+---------+---------------------------------------------------------------------+---------------------+----------------------+--------+---------+---------------------+-----------+ | query_id | statement | account_id | role_id | result_path | create_time | result_size | tables | user_id | expired_time | ColumnMap | +--------------------------------------+------------------+------------+---------+---------------------------------------------------------------------+---------------------+----------------------+--------+---------+---------------------+-----------+ -| c187873e-c25d-11ed-aa5a-acde48001122 | select a from t1 | 0 | 0 | SHARED:/query_result/sys_c187873e-c25d-11ed-aa5a-acde48001122_1.blk | 2023-03-14 19:45:45 | 0.000003814697265625 | t1 | 1 | 2023-03-15 19:45:45 | t1.a -> a | +| f005ebc6-a3dc-11ee-bb76-26dd28356ef3 | select a from t1 | 0 | 0 | SHARED:/query_result/sys_f005ebc6-a3dc-11ee-bb76-26dd28356ef3_1.blk | 2023-12-26 18:53:01 | 0.000003814697265625 | t1 | 0 | 2023-12-27 18:53:01 | a -> a | +--------------------------------------+------------------+------------+---------+---------------------------------------------------------------------+---------------------+----------------------+--------+---------+---------------------+-----------+ -1 row in set (0.00 sec) +1 row in set (0.01 sec) -- Save query results locally -MODUMP QUERY_RESULT c187873e-c25d-11ed-aa5a-acde48001122 INTO 'etl:your_local_path'; +MODUMP QUERY_RESULT 'f005ebc6-a3dc-11ee-bb76-26dd28356ef3' INTO 'etl:your_local_path'; +``` + +- Example 2 + +```sql +mysql> set global save_query_result = on; +mysql> set global query_result_timeout = 24; +mysql> set global query_result_maxsize = 200; +mysql> create table t1 (a int); +mysql> insert into t1 values(1); +mysql> /* save_result */select a from t1; ++------+ +| a | ++------+ +| 1 | ++------+ +1 row in set (0.02 sec) + +mysql> select last_query_id(); ++--------------------------------------+ +| last_query_id() | ++--------------------------------------+ +| afc82394-a45e-11ee-bb9a-26dd28356ef3 | ++--------------------------------------+ +1 row in set (0.00 sec) + +mysql> select * from result_scan('afc82394-a45e-11ee-bb9a-26dd28356ef3') as t; ++------+ +| a | ++------+ +| 1 | ++------+ +1 row in set (0.01 sec) + +mysql> select * from meta_scan('afc82394-a45e-11ee-bb9a-26dd28356ef3') as t; ++--------------------------------------+------------------+------------+---------+---------------------------------------------------------------------+---------------------+----------------------+--------+---------+---------------------+-----------+ +| query_id | statement | account_id | role_id | result_path | create_time | result_size | tables | user_id | expired_time | ColumnMap | ++--------------------------------------+------------------+------------+---------+---------------------------------------------------------------------+---------------------+----------------------+--------+---------+---------------------+-----------+ +| afc82394-a45e-11ee-bb9a-26dd28356ef3 | select a from t1 | 0 | 0 | SHARED:/query_result/sys_afc82394-a45e-11ee-bb9a-26dd28356ef3_1.blk | 2023-12-27 10:21:47 | 0.000003814697265625 | t1 | 0 | 2023-12-28 10:21:47 | a -> a | ++--------------------------------------+------------------+------------+---------+---------------------------------------------------------------------+---------------------+----------------------+--------+---------+---------------------+-----------+ +1 row in set (0.00 sec) +``` + +- Example 3 + +```sql +mysql> set global save_query_result = on; +mysql> set global query_result_timeout = 24; +mysql> set global query_result_maxsize = 200; +mysql> create table t1 (a int); +mysql> insert into t1 values(1); +mysql> show create table t1; ++-------+--------------------------------------------+ +| Table | Create Table | ++-------+--------------------------------------------+ +| t1 | CREATE TABLE `t1` ( +`a` INT DEFAULT NULL +) | ++-------+--------------------------------------------+ +1 row in set (0.02 sec) + +mysql> select * from meta_scan(last_query_id()) as t; ++--------------------------------------+----------------------+------------+---------+---------------------------------------------------------------------+---------------------+-----------------------+--------+---------+---------------------+----------------------------------------------+ +| query_id | statement | account_id | role_id | result_path | create_time | result_size | tables | user_id | expired_time | ColumnMap | ++--------------------------------------+----------------------+------------+---------+---------------------------------------------------------------------+---------------------+-----------------------+--------+---------+---------------------+----------------------------------------------+ +| 617647f4-a45c-11ee-bb97-26dd28356ef3 | show create table t1 | 0 | 0 | SHARED:/query_result/sys_617647f4-a45c-11ee-bb97-26dd28356ef3_1.blk | 2023-12-27 10:05:17 | 0.0000858306884765625 | | 0 | 2023-12-28 10:05:17 | Table -> Table, Create Table -> Create Table | ++--------------------------------------+----------------------+------------+---------+---------------------------------------------------------------------+---------------------+-----------------------+--------+---------+---------------------+----------------------------------------------+ +1 row in set (0.00 sec) ``` -## Constraints +- Example 4 + +```sql +mysql> set global save_query_result = on; +mysql> set global query_result_timeout = 24; +mysql> set global query_result_maxsize = 200; +mysql> create table t1 (a int); +mysql> insert into t1 values(1); +mysql> desc t1; ++-------+---------+------+------+---------+-------+---------+ +| Field | Type | Null | Key | Default | Extra | Comment | ++-------+---------+------+------+---------+-------+---------+ +| a | INT(32) | YES | | NULL | | | ++-------+---------+------+------+---------+-------+---------+ +1 row in set (0.03 sec) + +mysql> select * from meta_scan(last_query_id()) as t; ++--------------------------------------+-----------+------------+---------+---------------------------------------------------------------------+---------------------+---------------------+------------+---------+---------------------+----------------------------------------------------------------------------------------------------------------+ +| query_id | statement | account_id | role_id | result_path | create_time | result_size | tables | user_id | expired_time | ColumnMap | ++--------------------------------------+-----------+------------+---------+---------------------------------------------------------------------+---------------------+---------------------+------------+---------+---------------------+----------------------------------------------------------------------------------------------------------------+ +| 143a54b6-a45d-11ee-bb97-26dd28356ef3 | desc t1 | 0 | 0 | SHARED:/query_result/sys_143a54b6-a45d-11ee-bb97-26dd28356ef3_1.blk | 2023-12-27 10:10:17 | 0.00016021728515625 | mo_columns | 0 | 2023-12-28 10:10:17 | Field -> Field, Type -> Type, Null -> Null, Key -> Key, Default -> Default, Extra -> Extra, Comment -> Comment | ++--------------------------------------+-----------+------------+---------+---------------------------------------------------------------------+---------------------+---------------------+------------+---------+---------------------+----------------------------------------------------------------------------------------------------------------+ +1 row in set (0.00 sec) +``` + +- Example 5 + +```sql +mysql> CREATE TABLE numbers(pk INTEGER PRIMARY KEY, ui BIGINT UNSIGNED, si BIGINT); +Query OK, 0 rows affected (0.02 sec) + +mysql> INSERT INTO numbers VALUES (0, 0, -9223372036854775808), (1, 18446744073709551615, 9223372036854775807); +Query OK, 2 rows affected (0.01 sec) + +mysql> SET @si_min = -9223372036854775808; +Query OK, 0 rows affected (0.00 sec) + +mysql> PREPARE s2 FROM 'SELECT * FROM numbers WHERE si=?'; +Query OK, 0 rows affected (0.01 sec) + +mysql> EXECUTE s2 USING @si_min; ++------+------+----------------------+ +| pk | ui | si | ++------+------+----------------------+ +| 0 | 0 | -9223372036854775808 | ++------+------+----------------------+ +1 row in set (0.02 sec) + +mysql> select * from meta_scan(last_query_id()) as t; ++--------------------------------------+---------------------------------------------------------------------------------------------------+------------+---------+---------------------------------------------------------------------+---------------------+----------------------+--------+---------+---------------------+------------------------------+ +| query_id | statement | account_id | role_id | result_path | create_time | result_size | tables | user_id | expired_time | ColumnMap | ++--------------------------------------+---------------------------------------------------------------------------------------------------+------------+---------+---------------------------------------------------------------------+---------------------+----------------------+--------+---------+---------------------+------------------------------+ +| e83b8df2-a45d-11ee-bb98-26dd28356ef3 | EXECUTE s2 USING @si_min // SELECT * FROM numbers WHERE si=? ; SET @si_min = -9223372036854775808 | 0 | 0 | SHARED:/query_result/sys_e83b8df2-a45d-11ee-bb98-26dd28356ef3_1.blk | 2023-12-27 10:16:13 | 0.000019073486328125 | | 0 | 2023-12-28 10:16:13 | pk -> pk, ui -> ui, si -> si | ++--------------------------------------+---------------------------------------------------------------------------------------------------+------------+---------+---------------------------------------------------------------------+---------------------+----------------------+--------+---------+---------------------+------------------------------+ +1 row in set (0.00 sec) +``` + +- Example 6 + +```sql +mysql> set global save_query_result = on; +mysql> set global query_result_timeout = 24; +mysql> set global query_result_maxsize = 200; +mysql> create table t1 (a int); +mysql> insert into t1 values(1); +mysql> select * from t1; ++------+ +| a | ++------+ +| 1 | ++------+ +1 row in set (0.00 sec) -MatrixOne only supports on saving the query results of `SELECT` and `SHOW`. +mysql> select * from meta_scan(last_query_id()) as t; +ERROR 20405 (HY000): file query_result_meta/sys_c16859e4-a462-11ee-bba0-26dd28356ef3.blk is not found +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/Variable/system-variables/sql-mode.md b/docs/MatrixOne/Reference/Variable/system-variables/sql-mode.md index 5ac94e49a..11caac45a 100644 --- a/docs/MatrixOne/Reference/Variable/system-variables/sql-mode.md +++ b/docs/MatrixOne/Reference/Variable/system-variables/sql-mode.md @@ -1,202 +1,72 @@ # SQL Mode -`sql_mode` is a system parameter in MatrixOne, which specifies the mode in which MatrixOne executes queries and operations. `sql_mode` can affect the syntax and semantic rules of MatrixOne, thus altering the behavior of MatrixOne's SQL queries. This article will introduce the purpose of `sql_mode`, standard modes, and how to set `sql_mode`. - -## Why set `sql_mode` - -`sql_mode` can control the behavior of MatrixOne, including how to handle NULL values, perform insert operations, and sort and compare strings. It can ensure strict compliance with SQL standards and avoid non-standard behavior. In addition, `sql_mode` can help developers better identify errors and potential issues in SQL statements. - -## Default modes of `sql_mode` - -The following are the standard modes of `sql_mode`, which are also the default modes in MatrixOne: - -- `ONLY_FULL_GROUP_BY`: The `GROUP BY` clause is used to group query results and perform aggregate calculations on each group, such as `COUNT`, `SUM`, `AVG`, etc. In the `GROUP BY` clause, the specified columns are the grouping columns. Other columns can be identified in the `SELECT` list, including aggregate or non-aggregate function columns. Without the `ONLY_FULL_GROUP_BY` mode, if a non-aggregate function column is set in the `SELECT` list, MatrixOne will select any value that matches the `GROUP BY` column use it to calculate the aggregate function by default. +sql_mode is a system parameter in MatrixOne that specifies the mode in which MatrixOne performs queries and operations. sql\_mode can affect the syntax and semantic rules of MatrixOne, changing the behavior of MatrixOne queries for SQL. In this article, you will be introduced to the mode of sql\_mode, what it does, and how to set SQL mode. !!! note - If your table structure is complex and for ease of querying, you can disable the `ONLY_FULL_GROUP_BY` mode. - -- `STRICT_TRANS_TABLES`: When executing `INSERT` and `UPDATE` statements, an error will be reported if the data does not conform to the rules defined for the table. - -- `NO_ZERO_IN_DATE`: Prohibits inserting zero values into fields of type `DATE` or `DATETIME`. - -- `NO_ZERO_DATE`: Prohibits inserting or updating the field value of `0000-00-00` as a date or datetime type. The purpose of this mode is to avoid inserting invalid or illegal values into a date or datetime field and require the use of a valid date or datetime values. If such an operation is performed, an error will be reported. It should be noted that the `NO_ZERO_DATE` mode is only effective for insert or update operations. For existing `0000-00-00` values, they can still be queried and used. - -- `ERROR_FOR_DIVISION_BY_ZERO`: This mode throws an error when dividing by zero. - -- `NO_ENGINE_SUBSTITUTION`: This mode throws an error when executing `ALTER TABLE` or `CREATE TABLE` statements. The purpose of this mode is to force the use of the specified storage engine, preventing data inconsistencies or performance issues. The specified storage engine is unavailable or does not exist instead of automatically substituting it with another available storage engine. If automatic substitution of storage engines is desired, this mode can be removed from the sql_mode or set to other supported sql_mode methods. It is important to note that this mode only applies to `ALTER TABLE` or `CREATE TABLE` statements and does not affect the storage engine of existing tables. - -## Optional modes for sql_mode - -- `ANSI`: ANSI is a standard SQL language specification developed by `ANSI` (American National Standards Institute). In `ANSI` mode, SQL statements must comply with the `ANSI` SQL standard, which means that specific SQL language extensions or features specific to a particular database cannot be used. - -- `ALLOW_INVALID_DATES`: `ALLOW_INVALID_DATES`, also known as "loose mode" in MatrixOne SQL mode, allows the insertion of invalid dates in standard date format, such as '0000-00-00' or '2000-00-00'. This mode exists to be compatible with some earlier versions of MySQL and non-standard date formats. It is important to note that inserting invalid dates in `ALLOW_INVALID_DATES` mode can cause unexpected behavior, as invalid dates will not be handled appropriately. Therefore, it is always recommended to use the standard date format. - -- `ANSI_QUOTES`: `ANSI_QUOTES` is a strict mode in SQL mode, used to enforce SQL standards more strictly. In `ANSI_QUOTES` mode, MatrixOne treats double quotes as identifier quotes instead of string quotes. If you want to use double quotes to quote an identifier such as a table name or column name, you must use double quotes instead of single quotes. For example, the following SQL statement is correct in `ANSI_QUOTES` mode: - - ```sql - SELECT "column_name" FROM "table_name"; - ``` - - In the default SQL mode, double quotes will be interpreted as string quotes, resulting in incorrect syntax. Therefore, to use double quotes to quote identifiers, you must set MatrixOne to the `ANSI_QUOTES` mode. - - It should be noted that using the `ANSI_QUOTES` mode may cause SQL syntax incompatibility with other database systems because most other database systems use double quotes as string quotes rather than identifier quotes. Therefore, `ANSI_QUOTES` mode should be used cautiously when writing portable SQL statements. - -- `HIGH_NOT_PRECEDENCE`: `HIGH_NOT_PRECEDENCE` is called the "high-priority `NOT` operator" mode in MatrixOne SQL mode. In `HIGH_NOT_PRECEDENCE` mode, MatrixOne treats the `NOT` operator as a high-priority operator, meaning its priority is higher than most other operators. This means that if you use both the `NOT` operator and other operators in an SQL statement, MatrixOne will first calculate the result of the `NOT` operator and then calculate the results of the other operators. For example: - - ```sql - SELECT * FROM table WHERE NOT column = 1 AND column2 = 'value'; - ``` - - In `HIGH_NOT_PRECEDENCE` mode, MatrixOne will first calculate the result of `NOT` column = 1, and then calculate the result of column2 = 'value'. If the `NOT` operator is not correctly placed in the statement, it may result in unexpected results. - - It should be noted that in MatrixOne's default SQL mode, the `NOT` operator has the same priority as other operators. If you need to use the `HIGH_NOT_PRECEDENCE` mode, make sure to use parentheses in your SQL statements to clarify the priority. - -- `IGNORE_SPACE`: `IGNORE_SPACE` is referred to as the "ignore space" mode in MatrixOne SQL mode. In `IGNORE_SPACE` mode, MatrixOne ignores multiple spaces or tabs in an SQL statement and only considers one space or tab as a delimiter. This means that the following two SQL statements are equivalent in `IGNORE_SPACE` mode: - - ```sql - SELECT * FROM my_table; - SELECT*FROM my_table; - ``` - - The purpose of this mode is to make SQL statements more flexible and readable by allowing any number of spaces or tabs between keywords. However, it should be noted that in some cases, this mode may cause unexpected behavior, such as syntax errors when spaces or tabs are incorrectly placed in SQL functions or column names. - - By default, MatrixOne does not enable the `IGNORE_SPACE` mode. To enable this mode, you can use the SQL command `SET sql_mode='IGNORE_SPACE'` when connecting to MatrixOne. - -- `NO_AUTO_VALUE_ON_ZERO`: `NO_AUTO_VALUE_ON_ZERO` is called the "no auto value on zero" mode in MatrixOne SQL mode. In `NO_AUTO_VALUE_ON_ZERO` mode, when you insert a value of 0 into an auto-increment column, MatrixOne does not treat it as an auto-increment value but as a regular 0 value. This means that if you insert a value of 0 into an auto-increment column, the value of that column will not be automatically incremented but will remain 0 in `NO_AUTO_VALUE_ON_ZERO` mode. For example, the following SQL statement will not auto-increment the id column in `NO_AUTO_VALUE_ON_ZERO` mode: - - ```sql - CREATE TABLE my_table ( - id INT(11) NOT NULL AUTO_INCREMENT, - name VARCHAR(255) NOT NULL, - PRIMARY KEY (id) - ); - - INSERT INTO my_table (id, name) VALUES (0, 'John'); - ``` - - In the default SQL mode, when you insert a value of 0 into an auto-increment column, MatrixOne treats it as an auto-increment value and automatically increases it to the next available one. However, this may not be the desired behavior in some cases, so you can use the `NO_AUTO_VALUE_ON_ZERO` mode to disable it. - - If you use the `NO_AUTO_VALUE_ON_ZERO` mode, inserting data with a value of 0 may cause primary key duplicates or unique vital conflicts. So, extra attention is needed when you insert data. - -- `NO_BACKSLASH_ESCAPES`: `NO_BACKSLASH_ESCAPES` is also known as "no backslash escapes" mode in MatrixOne SQL mode. In `NO_BACKSLASH_ESCAPES` mode, MatrixOne does not treat the backslash as an escape character. This means that you cannot use the backslash to escape special characters, such as quotes or percent signs, in SQL statements. Instead, if you need to use these special characters in SQL statements, you must use other methods to escape them, such as using single quotes to represent double quotes in strings. For example, the following SQL statement will cause a syntax error in `NO_BACKSLASH_ESCAPES` mode: - - ```sql - SELECT 'It's a nice day' FROM my_table; - ``` - - In the default SQL mode, MatrixOne allows backslashes to escape special characters, so backslashes can be used in SQL statements to run characters such as quotes and percent signs. However, in some cases, using backslash escapes may result in confusion or incorrect results, so the `NO_BACKSLASH_ESCAPES` mode can be used to prohibit this behavior. - - If you use the `NO_BACKSLASH_ESCAPES` mode, you must use other ways to escape special characters, which may make SQL statements more complex and difficult to understand. Therefore, it's necessary to consider when using this mode carefully. + MatrixOne currently supports only the `ONLY_FULL_GROUP_BY` mode. Other modes are syntax-only. `ONLY_FULL_GROUP_BY` is used to control the behavior of the GROUP BY statement. When `ONLY_FULL_GROUP_BY` mode is enabled, MatrixOne requires that the columns in the GROUP BY clause in the SELECT statement must be aggregate functions (such as SUM, COUNT, etc.) or columns that appear in the GROUP BY clause. If there are columns in the SELECT statement that do not meet this requirement, an error will be thrown. If your table structure is complex, you can choose to turn `ONLY_FULL_GROUP_BY` mode off for easy querying. -- `NO_DIR_IN_CREATE`: known as "no directory in create" mode in MatrixOne SQL mode, prohibits directory paths in `CREATE TABLE` statements. In the `NO_DIR_IN_CREATE` mode, MatrixOne will report an error when a directory path is used in the column definition of a `CREATE TABLE` statement, which includes a way that contains a file name. For example: +## View sql_mode - ```sql - CREATE TABLE my_table ( - id INT(11) NOT NULL AUTO_INCREMENT, - name VARCHAR(255) NOT NULL, - datafile '/var/lib/MatrixOne/my_table_data.dat', - PRIMARY KEY (id) - ); - ``` - - In the SQL statement above, the datafile column defines a path containing a file name, specifying the file storing table data. In the `NO_DIR_IN_CREATE` mode, MatrixOne does not allow the use of such directory paths in `CREATE TABLE` statements and requires that the file path and file name be defined separately, for example: - - ```sql - CREATE TABLE my_table ( - id INT(11) NOT NULL AUTO_INCREMENT, - name VARCHAR(255) NOT NULL, - datafile VARCHAR(255) NOT NULL, - PRIMARY KEY (id) - ) DATA DIRECTORY '/var/lib/MatrixOne/' INDEX DIRECTORY '/var/lib/MatrixOne/'; - ``` - - The data file column in the SQL statement above only defines the file name. In contrast, the file path is defined separately in the `DATA DIRECTORY` and `INDEX DIRECTORY` clauses of the `CREATE TABLE` statement. - - It should be noted that the `NO_DIR_IN_CREATE` mode does not affect column definitions in already created tables but only affects column definitions in `CREATE TABLE` statements. Therefore, when using this mode, you'll need careful consideration to ensure your SQL statements meet its requirements. - -- `NO_UNSIGNED_SUBTRACTION`: `NO_UNSIGNED_SUBTRACTION` is also also known as "no unsigned subtraction" mode in MatrixOne SQL mode, treats the result of the subtraction of unsigned integers with the subtraction operator (-) as a signed integer instead of an unsigned integer. This means that if the value of the unsigned integer is smaller than the subtrahend, the result will be a negative number instead of an unsigned integer. For example: - - ```sql - SET SQL_MODE = 'NO_UNSIGNED_SUBTRACTION'; - SELECT CAST(1 AS UNSIGNED) - CAST(2 AS UNSIGNED); - ``` - - In the SQL statement above, the `NO_UNSIGNED_SUBTRACTION` mode treats `CAST(1 AS UNSIGNED) - CAST(2 AS UNSIGNED)` as a signed integer operation, so the result is -1 instead of the result of an unsigned integer operation, which is 4294967295. - - It should be noted that the `NO_UNSIGNED_SUBTRACTION` mode only affects unsigned integers that are subtracted using the subtraction operator (-), and other operations that use unsigned integers are not affected. If you need to perform many unsigned integer operations in MatrixOne, using appropriate type conversions in your code is recommended to avoid potential errors. - -- `PAD_CHAR_TO_FULL_LENGTH`: `PAD_CHAR_TO_FULL_LENGTH` is called the "pad CHAR to full length" mode in MatrixOne SQL mode. - - In the `PAD_CHAR_TO_FULL_LENGTH` mode, when you define a column of `CHAR` type, MatrixOne pads the column's value with spaces to make its length equal to the length specified for the column. This is because in MatrixOne, a column of `CHAR` type always occupies the defined length when stored, and any shortfall is filled with spaces. However, by default, the character set used by MatrixOne may be a multi-byte character set, so if spaces are used for padding, it may lead to incorrect length calculation. - - In the `PAD_CHAR_TO_FULL_LENGTH` mode, MatrixOne uses the maximum character length of the character set to pad the column of `CHAR` type to ensure that the length it occupies matches the defined size. This can avoid the problem of length calculation errors when using multi-byte character sets, but it also increases the use of storage space. - - It should be noted that the `PAD_CHAR_TO_FULL_LENGTH` mode only affects columns of `CHAR` type and does not affect columns of other varieties. If you need to use `CHAR` type columns in MatrixOne and correctly calculate the length of column values in a multi-byte character set, you can consider using the `PAD_CHAR_TO_FULL_LENGTH` mode. - -- `PIPES_AS_CONCAT`: `PIPES_AS_CONCAT` is called the "pipes as concatenation" mode in MatrixOne SQL mode. In the `PIPES_AS_CONCAT` mode, MatrixOne treats the vertical bar symbol (|) as a string concatenation rather than a bitwise operator. If you use the standing bar symbol to concatenate two strings, MatrixOne will treat them as one string instead of interpreting them as a binary bit operation. - - For example, the following SQL statement will return an error in the default mode because MatrixOne treats the vertical bar symbol as a bitwise operator: - - ```sql - SELECT 'abc' | 'def'; - ``` - - However, if the SQL mode is set to `PIPES_AS_CONCAT`, the above SQL statement will return the string 'abcdef'. - - Note that if your SQL statement contains the vertical bar symbol and it should be treated as a bitwise operator, do not use the `PIPES_AS_CONCAT` mode. Conversely, if you need to treat the vertical bar symbol as a string concatenation operator, use the `PIPES_AS_CONCAT` mode. - -- `REAL_AS_FLOAT`: `REAL_AS_FLOAT` is known as "treat REAL type as FLOAT type" mode in MatrixOne SQL mode. - - In `REAL_AS_FLOAT` mode, MatrixOne treats data of the `REAL` type as data of the `FLOAT` type. This means that MatrixOne uses the storage format of the `FLOAT` type to store data of the `REAL` type, rather than the more precise but also more space-consuming `DOUBLE` type storage format. - - Note that since the storage format of `FLOAT` type data occupies less space than DOUBLE type data, treating data of the `REAL` type as data of the `FLOAT` type can save storage space in some cases. However, doing so will also reduce the precision of the data, as `FLOAT` type data can only provide about 7 significant digits of precision, while `DOUBLE` type data can provide about 15 significant digits of precision. - - If you need to store high-precision floating-point data in MatrixOne, it is recommended not to use the `REAL_AS_FLOAT` mode and use DOUBLE type data to store it. If you do not require high data precision, you may consider using the `REAL_AS_FLOAT` mode to save storage space. - -- `STRICT_ALL_TABLES`: `STRICT_ALL_TABLES` is known as "enable strict mode" mode in MatrixOne SQL mode. In `STRICT_ALL_TABLES` mode, MatrixOne enables a series of strict checks to ensure that insert, update, and delete operations comply with constraints such as data types, NULL values, and foreign keys. Specifically, `STRICT_ALL_TABLES` mode performs the following operations: - - a. Rejects illegal values from being inserted into any column. - b. Rejects NULL values from being inserted into non-NULL columns. - c. Rejects values outside the allowed range from being inserted into any column. - d. Rejects strings from being inserted into numeric type columns. - e. Rejects date or time strings from being inserted into non-date or time type columns. - f. Rejects values that exceed the length defined for `CHAR`, `VARCHAR`, and `TEXT` type columns from being inserted. - g. Rejects values with mismatched data types from being inserted into foreign key columns. - - Note that enabling strict mode may cause problems for some old applications as they may assume that MatrixOne does not perform mandatory constraint checks. If you encounter problems when updating or migrating applications, consider disabling strict mode or modifying the application to comply with strict mode requirements. - -- `TIME_TRUNCATE_FRACTIONAL`: `TIME_TRUNCATE_FRACTIONAL` is known as "truncate fractional part of time" mode in MatrixOne SQL mode. In `TIME_TRUNCATE_FRACTIONAL` mode, MatrixOne truncates the fractional part of data of the `TIME`, `DATETIME`, and `TIMESTAMP` types, retaining only the integer part. This means that if you insert time data with a fractional part into a column of the `TIME`,`DATETIME`, or `TIMESTAMP` type, MatrixOne will truncate the fractional part and set it to 0. - - Note that enabling the `TIME_TRUNCATE_FRACTIONAL` mode may cause some loss of data precision, as truncating the fractional part may lose some critical time information. If you need to store and manipulate accurate time data, it is recommended not to use the `TIME_TRUNCATE_FRACTIONAL` mode. - -- `TRADITIONAL`: `TRADITIONAL` is a type of schema in the MatrixOne SQL mode, also known as the "traditional" mode. In `TRADITIONAL` mode, MatrixOne enables a series of strict checks to ensure that insert, update, and delete operations conform to SQL standard constraints. Specifically, the `TRADITIONA`L mode performs the following operations: - - a. Enables `STRICT_TRANS_TABLES` and `STRICT_ALL_TABLES` modes. - b. Rejects `INSERT` statements that omit column names to ensure all columns are explicitly assigned values. - c. Rejects inserting values with unclear data types into foreign key columns. - d. Rejects inserting strings into numeric columns. - e. Rejects inserting date or time strings into non-date or time-type columns. - f. Rejects inserting values that exceed the defined length of `CHAR`, `VARCHAR`, and `TEXT` columns. - g. Rejects using non-aggregate columns in the `GROUP BY` clause. - h. Rejects using non-listed non-aggregate columns in the `SELECT` statement. - i. It should be noted that enabling traditional mode may cause issues with some older applications that assume MatrixOne will not perform mandatory constraint checks. If you encounter problems when updating or migrating applications, consider the traditional disabling mode or modifying the applications to comply with traditional mode requirements. - -## How to set sql_mode - -The sql_mode can be set using the `SET` statement, for example: +View sql_mode in MatrixOne using the following command: ```sql -SET sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,ONLY_FULL_GROUP_BY'; +SELECT @@global.sql_mode; --global mode +SELECT @@session.sql_mode; --session mode ``` -The sql_mode can also be set in the configuration file of MatrixOne, for example: +## Set sql_mode + +Set sql_mode in MatrixOne using the following command: ```sql -sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,ONLY_FULL_GROUP_BY; +set global sql_mode = 'xxx' -- global mode, reconnecting to database takes effect +set session sql_mode = 'xxx' -- session mode ``` -In the example settings above, MatrixOne will use the `STRICT_TRANS_TABLES`, `NO_ZERO_IN_DATE`, and `ONLY_FULL_GROUP_BY` modes. +## Examples -## Constraints - -MatrixOne is compatible with MySQL, except the `ONLY_FULL_GROUP_BY` mode; other modes of sql_mode only implement syntax support. +```sql +CREATE TABLE student( +id int, +name char(20), +age int, +nation char(20) +); + +INSERT INTO student values(1,'tom',18,'上海'),(2,'jan',19,'上海'),(3,'jen',20,'北京'),(4,'bob',20,'北京'),(5,'tim',20,'广州'); + +mysql> select * from student group by nation;--This operation is not supported in `ONLY_FULL_GROUP_BY` mode +ERROR 1149 (HY000): SQL syntax error: column "student.id" must appear in the GROUP BY clause or be used in an aggregate function + +mysql> SET session sql_mode='ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,NO_ZERO_IN_DATE,STRICT_TRANS_TAB +LES';--Turns off ONLY_FULL_GROUP_BY mode for the current session +Query OK, 0 rows affected (0.02 sec) + +mysql> select * from student group by nation;--Turns off `ONLY_FULL_GROUP_BY` mode immediately in the current session ++------+------+------+--------+ +| id | name | age | nation | ++------+------+------+--------+ +| 1 | tom | 18 | 上海 | +| 3 | jen | 20 | 北京 | +| 5 | tim | 20 | 广州 | ++------+------+------+--------+ +3 rows in set (0.00 sec) + +mysql> SET global sql_mode='ONLY_FULL_GROUP_BY';--Set the global ONLY_FULL_GROUP_BY mode on. +Query OK, 0 rows affected (0.02 sec) + +mysql> select * from student group by nation;--ONLY_FULL_GROUP_BY mode does not take effect, because you need to reconnect to the database for global mode to take effect. ++------+------+------+--------+ +| id | name | age | nation | ++------+------+------+--------+ +| 1 | tom | 18 | 上海 | +| 3 | jen | 20 | 北京 | +| 5 | tim | 20 | 广州 | ++------+------+------+--------+ +3 rows in set (0.00 sec) + +mysql> exit --Exit the current session + +mysql> select * from student group by nation;--After reconnecting the database and executing the query, ONLY_FULL_GROUP_BY mode is successfully enabled. +ERROR 1149 (HY000): SQL syntax error: column "student.id" must appear in the GROUP BY clause or be used in an aggregate function +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/mo-tools/mo_ctl.md b/docs/MatrixOne/Reference/mo-tools/mo_ctl.md new file mode 100644 index 000000000..ac3150ec9 --- /dev/null +++ b/docs/MatrixOne/Reference/mo-tools/mo_ctl.md @@ -0,0 +1,308 @@ +# mo_ctl Distributed Edition Tool Guide + +`mo_ctl` Distributed Edition is a command-line tool for business users that assists them in deploying MatrixOne distributed clusters, installing related components, and ultimately providing MatrixOne services to users + +!!! note + mo_ctl Distributed Edition is an efficient database cluster management tool designed for enterprise users. To get a download path for the tool, contact your MatrixOne account manager. + +## Overview of features + +`mo_ctl` has now been adapted to the operating system as shown in the following table: + +| Operating System | Version | +| ------------- | ----------| +| Debian | 11 及以上 | +| Ubuntu | 20.04 及以上 | +| UOS | 20.0.0 | +| Open EulerOS | 20.3.0 | + +`mo_ctl`'s current list of features is shown in the following table: + +| Command | Features | +| ------------------ | -------------------------- | +| `mo_ctl help` | See a list of statements and functions for the `mo_ctl` tool itself. | +| `mo_ctl precheck` | Check the dependencies needed to install the cluster, e.g. CPU, memory, etc. | +| `mo_ctl install` | Create the cluster, install the appropriate plug-ins, and initialize the matrixone cluster according to the configuration file. | +| `mo_ctl registry` | Operate on highly available mirror repositories created in the cluster, e.g., add, delete, change, and check mirrors. | +| `mo_ctl node` | Manage nodes in the cluster, add nodes, delete nodes, etc. | +| `mo_ctl matrixone` | Manages matrixone clusters in a cluster, creating, starting, stopping, and deleting them. | +| `mo_ctl s3` | Manage distributed minio in the cluster, check status, expand capacity, and more. | +| `mo_ctl backup` | Perform backups and restores of matrixone clusters in the cluster. | +| `mo_ctl destroy` | Destroy the matrixone service and wipe the cluster. | + +## Get started quickly + +1. Use the command `mo_ctl help` to view the tool guide. + +2. Use the command `mo_ctl precheck` to see if the predependencies are met. + + ``` + mo_ctl precheck --config xxx.yaml + ``` + +3. Deploy the MatrixOne cluster with the command `mo_ctl install`: + + ``` + mo_ctl install --config xxx.yaml + ``` + +4. Use the command `mo_ctl matrixone list` to check the status of MatrixOne. + + ``` + mo_ctl matrixone list --type cluster + ``` + +## Reference Command Guide + +### help + +Use `mo_ctl help` to print the reference guide. + +``` +./mo_ctl help +Install, destroy, and operation matrixone cluster + +Usage: + mo_ctl [command] + +Available Commands: + backup backup matrixone cluster + completion Generate the autocompletion script for the specified shell + destroy destroy k8s cluster and apps on it + help Help about any command + install Install k8s, matrixone, minio, and other apps + matrixone matrixone operation cmd + precheck precheck cluster machine environment before install + registry registry operations + +Flags: + --config string Specify the mo_ctl config file + -d, --debug turn on debug mode + -h, --help help for mo_ctl + --kubeconfig string Path to the kubeconfig file to use (default "/root/.kube/config") + --logfile string Specify the log file + +Use "mo_ctl [command] --help" for more information about a command. +``` + +### precheck + +Use `mo_ctl precheck` to precheck whether the hardware and software environments are suitable for installing MatrixOne. + +``` +./mo_ctl precheck --help +precheck cluster machine environment before install + +Usage: + mo_ctl precheck [flags] + +Flags: + -h, --help help for precheck +``` + +### install + +Use `mo_ctl install` to install k8s, matrixone, minio, and other applications on your computer (machine or virtual machine). You will need to contact your account manager for the download path to the mirror package before executing this command. + +- clusterimage.tar: for those who need to use `mo_ctl` to create a cluster and install related components with the base k8s component and the corresponding app component for matrixone +- moappdistro.tar: for existing k8s clusters, requires component management using `mo_ctl` with matrixone and corresponding components + +``` +./mo_ctl install --help +Install k8s, matrixone, minio, and other apps + +Usage: + mo_ctl install [flags] + +Flags: + --app only install k8s app + --dry-run dry run + -h, --help help for install +``` + +### destory + +Use `mo_ctl destroy` to destroy the k8s cluster and the applications on it. + +``` +./mo_ctl destroy --help +destroy k8s cluster and apps on it + +Usage: + mo_ctl destroy [flags] + +Flags: + --configmap get clusterfile from k8s configmap + --dry-run dry run + --force force destroy, no notice + -h, --help help for destroy +``` + +### register + +Use `mo_ctl register` to manipulate the highly available mirror repository created in the cluster, for example: add, delete, and lookup mirrors. + +``` + mo_ctl registry --help +Usage: + mo_ctl registry [flags] + mo_ctl registry [command] + +Aliases: + registry, reg + +Available Commands: + delete delete (image) + list list (image | chart) + push push (image | chart) + +Flags: + -h, --help help for registry + --type string registry type (image | chart) (default "image") +``` + +### backup + + Use `mo_ctl backup` to backup, restore, and more to a matrixone cluster in a cluster + +``` + ./mo_ctl backup --help +backup matrixone cluster + +Usage: + mo_ctl backup [flags] + mo_ctl backup [command] + +Available Commands: + list list matrixone cluster backup revison + restore restore backup matrixone cluster + start start backup matrixone cluster + +Flags: + -h, --help help for backup +``` + +- **start** + + 1. First you need to prepare a yaml file that describes the backup job, where the generated yaml name is preset as backup.yaml. + + ``` + apiVersion: core.matrixorigin.io/v1alpha1 + kind: BackupJob + metadata: + # Specify the name of the job here + name: backupjob + # This specifies the namespace to which the job belongs + # 注意:此处要与需要备份的 mo 集群处于同一个 namespace + namespace: mocluster1 + spec: + source: + # The name of the mo cluster, available via the mo_ctl matrixone list command. + clusterRef: mocluster-mocluster1 + # Configure the backup storage location, either object storage or local path storage. For details, refer to https://github.com/matrixorigin/matrixone-operator/blob/main/docs/reference/api-reference.md#backupjob + target: + s3: + type: minio + endpoint: http://minio.s3-minio-tenant-test1 + path: mo-test/backup-01 + secretRef: + name: minio + ``` + + 2. Create a backup job for the backup operation with the following command + + ``` + # An exit code of 0 proves that the backup job was created successfully + sudo ./mo_ctl backup start --values backup.yaml + ``` + + 3. After successful creation, you can wait for the backup to complete with the following command + + ``` + # The backupjob here is the name defined in step one + sudo kubectl wait --for=condition=ended backupjob --all -A --timeout=5m + ``` + +- **restore** + + 1. Gets the name (ID) of the backup job, which can be obtained by + + ``` + sudo ./mo_ctl backup list + ``` + + 2. First you need to prepare a yaml file that describes the restore job, where the generated yaml name is preset as restore.yaml. + + ``` + # In addition to restoreFrom, other fields can be found at https://github.com/matrixorigin/matrixone-operator/blob/main/docs/reference/api-reference.md#matrixonecluster + apiVersion: core.matrixorigin.io/v1alpha1 + kind: MatrixOneCluster + metadata: + name: morestore + namespace: mocluster1 + spec: + # Here you need to fill in the name of the backup job you got in step 1 + restoreFrom: #BackupName + # Here you need to fill in the actual mirror repository information + imageRepository: sea.hub:5000/matrixorigin/matrixone + version: 1.1.0 + logService: + replicas: 1 + sharedStorage: + # Here you need to fill in the actual object storage information + s3: + type: minio + path: mo-test/backup-01 + endpoint: http://minio.s3-minio-tenant-test1 + secretRef: + name: minio + volume: + size: 10Gi + tn: + replicas: 1 + cacheVolume: + size: 10Gi + cnGroups: + - name: tp + replicas: 1 + cacheVolume: + size: 10Gi + ``` + + 3. Perform backup restore commands + + ``` + sudo ./mo_ctl backup restore --values restore.yaml + ``` + +### matrixone + +Use `mo_ctl matrixone` to manage matrixone clusters in a cluster, create, start, stop, delete, and more + +``` +./mo_ctl matrixone --help +Used for matrixone operation cmd + +Usage: + mo_ctl matrixone [flags] + mo_ctl matrixone [command] + +Aliases: + matrixone, mo + +Available Commands: + history history all matrixone (cluster | operator) + list list matrixone (cluster | operator) + remove remove matrixone (cluster) + rollback rollback depoly of matrixone (cluster | operator) + setup setup matrixone (cluster) + start start matrixone (cluster) + stop stop matrixone (cluster) + upgrade upgrade matrixone (cluster | operator) + +Flags: + --dry-run dry run + -h, --help help for matrixone + --name string Specify matrixorigin cluster name + --type string Specify a type (cluster | operator) (default "cluster") +``` diff --git a/docs/MatrixOne/Reference/mo-tools/mo_ctl_standalone.md b/docs/MatrixOne/Reference/mo-tools/mo_ctl_standalone.md new file mode 100644 index 000000000..5ceb63e97 --- /dev/null +++ b/docs/MatrixOne/Reference/mo-tools/mo_ctl_standalone.md @@ -0,0 +1,372 @@ +# mo_ctl Standalone Tools Guide + +`mo_ctl` Standalone is a command-line tool that helps you with deployment installation, start-stop control, and database connectivity for Standalone MatrixOne. + +## Overview of features + +`mo_ctl`'s currently adapted operating systems are shown in the following table: + +| Operating System | Version | +| -------- | -------------------- | +| Debian | 11 and above | +| Ubuntu | 20.04 and above | +| macOS | Monterey 12.3 and above | +|OpenCloudOS| v8.0 / v9.0 | +|Open EulerOS | 20.03 | +|TencentOS Server | v2.4 / v3.1 | +|UOS | V20 | +|KylinOS | V10 | +|KylinSEC | v3.0 | + +`mo_ctl`'s current list of features is shown in the following table. + +| commandS | clarification | +| -------------------- | ---------------------------------| +| `mo_ctl help` | View a list of statements and functions for the `mo_ctl` tool itself | +| `mo_ctl precheck` | Check the dependencies required for MatrixOne source code installation, which are golang, gcc, git, MySQL Client. | +| `mo_ctl deploy` | Download, install and compile the appropriate version of MatrixOne, the latest stable version is installed by default. | +| `mo_ctl start` | Starting the MatrixOne Service | +| `mo_ctl status` | Check if the MatrixOne service is running | +| `mo_ctl stop` | Stop all MatrixOne service processes | +| `mo_ctl restart` | Restarting the MatrixOne Service | +| `mo_ctl connect` | Calling MySQL Client to Connect to MatrixOne Service | +| `mo_ctl upgrade` | Upgrading/downgrading MatrixOne from the current version to a release or commit id version | +| `mo_ctl set_conf` | Setting the parameters for each type of use | +| `mo_ctl get_conf` | View currently used parameters | +| `mo_ctl uninstall` | Uninstall MatrixOne from MO_PATH path | +| `mo_ctl watchdog` | Set up a timed task to ensure MatrixOne service availability, checking the status of MatrixOne every minute and automatically pulling up the service if it is found to be aborted | +| `mo_ctl sql` | Execute SQL or a text file consisting of SQL directly by command. | +| `mo_ctl ddl_convert` | Tools for Converting MySQL DDL Statements to MatrixOne Statements | +| `mo_ctl get_cid` | View the current version of the source code that was downloaded from the repository using MatrixOne | +| `mo_ctl get_branch` | View the current branch version of a repository downloaded using MatrixOne | +| `mo_ctl pprof` | Used to collect MatrixOne performance analysis data | + +## Install mo_ctl + +Depending on whether you have Internet access, you can choose to install the `mo_ctl` tool online or offline. You need to be careful to always execute commands as root or with sudo privileges (and add sudo before each command). Meanwhile, `install.sh` will use the `unzip` command to extract `the mo_ctl` package. Make sure the `unzip` command is installed. + +### Install online + +``` +wget https://raw.githubusercontent.com/matrixorigin/mo_ctl_standalone/main/install.sh && sudo bash +x ./install.sh + +# Alternate address +wget https://ghproxy.com/https://github.com/matrixorigin/mo_ctl_standalone/blob/main/install.sh && sudo bash +x install.sh +``` + +For users running this command in a macOS environment, if you are a non-root user, run `install.sh` with the following statement: + +``` +sudo -u $(whoami) bash +x ./install.sh +``` + +### Offline installation + +``` +# 1. Download the installation script to your local computer before uploading it to the installation machine +wget https://raw.githubusercontent.com/matrixorigin/mo_ctl_standalone/main/install.sh +wget https://github.com/matrixorigin/mo_ctl_standalone/archive/refs/heads/main.zip -O mo_ctl.zip + +# If the original github address downloads too slowly, you can try downloading from the following mirror address: +wget https://mirror.ghproxy.com/https://github.com/matrixorigin/mo_ctl_standalone/blob/main/install.sh +wget https://githubfast.com/matrixorigin/mo_ctl_standalone/archive/refs/heads/main.zip -O mo_ctl.zip + +# 2. Install from offline package +bash +x ./install.sh mo_ctl.zip +``` + +## Get started quickly + +The Deployment Standalone Edition of MatrixOne can be quickly installed by following these steps, and the detailed guide provides a view of the [Standalone Deployment MatrixOne](../Get-Started/install-standalone-matrixone.md). + +1. Use the command `mo_ctl help` to view the tool guide. + +2. Use the command `mo_ctl precheck` to see if the predependencies are met. + +3. Use the command `mo_ctl get_conf` to set the relevant parameters, which may be configured as follows: + + ``` + mo_ctl set_conf MO_PATH="/data/mo/matrixone" #Set custom MatrixOne download path + mo_ctl set_conf MO_GIT_URL="https://githubfast.com/matrixorigin/matrixone.git" #Set mirror download address for slow download of github original address + ``` + +4. Install and deploy the latest stable version of MatrixOne using the command `mo_ctl deploy`. + +5. Start the MatrixOne service with the command `mo_ctl start`. + +6. Use the command `mo_ctl connect` to connect to the MatrixOne service. + +## Reference Command Guide + +### help - Print Reference Guide + +``` +mo_ctl help +Usage : mo_ctl [option_1] [option_2] + + [option_1] : available: connect | ddl_connect | deploy | get_branch | get_cid | get_conf | help | pprof | precheck | query | restart | set_conf | sql | start | status | stop | uninstall | upgrade | watchdog + 1) connect : connect to mo via mysql client using connection info configured + 2) ddl_convert : convert ddl file to mo format from other types of database + 3) deploy : deploy mo onto the path configured + 4) get_branch : upgrade or downgrade mo from current version to a target commit id or stable version + 5) get_cid : print mo git commit id from the path configured + 6) get_conf : get configurations + 7) help : print help information + 8) pprof : collect pprof information + 9) precheck : check pre-requisites for mo_ctl + 10) restart : a combination operation of stop and start + 11) set_conf : set configurations + 12) sql : execute sql from string, or a file or a path containg multiple files + 13) start : start mo-service from the path configured + 14) status : check if there's any mo process running on this machine + 15) stop : stop all mo-service processes found on this machine + 16) uninstall : uninstall mo from path MO_PATH=/data/mo/20230712_1228//matrixone + 17) upgrade : upgrade or downgrade mo from current version to a target commit id or stable version + 18) watchdog : setup a watchdog crontab task for mo-service to keep it alive + e.g. : mo_ctl status + + [option_2] : Use " mo_ctl [option_1] help " to get more info + e.g. : mo_ctl deploy help +``` + +Use `mo_ctl [option_1] help` to get a guide to the next level of `mo_ctl [option_1]` functionality. + +### precheck - check for predependencies + +Use `mo_ctl precheck` to check for pre-dependencies before installing MatrixOne in source code, which currently depends on `go`/`gcc`/`git`/`mysql (client)`. + +``` +mo_ctl precheck help +Usage : mo_ctl precheck # check pre-requisites for mo_ctl + Check list : go gcc git mysql +``` + +### deploy - Install MatrixOne + +Use `mo_ctl deploy [mo_version] [force]` to install the deployment stable version of MatrixOne, or a specified version. The `force` option allows you to remove an existing version of MatrixOne in the same directory and force a new installation. + +``` +mo_ctl deploy help +Usage : mo_ctl deploy [mo_version] [force] # deploy mo onto the path configured + [mo_version]: optional, specify an mo version to deploy + [force] : optional, if specified will delete all content under MO_PATH and deploy from beginning + e.g. : mo_ctl deploy # default, same as mo_ctl deploy 1.2.2 + : mo_ctl deploy main # deploy development latest version + : mo_ctl deploy d29764a # deploy development version d29764a + : mo_ctl deploy 1.2.2 # deploy stable verson 1.2.2 + : mo_ctl deploy force # delete all under MO_PATH and deploy verson 1.2.2 + : mo_ctl deploy 1.2.2 force # delete all under MO_PATH and deploy stable verson 1.2.2 from beginning +``` + +### start - Starts the MatrixOne service + +Start the MatrixOne service with `mo_ctl start`, with the startup file path under `MO_PATH`. + +``` +mo_ctl start help +Usage : mo_ctl start # start mo-service from the path configured +``` + +### stop - stop the MatrixOne service + +Use `mo_ctl stop [force]` to stop all MatrixOne services on this machine, or all if there are multiple MatrixOne services running. + +``` + mo_ctl stop help +Usage : mo_ctl stop [force] # stop all mo-service processes found on this machine + [force] : optional, if specified, will try to kill mo-services with -9 option, so be very carefully + e.g. : mo_ctl stop # default, stop all mo-service processes found on this machine + : mo_ctl stop force # stop all mo-services with kill -9 command +``` + +### restart - restart the MatrixOne service + +Use `mo_ctl restart [force]` to stop all MatrixOne services on this machine and restart the MatrixOne service located under the `MO_PATH` path. + +``` +mo_ctl restart help +Usage : mo_ctl restart [force] # a combination operation of stop and start + [force] : optional, if specified, will try to kill mo-services with -9 option, so be very carefully + e.g. : mo_ctl restart # default, stop all mo-service processes found on this machine and start mo-serivce under path of conf MO_PATH + : mo_ctl restart force # stop all mo-services with kill -9 command and start mo-serivce under path of conf MO_PATH +``` + +### connect - connect to the MatrixOne service via mysql-client + +Use `mo_ctl connect` to connect to the MatrixOne service, the connection parameters are set in the `mo_ctl` tool. + +``` +mo_ctl connect help +Usage : mo_ctl connect # connect to mo via mysql client using connection info configured +``` + +### status - Checks the status of MatrixOne + +Use `mo_ctl status` to check whether MatrixOne is running or not. + +``` +mo_ctl status help +Usage : mo_ctl status # check if there's any mo process running on this machine +``` + +### get_cid - Print MatrixOne code submission id + +Use `mo_ctl get_cid` to print the MatrixOne codebase commit id under the current `MO_PATH` path. + +``` +mo_ctl get_cid help +Usage : mo_ctl get_cid # print mo commit id from the path configured +``` + +### get_branch - print MatrixOne code commit id + +Use `mo_ctl get_branch` to print the MatrixOne codebase branch under the current `MO_PATH` path. + +``` +mo_ctl get_branch help +Usage : mo_ctl get_branch # print which git branch mo is currently on +``` + +### pprof - Collect performance information + +Use `mo_ctl pprof [item]` \[duration] to gather performance information about MatrixOne, primarily for debugging use by developers. + +``` +mo_ctl pprof help +Usage : mo_ctl pprof [item] [duration] # collect pprof information + [item] : optional, specify what pprof to collect, available: profile | heap | allocs + 1) profile : default, collect profile pprof for 30 seconds + 2) heap : collect heap pprof at current moment + 3) allocs : collect allocs pprof at current moment + [duration] : optional, only valid when [item]=profile, specifiy duration to collect profile + e.g. : mo_ctl pprof + : mo_ctl pprof profile # collect duration will use conf value PPROF_PROFILE_DURATION from conf file or 30 if it's not set + : mo_ctl pprof profile 30 + : mo_ctl pprof heap +``` + +### set_conf - configuration parameters + +Use `mo_ctl set_conf [conf_list]` to configure 1 or more usage parameters. + +``` +mo_ctl set_conf help +Usage : mo_ctl setconf [conf_list] # set configurations + [conf_list] : configuration list in key=value format, seperated by comma + e.g. : mo_ctl setconf MO_PATH=/data/mo/matrixone,MO_PW=M@trix0riginR0cks,MO_PORT=6101 # set multiple configurations + : mo_ctl setconf MO_PATH=/data/mo/matrixone # set single configuration +``` + +!!! note + When the path to set_conf's settings contains variables such as `${MO_PATH}`, `$` needs to be preceded by `\`, for example: + + ```bash + mo_ctl set_conf MO_CONF_FILE="\${MO_PATH}/matrixone/etc/launch/launch.toml" + ``` + +### get_conf - Get the list of parameters + +Use `mo_ctl get_conf [conf_list]` to get one or more current configuration items. + +``` +mo_ctl get_conf help +Usage : mo_ctl getconf [conf_list] # get configurations + [conf_list] : optional, configuration list in key, seperated by comma. + : use 'all' or leave it as blank to print all configurations + e.g. : mo_ctl getconf MO_PATH,MO_PW,MO_PORT # get multiple configurations + : mo_ctl getconf MO_PATH # get single configuration + : mo_ctl getconf all # get all configurations + : mo_ctl getconf # get all configurations +``` + +#### mo_ctl get_conf - list of detailed parameters + +Using `mo_ctl get_conf` will print a list of all parameters used by the current tool, their interpretations and ranges of values are shown in the following table. + +| Parameter name | features | Value specification | +| ---------------------- | ------------------------ | -------------------------| +| MO_PATH | MatrixOne's codebase and executables are located at | Folder Path | +| MO_LOG_PATH | Where MatrixOne's logs are stored | Folder path, default is ${MO_PATH}/matrixone/logs | +| MO_HOST | IP address to which the MatrixOne service is connected | IP address, default is 127.0.0.1 | +| MO_PORT | Port number to which the MatrixOne service is connected | Port number, default is 6001 | +| MO_USER | User name for connecting to the MatrixOne service | Username, default is root | +| MO_PW | Password for connecting to the MatrixOne service | Password, default is 111 | +| CHECK_LIST | precheck Required check dependencies | The default is ("go" "gcc" "git" "mysql"). | +| GCC_VERSION | The version of gcc that precheck checks |default 8.5.0 | +| GO_VERSION | The go version of the precheck check |default 1.22.3 | +| MO_GIT_URL | MatrixOne source code pulling address | default | +| MO_DEFAULT_VERSION | The version of MatrixOne that is pulled by default | default 1.2.2 | +| GOPROXY | GOPROXY address, generally used for domestic accelerated pull golang dependencies | default ,direct | +| STOP_INTERVAL | Stop interval, wait time to detect service status after stopping service | default 5 seconds | +| START_INTERVAL | Startup interval, wait time to detect service status after starting the service | default 2 seconds | +| MO_DEBUG_PORT | MatrixOne's debug port, typically used by developers. | default 9876 | +| MO_CONF_FILE | MatrixOne startup configuration file |default ${MO_PATH}/matrixone/etc/launch/launch.toml | +| RESTART_INTERVAL | Restart interval, wait time to detect service status after restarting the service | default 2 seconds | +| PPROF_OUT_PATH | golang's performance collection data output path | default /tmp/pprof-test/ | +| PPROF_PROFILE_DURATION | Performance collection time for golang | default 30 seconds | + +### ddl_convert - DDL format conversion + +Use `mo_ctl ddl_convert [options] [src_file] [tgt_file]` to convert a DDL file from another database syntax format to MatrixOne's DDL format, currently supported only in `mysql_to_mo` mode. + +``` +mo_ctl ddl_convert help +Usage : mo_ctl ddl_convert [options] [src_file] [tgt_file] # convert a ddl file to mo format from other types of database + [options] : available: mysql_to_mo + [src_file] : source file to be converted, will use env DDL_SRC_FILE from conf file by default + [tgt_file] : target file of converted output, will use env DDL_TGT_FILE from conf file by default + e.g. : mo_ctl ddl_convert mysql_to_mo /tmp/mysql.sql /tmp/mo.sql +``` + +### sql - Execute SQL + +Use `mo_ctl sql [sql]` to execute SQL text or SQL files. + +``` +mo_ctl sql help +Usage : mo_ctl sql [sql] # execute sql from string, or a file or a path containg multiple files + [sql] : a string quote by "", or a file, or a path + e.g. : mo_ctl sql "use test;select 1;" # execute sql "use test;select 1" + : mo_ctl sql /data/q1.sql # execute sql in file /data/q1.sql + : mo_ctl sql /data/ # execute all sql files with .sql postfix in /data/ +``` + +### uninstall - Uninstall MatrixOne + +Use `mo_ctl uninstall` to uninstall MatrixOne from MO_PATH. + +``` +mo_ctl uninstall help +Usage : mo_ctl uninstall # uninstall mo from path MO_PATH=/data/mo//matrixone + # note: you will need to input 'Yes/No' to confirm before uninstalling +``` + +### upgrade - upgrade/downgrade MatrixOne version + +MatrixOne 0.8 and later can use `mo_ctl upgrade version` or `mo_ctl upgrade commitid` to upgrade or downgrade MatrixOne from the current version to a stable version or a commit id version. + +``` +mo_ctl upgrade help +Usage : mo_ctl upgrade [version_commitid] # upgrade or downgrade mo from current version to a target commit id or stable version + [commitid] : a commit id such as '38888f7', or a stable version such as '1.2.2' + : use 'latest' to upgrade to latest commit on main branch if you don't know the id + e.g. : mo_ctl upgrade 38888f7 # upgrade/downgrade to commit id 38888f7 on main branch + : mo_ctl upgrade latest # upgrade/downgrade to latest commit on main branch + : mo_ctl upgrade 1.2.2 # upgrade/downgrade to stable version 1.2.2 +``` + +### watchdog - Keep MatrixOne alive + +Use `mo_ctl watchdog [options]` to set a scheduled task to guarantee MatrixOne service availability, check the status of MatrixOne every minute, and automatically pull up the service if it is found to be aborted. + +``` +mo_ctl watchdog help +Usage : mo_ctl watchdog [options] # setup a watchdog crontab task for mo-service to keep it alive + [options] : available: enable | disable | status + e.g. : mo_ctl watchdog enable # enable watchdog service for mo, by default it will check if mo-servie is alive and pull it up if it's dead every one minute + : mo_ctl watchdog disable # disable watchdog + : mo_ctl watchdog status # check if watchdog is enabled or disabled + : mo_ctl watchdog # same as mo_ctl watchdog status +``` + + diff --git a/docs/MatrixOne/Reference/mo-tools/mo_datax_writer.md b/docs/MatrixOne/Reference/mo-tools/mo_datax_writer.md new file mode 100644 index 000000000..e8f7386a5 --- /dev/null +++ b/docs/MatrixOne/Reference/mo-tools/mo_datax_writer.md @@ -0,0 +1,202 @@ +# mo_datax_writer Tool Guide + +`mo_datax_writer` is a tool to help you migrate data from mysql to matrixone. + +!!! Note + The `mo_datax_writer` tool is currently only supported for deployment on Linux system x86 architectures. + +## Pre-dependency + +- Finished [installing and starting](../../Get-Started/install-standalone-matrixone.md) MatrixOne +- Download [DataX Tools](https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz) +- Download and install [MySQL]() +- Finished installing [Python 3.8 (or plus)](https://www.python.org/downloads/) +- Installed wget +- Set environment encoding to UTF-8 + +## Install mo_datax_writer + +```bash +wget https://github.com/matrixorigin/mo_datax_writer/archive/refs/tags/v1.0.1.zip +unzip v1.0.1.zip +cd mo_datax_writer-1.0.1/ +#Extract mo_datax_writer into the datax/plugin/writer/ directory +unzip matrixonewriter.zip -d ../datax/plugin/writer/ +``` + +## Initialize the MatrixOne data table + +### Creating a database + +```sql +create database test; +``` + +### Creating a table + +```sql +use test; + +CREATE TABLE `user` ( +`name` VARCHAR(255) DEFAULT null, +`age` INT DEFAULT null, +`city` VARCHAR(255) DEFAULT null +); +``` + +## Initialize MySQL data table + +### Creating a database + +```SQL +create database test; +``` + +### Creating a table + +```sql +use test; + +CREATE TABLE `user` ( + `name` varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL, + `age` int DEFAULT NULL, + `city` varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci; +``` + +### Import Data + +```sql +insert into user values('zhangsan',26,'Shanghai'),('lisi',24,'Chengdu'),('wangwu',28,'Xian'),('zhaoliu',22,'Beijing'),('tianqi',26,'Shenzhen'); + +mysql> select * from user; ++----------+------+----------+ +| name | age | city | ++----------+------+----------+ +| zhangsan | 26 | Shanghai | +| lisi | 24 | Chengdu | +| wangwu | 28 | Xian | +| zhaoliu | 22 | Beijing | +| tianqi | 26 | Shenzhen | ++----------+------+----------+ +5 rows in set (0.00 sec) +``` + +## Importing data using DataX + +### Writing configuration files + +Add the datax configuration file **mysql2mo.json** to the datax/job directory as follows: + +```json +{ + "job": { + "setting": { + "speed": { + "channel": 1 + }, + "errorLimit": { + "record": 0, + "percentage": 0 + } + }, + "content": [ + { + "reader": { + "name": "mysqlreader", + "parameter": { + // MySQL Database User Name + "username": "root", + // MySQL Database Password + "password": "111", + // Column Names for MySQL Data Table Reads + "column": ["name","age","city"], + "splitPk": "", + "connection": [ + { + // MySQL Data Tables + "table": ["user"], + // MySQL Connection Information + "jdbcUrl": [ + "jdbc:mysql://127.0.0.1:3306/test?useSSL=false" + ] + } + ] + } + }, + "writer": { + "name": "matrixonewriter", + "parameter": { + // Database User Name + "username": "root", + // Database Password + "password": "111", + // Column names of tables to be imported + "column": ["name","age","city"], + // SQL statements that need to be executed before the import task starts + "preSql": [], + // SQL statement to execute after the import task is complete + "postSql": [], + // Batch write count, i.e., how many pieces of data to read and then execute load data inline import task + "maxBatchRows": 60000, + // Batch write size, i.e. how much data to read and then perform load data inline import task + "maxBatchSize": 5242880, + // Import task execution interval, i.e. after how long the load data inline import task is executed + "flushInterval": 300000, + "connection": [ + { + // Database Connection Information + "jdbcUrl": "jdbc:mysql://127.0.0.1:6001/test?useUnicode=true&useSSL=false", + // database name + "database": "test", + // database table + "table": ["user"] + } + ] + } + } + } + ] + } +} +``` + +### Perform DataX tasks + +Go to the datax installation directory and execute the following command + +```bash +python bin/datax.py job/mysql2mo.json +``` + +When the execution is complete, the output is as follows: + +```bash +2024-06-06 06:26:52.145 [job-0] INFO StandAloneJobContainerCommunicator - Total 5 records, 75 bytes | Speed 7B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.012s | Percentage 100.00% +2024-06-06 06:26:52.147 [job-0] INFO JobContainer - +任务启动时刻 : 2024-06-06 14:26:41 +任务结束时刻 : 2024-06-06 14:26:52 +任务总计耗时 : 10s +任务平均流量 : 7B/s +记录写入速度 : 0rec/s +读出记录总数 : 5 +读写失败总数 : 0 +``` + +### View Results + +View the results in the MatrixOne database and see that the data has been synchronized from MySQL into MatrixOne + +```sql +mysql> select * from user; ++----------+------+-----------+ +| name | age | city | ++----------+------+-----------+ +| zhangsan | 26 | Shanghai | +| lisi | 24 | Chengdu | +| wangwu | 28 | Xian | +| zhaoliu | 22 | Beijing | +| tianqi | 26 | Shenzhen | ++----------+------+-----------+ +5 rows in set (0.01 sec) +``` \ No newline at end of file diff --git a/docs/MatrixOne/Reference/mo-tools/mo_ssb_open.md b/docs/MatrixOne/Reference/mo-tools/mo_ssb_open.md new file mode 100644 index 000000000..b8a086b87 --- /dev/null +++ b/docs/MatrixOne/Reference/mo-tools/mo_ssb_open.md @@ -0,0 +1,166 @@ +# mo_ssb_open Tool Guide + +`mo_ssb_open` is a tool that implements SSB testing for MatrixOne. + +!!! Note + The `mo_ssb_open` tool is currently only supported for deployment on Linux system x86 architectures. + +## Pre-dependency + +- Finished [installing and starting](../../Get-Started/install-standalone-matrixone.md) MatrixOne +- Environment encoding set to UTF-8 +- Installed wget +- The bc command is installed + +## Install mo_ssb_open + +```bash +wget https://github.com/matrixorigin/mo_ssb_open/archive/refs/tags/v1.0.1.zip unzip v1.0.1.zip +``` + +## Generate Data Set + +```bash +cd mo_ssb_open-1.0.1 ./bin/gen-ssb-data.sh -s 1 -c 5 +``` + +**-s**: means generating a dataset of about 1GB, no parameters specified, 100G data generated by default, + +**-c**: Indicates the number of threads generating lineorder table data. Default is 10 threads. + +Generating a complete data set may take some time. When you are done, you can see the result file in the mo_ssb_open-1.0.1/bin/ssb-data/ directory. + +```bash +root@host-10-222-4-8:~/soft/ssb/mo_ssb_open-1.0.1/bin/ssb-data# ls -l +total 604976 +-rwS--S--T 1 root root 2837046 Jun 7 03:31 customer.tbl +-rw-r--r-- 1 root root 229965 Jun 7 03:31 date.tbl +-rw-r--r-- 1 root root 118904702 Jun 7 03:31 lineorder.tbl.1 +-rw-r--r-- 1 root root 119996341 Jun 7 03:31 lineorder.tbl.2 +-rw-r--r-- 1 root root 120146777 Jun 7 03:31 lineorder.tbl.3 +-rw-r--r-- 1 root root 120000311 Jun 7 03:31 lineorder.tbl.4 +-rw-r--r-- 1 root root 120057972 Jun 7 03:31 lineorder.tbl.5 +-rw-r--r-- 1 root root 17139259 Jun 7 03:31 part.tbl +-rw-r--r-- 1 root root 166676 Jun 7 03:31 supplier.tbl +``` + +## Building tables in MatrixOne + +Modify the configuration file conf/matrxione.conf to specify the address, username, password for MatrixOne. An example configuration file is shown below + +```conf +# MatrixOne host +export HOST='127.0.0.1' +# MatrixOne port +export PORT=6001 +# MatrixOne username +export USER='root' +# MatrixOne password +export PASSWORD='111' +# The database where SSB tables located +export DB='ssb' +``` + +Then execute the following script to build the table. + +```bash +./bin/create-ssb-tables.sh +``` + +Connect to MatrixOne to view and build table successfully. + +```sql +mysql> show tables; ++----------------+ +| Tables_in_ssb | ++----------------+ +| customer | +| dates | +| lineorder | +| lineorder_flat | +| part | +| supplier | ++----------------+ +6 rows in set (0.01 sec) +``` + +## Import Data + +Execute the following script to import the data required for the ssb test: + +```bash +./bin/load-ssb-data.sh -c 10 +``` + +**Parameter interpretation** + +**-c**: You can specify the number of threads to perform the import, which defaults to 5. + +Once loaded, you can query the data in MatrixOne using the created table. + +## Run the query command + +query result first column is query coding, + +- Multiple table queries + +```bash +root@host-10-222-4-8:~/soft/ssb/mo_ssb_open-1.0.1# ./bin/run-ssb-queries.sh +mysqlslap Ver 8.0.37 for Linux on x86_64 (MySQL Community Server - GPL) +mysql Ver 8.0.37 for Linux on x86_64 (MySQL Community Server - GPL) +bc 1.07.1 +Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free Software Foundation, Inc. +HOST: 127.0.0.1 +PORT: 6001 +USER: root +DB: ssb +q1.1: 0.22 0.16 0.13 fast:0.13 +q1.2: 0.17 0.17 0.17 fast:0.17 +q1.3: 0.15 0.19 0.18 fast:0.15 +q2.1: 0.22 0.21 0.23 fast:0.21 +q2.2: 0.18 0.17 0.16 fast:0.16 +q2.3: 0.15 0.16 0.17 fast:0.15 +q3.1: 0.24 0.23 0.23 fast:0.23 +q3.2: 0.16 0.16 0.20 fast:0.16 +q3.3: 0.16 0.14 0.13 fast:0.13 +q3.4: 0.12 0.11 0.11 fast:0.11 +q4.1: 0.24 0.22 0.30 fast:0.22 +q4.2: 0.22 0.21 0.22 fast:0.21 +q4.3: 0.20 0.21 0.20 fast:0.20 +total time: 2.23 seconds +Finish ssb queries. +``` + +- Single table query + +```bash +root@host-10-222-4-8:~/soft/ssb/mo_ssb_open-1.0.1# ./bin/run-ssb-flat-queries.sh +mysqlslap Ver 8.0.37 for Linux on x86_64 (MySQL Community Server - GPL) +mysql Ver 8.0.37 for Linux on x86_64 (MySQL Community Server - GPL) +bc 1.07.1 +Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free Software Foundation, Inc. +HOST: 127.0.0.1 +PORT: 6001 +USER: root +DB: ssb +q1.1: 0.21 0.13 0.14 fast:0.13 +q1.2: 0.15 0.13 0.15 fast:0.13 +q1.3: 0.16 0.21 0.22 fast:0.16 +q2.1: 0.36 0.34 0.38 fast:0.34 +q2.2: 0.36 0.34 0.32 fast:0.32 +q2.3: 0.25 0.26 0.22 fast:0.22 +q3.1: 0.39 0.39 0.30 fast:0.30 +q3.2: 0.32 0.33 0.29 fast:0.29 +q3.3: 0.22 0.23 0.29 fast:0.22 +q3.4: 0.32 0.28 0.31 fast:0.28 +q4.1: 0.42 0.38 0.38 fast:0.38 +q4.2: 0.42 0.48 0.45 fast:0.42 +q4.3: 0.35 0.34 0.29 fast:0.29 +total time: 3.48 seconds +Finish ssb-flat queries. +``` + +The query results correspond to: query statement, first query result, second query result, third query result, and fastest result in s. + +!!! note + You can view specific query statements in the mo_ssb_open-1.0.1/ssb-queries directory. \ No newline at end of file diff --git a/docs/MatrixOne/Reference/mo-tools/mo_tpch_open.md b/docs/MatrixOne/Reference/mo-tools/mo_tpch_open.md new file mode 100644 index 000000000..49e697145 --- /dev/null +++ b/docs/MatrixOne/Reference/mo-tools/mo_tpch_open.md @@ -0,0 +1,162 @@ +# mo_tpch_open Tool Guide + +`mo_tpch_open` is a tool that implements TPCH testing for MatrixOne. + +!!! Note + The `mo_tpch_open` tool is currently only supported for deployment on Linux system x86 architectures. + +## Pre-dependency + +- Finished [installing and starting](../../Get-Started/install-standalone-matrixone.md) MatrixOne +- Set environment encoding to UTF-8 +- Installed wget +- The bc command is installed + +## Install mo_tpch_open + +```bash +wget https://github.com/matrixorigin/mo_tpch_open/archive/refs/tags/v1.0.1.zip unzip v1.0.1.zip +``` + +## Generate Data Set + +Generate the dataset using the following command: + +```bash +cd mo_tpch_open-1.0.1 ./bin/gen-tpch-data.sh -s 1 -c 5 +``` + +**Parameter interpretation** + +**-s**: means generating a dataset of about 1GB, no parameters specified, 100G data generated by default, + +**-c**: Indicates the number of threads generating table data. Default is 10 threads. + +Generating a complete data set may take some time. When you are done, you can see the result file in the mo_tpch_open-1.0.1/bin/tpch-data directory. + +```bash +root@host-10-222-4-8:~/soft/tpch/tpch-tools/bin/tpch-data# ls -l +root@host-10-222-4-8:~/soft/tpch/mo_tpch_open-1.0.1/bin/tpch-data# ls -l +total 1074936 +-rw-r--r-- 1 root root 24346144 Jun 7 03:16 customer.tbl +-rw-r--r-- 1 root root 151051198 Jun 7 03:16 lineitem.tbl.1 +-rw-r--r-- 1 root root 152129724 Jun 7 03:16 lineitem.tbl.2 +-rw-r--r-- 1 root root 152344710 Jun 7 03:16 lineitem.tbl.3 +-rw-r--r-- 1 root root 152123661 Jun 7 03:16 lineitem.tbl.4 +-rw-r--r-- 1 root root 152213994 Jun 7 03:16 lineitem.tbl.5 +-rw-r--r-- 1 root root 2224 Jun 7 03:16 nation.tbl +-rw-r--r-- 1 root root 34175478 Jun 7 03:16 orders.tbl.1 +-rw-r--r-- 1 root root 34463858 Jun 7 03:16 orders.tbl.2 +-rw-r--r-- 1 root root 34437453 Jun 7 03:16 orders.tbl.3 +-rw-r--r-- 1 root root 34445732 Jun 7 03:16 orders.tbl.4 +-rw-r--r-- 1 root root 34429640 Jun 7 03:16 orders.tbl.5 +-rw-r--r-- 1 root root 24135125 Jun 7 03:16 part.tbl +-rw-r--r-- 1 root root 23677134 Jun 7 03:16 partsupp.tbl.1 +-rw-r--r-- 1 root root 23721079 Jun 7 03:16 partsupp.tbl.2 +-rw-r--r-- 1 root root 23808550 Jun 7 03:16 partsupp.tbl.3 +-rw-r--r-- 1 root root 23894802 Jun 7 03:16 partsupp.tbl.4 +-rw-r--r-- 1 root root 23883051 Jun 7 03:16 partsupp.tbl.5 +-rw-r--r-- 1 root root 389 Jun 7 03:16 region.tbl +-rw-r--r-- 1 root root 1409184 Jun 7 03:16 supplier.tbl +``` + +## Building tables in MatrixOne + +Modify the configuration file conf/matrxione.conf to specify the address, username, password for MatrixOne. An example configuration file is shown below + +```conf +# MatrixOne host +export HOST='127.0.0.1' +# MatrixOne port +export PORT=6001 +# MatrixOne username +export USER='root' +# MatrixOne password +export PASSWORD='111' +# The database where TPC-H tables located +export DB='tpch' +``` + +Then execute the following script to build the table. + +```bash +./bin/create-tpch-tables.sh +``` + +Connect to MatrixOne to view and build table successfully. + +```sql +mysql> show tables; ++----------------+ +| Tables_in_tpch | ++----------------+ +| customer | +| lineitem | +| nation | +| orders | +| part | +| partsupp | +| region | +| revenue0 | +| supplier | ++----------------+ +9 rows in set (0.00 sec) +``` + +## Import Data + +Execute the following script to import the data required for the TPC-H test: + +```bash +./bin/load-tpch-data.sh -c 10 +``` + +**Parameter interpretation** + +**-c**: You can specify the number of threads to perform the import, which defaults to 5. + +Once loaded, you can query the data in MatrixOne using the created table. + +## Run the query command + +Execute the following command to query: + +```bash +root@host-10-222-4-8:~/soft/tpch/mo_tpch_open-1.0.1# ./bin/run-tpch-queries.sh +mysql Ver 8.0.37 for Linux on x86_64 (MySQL Community Server - GPL) +HOST: 127.0.0.1 +PORT: 6001 +USER: root +DB: tpch +Time Unit: ms +q1 836 715 691 691 +q2 111 80 88 80 +q3 325 235 212 212 +q4 221 181 177 177 +q5 240 236 295 236 +q6 215 292 350 292 +q7 373 327 299 299 +q8 236 238 243 238 +q9 443 406 413 406 +q10 375 390 422 390 +q11 201 237 231 231 +q12 461 460 400 400 +q13 321 294 301 294 +q14 289 261 282 261 +q15 391 285 294 285 +q16 222 288 255 255 +q17 333 247 243 243 +q18 275 262 317 262 +q19 513 479 511 479 +q20 240 244 198 198 +q21 1503 1746 1786 1746 +q22 138 122 126 122 +Total cold run time: 8262 ms +Total hot run time: 7797 ms +Finish tpch queries. +``` + +The query results correspond to: query statement, first query result, second query result, third query result, and fastest result in ms. + +!!! note + You can view specific query statements in the mo_tpch_open-1.0.1/queries directory. diff --git a/docs/MatrixOne/Reference/mo-tools/mo_ts_perf_test.md b/docs/MatrixOne/Reference/mo-tools/mo_ts_perf_test.md new file mode 100644 index 000000000..9664f39f7 --- /dev/null +++ b/docs/MatrixOne/Reference/mo-tools/mo_ts_perf_test.md @@ -0,0 +1,230 @@ +# mo_ts_perf_test Tool Guide + +`mo_ts_perf_test` is a timed write and query test tool for MatrixOne. + +!!! Note + The `mo_ts_perf_test` tool is currently only supported for deployment on Linux system x86 architectures. + +## Pre-dependency + +- Finished [installing and starting](../../Get-Started/install-standalone-matrixone.md) MatrixOne. +- Installed wget + +## Install mo_ts_perf_test + +```bash +wget https://github.com/matrixorigin/mo_ts_perf_test/archive/refs/tags/v1.0.1.zip unzip v1.0.1.zip +``` + +## Configuration + +Modify the db.conf configuration file in the matrixone/conf directory as appropriate + +```bash +root@host-10-222-4-8:~/soft/perf/mo_ts_perf_test-1.0.0/matrixone/conf# cat db.conf +[dbInfo] +host = 127.0.0.1 +port = 6001 +user = root +password = 111 +tablePrefix = d +point_query_ts_condition = '2017-07-14 10:40:06.379' +loadFilePath = /root/soft/perf/ +``` + +**Configuration instructions:** + +- tablePrefix: When writing to multiple tables for lookup, the prefix of the table name, for example, with a value of d, will automatically create three tables: d0, d1, d2; +- point_query_ts_condition: Filter criteria value for the ts field when querying for points; +- loadFilePath: When importing for load data infile, the directory where the local csv file is to be imported; note: the loadFilePath path must be local to the MO database, that is: the csv file is to be placed on the server where the MO database is located. + +## Perform write tests with mo-write + +mo-write is the MO write test tool with the command: + +```bash +mo-write -T -r -n -retry -mode -txc -tType -wType -wType +``` + +!!! note + All writes are to tables under the test database, single-table writes are to d0 tables, multi-table writes are to d0, d1, d2, and so on (the number is determined by the client data). + +**Parameter description** + +- -T: Indicates the number of clients writing concurrently, which defaults to 7; +- -r: Indicates the number of data rows submitted per write, default 10000; +- -n: Indicates the total number of data rows to import per client, default 500,000; +- -retry: Indicates the number of tests (eventually the average write speed is calculated automatically), default 1; +- -mode: Indicates write mode, single for single table writes, multi for multi table writes, default multi; +- -txc: Indicates the number of writes per transaction commit, value >=0, default 0 (0 means no transactions open); +- -tType: Indicates the type of table written to, ts, tsPK, intPK, respectively, default ts, ts for a time series table without a primary key, tsPK for a time series table with a primary key, and intPK for a normal table with a primary key of type int; +- -wType: Indicates the type of write, divided into insert, loadLine, loadFile, insert denotes insert into values for writing data, loadLine denotes load data inline for writing, and loadFile denotes load data infile for importing into a local csv file (local csv data file fetch: can be automatically generated in its parent data directory via sr-write). + +### Examples + +- **Example 1** + +```bash +root@host-10-222-4-8:~/soft/perf/mo_ts_perf_test-1.0.0/matrixone/mo-write# ./mo-write +r=10000, T=7, n=500000, mode=multi, retry=1, txc=0, tType=ts, wType=loadLine +dbConfig:{127.0.0.1 6001 root 111 d '2017-07-14 10:40:06.379' /root/soft/perf/} +start create db conn, count:7 +db connection[1] created. +db connection[2] created. +db connection[3] created. +db connection[4] created. +db connection[5] created. +db connection[6] created. +db connection[7] created. +mo-data of all clinet(7 thread) has ready! +Initialize database and table completed. +start preparing test data. +spend time of prepare testing data:7.255468 s +按 Y 或者 回车键,将开始插入数据,按 N 将退出, 开的第1次测试, txc=0 + +start test 1 ……. +spend time:7.405524 s +1 test: 3500000/7.405524 = 472620.159086 records/second +======== avg test: 472620.159086/1 = 472620.159086 records/second txc=0 =========== +``` + +- **Example 2** + +Each of the 2 clients uses insert into to write 100,000 pieces of data to a time series table (d0) with a primary key: + +```bash +root@host-10-222-4-8:~/soft/perf/mo_ts_perf_test-1.0.1/matrixone/mo-write# ./mo-write -T 2 -n 100000 -mode single -tType tsPK -wType insert +r=10000, T=2, n=100000, mode=single, retry=1, txc=0, tType=tsPK, wType=insert +dbConfig:{127.0.0.1 6001 root 111 d '2017-07-14 10:40:06.379' /root/soft/perf/} +start create db conn, count:2 +db connection[1] created. +db connection[2] created. +mo-data of all clinet(2 thread) has ready! +Initialize database and table completed. +start preparing test data. +spend time of prepare testing data:0.819425 s +按 Y 或者 回车键,将开始插入数据,按 N 将退出, 开的第1次测试, txc=0 + +start test 1 ……. +spend time:11.388648 s +1 test: 200000/11.388648 = 17561.347089 records/second +======== avg test: 17561.347089/1 = 17561.347089 records/second txc=0 =========== +``` + +- **Example 3** + +1 client tests a set of data by writing 500,000 pieces of data to a normal table (d0) with a primary key of type int using load data inline: + +```bash +root@host-10-222-4-8:~/soft/perf/mo_ts_perf_test-1.0.1/matrixone/mo-write# ./mo-write -T 1 -tType=intPK -retry 1 +r=10000, T=1, n=500000, mode=multi, retry=1, txc=0, tType=intPK, wType=loadLine +dbConfig:{127.0.0.1 6001 root 111 d '2017-07-14 10:40:06.379' /root/soft/perf/} +start create db conn, count:1 +db connection[1] created. +mo-data of all clinet(1 thread) has ready! +Initialize database and table completed. +start preparing test data. +spend time of prepare testing data:1.583363 s +按 Y 或者 回车键,将开始插入数据,按 N 将退出, 开的第1次测试, txc=0 + +start test 1 ……. +spend time:5.062582 s +1 test: 500000/5.062582 = 98763.826906 records/second +======== avg test: 98763.826906/1 = 98763.826906 records/second txc=0 =========== +``` + +- **Example 4** + +Using load data inline, 8 clients write 500,000 pieces of data to a time series table (d0......d7) without a primary key via a transaction commit (10 writes per commit), automatically testing 3 groups for averaging: + +```bash +root@host-10-222-4-8:~/soft/perf/mo_ts_perf_test-1.0.1/matrixone/mo-write# ./mo-write -T 8 -txc 10 -retry 3 +r=10000, T=8, n=500000, mode=multi, retry=3, txc=10, tType=ts, wType=loadLine +dbConfig:{127.0.0.1 6001 root 111 d '2017-07-14 10:40:06.379' /root/soft/perf/} +start create db conn, count:8 +db connection[1] created. +db connection[2] created. +db connection[3] created. +db connection[4] created. +db connection[5] created. +db connection[6] created. +db connection[7] created. +db connection[8] created. +mo-data of all clinet(8 thread) has ready! +Initialize database and table completed. +start preparing test data. +spend time of prepare testing data:7.854798 s +按 Y 或者 回车键,将开始插入数据,按 N 将退出, 开的第1次测试, txc=10 + +start test 1 ……. +开始事务提交写入, 一次事务提交的写入: 10 +spend time:9.482012 s +1 test: 4000000/9.482012 = 421851.388088 records/second +按 Y 或者 回车键,将开始插入数据,按 N 将退出, 开的第2次测试, txc=10 + +start test 2 ……. +tables has truncated and start insert data …… +开始事务提交写入, 一次事务提交的写入: 10 +spend time:10.227261 s +2 test: 4000000/10.227261 = 391111.576833 records/second +按 Y 或者 回车键,将开始插入数据,按 N 将退出, 开的第3次测试, txc=10 + +start test 3 ……. +tables has truncated and start insert data …… +开始事务提交写入, 一次事务提交的写入: 10 +spend time:8.994586 s +3 test: 4000000/8.994586 = 444711.979564 records/second +======== avg test: 1257674.944485/3 = 419224.981495 records/second txc=10 =========== +``` + +## Perform query testing with mo-query + +mo-query is a query testing tool that tests the time of queries such as select \*, point queries, common aggregate queries, time windows, etc. All queries query only the table d0 in the test library. The command is: + +```bash +mo-query -T +``` + +**Parameter description** + +**-T:** Indicates the number of clients executing select * queries concurrently. Defaults to 1. + +### Examples + +```bash +root@host-10-222-4-8:~/soft/perf/mo_ts_perf_test-1.0.1/matrixone/mo-query# ./mo-query -T 5 +T=5 +dbConfig:{127.0.0.1 6001 root 111 d '2017-07-14 10:40:06.379' /root/soft/perf/} +start create db conn, count:5 +db connection[1] created. +db connection[2] created. +db connection[3] created. +db connection[4] created. +db connection[5] created. +mo all clinet(5 thread) has ready! + + count value is:200000 +'count(*)' query spend time:0.062345 s + +'select *' (5 client concurrent query) spend time:0.850350 s +query speed: 1000000/0.850350 = 1175985.806764 records/second + + point query sql: select * from test.d0 where ts='2017-07-14 10:40:06.379' +'point query' spend time:0.001589 s + + avg value is: 0.07560730761790913 +'avg(current)' query spend time:0.026116 s + + sum value is: 15121.461523581824 +'sum(current)' query spend time:0.023109 s + + max value is: 3.9999022 +'max(current)' query spend time:0.054021 s + + min value is: -3.9999993 +'min(current)' query spend time:0.035809 s + +TimeWindow query sql:select _wstart, _wend, max(current), min(current) from d0 interval(ts, 60, minute) sliding(60, minute) +2017-07-14 02:00:00 +0000 UTC 2017-07-14 03:00:00 +0000 UTC 3.9999022 -3.9999993 +TimeWindow query spend time:0.180333 s +``` \ No newline at end of file diff --git a/docs/MatrixOne/Tutorial/c-net-crud-demo.md b/docs/MatrixOne/Tutorial/c-net-crud-demo.md new file mode 100644 index 000000000..05d8530ac --- /dev/null +++ b/docs/MatrixOne/Tutorial/c-net-crud-demo.md @@ -0,0 +1,132 @@ +# C# Base Example + +This document will guide you through how to build a simple application using C# and implement CRUD (Create, Read, Update, Delete) functionality. + +## Prepare before you start + +- Finished [installing and starting](../Get-Started/install-standalone-matrixone.md) MatrixOne + +- Installed [.NET Core SDK](https://dotnet.microsoft.com/zh-cn/download) + +- [MySQL Client](https://dev.mysql.com/downloads/installer/) installed + +## Steps + +### Step one: Create a C# app + +Create an app using the dotnet command. For example, create a new app called myapp: + +``` +dotnet new console -o myapp +``` + +Then switch to the myapp directory + +### Step Two: Add the MySQL Connector/NET NuGet Package + +Install the MySql.Data package using the NuGet package manager: + +``` +dotnet add package MySql.Data +``` + +### Step Three: Connect Matrixone for Action + +Write code to connect to Matrixone, build a student table, and add, delete, and change lookups. Write the following code in the Program.cs file: + +``` +using System; +using MySql.Data.MySqlClient; + +class Program { + + static void ExecuteSQL(MySqlConnection connection, string query) + { + using (MySqlCommand command = new MySqlCommand(query, connection)) + { + command.ExecuteNonQuery(); + } + } + static void Main(string[] args) + { + Program n =new Program(); + string connectionString = "server=127.0.0.1;user=root;database=test;port=6001;password=111"; + using (MySqlConnection connection = new MySqlConnection(connectionString)) + { + try{ + connection.Open(); + Console.WriteLine ("Connection already established"); + // build table + ExecuteSQL(connection,"CREATE TABLE IF NOT EXISTS Student (id INT auto_increment PRIMARY KEY, name VARCHAR(255),age int,remark VARCHAR(255) )"); + Console.WriteLine("Build table succeeded!"); + // Insert data + ExecuteSQL (connection, "INSERT INTO Student (name,age) VALUES ('Zhang San',22), ('Li Si',25), ('Zhao Wu',30)"); + Console.WriteLine ("Data inserted successfully!"); + // Update data + ExecuteSQL(connection,"UPDATE Student SET remark = 'Updated' WHERE id = 1"); + Console.WriteLine("Update data successfully!"); + // Delete data + ExecuteSQL(connection,"DELETE FROM Student WHERE id = 2"); + Console.WriteLine("Data deleted successfully!"); + //query data + MySqlCommand command = new MySqlCommand("SELECT * FROM Student", connection); + using (MySqlDataReader reader = command.ExecuteReader()) + { + while (reader.Read()) + { + Console.WriteLine ($"Name: {reader["name"]}, age: {reader["age"]}, notes: {reader["remark"]}"); + } + } + Console.WriteLine("Data query succeeded!"); + } + catch (MySqlException ex) + { + Console.WriteLine(ex.Message); + + } + finally + { + Console.WriteLine ("Ready to Disconnect"); + connection.Close(); + Console.WriteLine("Disconnect succeeded!"); + } + + //connection.Close(); + } + } +} +``` + +### Step Four: Run the Program + +Execute the command `dotnet run` at the terminal: + +``` +(base) admin@admindeMacBook-Pro myapp %dotnet run +A connection has been established. +Successfully inserted the data! +Successfully updated the data! +Successfully deleted the data! +Name: Zhao Wu, age: 30, Remarks: +Name: Zhang San, age: 22, Remarks: Updated +Data query succeeded! +Ready to disconnect +Disconnect succeeded! +``` + +### Step five: Check the data + +Use the Mysql client connection Matrixone to query the Student table: + +```sql +mysql> select * from student; ++------+-----------+------+---------+ +| id | name | age | remark | ++------+-----------+------+---------+ +| 3 | Zhao Wu | 30 | NULL | +| 1 | Zhang San | 22 | Updated | ++------+--------+------+------------+ +2 rows in set (0.00 sec) +``` + +As you can see, the data is returned correctly. \ No newline at end of file diff --git a/docs/MatrixOne/Tutorial/django-python-crud-demo.md b/docs/MatrixOne/Tutorial/django-python-crud-demo.md new file mode 100644 index 000000000..34f23ea7e --- /dev/null +++ b/docs/MatrixOne/Tutorial/django-python-crud-demo.md @@ -0,0 +1,351 @@ +# Django Foundation Example + +This document will guide you through how to build a simple application using **Django** and implement CRUD (Create, Read, Update, Delete) functionality. + +**Django** is an open source web application framework written in Python. + +## Prepare before you start + +A simple introduction to related software: + +* Django is an advanced Python web framework for rapid development of maintainable and scalable web applications. With Django, with very little code, Python's program developers can easily do most of what a full-fledged website needs and further develop full-featured web services. + +### Software Installation + +Before you begin, confirm that you have downloaded and installed the following software: + +- Verify that you have completed the [standalone deployment of](../Get-Started/install-standalone-matrixone.md) MatrixOne. + +- Verify that you have finished installing [Python 3.8 (or plus)](https://www.python.org/downloads/). Verify that the installation was successful by checking the Python version with the following code: + + ``` + python3 -V + ``` + +- Verify that you have completed installing the MySQL client. + +- Verify that you have finished installing [Django](https://www.djangoproject.com/download/). Verify that the installation was successful by checking the Django version with the following code: + + ``` + python3 -m django --version + ``` + +- Download and install the `pymysql` tool. Download and install the `pymysql` tool using the following code: + + ``` + pip3 install pymysql + + #If you are in China mainland and have a low downloading speed, you can speed up the download by following commands. + pip3 install pymysql -i https://pypi.tuna.tsinghua.edu.cn/simple + ``` + +### Environment Configuration + +1. Connect to MatrixOne through a MySQL client. Create a new database named *test*. + + ``` + mysql> create database test; + ``` + +2. Create the project `django_crud_matrixone`. + + ``` + django-admin startproject django_crud_matrixone + ``` + + Once created we can look at the directory structure of the following project: + + ```bash + cd django_crud_matrixone/ + + django_crud_matrixone/ + ├── __init__.py + └── asgi.py + └── settings.py + └── urls.py + └── wsgi.py + manage + ``` + +3. Next we start the server by entering the following command into the django\_crud\_matrixone directory: + + ``` + python3 manage.py runserver 0.0.0.0:8000 + ``` + + 0.0.0.0 Let other computers connect to the development server, 8000 is the port number. If not, then the port number defaults to 8000. + + Enter the ip of your server in your browser (here we enter the native IP address: 127.0.0.1:8000) and the port number. If it starts normally, the output is as follows: + + ![](https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/artwork/docs/tutorial/django/django-1.png) + +4. We found the DATABASES configuration item in the project's settings.py file and modified its information to: + + ```python + DATABASES = { + 'default': { 'ENGINE': 'django.db.backends.mysql', # database engine + 'NAME': 'test', # database name + 'HOST': '127.0.0.1', # database address, native ip address 127.0.0.1 + 'PORT': 6001, # Port + 'USER': 'root', # database username + 'PASSWORD': '111', #database password + } + } + ``` + +5. Next, tell Django to connect to the mysql database using the pymysql module, introduce the module and configure it in __init__.py in the same directory as settings.py: + + ```python + import pymysql + pymysql.install_as_MySQLdb() + ``` + +6. To create an app, Django dictates that you must create one if you want to use the model. We create an app for TestModel using the following command: + + ``` + django-admin startapp TestModel + ``` + + The directory structure is as follows: + + ```bash + django_crud_matrixone/ + ├── __init__.py + └── asgi.py + ... + TestModel + └── migrations + └── __init__.py + └── admin.py + └── apps.py + └── models.py + └── tests.py + └── views.py + ``` + +7. Next find the INSTALLED\_APPS entry in settings.py as follows: + + ```python + INSTALLED_APPS = [ + "django.contrib.admin", + "django.contrib.auth", + "django.contrib.contenttypes", + "django.contrib.sessions", + "django.contrib.messages", + "django.contrib.staticfiles", + "TestModel", #add this + ] + ``` + +## New Table + +- Modify the TestModel/models.py file to define the information code for a *book table* as follows: + +```python +from django.db import models +class Book(models.Model): + id = models.AutoField(primary_key=True) # id is created automatically and can be written manually + title = models.CharField(max_length=32) # book name + price = models.DecimalField(max_digits=5, decimal_places=2) # book price + publish = models.CharField(max_length=32) # publisher name + pub_date = models.DateField() # publication time +``` + +The Django model uses its own ORM. The above class name represents the database table name (*testmodel_book*) and inherits models.Model. Fields inside the class represent fields in the data table. Data types: AutoField (equivalent to int), CharField (equivalent to varchar), DecimalField (equivalent to decimal), DateField (equivalent to date), max_length parameter limits length. + +ORM correspondence table: + +
+ +
+ +Refer to: for more model field types. + +- Run from the command line + +```bash +python3 manage.py makemigrations TestModel # Generate the configuration file and put it in the migrations directory under the app +python3 manage.py migrate TestModel # Automatically generate the appropriate SQL statements based on the configuration file +``` + +Go into the *test* database and see that the *testmodel\_book table* has been generated. where a record of the operations performed is generated in the *django\_migrations table*. + +```sql +mysql> show tables; ++-------------------+ +| Tables_in_test | ++-------------------+ +| django_migrations | +| testmodel_book | ++-------------------+ +2 rows in set (0.01 sec) +``` + +## Insert Data + +- Adding data requires the creation of objects first, and then through the ORM-supplied objects-supplied method create . Create a new views.py file in the django\_crud\_matrixone directory under the previously created django\_crud\_matrixone directory and enter the code: + +``` +from django.shortcuts import render,HttpResponse +from TestModel import models +def add_book(request): + books = models.Book.objects.create(title="白夜行",price=39.50,publish="南海出版公司",pub_date="2010-10-10") + return HttpResponse("

数据添加成功!

") +``` + +- Next, bind the URL to the view function. Open the urls.py file, delete the original code, and copy and paste the following code into the urls.py file: + +``` +from django.contrib import admin +from django.urls import path +from . import views + +urlpatterns = [ + path('', views.add_book), + ] +``` + +- Next we start the server by entering the following command into the django\_crud\_matrixone directory: + +``` +python3 manage.py runserver 0.0.0.0:8000 +``` + +Enter the ip of your server in your browser (here we enter the native IP address: 127.0.0.1:8000) and the port number. If it starts normally, the output is as follows: + +
+ +
+ +- Connecting to the database to query the data, you can see that the data was successfully inserted: + +```sql +mysql> select * from testmodel_book; ++------+-----------+-------+--------------------+------------+ +| id | title | price | publish | pub_date | ++------+-----------+-------+--------------------+------------+ +| 1 | 白夜行 | 39.50 | 南海出版公司 | 2010-10-10 | ++------+-----------+-------+--------------------+------------+ +1 row in set (0.00 sec) +``` + +## query data + +- Modify the views.py file in the django\_crud\_matrixone directory and add the code: + +```python +def src_book(request): + books = models.Book.objects.all()#Use the all() method to query everything + for i in books: + print(i.id,i.title,i.price,i.publish,i.pub_date) + return HttpResponse("

查找成功!

") +``` + +For more query related methods, refer to: + +- Modify the urls.py file: + + ``` + urlpatterns = [ + path('', views.src_book), + ] + ``` + +- Next we start the server by entering the following command into the django\_crud\_matrixone directory: + +``` +python3 manage.py runserver 0.0.0.0:8000 +``` + +Enter the ip of your server in your browser (here we enter the native IP address: 127.0.0.1:8000) and the port number. If it starts normally, the output is as follows: + +
+ +
+ +The command line results are: + +
+ +
+ +## Update Data + +- The update data uses QuerySet type data `.update()`. The following example updates the price value of a record with an id value of 1 to 50. Modify the views.py file in the django\_crud\_matrixone directory and add the code: + +```python +def upd_book(request): + books = models.Book.objects.filter(pk=1).update(price=50) + return HttpResponse("

更新成功!

") +``` + +pk=1 means primary key=1, which is equivalent to id=1. + +- Modify the urls.py file: + +``` +urlpatterns = [ +path('', views.upd_book), +] +``` + +- Next we start the server by entering the following command into the django\_crud\_matrixone directory: + +``` +python3 manage.py runserver 0.0.0.0:8000 +``` + +Enter the ip of your server in your browser (here we enter the native IP address: 127.0.0.1:8000) and the port number. If it starts normally, the output is as follows: + +
+ +
+ +- Looking at the *testmodel\_book table*, you can see that the data was updated successfully: + +```sql +mysql> select * from testmodel_book; ++------+-----------+-------+--------------------+------------+ +| id | title | price | publish | pub_date | ++------+-----------+-------+--------------------+------------+ +| 1 | 白夜行 | 50.00 | 南海出版公司 | 2010-10-10 | ++------+-----------+-------+--------------------+------------+ +1 row in set (0.00 sec) +``` + +## Delete Data + +- Deleting data uses the number of QuerySet types `.delete()`. The following example deletes a record with price 50. Modify the views.py file in the django\_crud\_matrixone directory and add the code: + +```python +def del_book(request): + books=models.Book.objects.filter(price=50).delete() + return HttpResponse("

删除成功!

") +``` + +- Modify the urls.py file: + +``` +urlpatterns = [ +path('', views.del_book), +] +``` + +- Next we start the server by entering the following command into the django\_crud\_matrixone directory: + +``` +python3 manage.py runserver 0.0.0.0:8000 +``` + +Enter the ip of your server in your browser (here we enter the native IP address: 127.0.0.1:8000) and the port number. If it starts normally, the output is as follows: + +
+ +
+ +- Looking at the *testmodel\_book table*, you can see that the data was successfully deleted. + +```sql +mysql> select * from testmodel_book; +Empty set (0.00 sec) +``` diff --git a/docs/MatrixOne/Tutorial/rag-demo.md b/docs/MatrixOne/Tutorial/rag-demo.md new file mode 100644 index 000000000..356576fa2 --- /dev/null +++ b/docs/MatrixOne/Tutorial/rag-demo.md @@ -0,0 +1,207 @@ +# Example of RAG Application Foundation + +## What is RAG? + +RAG, known as Retrieval-Augmented Generation, is a technology that combines information retrieval and text generation to improve the accuracy and relevance of text generated by large language models (LLMs). LLM may not be able to obtain up-to-date information due to limitations of its training data. + +For example, when I asked GPT about the latest version of MatrixOne, it didn't give an answer. + +
+ +
+ +In addition, these models can sometimes produce misleading information and produce factually incorrect content. For example, when I asked Lu Xun about his relationship with Zhou Shuren, GPT started a serious nonsense. + +
+ +
+ +To solve the above problem, we can retrain the LLM model, but at a high cost. The main advantage of RAG, on the other hand, is that it avoids having to train again for specific tasks. Its high availability and low threshold make it one of the most popular scenarios in LLM systems, on which many LLM applications are built. The core idea of RAG is for the model to not only rely on what it learns during the training phase when generating responses, but also to utilize external, up-to-date, proprietary sources of information, so that users can optimize the output of the model by enriching the input with additional external knowledge bases based on the actual situation. + +RAG's workflow typically consists of the following steps: + +- Retrieve: Find and extract the information most relevant to the current query from a large data set or knowledge base. +- Augment: Combines retrieved information or data sets with the LLM to enhance the performance of the LLM and the accuracy of the output. +- Generate: Utilize LLM to generate new text or responses using retrieved information. + +The following is a flow chart for Native RAG: + +
+ +
+ +As you can see, the retrieval link plays a crucial role in the RAG architecture, and MatrixOne's ability to retrieve vectors provides powerful data retrieval support for building RAG applications. + +## Role of Matrixone in RAG + +As a hyperconverged database, Matrxione comes with its own vector capabilities, which play an important role in RAG applications in the following ways: + +- Efficient information retrieval: Matrxione has vector data types specifically designed to process and store high-dimensional vector data. It uses special data structures and indexing strategies, such as KNN queries, to quickly find data items that most closely resemble query vectors. + +- Support for large-scale data processing: Matrxione's ability to effectively manage and process large-scale vector data is a core feature of the retrieval component of the RAG system, which enables the RAG system to quickly retrieve the information most relevant to user queries from vast amounts of data. + +- Improved generation quality: Through the retrieval capabilities of Matrxione's vector capabilities, RAG technology can introduce information from an external knowledge base to produce more accurate, rich, and contextualized text that improves the quality of generated text. + +- Security and privacy protection: Matrxione can also protect data with data security measures such as encrypted storage and access control, which is particularly important for RAG applications that handle sensitive data. + +- Simplify the development process: Using Matrxione simplifies the development process for RAG applications because it provides an efficient mechanism for storing and retrieving vectorized data, reducing the burden on developers in data management. + +Based on Ollama, this paper combines Llama2 and Mxbai-embed-large to quickly build a Native RAG application using Matrixone's vector capabilities. + +## Prepare before you start + +### Relevant knowledge + +**Ollama**: Ollama is an open source large language model service tool that allows users to easily deploy and use large-scale pre-trained models in their hardware environment. Ollama's primary function is to deploy and manage large language models (LLMs) within Docker containers, enabling users to quickly run them locally. Ollama simplifies the deployment process by allowing users to run open source large language models locally with a single command through simple installation instructions. + +**Llama2**:llama2 is an open source language large model for understanding and generating long text that can be used for research and commercial purposes. + +**Mxbai-embed-large**: mxbai-embed-large is an open source embedding model designed for text embedding and retrieval tasks. The model generates an embedding vector size of 1024. + +### Software Installation + +Before you begin, confirm that you have downloaded and installed the following software: + +- Verify that you have completed the [standalone deployment of](../Get-Started/install-standalone-matrixone.md) MatrixOne. + +- Verify that you have finished installing [Python 3.8 (or plus)](https://www.python.org/downloads/). Verify that the installation was successful by checking the Python version with the following code: + +``` +python3 -V +``` + +- Verify that you have completed installing the MySQL client. + +- Download and install the `pymysql` tool. Download and install the `pymysql` tool using the following code: + +``` +pip3 install pymysql +``` + +- Verify that you have finished installing [ollama](https://ollama.com/download). Verify that the installation was successful by checking the ollama version with the following code: + +``` +ollama -v +``` + +- Download the LLM model `llama2` and embedding model `mxbai-embed-large`: + +``` +ollama pull llama2 ollama pull mxbai-embed-large +``` + +## Build your app + +### Building table + +Connect to MatrixOne and create a table called `rag_tab` to store text information and corresponding vector information. + +```sql +create table rag_tab(content text,embedding vecf32(1024)); +``` + +### Text vectorization stored to MatrixOne + +Create the python file rag\_example.py, slice and vectorize the textual information using the mxbai-embed-large embedding model, and save it to MatrixOne's `rag_tab` table. + +```python +import ollama +import pymysql.cursors + +conn = pymysql.connect( + host='127.0.0.1', + port=6001, + user='root', + password = "111", + db='db1', + autocommit=True + ) +cursor = conn.cursor() + +#Generate embeddings +documents = [ +"MatrixOne is a hyper-converged cloud & edge native distributed database with a structure that separates storage, computation, and transactions to form a consolidated HSTAP data engine. This engine enables a single database system to accommodate diverse business loads such as OLTP, OLAP, and stream computing. It also supports deployment and utilization across public, private, and edge clouds, ensuring compatibility with diverse infrastructures.", +"MatrixOne touts significant features, including real-time HTAP, multi-tenancy, stream computation, extreme scalability, cost-effectiveness, enterprise-grade availability, and extensive MySQL compatibility. MatrixOne unifies tasks traditionally performed by multiple databases into one system by offering a comprehensive ultra-hybrid data solution. This consolidation simplifies development and operations, minimizes data fragmentation, and boosts development agility.", +"MatrixOne is optimally suited for scenarios requiring real-time data input, large data scales, frequent load fluctuations, and a mix of procedural and analytical business operations. It caters to use cases such as mobile internet apps, IoT data applications, real-time data warehouses, SaaS platforms, and more.", +"Matrix is a collection of complex or real numbers arranged in a rectangular array.", +"The lastest version of MatrixOne is 1.2.0, releases on 20th May, 2024." +"We are excited to announce MatrixOne 0.8.0 release on 2023/6/30." +] + +for i,d in enumerate(documents): + response = ollama.embeddings(model="mxbai-embed-large", prompt=d) + embedding = response["embedding"] + insert_sql = "insert into rag_tab(content,embedding) values (%s, %s)" + data_to_insert = (d, str(embedding)) + cursor.execute(insert_sql, data_to_insert) +``` + +### View quantity in `rag_tab` table + +```sql +mysql> select count(*) from rag_tab; ++----------+ +| count(*) | ++----------+ +| 6 | ++----------+ +1 row in set (0.00 sec) +``` + +As you can see, the data was successfully stored into the database. + +- Indexing (not required) + +In large-scale high-dimensional data retrieval, if a full search is used, the similarity calculation with each vector in the entire data set needs to be performed for each query, which results in significant performance overhead and latency. The use of vector index can effectively solve the above problems,by establishing efficient data structures and algorithms to optimize the search process,improve retrieval performance,reduce computing and storage costs,and enhance the user experience. Therefore, we build an IVF-FLAT vector index for the vector field + +```sql +SET GLOBAL experimental_ivf_index = 1; -- turn on vector index +create index idx_rag using ivfflat on rag_tab(embedding) lists=1 op_type "vector_l2_ops"; +``` + +### Vector retrieval + +Once the data is ready, you can search the database for the most similar content based on the questions we asked. This step relies heavily on the vector retrieval capabilities of MatrixOne, which supports multiple similarity searches, where we use `l2_distance` to retrieve and set the number of returned results to 3. + +```python +prompt = "What is the latest version of MatrixOne?" + +response = ollama.embeddings( + prompt=prompt, + model="mxbai-embed-large" +) +query_embedding= embedding = response["embedding"] +query_sql = "select content from rag_tab order by l2_distance(embedding,%s) asc limit 3" +data_to_query = str(query_embedding) +cursor.execute(query_sql, data_to_query) +data = cursor.fetchall() +``` + +### Enhanced generation + +We combine what we retrieved in the previous step with LLM to generate an answer. + +```python +#enhance generate +output = ollama.generate( + model="llama2", + prompt=f"Using this data: {data}. Respond to this prompt: {prompt}" +) + +print(output['response']) +``` + +Console output related answer: + +``` +Based on the provided data, the latest version of MatrixOne is 1.2.2, which was released on July 12th, 2024. +``` + +After enhancement, the model generates the correct answer. + +## Reference Documents + +- [Vector Type](../Develop/Vector/vector_type.md) +- [Vector retrieval](../Develop/Vector/vector_search.md) +- [CREATE INDEX...USING IVFFLAT](../Reference/SQL-Reference/Data-Definition-Language/create-index-ivfflat.md) +- [L2_DISTANCE()](../Reference/Functions-and-Operators/Vector/l2_distance.md) \ No newline at end of file diff --git a/docs/MatrixOne/Tutorial/search-picture-demo.md b/docs/MatrixOne/Tutorial/search-picture-demo.md new file mode 100644 index 000000000..401ee3de5 --- /dev/null +++ b/docs/MatrixOne/Tutorial/search-picture-demo.md @@ -0,0 +1,242 @@ +# Example of application basis for graph search + +Currently, graphic and text search applications cover a wide range of areas. In e-commerce, users can search for goods by uploading images or text descriptions; in social media platforms, content can be found quickly through images or text to enhance the user's experience; and in copyright detection, image copyright can be identified and protected. In addition, text search is widely used in search engines to help users find specific images through keywords, while graphic search is used in machine learning and artificial intelligence for image recognition and classification tasks. + +The following is a flow chart of a graphic search: + +
+ +
+ +As you can see, vectorized storage and retrieval of images is involved in building graph-to-text search applications, while MatrixOne's vector capabilities and multiple retrieval methods provide critical technical support for building graph-to-text search applications. + +In this chapter, we'll build a simple graphical (textual) search application based on MatrixOne's vector capabilities. + +## Prepare before you start + +### Relevant knowledge + +**Transformers**: Transformers is an open source natural language processing library that provides a wide range of pre-trained models through which researchers and developers can easily use and integrate CLIP models into their projects. + +**CLIP**: The CLIP model is a deep learning model published by OpenAI. At its core is the unified processing of text and images through contrastive learning, enabling tasks such as image classification to be accomplished through text-image similarity without the need for direct optimization tasks. It can be combined with a vector database to build tools to search graphs. High-dimensional vector representations of images are extracted through CLIP models to capture their semantic and perceptual features, and then encoded into an embedded space. At query time, the sample image gets its embedding through the same CLIP encoder, performing a vector similarity search to effectively find the first k closest database image vectors. + +### Software Installation + +Before you begin, confirm that you have downloaded and installed the following software: + +- Verify that you have completed the [standalone deployment of](../Get-Started/install-standalone-matrixone.md) MatrixOne. + +- Verify that you have finished installing [Python 3.8 (or plus)](https://www.python.org/downloads/). Verify that the installation was successful by checking the Python version with the following code: + +``` +python3 -V +``` + +- Verify that you have completed installing the MySQL client. + +- Download and install the `pymysql` tool. Download and install the `pymysql` tool using the following code: + +``` +pip install pymysql +``` + +- Download and install the `transformers` library. Download and install the `transformers` library using the following code: + +``` +pip install transformers +``` + +- Download and install the `Pillow` library. Download and install the `Pillow` library using the following code: + +``` +pip install pillow +``` + +## Build your app + +### Building table + +Connect to MatrixOne and create a table called `pic_tab` to store picture path information and corresponding vector information. + +```sql +create table pic_tab(pic_path varchar(200), embedding vecf64(512)); +``` + +### Load Model + +```python +from transformers import CLIPProcessor, CLIPModel + +# Load model from HuggingFace +model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32") +processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32") +``` + +### Traversing the Picture Path + +The definition method `find_img_files` traverses the local images folder, where I pre-stored images of fruit in five categories, apple, banana, blueberry, cherry, and apricot, several in each category, in `.jpg` format. + +```python +def find_img_files(directory): + img_files = [] # Used to store found .jpg file paths + for root, dirs, files in os.walk(directory): + for file in files: + if file.lower().endswith('.jpg'): + full_path = os.path.join(root, file) + img_files.append(full_path) # Build the full file path + return img_files +``` + +- Image vectorized and stored in MatrixOne + +Define the method `storage_img` to map the picture into a vector, normalize it (not required) and store it in MatrixOne. MatrixOne supports L2 normalization of vectors using the `NORMALIZE_L2()` function. In some cases, features of the data may be distributed at different scales, which may cause some features to have a disproportionate effect on distance calculations. By normalizing, this effect can be reduced and the contribution of different characteristics to the end result more balanced. And when using the L2 distance measure, L2 normalization avoids vectors of different lengths affecting distance calculations. + +```python +import pymysql +from PIL import Image + +conn = pymysql.connect( + host = '127.0.0.1', + port = 6001, + user = 'root', + password = "111", + db = 'db1', + autocommit = True + ) + +cursor = conn.cursor() + +# Map the image into vectors and store them in MatrixOne +def storage_img(): + for file_path in jpg_files: + image = Image.open(file_path) + if image.mode != 'RGBA': + image = image.convert('RGBA') + inputs = processor(images=image, return_tensors="pt", padding=True) + img_features = model.get_image_features(inputs["pixel_values"]) # Using models to acquire image features + img_features = img_features .detach().tolist() # Separate tensor, convert to list + embeddings = img_features [0] + insert_sql = "insert into pic_tab(pic_path,embedding) values (%s, normalize_l2(%s))" + data_to_insert = (file_path, str(embeddings)) + cursor.execute(insert_sql, data_to_insert) + image.close() +``` + +### View quantity in `pic_tab` table + +```sql +mysql> select count(*) from pic_tab; ++----------+ +| count(*) | ++----------+ +| 4801 | ++----------+ +1 row in set (0.00 sec) +``` + +As you can see, the data was successfully stored into the database. + +### Build Vector Index + +MatrixOne supports vector indexing in IVF-FLAT, where each search requires recalculating the similarity between the query image and each image in the database without an index. The index, on the other hand, reduces the amount of computation necessary by performing similarity calculations only on images marked as "relevant" in the index. + +```python +def create_idx(n): + cursor.execute('SET GLOBAL experimental_ivf_index = 1') + create_sql = 'create index idx_pic using ivfflat on pic_tab(embedding) lists=%s op_type "vector_l2_ops"' + cursor.execute(create_sql, n) +``` + +### Search in graphic (text) + +Next, we define the methods `img_search_img` and `text_search_img` to implement graph and text search. MatrixOne has vector retrieval capabilities and supports multiple similarity searches, where we use `l2_distance` to retrieve. + +```python +# search for maps +def img_search_img(img_path, k): + image = Image.open(img_path) + inputs = processor(images=image, return_tensors="pt") + img_features = model.get_image_features(**inputs) + img_features = img_features.detach().tolist() + img_features = img_features[0] + query_sql = "select pic_path from pic_tab order by l2_distance(embedding,normalize_l2(%s)) asc limit %s" + data_to_query = (str(img_features), k) + cursor.execute(query_sql, data_to_query) + global data + data = cursor.fetchall() + +# search for pictures by writing +def text_search_img(text,k): + inputs = processor(text=text, return_tensors="pt", padding=True) + text_features = model.get_text_features(inputs["input_ids"], inputs["attention_mask"]) + embeddings = text_features.detach().tolist() + embeddings = embeddings[0] + query_sql = "select pic_path from pic_tab order by l2_distance(embedding,normalize_l2(%s)) asc limit %s" + data_to_query = (str(embeddings),k) + cursor.execute(query_sql, data_to_query) + global data + data = cursor.fetchall() +``` + +### Search Results Showcase + +When retrieving a relevant image from an image or text, we need to print the results, where we use Matplotlib to present the search results. + +```python +import matplotlib.pyplot as plt +import matplotlib.image as mpimg + +def show_img(img_path,rows,cols): + if img_path: + result_path = [img_path] + [path for path_tuple in data for path in path_tuple] + else: + result_path = [path for path_tuple in data for path in path_tuple] + # Create a new graph and axes + fig, axes = plt.subplots(nrows=rows, ncols=cols, figsize=(10, 10)) + # Loop over image paths and axes + for i, (result_path, ax) in enumerate(zip(result_path, axes.ravel())): + image = mpimg.imread(result_path) # Read image + ax.imshow(image) # Show picture + ax.axis('off') # Remove Axis + ax.set_title(f'image{i + 1}') # Setting the Submap Title + plt.tight_layout() # Adjusting subgraph spacing + plt.show() # Display the entire graph +``` + +### View Results + +Run the program by entering the following code in the main program: + +```python +if __name__ == "__main__": + directory_path = '/Users/admin/Downloads/fruit01' # Replace with the actual directory path + jpg_files = find_img_files(directory_path) + storage_img() + create_idx(4) + img_path = '/Users/admin/Downloads/fruit01/blueberry/f_01_04_0450.jpg' + img_search_img(img_path, 3) # search for maps + show_img(img_path,1,4) + text = ["Banana"] + text_search_img(text,3) # search for pictures by writing + show_img(None,1,3) +``` + +Using the results of the chart search, the first chart on the left is a comparison chart. As you can see, the searched picture is very similar to the comparison chart: + +
+ +
+ +As you can see from the text search results, the searched image matches the input text: + +
+ +
+ +## Reference Documents + +- [Vector Type](../Develop/Vector/vector_type.md) +- [Vector retrieval](../Develop/Vector/vector_search.md) +- [CREATE INDEX...USING IVFFLAT](../Reference/SQL-Reference/Data-Definition-Language/create-index-ivfflat.md) +- [L2_DISTANCE()](../Reference/Functions-and-Operators/Vector/l2_distance.md) +- [NORMALIZE_L2()](../Reference/Functions-and-Operators/Vector/normalize_l2.md) \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 0e900a347..32bc58a19 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -56,6 +56,9 @@ nav: - Scalability: MatrixOne/Overview/feature/scalability.md - Cost-Effective: MatrixOne/Overview/feature/cost-effective.md - High Availability: MatrixOne/Overview/feature/high-availability.md + - Timing: MatrixOne/Overview/feature/time-series.md + - Streams: MatrixOne/Overview/feature/stream.md + - User-defined functions: MatrixOne/Overview/feature/udf.md - MySQL Compatibility: MatrixOne/Overview/feature/mysql-compatibility.md - MatrixOne Architecture Design: - Architecture Design Overview: MatrixOne/Overview/architecture/matrixone-architecture-design.md @@ -64,8 +67,13 @@ nav: - Logtail Protocol Architecture: MatrixOne/Overview/architecture/architecture-logtail.md - Transaction and Lock Mechanisms Architecture: MatrixOne/Overview/architecture/architecture-transaction-lock.md - Detailed Proxy Architecture: MatrixOne/Overview/architecture/architecture-proxy.md + - WAL Technology Explained: MatrixOne/Overview/architecture/architecture-wal.md - Detailed Caching and Hot-Cold Data Separation Architecture: MatrixOne/Overview/architecture/architecture-cold-hot-data-separation.md - Detailed Stream Engine Architecture: MatrixOne/Overview/architecture/streaming.md + - MatrixOne-Operator design and implementation: MatrixOne/Overview/architecture/architecture-matrixone-operator.md + - MatrixOne vs. other databases: + - MatrixOne Positioning: MatrixOne/Overview/matrixone-vs-other_databases/matrixone-positioning .md + - MatrixOne vs. common OLTP databases: MatrixOne/Overview/matrixone-vs-other_databases/matrixone-vs-oltp.md - What's New: MatrixOne/Overview/whats-new.md - Getting Started: - Deploy standalone MatrixOne: @@ -86,12 +94,14 @@ nav: - Connect MatrixOne with JDBC: MatrixOne/Develop/connect-mo/java-connect-to-matrixone/connect-mo-with-jdbc.md - Connect MatrixOne with Java ORMs: MatrixOne/Develop/connect-mo/java-connect-to-matrixone/connect-mo-with-orm.md - Python connect to MatrixOne: MatrixOne/Develop/connect-mo/python-connect-to-matrixone.md + - C# connect to MatrixOne: MatrixOne/Develop/connect-mo/connect-to-matrixone-with-c#.md - Connecting to MatrixOne with Golang: MatrixOne/Develop/connect-mo/connect-to-matrixone-with-go.md - MatrixOne SSL connection: MatrixOne/Develop/connect-mo/configure-mo-ssl-connection.md - Schema Design: - Overview: MatrixOne/Develop/schema-design/overview.md - Create Database: MatrixOne/Develop/schema-design/create-database.md - Create Table: MatrixOne/Develop/schema-design/create-table.md + - Replication table: MatrixOne/Develop/schema-design/create-table-as-select.md - Create View: MatrixOne/Develop/schema-design/create-view.md - Create Temporary Table: MatrixOne/Develop/schema-design/create-temporary-table.md - Create Secondary Index: MatrixOne/Develop/schema-design/create-secondary-index.md @@ -124,7 +134,12 @@ nav: - Subquery: MatrixOne/Develop/read-data/subquery.md - Views: MatrixOne/Develop/read-data/views.md - Common Table Expression: MatrixOne/Develop/read-data/cte.md - - Window Function.: MatrixOne/Develop/read-data/window-function.md + - Window Function: + - Standard Window: MatrixOne/Develop/read-data/window-function/window-function.md + - Time Window: MatrixOne/Develop/read-data/window-function/time-window.md + - Data de-duplication: + - COUNT(DISTINCT): MatrixOne/Develop/distinct-data/count-distinct.md + - BITMAP: MatrixOne/Develop/distinct-data/bitmap.md - Account Design: - Multi-Account Overview: MatrixOne/Develop/Publish-Subscribe/multi-account-overview.md - Publish-Subscribe: MatrixOne/Develop/Publish-Subscribe/pub-sub-overview.md @@ -141,14 +156,25 @@ nav: - How to use transaction: - User Guide: MatrixOne/Develop/Transactions/matrixone-transaction-overview/how-to-use.md - Scenario: MatrixOne/Develop/Transactions/matrixone-transaction-overview/scenario.md + - User-defined function: + - UDF python: MatrixOne/Develop/udf/udf-python.md + - UDF python advanced: MatrixOne/Develop/udf/udf-python-advanced.md + - Vector: + - Vector Type: MatrixOne/Develop/Vector/vector_type.md + - Vector Search: MatrixOne/Develop/Vector/vector_search.md + - Cluster Centers: MatrixOne/Develop/Vector/cluster_centers.md - Application Developing Tutorials: - Java CRUD demo: MatrixOne/Tutorial/develop-java-crud-demo.md - SpringBoot and JPA CRUD demo: MatrixOne/Tutorial/springboot-hibernate-crud-demo.md - SpringBoot and MyBatis CRUD demo: MatrixOne/Tutorial/springboot-mybatis-crud-demo.md - Python CRUD demo: MatrixOne/Tutorial/develop-python-crud-demo.md - SQLAlchemy CRUD demo: MatrixOne/Tutorial/sqlalchemy-python-crud-demo.md + - Django CRUD demo: MatrixOne/Tutorial/django-python-crud-demo.md - Golang CRUD demo: MatrixOne/Tutorial/develop-golang-crud-demo.md - Gorm CRUD demo: MatrixOne/Tutorial/gorm-golang-crud-demo.md + - C# CRUD demo: MatrixOne/Tutorial/c-net-crud-demo.md + - Rag Application demo: MatrixOne/Tutorial/rag-demo.md + - Picture(Text)-to-Picture Search Application demo: MatrixOne/Tutorial/search-picture-demo.md - Ecological Tools: - BI Tools: - Visualizing MatrixOne Data with FineBI: MatrixOne/Develop/Ecological-Tools/BI-Connection/FineBI-connection.md @@ -168,8 +194,10 @@ nav: - Experience Environment Deployment Plan: MatrixOne/Deploy/deployment-topology/experience-deployment-topology.md - Minimum Production Environment Deployment Plan: MatrixOne/Deploy/deployment-topology/minimal-deployment-topology.md - Recommended Production Environment Deployment Plan: MatrixOne/Deploy/deployment-topology/recommended-prd-deployment-topology.md - - MatrixOne distributed cluster deployment: MatrixOne/Deploy/deploy-MatrixOne-cluster.md - - Maintenance: + - Cluster Deployment Guide: + - Kubernetes and object storage environment not deployed: MatrixOne/Deploy/deploy-Matrixone-cluster/deploy-MatrixOne-cluster-without-k8.md + - Deployed Kubernetes and object storage environment: MatrixOne/Deploy/deploy-Matrixone-cluster/deploy-MatrixOne-cluster-with-k8.md + - Cluster Operations Management: - Starting and stopping: MatrixOne/Deploy/MatrixOne-start-stop.md - Updating: MatrixOne/Deploy/update-MatrixOne-cluster.md - Health check and resource monitoring: MatrixOne/Deploy/health-check-resource-monitoring.md @@ -182,6 +210,10 @@ nav: - MatrixOne Backup and Recovery Overview: MatrixOne/Maintain/backup-restore/backup-restore-overview.md - Backup and Recovery Concepts: MatrixOne/Maintain/backup-restore/key-concepts.md - Backup and Restore by using mo-dump: MatrixOne/Maintain/backup-restore/modump-backup-restore.md + - mo_br Backup and Recovery: + - mo_br guidelines for use: MatrixOne/Maintain/backup-restore/mobr-backup-restore/mobr.md + - mo_br regular physical backup recovery: MatrixOne/Maintain/backup-restore/mobr-backup-restore/mobr-physical-backup-restore.md + - mo_br snapshot backup recovery: MatrixOne/Maintain/backup-restore/mobr-backup-restore/mobr-snapshot-backup-restore.md - Mount Data: - Mount directory to Docker container: MatrixOne/Maintain/mount-data-by-docker.md - Migrating: @@ -232,6 +264,8 @@ nav: - Save query result support: MatrixOne/Reference/Variable/system-variables/save_query_result.md - Timezone support: MatrixOne/Reference/Variable/system-variables/timezone.md - Lowercase table names support: MatrixOne/Reference/Variable/system-variables/lower_case_tables_name.md + - Foreign key checking support: MatrixOne/Reference/Variable/system-variables/foreign_key_checks.md + - User-specified case consistency support for query result set column names: MatrixOne/Reference/Variable/system-variables/keep_user_target_list_in_result.md - Custom variable: MatrixOne/Reference/Variable/custom-variable.md - SQL Language Structure: - Keywords: MatrixOne/Reference/Language-Structure/keywords.md @@ -246,21 +280,33 @@ nav: - BLOB and TEXT Type: MatrixOne/Reference/Data-Types/blob-text-type.md - ENUM Type: MatrixOne/Reference/Data-Types/enum-type.md - UUID Type: MatrixOne/Reference/Data-Types/uuid-type.md + - VECTOR Type: MatrixOne/Reference/Data-Types/vector-type.md - Fixed-Point Types (Exact Value) - DECIMAL: MatrixOne/Reference/Data-Types/fixed-point-types.md - SQL Statements: - Type of SQL Statements: MatrixOne/Reference/SQL-Reference/SQL-Type.md - Data Definition Language: - CREATE DATABASE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-database.md - CREATE INDEX: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-index.md + - CREATE INDEX...USING IVFFLAT: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-index-ivfflat.md - CREATE TABLE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-table.md + - CREATE TABLE AS SELECT: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-table-as-select.md + - CREATE TABLE ... LIKE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-table-like.md - CREATE EXTERNAL TABLE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-external-table.md + - CREATE CLUSTER TABLE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-cluster-table.md - CREATE PUBLICATION: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-publication.md - CREATE SEQUENCE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-sequence.md - CREATE STAGE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-stage.md - CREATE...FROM...PUBLICATION...: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-subscription.md - CREATE VIEW: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-view.md + - CREATE FUNCTION...LANGUAGE SQL AS: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-function-sql.md + - CREATE FUNCTION...LANGUAGE PYTHON AS: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-function-python.md + - CREATE SOURCE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-source.md + - CREATE DYNAMIC TABLE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-dynamic-table.md + - CREATE SNAPSHOT: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/create-snapshot.md - ALTER TABLE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-table.md + - ALTER TABLE ... ALTER REINDEX: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-reindex.md - ALTER PUBLICATION: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-publication.md + - ALTER SEQUENCE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-sequence.md - ALTER STAGE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-stage.md - ALTER VIEW: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/alter-view.md - DROP DATABASE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/drop-database.md @@ -269,19 +315,26 @@ nav: - DROP PUBLICATION: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/drop-publication.md - DROP SEQUENCE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/drop-sequence.md - DROP STAGE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/drop-stage.md + - DROP SNAPSHOT: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/drop-snapshot.md - DROP VIEW: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/drop-view.md + - DROP FUNCTION: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/drop-function.md - TRUNCATE TABLE: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/truncate-table.md + - RESTORE ACCOUNT: MatrixOne/Reference/SQL-Reference/Data-Definition-Language/restore-account.md - Data Manipulation Language: - - INSERT: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/insert.md - - INSERT INTO SELECT: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/insert-into-select.md - - INSERT ON DUPLICATE KEY UPDATE: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/insert-on-duplicate.md - - DELETE: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/delete.md - - UPDATE: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/update.md - - LOAD DATA INFILE: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/load-data-infile.md - - REPLACE: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/replace.md - - Information Functions: - - LAST_QUERY_ID(): MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/information-functions/last-query-id.md - - LAST_INSERT_ID(): MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/information-functions/last-insert-id.md + - INSERT: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/insert.md + - INSERT INTO SELECT: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/insert-into-select.md + - DELETE: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/delete.md + - UPDATE: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/update.md + - LOAD DATA INFILE: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/load-data-infile.md + - LOAD DATA INLINE: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/load-data-inline.md + - UPSERT: + - UPSERT Overview: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/upsert.md + - INSERT ON DUPLICATE KEY UPDATE: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/insert-on-duplicate.md + - INSERT IGNORE: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/insert-ignore.md + - REPLACE: MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/upsert/replace.md + - Information Functions: + - LAST_QUERY_ID(): MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/information-functions/last-query-id.md + - LAST_INSERT_ID(): MatrixOne/Reference/SQL-Reference/Data-Manipulation-Language/information-functions/last-insert-id.md - Data Query Language: - SELECT: MatrixOne/Reference/SQL-Reference/Data-Query-Language/select.md - SUBQUERY: @@ -294,6 +347,7 @@ nav: - SUBQUERY with IN: MatrixOne/Reference/SQL-Reference/Data-Query-Language/subqueries/subquery-with-in.md - JOIN: - JOIN Overview: MatrixOne/Reference/SQL-Reference/Data-Query-Language/join/join.md + - CROSS JOIN: MatrixOne/Reference/SQL-Reference/Data-Query-Language/join/inner-join.md - INNER JOIN: MatrixOne/Reference/SQL-Reference/Data-Query-Language/join/inner-join.md - LEFT JOIN: MatrixOne/Reference/SQL-Reference/Data-Query-Language/join/left-join.md - RIGHT JOIN: MatrixOne/Reference/SQL-Reference/Data-Query-Language/join/right-join.md @@ -350,6 +404,7 @@ nav: - EXPLAIN: MatrixOne/Reference/SQL-Reference/Other/Explain/explain.md - EXPLAIN Output Format: MatrixOne/Reference/SQL-Reference/Other/Explain/explain-workflow.md - Explain Analyze: MatrixOne/Reference/SQL-Reference/Other/Explain/explain-analyze.md + - Explain Prepared: MatrixOne/Reference/SQL-Reference/Other/Explain/explain-prepared.md - Functions and Operators: - Operators: - INTERVAL: MatrixOne/Reference/Operators/interval.md @@ -381,6 +436,8 @@ nav: - BINARY: MatrixOne/Reference/Operators/operators/cast-functions-and-operators/binary.md - CAST: MatrixOne/Reference/Operators/operators/cast-functions-and-operators/cast.md - CONVERT: MatrixOne/Reference/Operators/operators/cast-functions-and-operators/convert.md + - SERIAL: MatrixOne/Reference/Operators/operators/cast-functions-and-operators/serial.md + - SERIAL_FULL: MatrixOne/Reference/Operators/operators/cast-functions-and-operators/serial_full.md - Comparison Functions and Operators: - Comparison Functions and Operators Overview: MatrixOne/Reference/Operators/operators/comparison-functions-and-operators/comparison-functions-and-operators-overview.md - '>': MatrixOne/Reference/Operators/operators/comparison-functions-and-operators/greater-than.md @@ -415,9 +472,11 @@ nav: - OR: MatrixOne/Reference/Operators/operators/logical-operators/or.md - XOR: MatrixOne/Reference/Operators/operators/logical-operators/xor.md - Functions: + - Summary Table of Functions: MatrixOne/Reference/Functions-and-Operators/matrixone-function-list.md - Aggregate Functions: - ANY_VALUE: MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/any-value.md - AVG: MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/avg.md + - BITMAP: MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/bitmap.md - BIT_AND: MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/bit_and.md - BIT_OR: MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/bit_or.md - BIT_XOR: MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/bit_xor.md @@ -429,7 +488,9 @@ nav: - STDDEV_POP: MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/stddev_pop.md - SUM: MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/sum.md - VARIANCE: MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/variance.md + - VAR_POP: MatrixOne/Reference/Functions-and-Operators/Aggregate-Functions/var_pop.md - Datetime: + - CONVERT_TZ(): MatrixOne/Reference/Functions-and-Operators/Datetime/convert-tz.md - CURDATE(): MatrixOne/Reference/Functions-and-Operators/Datetime/curdate.md - CURRENT_TIMESTAMP(): MatrixOne/Reference/Functions-and-Operators/Datetime/current-timestamp.md - DATE(): MatrixOne/Reference/Functions-and-Operators/Datetime/date.md @@ -444,14 +505,17 @@ nav: - FROM_UNIXTIME: MatrixOne/Reference/Functions-and-Operators/Datetime/from-unixtime.md - MINUTE(): MatrixOne/Reference/Functions-and-Operators/Datetime/minute.md - MONTH(): MatrixOne/Reference/Functions-and-Operators/Datetime/month.md + - NOW(): MatrixOne/Reference/Functions-and-Operators/Datetime/now.md - SECOND(): MatrixOne/Reference/Functions-and-Operators/Datetime/second.md + - STR_TO_DATE(): MatrixOne/Reference/Functions-and-Operators/Datetime/str-to-date.md + - SYSDATE(): MatrixOne/Reference/Functions-and-Operators/Datetime/sysdate.md - TIME(): MatrixOne/Reference/Functions-and-Operators/Datetime/time.md - TIMEDIFF(): MatrixOne/Reference/Functions-and-Operators/Datetime/timediff.md - TIMESTAMP(): MatrixOne/Reference/Functions-and-Operators/Datetime/timestamp.md - TIMESTAMPDIFF(): MatrixOne/Reference/Functions-and-Operators/Datetime/timestampdiff.md - TO_DATE(): MatrixOne/Reference/Functions-and-Operators/Datetime/to-date.md - - TO_DAYS(): MatrixOne/Reference/Functions-and-Operators/Datetime/to-seconds.md - - TO_SECONDS(): MatrixOne/Reference/Functions-and-Operators/Datetime/to-days.md + - TO_DAYS(): MatrixOne/Reference/Functions-and-Operators/Datetime/to-days.md + - TO_SECONDS(): MatrixOne/Reference/Functions-and-Operators/Datetime/to-seconds.md - UNIX_TIMESTAMP: MatrixOne/Reference/Functions-and-Operators/Datetime/unix-timestamp.md - UTC_TIMESTAMP(): MatrixOne/Reference/Functions-and-Operators/Datetime/utc-timestamp.md - WEEK(): MatrixOne/Reference/Functions-and-Operators/Datetime/week.md @@ -472,8 +536,8 @@ nav: - LOG10(): MatrixOne/Reference/Functions-and-Operators/Mathematical/log10.md - PI(): MatrixOne/Reference/Functions-and-Operators/Mathematical/pi.md - POWER(): MatrixOne/Reference/Functions-and-Operators/Mathematical/power.md - - RAND(): MatrixOne/Reference/Functions-and-Operators/Mathematical/rand.md - ROUND(): MatrixOne/Reference/Functions-and-Operators/Mathematical/round.md + - RAND(): MatrixOne/Reference/Functions-and-Operators/Mathematical/rand.md - SIN(): MatrixOne/Reference/Functions-and-Operators/Mathematical/sin.md - SINH(): MatrixOne/Reference/Functions-and-Operators/Mathematical/sinh.md - TAN(): MatrixOne/Reference/Functions-and-Operators/Mathematical/tan.md @@ -488,24 +552,34 @@ nav: - FIELD(): MatrixOne/Reference/Functions-and-Operators/String/field.md - FIND_IN_SET(): MatrixOne/Reference/Functions-and-Operators/String/find-in-set.md - FORMAT(): MatrixOne/Reference/Functions-and-Operators/String/format.md + - FROM_BASE64(): MatrixOne/Reference/Functions-and-Operators/String/from_base64.md - HEX(): MatrixOne/Reference/Functions-and-Operators/String/hex.md - INSTR(): MatrixOne/Reference/Functions-and-Operators/String/instr.md + - LCASE(): MatrixOne/Reference/Functions-and-Operators/String/lcase.md - LEFT(): MatrixOne/Reference/Functions-and-Operators/String/left.md - LENGTH(): MatrixOne/Reference/Functions-and-Operators/String/length.md + - LOCATE(): MatrixOne/Reference/Functions-and-Operators/String/locate.md + - LOWER(): MatrixOne/Reference/Functions-and-Operators/String/lower.md - LPAD(): MatrixOne/Reference/Functions-and-Operators/String/lpad.md - LTRIM(): MatrixOne/Reference/Functions-and-Operators/String/ltrim.md + - MD5(): MatrixOne/Reference/Functions-and-Operators/String/md5.md - OCT(): MatrixOne/Reference/Functions-and-Operators/String/oct.md - REPEAT(): MatrixOne/Reference/Functions-and-Operators/String/repeat.md - REVERSE(): MatrixOne/Reference/Functions-and-Operators/String/reverse.md - RPAD(): MatrixOne/Reference/Functions-and-Operators/String/rpad.md - RTRIM(): MatrixOne/Reference/Functions-and-Operators/String/rtrim.md + - SHA1()/SHA(): MatrixOne/Reference/Functions-and-Operators/String/sha1.md + - SHA2(): MatrixOne/Reference/Functions-and-Operators/String/sha2.md - SPACE(): MatrixOne/Reference/Functions-and-Operators/String/space.md - SPLIT_PART(): MatrixOne/Reference/Functions-and-Operators/String/split_part.md - STARTSWITH(): MatrixOne/Reference/Functions-and-Operators/String/startswith.md - SUBSTRING(): MatrixOne/Reference/Functions-and-Operators/String/substring.md - SUBSTRING_INDEX(): MatrixOne/Reference/Functions-and-Operators/String/substring-index.md + - TO_BASE64(): MatrixOne/Reference/Functions-and-Operators/String/to_base64.md - TRIM(): MatrixOne/Reference/Functions-and-Operators/String/trim.md + - UCASE(): MatrixOne/Reference/Functions-and-Operators/String/ucase.md - UNHEX(): MatrixOne/Reference/Functions-and-Operators/String/unhex.md + - UPPER(): MatrixOne/Reference/Functions-and-Operators/String/upper.md - Regular Expressions: - Regular Expressions Overview: MatrixOne/Reference/Functions-and-Operators/String/Regular-Expressions/Regular-Expression-Functions-Overview.md - NOT REGEXP: MatrixOne/Reference/Functions-and-Operators/String/Regular-Expressions/not-regexp.md @@ -513,6 +587,19 @@ nav: - REGEXP_LIKE(): MatrixOne/Reference/Functions-and-Operators/String/Regular-Expressions/regexp-like.md - REGEXP_REPLACE(): MatrixOne/Reference/Functions-and-Operators/String/Regular-Expressions/regexp-replace.md - REGEXP_SUBSTR(): MatrixOne/Reference/Functions-and-Operators/String/Regular-Expressions/regexp-substr.md + - Vector: + - Basic Operator: MatrixOne/Reference/Functions-and-Operators/Vector/arithmetic.md + - Mathematical Calculations: MatrixOne/Reference/Functions-and-Operators/Vector/misc.md + - CLUSTER_CENTERS(): MatrixOne/Reference/Functions-and-Operators/Vector/cluster_centers.md + - COSINE_SIMILARITY(): MatrixOne/Reference/Functions-and-Operators/Vector/cosine_similarity.md + - COSINE_DISTANCE(): MatrixOne/Reference/Functions-and-Operators/Vector/cosine_distance.md + - INNER_PRODUCT(): MatrixOne/Reference/Functions-and-Operators/Vector/inner_product.md + - L1_NORM(): MatrixOne/Reference/Functions-and-Operators/Vector/l1_norm.md + - L2_NORM(): MatrixOne/Reference/Functions-and-Operators/Vector/l2_norm.md + - L2_DISTANCE(): MatrixOne/Reference/Functions-and-Operators/Vector/l2_distance.md + - NORMALIZE_L2(): MatrixOne/Reference/Functions-and-Operators/Vector/normalize_l2.md + - SUBVECTOR(): MatrixOne/Reference/Functions-and-Operators/Vector/subvector.md + - VECTOR_DIMS(): MatrixOne/Reference/Functions-and-Operators/Vector/vector_dims.md - Table: - UNNEST(): MatrixOne/Reference/Functions-and-Operators/Table/unnest.md - Vector: @@ -532,6 +619,8 @@ nav: - ROW_NUMBER(): MatrixOne/Reference/Functions-and-Operators/Window-Functions/row_number.md - JSON Functions: MatrixOne/Reference/Functions-and-Operators/Json/json-functions.md - Other Functions: + - SAMPLE: MatrixOne/Reference/Functions-and-Operators/Other/sample.md + - SERIAL_EXTRACT: MatrixOne/Reference/Functions-and-Operators/Other/serial_extract.md - SLEEP: MatrixOne/Reference/Functions-and-Operators/Other/sleep.md - UUID(): MatrixOne/Reference/Functions-and-Operators/Other/uuid.md - System OPS Functions: @@ -551,7 +640,12 @@ nav: - Partitioning supported features list: MatrixOne/Reference/Limitations/mo-partition-support.md - MatrixOne Directory Structure: MatrixOne/Maintain/mo-directory-structure.md - MatrixOne Tools: - - mo_ctl Tool: MatrixOne/Maintain/mo_ctl.md + - mo_ctl stand-alone tool: MatrixOne/Reference/mo-tools/mo_ctl_standalone.md + - mo_ctl distributed Tools: MatrixOne/Reference/mo-tools/mo_ctl.md + - mo_datax_writer tool: MatrixOne/Reference/mo-tools/mo_datax_writer.md + - mo_ssb_open tool: MatrixOne/Reference/mo-tools/mo_ssb_open.md + - mo_tpch_open tool: MatrixOne/Reference/mo-tools/mo_tpch_open.md + - mo_ts_perf_test tool: MatrixOne/Reference/mo-tools/mo_ts_perf_test.md - Troubleshooting: - Slow Queries: MatrixOne/Troubleshooting/slow-queries.md - Common statistic data query: MatrixOne/Troubleshooting/common-statistics-query.md