Skip to content

Commit

Permalink
fix tugraph datax doc (TuGraph-family#629)
Browse files Browse the repository at this point in the history
* fix tugraph datax

* fix tugraph datax

* fix tugraph datax

* fix tugraph datax doc
  • Loading branch information
lipanpan03 authored Aug 13, 2024
1 parent 547ab54 commit 48a1e20
Show file tree
Hide file tree
Showing 4 changed files with 183 additions and 109 deletions.
2 changes: 1 addition & 1 deletion deps/tugraph-db-browser
78 changes: 76 additions & 2 deletions docs/en-US/source/6.utility-tools/7.tugraph-datax.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ mvn -U clean package assembly:assembly -Dmaven.test.skip=true

The compiled DataX file is in the target directory

## 3.Text data imported into TuGraph with DataX
## 3.Import TuGraph

### 3.1.Text data imported into TuGraph with DataX

Using the data from the lgraph_import section of the TuGraph manual as an example, we have three csv data files, as follows:
`actors.csv`
Expand Down Expand Up @@ -278,7 +280,7 @@ python3 datax/bin/datax.py job_movies.json
python3 datax/bin/datax.py job_roles.json
```

## MySQL's data imported into TuGraph with DataX
### 3.2.MySQL's data imported into TuGraph with DataX

We create the following table of movies under 'test' database

Expand Down Expand Up @@ -438,3 +440,75 @@ Create a DataX job configuration file
```shell
python3 datax/bin/datax.py job_mysql_to_tugraph.json
```

## 4.Export TuGraph

### 4.1. Configuration example

TuGraph supports exporting data using DataX. Use the following configuration to export data to text data

```json
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "tugraphreader",
"parameter": {
"username": "admin",
"password": "73@TuGraph",
"graphName": "Movie_8C5C",
"queryCypher": "match (n:person) return n.id,n.name,n.born;",
"url": "bolt://100.83.30.35:27687"
}
},
"writer": {
"name": "txtfilewriter",
"parameter": {
"path": "./result",
"fileName": "luohw",
"writeMode": "truncate"
}
}
}
]
}
}
```

Using this configuration file, you can export all the id, name and born attributes of the person node in the TuGraph Movie_8C5C subgraph,
export them to the result directory under the current directory, and the file name is luohw+random suffix.

### 4.2. Parameter Description

When using DataX to export TuGraph data, you need to set the reader to tugraphreader and configure the following 5 parameters:

* **url**
* Description: TuGraph's bolt server address <br />
* Required: Yes <br />
* Default value: None <br />

* **username**
* Description: TuGraph's username <br />
* Required: Yes <br />
* Default value: None <br />

* **password**
* Description: TuGraph's password <br />
* Required: Yes <br />
* Default value: None <br />

* **graphName**
* Description: The selected TuGraph subgraph to be synchronized <br />
* Required: Yes <br />
* Default value: None <br />

* **queryCypher**
* Description: Read data in TuGraph through cypher statements <br />
* Required: No <br />
* Default value: None <br />
209 changes: 103 additions & 106 deletions docs/zh-CN/source/6.utility-tools/7.tugraph-datax.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@ mvn -U clean package assembly:assembly -Dmaven.test.skip=true

编译出来的 DataX 文件在 target 目录下

## 3.文本数据通过DataX导入TuGraph
## 3. 导入TuGraph

### 3.1.文本数据通过DataX导入TuGraph

我们以 TuGraph 手册中导入工具 lgraph_import 章节举的数据为例子,有三个 csv 数据文件,如下:
`actors.csv`
Expand Down Expand Up @@ -88,29 +90,14 @@ nm2514879,Ruolan Li,tt4701660
"writer": {
"name": "tugraphwriter",
"parameter": {
"host": "127.0.0.1",
"port": 7071,
"url": "bolt://127.0.0.1:27687",
"username": "admin",
"password": "73@TuGraph",
"graphName": "default",
"schema": [
{
"label": "actor",
"type": "VERTEX",
"properties": [
{ "name": "aid", "type": "STRING" },
{ "name": "name", "type": "STRING" }
],
"primary": "aid"
}
],
"files": [
{
"label": "actor",
"format": "JSON",
"columns": ["aid", "name"]
}
]
"labelType": "VERTEX",
"labelName": "actor",
"batchNum": 1000,
"properties": ["aid", "name"]
}
}
}
Expand Down Expand Up @@ -160,31 +147,14 @@ nm2514879,Ruolan Li,tt4701660
"writer": {
"name": "tugraphwriter",
"parameter": {
"host": "127.0.0.1",
"port": 7071,
"url": "bolt://127.0.0.1:27687",
"username": "admin",
"password": "73@TuGraph",
"graphName": "default",
"schema": [
{
"label": "movie",
"type": "VERTEX",
"properties": [
{ "name": "mid", "type": "STRING" },
{ "name": "name", "type": "STRING" },
{ "name": "year", "type": "STRING" },
{ "name": "rate", "type": "FLOAT", "optional": true }
],
"primary": "mid"
}
],
"files": [
{
"label": "movie",
"format": "JSON",
"columns": ["mid", "name", "year", "rate"]
}
]
"labelType": "VERTEX",
"labelName": "movie",
"batchNum": 1000,
"properties": ["mid", "name", "year", "rate"]
}
}
}
Expand Down Expand Up @@ -230,27 +200,16 @@ nm2514879,Ruolan Li,tt4701660
"writer": {
"name": "tugraphwriter",
"parameter": {
"host": "127.0.0.1",
"port": 7071,
"url": "bolt://127.0.0.1:27687",
"username": "admin",
"password": "73@TuGraph",
"graphName": "default",
"schema": [
{
"label": "play_in",
"type": "EDGE",
"properties": [{ "name": "role", "type": "STRING" }]
}
],
"files": [
{
"label": "play_in",
"format": "JSON",
"SRC_ID": "actor",
"DST_ID": "movie",
"columns": ["SRC_ID", "role", "DST_ID"]
}
]
"labelType": "EDGE",
"labelName": "play_in",
"batchNum": 1000,
"properties": ["SRC_ID", "role", "DST_ID"],
"startLabel": {"type": "actor", "key": "SRC_ID"},
"endLabel": {"type": "movie", "key": "DST_ID"}
}
}
}
Expand All @@ -273,7 +232,7 @@ python3 datax/bin/datax.py job_movies.json
python3 datax/bin/datax.py job_roles.json
```

## 4.MySQL数据通过DataX导入TuGraph
### 3.2.MySQL数据通过DataX导入TuGraph

我们在 `test` database 下建立如下电影 `movies`

Expand Down Expand Up @@ -332,31 +291,14 @@ values
"writer": {
"name": "tugraphwriter",
"parameter": {
"host": "127.0.0.1",
"port": 7071,
"url": "bolt://127.0.0.1:27687",
"username": "admin",
"password": "73@TuGraph",
"graphName": "default",
"schema": [
{
"label": "movie",
"type": "VERTEX",
"properties": [
{ "name": "mid", "type": "STRING" },
{ "name": "name", "type": "STRING" },
{ "name": "year", "type": "STRING" },
{ "name": "rate", "type": "FLOAT", "optional": true }
],
"primary": "mid"
}
],
"files": [
{
"label": "movie",
"format": "JSON",
"columns": ["mid", "name", "year", "rate"]
}
]
"labelType": "VERTEX",
"labelName": "movie",
"batchNum": 1000,
"properties": ["mid", "name", "year", "rate"]
}
}
}
Expand Down Expand Up @@ -395,31 +337,14 @@ values
"writer": {
"name": "tugraphwriter",
"parameter": {
"host": "127.0.0.1",
"port": 7071,
"url": "bolt://127.0.0.1:27687",
"username": "admin",
"password": "73@TuGraph",
"graphName": "default",
"schema": [
{
"label": "movie",
"type": "VERTEX",
"properties": [
{ "name": "mid", "type": "STRING" },
{ "name": "name", "type": "STRING" },
{ "name": "year", "type": "STRING" },
{ "name": "rate", "type": "FLOAT", "optional": true }
],
"primary": "mid"
}
],
"files": [
{
"label": "movie",
"format": "JSON",
"columns": ["mid", "name", "year", "rate"]
}
]
"labelType": "VERTEX",
"labelName": "movie",
"batchNum": 1000,
"properties": ["mid", "name", "year", "rate"]
}
}
}
Expand All @@ -433,3 +358,75 @@ values
```shell
python3 datax/bin/datax.py job_mysql_to_tugraph.json
```

## 4.导出TuGraph

### 4.1.配置样例

TuGraph支持使用DataX导出数据,使用如下配置即可将数据导出到文本数据中

```json
{
"job": {
"setting": {
"speed": {
"channel":1
}
},
"content": [
{
"reader": {
"name": "tugraphreader",
"parameter": {
"username": "admin",
"password": "73@TuGraph",
"graphName": "Movie_8C5C",
"queryCypher": "match (n:person) return n.id,n.name,n.born;",
"url": "bolt://127.0.0.1:27687"
}
},
"writer": {
"name": "txtfilewriter",
"parameter": {
"path": "./result",
"fileName": "luohw",
"writeMode": "truncate"
}
}
}
]
}
}
```

使用这个配置文件,可以把TuGraph Movie_8C5C子图中person节点的id,name和born属性全部导出出来,
导出到当前目录下的result目录中,文件名称为luohw+随机后缀。

### 4.2.参数说明

在使用DataX导出TuGraph数据时,需要将reader设置为tugraphreader并配置以下5个参数:

* **url**
* 描述:TuGraph的bolt server地址 <br />
* 必选:是 <br />
* 默认值:无 <br />

* **username**
* 描述:TuGraph的用户名 <br />
* 必选:是 <br />
* 默认值:无 <br />

* **password**
* 描述:TuGraph的密码 <br />
* 必选:是 <br />
* 默认值:无 <br />

* **graphName**
* 描述:所选取的需要同步的TuGraph子图 <br />
* 必选:是 <br />
* 默认值:无 <br />

* **queryCypher**
* 描述:通过cypher语句读取TuGraph中的数据 <br />
* 必选:否 <br />
* 默认值:无 <br />
3 changes: 3 additions & 0 deletions docs/zh-CN/source/development_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,8 @@ CALL db.upsertVertex('node1', [{id:1, name:'name1'},{id:2, name:'name2'}])

第二个参数和第三个参数是为第四个参数服务的。分别说明了起点和终点的类型是什么,以及第四个参数中那个字段代表起点主键字段值,那个字段代表终点主键字段值。

注:第二个参数和第三个参数中配置的起点和终点的主键字段并不是起点和终点schema中的主键字段名,只是起一个占位和区别的作用,方便识别第四个参数中哪个字段代表起点和终点的主键字段。

推荐使用driver里面的参数化特性,避免自己构造语句。
```
CALL db.upsertEdge('edge1',{type:'node1',key:'node1_id'}, {type:'node2',key:'node2_id'}, [{node1_id:1,node2_id:2,score:10},{node1_id:3,node2_id:4,score:20}])
Expand All @@ -249,6 +251,7 @@ CALL db.upsertEdge('edge1',{type:'node1',key:'node1_id'}, {type:'node2',key:'nod
https://github.com/ljcui/DataX/tree/bolt 自行编译。

这个DataX实现的 tugraph writer 内部调用的是上面描述的`db.upsertVertex``db.upsertEdge`
这个DataX实现的 tugraph reader 内部调用的是TuGraph 的 bolt client,支持流式读取

### 离线脱机导入数据
如果你有子图的schema以及子图里面所有的点边数据(csv或者json格式),可以利用`lgraph_import`工具离线将这些数据生成图数据。
Expand Down

0 comments on commit 48a1e20

Please sign in to comment.