Skip to content

Commit

Permalink
docs: chinese doc (#1)
Browse files Browse the repository at this point in the history
  • Loading branch information
siam-ese authored Dec 20, 2024
1 parent b14df84 commit 483c308
Show file tree
Hide file tree
Showing 68 changed files with 333 additions and 82 deletions.
File renamed without changes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/en-US/workflow/data_source_form.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/en-US/workflow/timer_form.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/zh-CN/clipsheet_popup_home.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/zh-CN/workflow/data_source_form.png
Binary file added docs/assets/zh-CN/workflow/timer_form.png
2 changes: 1 addition & 1 deletion docs/en-US/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Univer Clipsheet

<img src="../assets/clipsheet_popup_home.png" style="width: 600px; height: 400px; object-fit: contain;"/>
<img src="../assets/en-US/clipsheet_popup_home.png" style="width: 600px; height: 400px; object-fit: contain;"/>

Univer Clipsheet is a powerful Chrome extension for web scraping and data automation. It simplifies the process of extracting, organizing, and managing web data with powerful scraping capabilities and workflow integration.
1 change: 1 addition & 0 deletions docs/en-US/_sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
* [3. Scraper](/en-US/scraper.md)
* [4. Data management](/en-US/data-management.md)
* [5. Data Drill Down](/en-US/data-drill-down.md)
* [6. Workflow](/en-US/workflow.md)
8 changes: 4 additions & 4 deletions docs/en-US/data-drill-down.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,24 @@

When a scraped table contains columns of the URL type, a **drill-down** icon will appear in the corresponding column.

<img src="../assets/data-drill-down/column_drill_down_example.png" style="width: 280px; height: 500px; object-fit: contain;" />
<img src="../assets/en-US/data-drill-down/column_drill_down_example.png" style="width: 280px; height: 500px; object-fit: contain;" />

## 5.1 Create Drill-down columns

Clicking the button will navigate to the URL as a detail page.

As shown in the image below, you can select three highlighted blocks on the detail page. These selected blocks can be saved as your drill-down configuration for the URL column.

<img src="../assets/data-drill-down/drill_down_detail_page.png" style="width: 600px; height: 400px; object-fit: contain;" />
<img src="../assets/en-US/data-drill-down/drill_down_detail_page.png" style="width: 600px; height: 400px; object-fit: contain;" />

Returning to the `Scraper` form, you will see the **drill-down**-down columns created beneath the URL column.

<img src="../assets/data-drill-down/table_drill_down_columns.png" style="width: 260px; height: 400px; object-fit: contain;" />
<img src="../assets/en-US/data-drill-down/table_drill_down_columns.png" style="width: 260px; height: 400px; object-fit: contain;" />

## 5.2 Get Drill-down data

When you run the `Scraper` with drill-**drill-down** columns in the scraped table, it will automatically visit each URL in the URL column. For every visited URL, it will scrape the data based on the blocks you selected in the **drill-down**-down column configuration.

As shown in the picture below, You can see that Clipsheet has scraped the three **drill-down** columns we selected earlier.

<img src="../assets/data-drill-down/data_with_drill_down_columns.png" style="width: 800px; height: 380px; object-fit: contain;" />
<img src="../assets/en-US/data-drill-down/data_with_drill_down_columns.png" style="width: 800px; height: 380px; object-fit: contain;" />
4 changes: 2 additions & 2 deletions docs/en-US/data-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@

All scraped table data is displayed in the list under the `Data` tab.

<img src="../assets/data-management/clipsheet_popup_data_list.png" style="width: 600px; height: 400px; object-fit: contain;" />
<img src="../assets/en-US/data-management/clipsheet_popup_data_list.png" style="width: 600px; height: 400px; object-fit: contain;" />

You can view your tables in this list, whether they were scraped using `Scraper`, `Workflow`, or `Quick Scraping`.

## 4.1 View Table Details

Click on an item in the list to view the table details.

<img src="../assets/data-management/clipsheet_preview_table_dialog.png" style="width: 800px; height: 400px; object-fit: contain;" />
<img src="../assets/en-US/data-management/clipsheet_preview_table_dialog.png" style="width: 800px; height: 400px; object-fit: contain;" />

## 4.2 Export Table as CSV

Expand Down
2 changes: 1 addition & 1 deletion docs/en-US/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The first step is to unzip the compressed file you just downloaded.

## 1.3 Open developer mode

![Developer mode](../assets/getting-started/chrome_extensions_developer_mode.png)
![Developer mode](../assets/en-US/getting-started/chrome_extensions_developer_mode.png)

Locate the developer mode toggle control on the right side of the navigation bar and open it.

Expand Down
8 changes: 4 additions & 4 deletions docs/en-US/hello-world.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@

To find extension icon on the chrome `Navbar` after we install **Clipsheet**.

<img src="../assets/hello-world/chrome_extensions_navbar.png" style="width: 280px; height: 420px; object-fit: contain;" />
<img src="../assets/en-US/hello-world/chrome_extensions_navbar.png" style="width: 280px; height: 420px; object-fit: contain;" />

And navigate to a webpage you're interested in and open **Clipsheet**. You will then see the main interface of **Clipsheet**.

<img src="../assets/hello-world/clipsheet_popup_detected_dropdown.png" style="width: 600px; height: 400px; object-fit: contain;" />
<img src="../assets/en-US/hello-world/clipsheet_popup_detected_dropdown.png" style="width: 600px; height: 400px; object-fit: contain;" />

You can see that **Clipsheet** has automatically detected the tables on this page. Browse through the list of tables to find the one with the data you need.

Expand All @@ -17,10 +17,10 @@ Click the `Quick Scraping` button.

You will see a panel in top of left side on you browsing webpage

<img src="../assets/shared/clipsheet_table_scraping_dialog.png" style="width: 400px; height: 200px; object-fit: contain;" />
<img src="../assets/en-US/shared/clipsheet_table_scraping_dialog.png" style="width: 400px; height: 200px; object-fit: contain;" />

Click the `Confirm` button.

<img src="../assets/hello-world/clipsheet_success_scraping_dialog.png" style="width: 400px; height: 200px; object-fit: contain;" />
<img src="../assets/en-US/hello-world/clipsheet_success_scraping_dialog.png" style="width: 400px; height: 200px; object-fit: contain;" />

You have successfully scraped your first table with **Clipsheet**. You can view the scraped table data by clicking the `View Scraped Table Data` button.
18 changes: 9 additions & 9 deletions docs/en-US/scraper.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,50 +8,50 @@ In the previous chapters, we learned how to quickly scrape the tables we need us

However, this is not sufficient for large-scale web scraping requirements. To handle such scenarios, we need to automate the data collection process. What we need is a **Scraper**!

<img src="../assets/shared/clipsheet_table_scraping_dialog.png" style="width: 400px; height: 200px; object-fit: contain;" />
<img src="../assets/en-US/shared/clipsheet_table_scraping_dialog.png" style="width: 400px; height: 200px; object-fit: contain;" />

In the `Confirm Selection` panel, you’ll notice a `Create **Scraper**` button located in the footer.
In the `Confirm Selection` panel, you’ll notice a `Create Scraper` button located in the footer.

## 3.1 Web Scraping Automation

Clicking this button will open the **Scraper** configuration form in the side panel.

<img src="../assets/scraper/clipsheet_create_scraper_form.png" style="width: 340px; height: 600px; object-fit: contain;" />
<img src="../assets/en-US/scraper/clipsheet_create_scraper_form.png" style="width: 340px; height: 600px; object-fit: contain;" />

We provide three adaptable scraping methods: `Scroll`, `Click`, and `Page`. These configurations allow you to extract more data from a webpage by customizing the scraping process.

### 3.1.1 Infinite Scroll

For pages with infinite scrolling to load more data, you can use the `Scroll` **Scraper** to extract the entire list.

<img src="../assets/scraper/clipsheet_scraper_scroll_form.png" style="width: 300px; height: 200px; object-fit: contain;" />
<img src="../assets/en-US/scraper/clipsheet_scraper_scroll_form.png" style="width: 300px; height: 200px; object-fit: contain;" />

### 3.1.2 Click to Load More Data or Navigate to the Next Page

<img src="../assets/scraper/clipsheet_scraper_click_form.png" style="width: 300px; height: 200px; object-fit: contain;" />
<img src="../assets/en-US/scraper/clipsheet_scraper_click_form.png" style="width: 300px; height: 200px; object-fit: contain;" />

For pages that require clicking a button to load more data or navigate to the next page, you can use the `Click` **Scraper** to retrieve all the data.

### 3.1.3 Pagination

<img src="../assets/scraper/clipsheet_scraper_page_form.png" style="width: 300px; height: 200px; object-fit: contain;" />
<img src="../assets/en-US/scraper/clipsheet_scraper_page_form.png" style="width: 300px; height: 200px; object-fit: contain;" />

The `Page` configuration makes it easy to scrape data that is divided across multiple pages.

## 3.2 Columns of table

<img src="../assets/scraper/clipsheet_scraper_columns_of_table.png" style="width: 300px; height: 400px; object-fit: contain;" />
<img src="../assets/en-US/scraper/clipsheet_scraper_columns_of_table.png" style="width: 300px; height: 400px; object-fit: contain;" />

In this table, we list the columns from the scraped table, allowing you to customize it. You can define column names, delete columns, and preview the table data by clicking the `View Table` button

## 3.3 Save your scraper

After configuring, you can save the scraper, and it will appear in the scraper list in the popup. You can then `start` the scraper to begin web scraping and collect the data.

<img src="../assets/scraper/clipsheet_popup_scraper_list.png" style="width: 600px; height: 400px; object-fit: contain;" />
<img src="../assets/en-US/scraper/clipsheet_popup_scraper_list.png" style="width: 600px; height: 400px; object-fit: contain;" />

## 3.4 Upload & Export

We allow you to export your scraper as a JSON file and import a scraper JSON file shared by others, making it easy for you to use the scraper.

<img src="../assets/scraper/clipsheet_popup_scraper_upload_and_export.png" style="width: 600px; height: 400px; object-fit: contain;" />
<img src="../assets/en-US/scraper/clipsheet_popup_scraper_upload_and_export.png" style="width: 600px; height: 400px; object-fit: contain;" />
43 changes: 43 additions & 0 deletions docs/en-US/workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# 6. Workflow

Although web scraping can usually be executed with a single click in most scenarios, there are times when we prefer to schedule or run specific scraping tasks periodically.

In such cases, we need a ****workflow**** to automate these tasks.

A **workflow** is a combination of multiple scrapers. Running a **workflow** will execute all the scrapers it contains, and you will be able to **integrate** the data from all scrapers for further processing.

Below, we will introduce several key features of workflows, including scheduled execution, incremental data updates to bound data sources, and integration of multiple scrapers.

## 6.1 Bind Workflow Data Source

A **workflow** can be linked to a data source. If no data source is defined, a new data source will be automatically created and bound after the **workflow**'s first execution.

<img src="../assets/en-US/workflow/data_source_form.png" style="width: 400px; height: 300px; object-fit: contain;" />

> **Note:** When a **workflow** is bound to a data source, the columns of the data source will be set to the **workflow** and cannot be modified. (This is because once a data source is created, its table structure is immutable.)
Once a **workflow** has bound a data source, the workflow’s output data will be appended to this data source, and duplicate rows will be automatically removed.

You can customize the configuration for removing duplicates.

<img src="../assets/en-US/workflow/remove_duplicates_form.png" style="width: 400px; height: 300px; object-fit: contain;" />

## 6.2 Combine with Scrapers

A **workflow** is a collection of multiple scrapers working together.

You need to define the **workflow** based on the scrapers that make up the **workflow**. For example, in the following setup, we configure the **workflow** to include the `Amazon Scraper` and `Google Maps Scraper`.

<img src="../assets/en-US/workflow/data_merge_form_scraper.png" style="width: 400px; height: 300px; object-fit: contain;" />

Next, we need to determine how each column in the **workflow** is mapped to the corresponding column from each **scraper**.

<img src="../assets/en-US/workflow/data_merge_form_column.png" style="width: 400px; height: 300px; object-fit: contain;" />

## 6.3 Scheduled Execution

For scheduled execution, we provide a highly customizable scheduling configuration, allowing you to set the exact times for your **workflow** to run.

<img src="../assets/en-US/workflow/timer_form.png" style="width: 400px; height: 300px; object-fit: contain;" />

Scheduled execution of workflows is a powerful feature that helps you fulfill your web scraping needs.
4 changes: 2 additions & 2 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@
<link rel="stylesheet" href="//cdn.jsdelivr.net/npm/docsify/lib/themes/buble.css" />
</head>
<body>
<!-- <nav>
<nav>
<a href="#/">EN</a>
<a href="#/zh-CN/">简体中文</a>
</nav> -->
</nav>
<div id="app"></div>
<script>
window.$docsify = {
Expand Down
4 changes: 2 additions & 2 deletions docs/zh-CN/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Univer Clipsheet

![Clipsheet App](../assets/clipsheet_popup_home.png)
<img src="../assets/zh-CN/clipsheet_popup_home.png" style="width: 600px; height: 400px; object-fit: contain;"/>

**Univer Clipsheet** 是一个功能强大的 Chrome 浏览器扩展,用于网页采集和数据自动化。它通过强大的采集功能和工作流集成,简化了数据提取、组织和管理的过程
Univer Clipsheet 是一款功能强大的 Chrome 扩展,用于网页采集和数据自动化。它简化了提取、组织和管理网页数据的过程,具有强大的采集能力和工作流集成功能
9 changes: 7 additions & 2 deletions docs/zh-CN/_sidebar.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@

* [Introduction](/)
* [Quick Started](quick-start.md)
* [简介](/)
* [1. 快速入门](/zh-CN/getting-started)
* [2. "第一个表格"](/zh-CN/hello-world)
* [3. Scraper](/zh-CN/scraper.md)
* [4. 采集数据管理](/zh-CN/data-management.md)
* [5. 数据下钻采集](/zh-CN/data-drill-down.md)
* [6. Workflow](/zh-CN/workflow.md)
25 changes: 25 additions & 0 deletions docs/zh-CN/data-drill-down.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# 5. 数据下钻

当一个采集的表格包含 URL 类型的列时,相应的列将显示一个 **数据下钻** 图标。

<img src="../assets/zh-CN/data-drill-down/column_drill_down_example.png" style="width: 280px; height: 500px; object-fit: contain;" />

## 5.1 创建数据下钻列

点击该按钮将导航到该 URL 的详情页。

如下面的图片所示,您可以在详情页上选择三个高亮的块。这些选中的块可以保存为您用于 URL 列的数据下钻配置。

<img src="../assets/zh-CN/data-drill-down/drill_down_detail_page.png" style="width: 600px; height: 400px; object-fit: contain;" />

返回到 `采集器(Scraper)` 表单,您将看到在 URL 列下方创建的 **数据下钻** 列。

<img src="../assets/zh-CN/data-drill-down/table_drill_down_columns.png" style="width: 260px; height: 400px; object-fit: contain;" />

## 5.2 获取数据下钻数据

当您运行包含 **数据下钻** 列的 `采集器(Scraper)` 时,它将自动访问 URL 列中的每个 URL。对于每个访问的 URL,它将根据您在 **数据下钻** 配置中选择的块采集数据。

如下面的图片所示,您可以看到 Clipsheet 已成功采集了我们之前选择的三个 **数据下钻** 列的数据。

<img src="../assets/zh-CN/data-drill-down/data_with_drill_down_columns.png" style="width: 800px; height: 380px; object-fit: contain;" />
17 changes: 17 additions & 0 deletions docs/zh-CN/data-management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# 4. 数据管理

所有采集的表格数据都会显示在 `数据` 标签下的列表中。

<img src="../assets/zh-CN/data-management/clipsheet_popup_data_list.png" style="width: 600px; height: 400px; object-fit: contain;" />

您可以在此列表中查看您的表格,无论它们是通过 `采集器(Scraper)``工作流(Workflow)` 还是 `快速采集(Quick Scraping)` 采集的。

## 4.1 查看表格详情

点击列表中的一项以查看表格的详细信息。

<img src="../assets/zh-CN/data-management/clipsheet_preview_table_dialog.png" style="width: 800px; height: 400px; object-fit: contain;" />

## 4.2 导出表格为 CSV 格式

我们还支持将表格导出为 **CSV** 文件,以便于数据共享和使用。
31 changes: 28 additions & 3 deletions docs/zh-CN/getting-started.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,30 @@
# Getting Started
# 快速入门

## 1. Installation
## 1. 安装

xxxx
要体验 Clipsheet 的功能,请点击 [Clipsheet Releases](https://github.com/dream-num/univer-clipsheet/releases) 获取最新版本的 Clipsheet,并按照以下步骤使用“加载已解压的扩展”。

## 1.1 解压

第一步是解压刚刚下载的压缩文件。

## 1.2 进入扩展管理页面

“打开 Chrome 浏览器,在地址栏中输入 `chrome://extensions/`。”

## 1.3 打开开发者模式

![开发者模式](../assets/en-US/getting-started/chrome_extensions_developer_mode.png)

在导航栏右侧找到开发者模式切换按钮并打开它。

## 1.4 加载文件夹作为扩展

点击 `加载已解压的扩展` 按钮,选择 Clipsheet 文件夹来加载扩展。

## 1.5 教程

如果您仍然无法正确安装 Clipsheet,我们已准备好文档和视频帮助您完成安装过程。

请阅读 [加载已解压扩展文档](https://developer.chrome.com/docs/extensions/get-started/tutorial/hello-world#load-unpacked) 了解更多信息。
观看这个 [视频教程](https://www.youtube.com/watch?v=oswjtLwCUqg),了解如何加载扩展的详细解释。
25 changes: 25 additions & 0 deletions docs/zh-CN/hello-world.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# 使用 **Clipsheet** 采集你的第一个表格

安装 **Clipsheet** 后,点击 Chrome `浏览器顶部工具栏` 上的扩展图标。

<img src="../assets/en-US/hello-world/chrome_extensions_navbar.png" style="width: 280px; height: 420px; object-fit: contain;" />

然后,导航到你感兴趣的网页并打开 **Clipsheet**。你将看到 **Clipsheet** 的主界面。

<img src="../assets/zh-CN/hello-world/clipsheet_popup_detected_dropdown.png" style="width: 600px; height: 400px; object-fit: contain;" />

你会看到 **Clipsheet** 已自动检测到页面上的表格。浏览表格列表,找到你需要的数据表格。

如果检测到的表格不是你想要的,点击 `手动采集` 按钮选择你要采集的元素。

点击 `快速采集` 按钮。

你会看到一个面板出现在网页左上角。

<img src="../assets/zh-CN/shared/clipsheet_table_scraping_dialog.png" style="width: 400px; height: 200px; object-fit: contain;" />

点击 `确认` 按钮。

<img src="../assets/zh-CN/hello-world/clipsheet_success_scraping_dialog.png" style="width: 400px; height: 200px; object-fit: contain;" />

你已经成功采集了第一个表格。点击 `查看采集的表格数据` 按钮查看采集的表格数据。
Loading

0 comments on commit 483c308

Please sign in to comment.