-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
68 changed files
with
333 additions
and
82 deletions.
There are no files selected for viewing
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+106 KB
docs/assets/zh-CN/scraper/clipsheet_popup_scraper_upload_and_export.png
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# Univer Clipsheet | ||
|
||
<img src="../assets/clipsheet_popup_home.png" style="width: 600px; height: 400px; object-fit: contain;"/> | ||
<img src="../assets/en-US/clipsheet_popup_home.png" style="width: 600px; height: 400px; object-fit: contain;"/> | ||
|
||
Univer Clipsheet is a powerful Chrome extension for web scraping and data automation. It simplifies the process of extracting, organizing, and managing web data with powerful scraping capabilities and workflow integration. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# 6. Workflow | ||
|
||
Although web scraping can usually be executed with a single click in most scenarios, there are times when we prefer to schedule or run specific scraping tasks periodically. | ||
|
||
In such cases, we need a ****workflow**** to automate these tasks. | ||
|
||
A **workflow** is a combination of multiple scrapers. Running a **workflow** will execute all the scrapers it contains, and you will be able to **integrate** the data from all scrapers for further processing. | ||
|
||
Below, we will introduce several key features of workflows, including scheduled execution, incremental data updates to bound data sources, and integration of multiple scrapers. | ||
|
||
## 6.1 Bind Workflow Data Source | ||
|
||
A **workflow** can be linked to a data source. If no data source is defined, a new data source will be automatically created and bound after the **workflow**'s first execution. | ||
|
||
<img src="../assets/en-US/workflow/data_source_form.png" style="width: 400px; height: 300px; object-fit: contain;" /> | ||
|
||
> **Note:** When a **workflow** is bound to a data source, the columns of the data source will be set to the **workflow** and cannot be modified. (This is because once a data source is created, its table structure is immutable.) | ||
Once a **workflow** has bound a data source, the workflow’s output data will be appended to this data source, and duplicate rows will be automatically removed. | ||
|
||
You can customize the configuration for removing duplicates. | ||
|
||
<img src="../assets/en-US/workflow/remove_duplicates_form.png" style="width: 400px; height: 300px; object-fit: contain;" /> | ||
|
||
## 6.2 Combine with Scrapers | ||
|
||
A **workflow** is a collection of multiple scrapers working together. | ||
|
||
You need to define the **workflow** based on the scrapers that make up the **workflow**. For example, in the following setup, we configure the **workflow** to include the `Amazon Scraper` and `Google Maps Scraper`. | ||
|
||
<img src="../assets/en-US/workflow/data_merge_form_scraper.png" style="width: 400px; height: 300px; object-fit: contain;" /> | ||
|
||
Next, we need to determine how each column in the **workflow** is mapped to the corresponding column from each **scraper**. | ||
|
||
<img src="../assets/en-US/workflow/data_merge_form_column.png" style="width: 400px; height: 300px; object-fit: contain;" /> | ||
|
||
## 6.3 Scheduled Execution | ||
|
||
For scheduled execution, we provide a highly customizable scheduling configuration, allowing you to set the exact times for your **workflow** to run. | ||
|
||
<img src="../assets/en-US/workflow/timer_form.png" style="width: 400px; height: 300px; object-fit: contain;" /> | ||
|
||
Scheduled execution of workflows is a powerful feature that helps you fulfill your web scraping needs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# Univer Clipsheet | ||
|
||
![Clipsheet App](../assets/clipsheet_popup_home.png) | ||
<img src="../assets/zh-CN/clipsheet_popup_home.png" style="width: 600px; height: 400px; object-fit: contain;"/> | ||
|
||
**Univer Clipsheet** 是一个功能强大的 Chrome 浏览器扩展,用于网页采集和数据自动化。它通过强大的采集功能和工作流集成,简化了数据提取、组织和管理的过程。 | ||
Univer Clipsheet 是一款功能强大的 Chrome 扩展,用于网页采集和数据自动化。它简化了提取、组织和管理网页数据的过程,具有强大的采集能力和工作流集成功能。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,8 @@ | ||
|
||
* [Introduction](/) | ||
* [Quick Started](quick-start.md) | ||
* [简介](/) | ||
* [1. 快速入门](/zh-CN/getting-started) | ||
* [2. "第一个表格"](/zh-CN/hello-world) | ||
* [3. Scraper](/zh-CN/scraper.md) | ||
* [4. 采集数据管理](/zh-CN/data-management.md) | ||
* [5. 数据下钻采集](/zh-CN/data-drill-down.md) | ||
* [6. Workflow](/zh-CN/workflow.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# 5. 数据下钻 | ||
|
||
当一个采集的表格包含 URL 类型的列时,相应的列将显示一个 **数据下钻** 图标。 | ||
|
||
<img src="../assets/zh-CN/data-drill-down/column_drill_down_example.png" style="width: 280px; height: 500px; object-fit: contain;" /> | ||
|
||
## 5.1 创建数据下钻列 | ||
|
||
点击该按钮将导航到该 URL 的详情页。 | ||
|
||
如下面的图片所示,您可以在详情页上选择三个高亮的块。这些选中的块可以保存为您用于 URL 列的数据下钻配置。 | ||
|
||
<img src="../assets/zh-CN/data-drill-down/drill_down_detail_page.png" style="width: 600px; height: 400px; object-fit: contain;" /> | ||
|
||
返回到 `采集器(Scraper)` 表单,您将看到在 URL 列下方创建的 **数据下钻** 列。 | ||
|
||
<img src="../assets/zh-CN/data-drill-down/table_drill_down_columns.png" style="width: 260px; height: 400px; object-fit: contain;" /> | ||
|
||
## 5.2 获取数据下钻数据 | ||
|
||
当您运行包含 **数据下钻** 列的 `采集器(Scraper)` 时,它将自动访问 URL 列中的每个 URL。对于每个访问的 URL,它将根据您在 **数据下钻** 配置中选择的块采集数据。 | ||
|
||
如下面的图片所示,您可以看到 Clipsheet 已成功采集了我们之前选择的三个 **数据下钻** 列的数据。 | ||
|
||
<img src="../assets/zh-CN/data-drill-down/data_with_drill_down_columns.png" style="width: 800px; height: 380px; object-fit: contain;" /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# 4. 数据管理 | ||
|
||
所有采集的表格数据都会显示在 `数据` 标签下的列表中。 | ||
|
||
<img src="../assets/zh-CN/data-management/clipsheet_popup_data_list.png" style="width: 600px; height: 400px; object-fit: contain;" /> | ||
|
||
您可以在此列表中查看您的表格,无论它们是通过 `采集器(Scraper)`、`工作流(Workflow)` 还是 `快速采集(Quick Scraping)` 采集的。 | ||
|
||
## 4.1 查看表格详情 | ||
|
||
点击列表中的一项以查看表格的详细信息。 | ||
|
||
<img src="../assets/zh-CN/data-management/clipsheet_preview_table_dialog.png" style="width: 800px; height: 400px; object-fit: contain;" /> | ||
|
||
## 4.2 导出表格为 CSV 格式 | ||
|
||
我们还支持将表格导出为 **CSV** 文件,以便于数据共享和使用。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,30 @@ | ||
# Getting Started | ||
# 快速入门 | ||
|
||
## 1. Installation | ||
## 1. 安装 | ||
|
||
xxxx | ||
要体验 Clipsheet 的功能,请点击 [Clipsheet Releases](https://github.com/dream-num/univer-clipsheet/releases) 获取最新版本的 Clipsheet,并按照以下步骤使用“加载已解压的扩展”。 | ||
|
||
## 1.1 解压 | ||
|
||
第一步是解压刚刚下载的压缩文件。 | ||
|
||
## 1.2 进入扩展管理页面 | ||
|
||
“打开 Chrome 浏览器,在地址栏中输入 `chrome://extensions/`。” | ||
|
||
## 1.3 打开开发者模式 | ||
|
||
![开发者模式](../assets/en-US/getting-started/chrome_extensions_developer_mode.png) | ||
|
||
在导航栏右侧找到开发者模式切换按钮并打开它。 | ||
|
||
## 1.4 加载文件夹作为扩展 | ||
|
||
点击 `加载已解压的扩展` 按钮,选择 Clipsheet 文件夹来加载扩展。 | ||
|
||
## 1.5 教程 | ||
|
||
如果您仍然无法正确安装 Clipsheet,我们已准备好文档和视频帮助您完成安装过程。 | ||
|
||
请阅读 [加载已解压扩展文档](https://developer.chrome.com/docs/extensions/get-started/tutorial/hello-world#load-unpacked) 了解更多信息。 | ||
观看这个 [视频教程](https://www.youtube.com/watch?v=oswjtLwCUqg),了解如何加载扩展的详细解释。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# 使用 **Clipsheet** 采集你的第一个表格 | ||
|
||
安装 **Clipsheet** 后,点击 Chrome `浏览器顶部工具栏` 上的扩展图标。 | ||
|
||
<img src="../assets/en-US/hello-world/chrome_extensions_navbar.png" style="width: 280px; height: 420px; object-fit: contain;" /> | ||
|
||
然后,导航到你感兴趣的网页并打开 **Clipsheet**。你将看到 **Clipsheet** 的主界面。 | ||
|
||
<img src="../assets/zh-CN/hello-world/clipsheet_popup_detected_dropdown.png" style="width: 600px; height: 400px; object-fit: contain;" /> | ||
|
||
你会看到 **Clipsheet** 已自动检测到页面上的表格。浏览表格列表,找到你需要的数据表格。 | ||
|
||
如果检测到的表格不是你想要的,点击 `手动采集` 按钮选择你要采集的元素。 | ||
|
||
点击 `快速采集` 按钮。 | ||
|
||
你会看到一个面板出现在网页左上角。 | ||
|
||
<img src="../assets/zh-CN/shared/clipsheet_table_scraping_dialog.png" style="width: 400px; height: 200px; object-fit: contain;" /> | ||
|
||
点击 `确认` 按钮。 | ||
|
||
<img src="../assets/zh-CN/hello-world/clipsheet_success_scraping_dialog.png" style="width: 400px; height: 200px; object-fit: contain;" /> | ||
|
||
你已经成功采集了第一个表格。点击 `查看采集的表格数据` 按钮查看采集的表格数据。 |
Oops, something went wrong.