Skip to content

Commit

Permalink
Update README and documentation: streamline features, enhance deploym…
Browse files Browse the repository at this point in the history
…ent instructions, and clarify plugin installation process for both English and Chinese versions.
  • Loading branch information
Ray-D-Song committed Dec 15, 2024
1 parent d376e3f commit e5b9f77
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 281 deletions.
151 changes: 12 additions & 139 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,146 +15,19 @@ Web Archive is a free web archiving and sharing service based on Cloudflare, inc

The server is based on the full set of services of Cloudflare Worker, including D1 database and R2 storage bucket.

## Why
Most web archiving tools, like archivebox, are based on server-side calls to headless browsers to capture pages.
This approach has the following drawbacks:
- It is difficult to archive websites that require login, such as zhihu and Medium, as they need to configure tokens or cookies.
- Headless browsers require higher server requirements, and most users are nas users.
Web Archive is a completely free and zero-threshold solution, and Cloudflare can easily migrate the data back to the local host after self-hosting.
## Features

## Feat & Roadmap
- [x] Folder classification
- [x] Page preview image
- [x] Title keyword search
- [x] Showcase, share the pages you captured
- [x] Mobile support
- [x] Tag classification system
- [x] ~~Save the page as markdown~~ Read mode
- [ ] Highlight the text?
- Web archiving, search, sharing
- Folder classification
- Mobile adaptation
- AI generated tag classification
- Reading mode

## Deploy Guide
Github Actions (Recommended)
## Deploy
You can refer to the [deploy document](https://web-archive-docs.pages.dev/en/deploy.html) to deploy.

[![Deploy to Cloudflare Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/ray-d-song/web-archive)
After deployment, in the browser plugin, enter the service address and key to use.

Click the button above, follow the instructions of Cloudflare to complete the deployment.

> [!IMPORTANT]
> R2 storage bucket is a feature that needs to be manually enabled in the Cloudflare panel, please enable it before deployment or re-run Github Actions after failure.
> You only need to enable the R2 feature, no need to create a storage bucket, the storage bucket will be created automatically during deployment.
> [!NOTE]
> When creating a token, select the `Edit Cloudflare Workers` template directly, and then manually add the `D1 Edit` permission.
![permissions](https://raw.githubusercontent.com/ray-d-song/web-archive/main/docs/imgs/perm.png)

Once deployed, please login as soon as possible, the first user to login will be set as the administrator.

---

<details>
<summary>Command Deploy</summary>

Requires the local installation of the node environment.
Updating during command deployment is more troublesome, it is recommended to use Github actions for deployment.
### 0. Download the code
Download the latest service.zip from the release page, unzip it, and execute the following commands in the root directory.

### 1. Login
```bash
npx wrangler login
```

### 2. Create r2 bucket
```bash
npx wrangler r2 bucket create web-archive
```
Output:
```bash
⛅️ wrangler 3.78.10 (update available 3.80.4)
--------------------------------------------------------

Creating bucket web-archive with default storage class set to Standard.
Created bucket web-archive with default storage class set to Standard.
```

### 3. Create d1 database
```bash
npx wrangler d1 create web-archive
```

Output:

```bash
⛅️ wrangler 3.78.10 (update available 3.80.4)
--------------------------------------------------------

✅ Successfully created DB 'web-archive' in region UNKNOWN
Created your new D1 database.

[[d1_databases]]
binding = "DB" # i.e. available in your Worker on env.DB
database_name = "web-archive"
database_id = "xxxx-xxxx-xxxx-xxxx-xxxx"
```

Copy the last line of the output, and replace the `database_id` value in the `wrangler.toml` file.

Then execute the initialization sql:
```bash
npx wrangler d1 migrations apply web-archive --remote
```

Output:
```bash
🌀 Executing on remote database web-archive (7fd5a5ce-79e7-4519-a5fb-2f9a3af71064):
🌀 To execute on your local development database, remove the --remote flag from your wrangler command.
Note: if the execution fails to complete, your DB will return to its original state and you can safely retry.
├ 🌀 Uploading 7fd5a5ce-79e7-4519-a5fb-2f9a3af71064.0a40ff4fc67b5bdf.sql
│ 🌀 Uploading complete.
🌀 Starting import...
🌀 Processed 9 queries.
🚣 Executed 9 queries in 0.00 seconds (13 rows read, 13 rows written)
Database is currently at bookmark 00000001-00000005-00004e2b-c977a6f2726e175274a1c75055c23607.
┌────────────────────────┬───────────┬──────────────┬────────────────────┐
│ Total queries executed │ Rows read │ Rows written │ Database size (MB) │
├────────────────────────┼───────────┼──────────────┼────────────────────┤
│ 9 │ 13 │ 13 │ 0.04 │
└────────────────────────┴───────────┴──────────────┴────────────────────┘
```
### 4. Deploy
```bash
npx wrangler pages deploy
```
Output:
```bash
The project you specified does not exist: "web-archive". Would you like to create it?
❯ Create a new project
✔ Enter the production branch name: … dev
✨ Successfully created the 'web-archive' project.
▲ [WARNING] Warning: Your working directory is a git repo and has uncommitted changes

To silence this warning, pass in --commit-dirty=true

🌎 Uploading... (3/3)

✨ Success! Uploaded 3 files (3.29 sec)

✨ Compiled Worker successfully
✨ Uploading Worker bundle
✨ Uploading _routes.json
🌎 Deploying...
✨ Deployment complete! Take a peek over at https://web-archive-xxxx.pages.dev
```
</details>
## Usage Guide
Download the latest extension.zip from the release page, unzip it, and install it to the browser.
After the first installation, you need to enter the API address and key. The API address is the service address, and the key is the password of the first user (administrator).
In the folder page, you can set whether a page is displayed in the showcase.
Showcase address: /#/showcase/folder
Plugin download:
- [Chrome](https://chromewebstore.google.com/detail/web-archive/dfigobdhnhkkdniegjdagofhhhopjajb?hl=zh-CN&utm_source=ext_sidebar)
- [Firefox](https://addons.mozilla.org/zh-CN/firefox/addon/web-archive-ray-banzhe/)
152 changes: 12 additions & 140 deletions docs/README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,146 +10,18 @@ Web Archive 是一个网页归档工具,包含以下几个部分:

服务端基于 Cloudflare Worker 的全套服务,包含 D1 数据库、R2 存储桶。

## Why
大多数网页归档工具,比如 archivebox,都是基于服务器调用无头浏览器抓取的方式进行归档。
这种做法的弊端是 知乎、medium 这种需要登录的网站操作很麻烦,需要配置 token 或 cookie。
同时无头浏览器对服务器的要求也比较高,大多数都是 nas 用户在使用。
web-archive 是一个完全免费、无门槛的方案,而且 Cloudflare 可以非常方便的将数据迁移回本地转为 self-host。
## 功能

## feat & roadmap
- [x] 文件夹分类
- [x] 页面预览图
- [x] 标题关键字查询
- [x] 橱窗,可以分享自己抓取的页面
- [x] 移动端适配
- [x] tag 分类
- [x] ~~将页面保存为 markdown~~ 阅读模式
- [ ] 划词高亮?
- 网页归档,搜索,分享
- 文件夹分类
- 移动端适配
- AI 生成 tag 分类
- 阅读模式

## 部署指南
Github Actions 一键部署(推荐)
## 部署
可以参考 [部署文档](https://web-archive-docs.pages.dev/deploy.html) 进行部署。
部署完成后,在浏览器插件中输入服务地址和 key 即可使用。

[![Deploy to Cloudflare Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/ray-d-song/web-archive)

点击上面的按钮,按照 Cloudflare 的指引完成部署。

> [!IMPORTANT]
> R2 存储桶是需要在 Cloudflare 面板上手动开通的功能,请开通后再进行部署或者失败后 re-run Github Actions。
> 仅需开通 R2 功能,不需要创建存储桶,存储桶会在部署时自动创建。
> [!NOTE]
> 创建令牌时,直接选择 `编辑 Cloudflare Workers` 模版,再手动添加 `D1 编辑` 权限。
![permissions](https://raw.githubusercontent.com/ray-d-song/web-archive/main/docs/imgs/perm_zh.png)

部署后请尽快登录,首个登录的用户会被设置为管理员。

---

<details>
<summary>命令部署</summary>

要求本地安装了 node 环境。
命令部署时更新比较麻烦, 推荐实用 Github actions 部署。
### 0. 下载代码
在 release 页面下载最新的 service.zip,解压后在根目录执行后续操作。

### 1. 登录
```bash
npx wrangler login
```

### 2. 创建 r2 存储桶
```bash
npx wrangler r2 bucket create web-archive
```
成功输出:
```bash
⛅️ wrangler 3.78.10 (update available 3.80.4)
--------------------------------------------------------

Creating bucket web-archive with default storage class set to Standard.
Created bucket web-archive with default storage class set to Standard.
```

### 3. 创建 d1 数据库
```bash
# 创建数据库
npx wrangler d1 create web-archive
```

执行输出:

```bash
⛅️ wrangler 3.78.10 (update available 3.80.4)
--------------------------------------------------------

✅ Successfully created DB 'web-archive' in region UNKNOWN
Created your new D1 database.

[[d1_databases]]
binding = "DB" # i.e. available in your Worker on env.DB
database_name = "web-archive"
database_id = "xxxx-xxxx-xxxx-xxxx-xxxx"
```
拷贝最后一行,替换 `wrangler.toml` 文件中 `database_id` 的值。

然后执行初始化 sql:
```bash
npx wrangler d1 migrations apply web-archive --remote
```

成功输出:
```bash
🌀 Executing on remote database web-archive (7fd5a5ce-79e7-4519-a5fb-2f9a3af71064):
🌀 To execute on your local development database, remove the --remote flag from your wrangler command.
Note: if the execution fails to complete, your DB will return to its original state and you can safely retry.
├ 🌀 Uploading 7fd5a5ce-79e7-4519-a5fb-2f9a3af71064.0a40ff4fc67b5bdf.sql
│ 🌀 Uploading complete.
🌀 Starting import...
🌀 Processed 9 queries.
🚣 Executed 9 queries in 0.00 seconds (13 rows read, 13 rows written)
Database is currently at bookmark 00000001-00000005-00004e2b-c977a6f2726e175274a1c75055c23607.
┌────────────────────────┬───────────┬──────────────┬────────────────────┐
│ Total queries executed │ Rows read │ Rows written │ Database size (MB) │
├────────────────────────┼───────────┼──────────────┼────────────────────┤
│ 9 │ 13 │ 13 │ 0.04 │
└────────────────────────┴───────────┴──────────────┴────────────────────┘
```
### 4. 部署服务
```bash
# 部署服务
npx wrangler pages deploy
```
成功输出:
```bash
The project you specified does not exist: "web-archive". Would you like to create it?
❯ Create a new project
✔ Enter the production branch name: … dev
✨ Successfully created the 'web-archive' project.
▲ [WARNING] Warning: Your working directory is a git repo and has uncommitted changes

To silence this warning, pass in --commit-dirty=true

🌎 Uploading... (3/3)

✨ Success! Uploaded 3 files (3.29 sec)

✨ Compiled Worker successfully
✨ Uploading Worker bundle
✨ Uploading _routes.json
🌎 Deploying...
✨ Deployment complete! Take a peek over at https://web-archive-xxxx.pages.dev
```
</details>
## 使用指南
在 release 页面下载最新的 extension.zip,解压后安装到浏览器中。
首次安装后,需要输入 API 地址和密钥,API 地址是服务地址,密钥就是首个用户(管理员)的密码。
在文件夹页面,你可以设置某个页面是否在橱窗中展示。
橱窗地址:/#/showcase/folder
插件下载:
- [Chrome](https://chromewebstore.google.com/detail/web-archive/dfigobdhnhkkdniegjdagofhhhopjajb?hl=zh-CN&utm_source=ext_sidebar)
- [Firefox](https://addons.mozilla.org/zh-CN/firefox/addon/web-archive-ray-banzhe/)
5 changes: 4 additions & 1 deletion docs/en/usage.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Usage

Download the latest extension.zip from the release page, unzip and install it into the browser.
Download the plugin from the plugin store:
- [Chrome](https://chromewebstore.google.com/detail/web-archive/dfigobdhnhkkdniegjdagofhhhopjajb?hl=zh-CN&utm_source=ext_sidebar)
- [Firefox](https://addons.mozilla.org/zh-CN/firefox/addon/web-archive-ray-banzhe/)

After the first installation, you need to input the API address and key, the API address is the service address, and the key is the password of the first user (administrator).

In the folder page, you can set whether to display a page in the showcase.
Expand Down
5 changes: 4 additions & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# 使用指南

在 release 页面下载最新的 extension.zip,解压后安装到浏览器中。
在浏览器插件商店下载插件:
- [Chrome](https://chromewebstore.google.com/detail/web-archive/dfigobdhnhkkdniegjdagofhhhopjajb?hl=zh-CN&utm_source=ext_sidebar)
- [Firefox](https://addons.mozilla.org/zh-CN/firefox/addon/web-archive-ray-banzhe/)

首次安装后,需要输入 API 地址和密钥,API 地址是服务地址,密钥就是首个用户(管理员)的密码。

在文件夹页面,你可以设置某个页面是否在橱窗中展示。
Expand Down

0 comments on commit e5b9f77

Please sign in to comment.