Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
zhendi committed Sep 14, 2024
1 parent f194bec commit b44f009
Show file tree
Hide file tree
Showing 25 changed files with 1,144 additions and 111 deletions.
4 changes: 2 additions & 2 deletions .envrc.example
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ export CSGHUB_PORTAL_ON_PREMISE=true
export CSGHUB_PORTAL_SENSITIVE_CHECK=false
export CSGHUB_PORTAL_STARHUB_BASE_URL=http://localhost:8080
export CSGHUB_PORTAL_STARHUB_API_KEY=f3a7b9c1d6e5f8e2a1b5d4f9e6a2b8d7c3a4e2b1d9f6e7a8d2c5a7b4c1e3f5b8a1d4f9b7d6e2f8a5d3b1e7f9c6a8b2d1e4f7d5b6e9f2a4b3c8e1d7f995hd82hf
export ENABLE_HTTPS=true
export CSGHUB_PORTAL_DATABASE_DSN=postgresql://postgres:postgres@localhost:5432/starhub_portal?sslmode=disable
export CSGHUB_PORTAL_ENABLE_HTTPS=true
export CSGHUB_PORTAL_DATABASE_DSN=postgresql://postgres:postgres@localhost:5432/csghub_development?sslmode=disable
export CSGHUB_PORTAL_DATABASE_DIALECT=pg
export CSGHUB_PORTAL_SIGNUP_URL=
export CSGHUB_PORTAL_LOGIN_URL=
Expand Down
210 changes: 101 additions & 109 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,109 +1,101 @@
## 安装

### 设置 Go Path(如果已完成此设置,可以跳过此步骤)

Go 使用 `GOPATH` 环境变量来指定工作空间位置。以下是设置方法:

#### 1. 确定您的工作空间

选择一个目录作为您的 Go 工作空间,例如 `~/go`

#### 2. 设置 `GOPATH`

将以下内容添加到您的 shell 配置文件中(例如 `.bashrc``.zshrc`):

```bash
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
```

#### 3. 应用更改

应用更改以确保配置文件更新生效:

```shell
source ~/.bashrc
```


```shell
source ~/.zshrc
```

#### 4. 验证设置

运行以下命令以验证 `GOPATH` 是否正确设置:

```shell
go env GOPATH
```

输出应为您的工作空间目录,例如 `/home/username/go`

### 设置环境变量

在项目根目录下,根据 `.env.example` 文件创建 `.env` 文件,并根据需要修改其中的配置。

```shell
cp .env.example .env
```

### 获取 Go 依赖项

要安装必要的 Go 模块,运行:

```shell
go mod tidy
```

### 安装 Air 用于实时重载

Air 是一个允许 Go 应用程序实时重载的工具。使用以下命令安装:

```shell
go install github.com/cosmtrek/air@latest
```

### 获取前端依赖项

打开 `frontend` 目录并使用 Yarn 安装依赖项:

```shell
cd frontend
yarn install
```

## 启动

### 同时启动前后端服务

要同时启动前后端服务,运行以下命令:

```shell
make
```

### 单独启动前端服务

如果只想启动前端服务,运行以下命令:

```shell
make run-frontend
```

### 单独启动 Go 服务

如果只想启动 Go 服务,运行以下命令:

```shell
make run-backend
```

### 访问应用程序

当两个服务都运行后,打开您的网页浏览器并访问:

[http://localhost:8090](http://localhost:8090)

这将允许您查看和交互应用程序。
**[简体中文](/docs/readme_cn.md)[English](/docs/readme_en.md)[日本語](/docs/readme_ja.md)[한국어](/docs/readme_kr.md)**
## CSGHub README

CSGHub is an open source, trustworthy large model asset management platform that can assist users in governing the assets involved in the lifecycle of LLM and LLM applications (datasets, model files, codes, etc).

With CSGHub, users can perform operations on LLM assets, including uploading, downloading, storing, verifying, and distributing, through Web interface, Git command line, or natural language Chatbot. Meanwhile, the platform provides microservice submodules and standardized OpenAPIs, which could be easily integrated with users' own systems.

CSGHub is committed to bringing users an asset management platform that is natively designed for large models and can be deployed On-Premise for fully offline operation. CSGHub offers functionalities similar to a privatized Huggingface(on-premise Huggingface), managing LLM assets in a manner akin to how OpenStack Glance manages virtual machine images, Harbor manages container images, and Sonatype Nexus manages artifacts.

You can try the free SaaS version of CSGHub through the OpenCSG Community official website.https://opencsg.com/models <br>You can also jump to the [Quick Start](#quick-start) section to quickly launch your local instance and explore all the features of CSGHub.
<kbd>
<img src="/docs/images/project_intro.jpg" width='800' />
</kdb>
### UPDATES
- [2024.08.15] v0.8.0 Big release to introduce a standalone `user_server` service and move user, org, token management from `CSGHub` to backend server `CSGHub-server`, introduce a standalone `starhub_server_runner service` for uniformed deployment of Application Space, Mode Inference and Finetune. Resource management enchanced, multiple k8s clusters in different regions are not supported.
- [2024.07.15] v0.7.0 Big release to support `Multiple Resource Sync` for models and datasets, One-Click Fintune, Resource Usage Metering.
- [2024.06.21] v0.6.1 bug fix and user experience enhancement.
- [2024.06.18] v0.6.0 Big release to support `dedicated` model inference endpoint, support `streamlit` Space, allow use to `like` repos allow to set repo's `industry tag`, enhance git history and commit diff details.
- [2024.05.14] v0.5.0 Enhance Space user experience, auto build `relations` between repos(model,dataset,code and spaces), support multiple files uploading.
- [2024.04.18] v0.4.0 Allow to run `Application Space` (gradio app), add a widget to try model inference, support new repo type `Code`, support organization members management, support wechat login.
- [2024.03.15] v0.3.0 Plan: Files online editing, organization edit, dataset preview.
- [2024.02.15] v0.2.0 Improve the function of model dataset hosting, and add the feature of inviting new organization members.
- [2024.01.15] v0.1.0 CSGHub Alpha version release, supports model and dataset management functions, detailed function is as below.

### CORE FUNCTIONS
In the era of LLM, data and models are increasingly becoming the most important digital assets for businesses and individual users. However, there are currently issues such as fragmented management tools, limited management methods, and localization, which not only pose potential threats to secure operations but also might hinder the updating and iteration of enterprise-scale models. If you believe that large models will become a major driving force in the upcoming revolution, you may also be considering how to manage core assets — models, data, and large model application code — more efficiently and securely. CSGHub is an open-source project designed to address these issues.

CSGHub's core funtions(update reguarly):
- **Unified Management of LLM Assets**: A one-stop Hub for unified management of model files, datasets, and large-scale model application codes.
- **Development Ecosystem Compatibility**: Supports both HTTPS and SSH protocols for Git commands and web interface operations, ensuring convenient usage for different users.
- **Large Model Capability Expansion**: Natively supports version management, model format conversion, automatic data processing, and dataset preview functions.
- **Permissions and Security**: Supports integration with corporate user systems, setting of asset visibility, and zero-trust authentication interface design for both external and internal users, maximizing security.
- **Support for Private Deployment**: Independent of internet and cloud vendors, enabling one-click initiation of private deployment.
- **Native Design for Large Models**: Supports natural language interaction, one-click model deployment, and asset management for Agent and Copilot App.

### TECH DESIGN
The technical design of CSGHub are as follows:
- CSGHub integrates multiple technologies including Git Servers, Git LFS (Large File Storage) protocol, and Object Storage Service (OSS), providing a reliable data storage layer, a flexible infrastructure access layer, and extensive support for development tools.
- Utilizing a service-oriented architecture, CSGHub offers backend services through CSGHub Server and a management interface via CSGHub Web Service. Ordinary users can quickly initiate services using Docker compose or Kubernetes Helm Chart for enterprise-level asset management. Users with in-house development capabilities can utilize CSGHub Server for secondary development to integrate management functions into external systems or to customize advanced features.
- Leveraging outstanding open-source projects like Apache Arrow and DuckDB, CSGHub supports previewing of Parquet data file formats, facilitating localized dataset management for researchers and common users.
- CSGHub provides an intuitive web interface and permission design for enterprise organization structure. Users can realize version control management, online browsing and downloading through the web UI, as well as set the visibility scope of datasets and model files to realize data security isolation, and can also initiate topic discussions on models and datasets.

Our R&D team has been focusing on AI + DevOps for a long time, and we hope to solve the pain points in the development process of large models through the CSGHub project. We encourage everyone to contribute high-quality development and operation and maintenance documents, and work together to improve the platform, so that large models assets can be more traceable and efficient.

### DEMO VIDEO
In order to help users to quickly understand the features and usage of CSGHub, we have recorded a demo video. You can watch this video to get a quick understanding of the main features and operation procedures of this program.
- CSGHub Demo video is as below,you can also check it at [YouTube](https://www.youtube.com/watch?v=SFDISpqowXs) or [Bilibili](https://www.bilibili.com/video/BV1wk4y1X7G7/)
<video width="658" height="432" src="https://github-production-user-asset-6210df.s3.amazonaws.com/3232817/296556812-205d07f2-de9d-4a7f-b3f5-83514a71453e.mp4"></video>

### ROADMAP
- **Asset Management**
- [x] Built-in Code Repo: Built-in Code Repo management function to associate the code of model, dataset, Space space application.
- [x] Multi-source data synchronization: Support configure and enable remote repository, automatic data synchronization, support OpenCSG community, Huggingface and other remote sources。
- **AI Enhancement**
- [x] One-Click Fine-Tuning: Support integration with OpenCSG llm-finetune tool to start model fine-tuning training with one click.
- [x] One-Click Inference: Support integration with OpenCSG llm-inference tool to start model reasoning service with one click.
- **LLM App and Enterprise Features**
- [x] App Space: Support hosting Gradio/Streamlit applications and publishing them to App Space.
- [x] Fine-grained Permission Control: Fine-grained permission and access control settings for enterprise architecture.
- **Security Compliance**
- [ ] GitServer Adapter: Generic GitServer adapter to support multiple major Git repository types through Adaptor mode.
- [x] Asset Metadata: Asset metadata management mechanism, supporting customized metadata types and corresponding AutoTag rules.

The detailed roadmap is designed as follows: [full roadmap](/docs/roadmap_en.md)

### ARCHITECTURE
CSGHub is made with two typical parts: Portal and Server. This repo corresponds to CSGHub Portal, while CSGHub Server is another high-performance backend project implemented with Golang.

If you want to dive deep into CSGHub Server detail or wish to integrate the Server with your own frontend system or more, you can check the [CSGHub Server open-source project](https://github.com/OpenCSGs/csghub-server).

#### CSGHub Portal Architecture
<img src="/docs/images/portal_tech_graph.png" width='800'>

#### CSGHub Server Architecture
<img src="/docs/images/server_tech_graph.png" width='800'>

### QUICK START
You can refer to [here](/deploy/all_in_one/README.md)quickly deploy a basic csghub instance.

#### Tech docs in detail
- [setup development env](/docs/setup_en.md)

### Contributing
We welcome developers of all levels to contribute to our open-source project, CSGHub. If you would like to get involved, please refer to our [contributing guidelines](/docs/CONTRIBUTING_en.md). We look forward to your participation and suggestions.

### ACKNOWLEDGEMENTS
This project is based on Rails, Vue3, Tailwind CSS, Administrate, Postgresql, Apache Arrow, DuckDB and GoGin, whose open source contributions are deeply appreciated!

### CONTACT WITH US
If you meet any problem during usage, you can contact with us by any following way:
1. initiate an issue in github
2. join our WeChat group by scaning wechat helper qrcode
3. join our official discord channel: [OpenCSG Discord Channel](https://discord.gg/bXnu4C9BkR)
4. join our slack workspace:[OpenCSG Slack Channel](https://join.slack.com/t/opencsghq/shared_invite/zt-2fmtem7hs-s_RmMeoOIoF1qzslql2q~A)
<div style="display:inline-block">
<img src="/docs/images/wechat-assistant-new.png" width='200'>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<img src="/docs/images/discord-qrcode.png" width='200'>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<img src="/docs/images/slack-qrcode.png" width='200'>
</div>
50 changes: 50 additions & 0 deletions docs/CONTRIBUTING_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
Contributing to CSGHub
=======================

[中文版本](/CONTRIBUTING.md)

Welcome to CSGHub and thank you for your interest in contributing to this project!

CSGHub is an open-source and trusted asset management platform for large models. It aims to assist users in governing assets, such as datasets, model files, and code, throughout the lifecycle of LLM (Large Language Models) and LLM applications. CSGHub provides functionality similar to a private version of Huggingface and manages LLM assets in a way that is similar to OpenStack Glance for managing virtual machine images, Harbor for managing container images, and Sonatype Nexus for managing artifacts.

Contribution Workflow
----------------------

To contribute to the project, please follow the "fork and pull request" workflow. Please refrain from pushing directly to the main repository unless you are a maintainer.

1. Fork the repository from GitHub (https://github.com/OpenCSGs/csghub) to your own GitHub account.
2. Make the desired changes and improvements in your forked repository.
3. Create a new branch in your forked repository to accommodate your modifications. It is recommended to base your branch on the `main` branch.
4. Make the necessary modifications and improvements in your new branch.
5. Once you have completed your changes, submit a Pull Request (PR) to the `main` branch of the original repository.
6. Maintainers will review your PR, provide feedback, and engage in discussions.
7. After necessary modifications and discussions, your PR will be merged into the `main` branch.

Make sure your contributions adhere to the following guidelines:

- Maintain consistent code style with the project.
- New features or improvements should have appropriate tests.
- Document additions or modifications should be clear and understandable to facilitate usage by other developers.

Reporting Issues and Making Suggestions
----------------------

If you encounter any issues, have suggestions for improvements, or want to request new features, please report them on our [Issues](https://github.com/OpenCSGs/csghub/issues) page. We regularly review and respond to your feedback.

When reporting issues or making suggestions, please follow these guidelines:

- Provide as much detail as possible. Clearly describe what is going wrong, how it is failing, and if there are any error messages. A description like "XY doesn't work" is not helpful for troubleshooting. Always include the code you ran and, if possible, extract the relevant parts instead of including the entire script. This helps us reproduce the error.
- If you need to include long code blocks, logs, or tracebacks, wrap them in `<details>` and `</details>` tags. This collapses the content, making the issue easier to read and follow. Refer to [this link](https://developer.mozilla.org/en/docs/Web/HTML/Element/details) for more information on collapsing content.

Issue Labels
----------------------

For an overview of the labeling system we use to tag issues and pull requests, please refer to [this page](https://github.com/OpenCSGs/csghub/labels).


Local Development
----------------------

You can develop CSGHub using [Docker Compose](docs/all_in_one_readme_en.md) or your [local environment](docs/setup_en.md).

Thank you for contributing to the CSGHub project! We look forward to your involvement and suggestions.
51 changes: 51 additions & 0 deletions docs/all_in_one_readme_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
## CSGHub All-in-One Deployment Guide

This script enables the one-click deployment of an all-in-one CSGHub instance, including all related components:

* csghub-portal
* csghub-server
* nginx
* postgresql
* git-server
* minio
* casdoor
* account server
* user server
* space builder
* registry


**Notice:**
CSGHhub v0.4.0 supports the space function, and v0.7.0 supports model fine-tuning, inference. Space, model fine-tuning and inference all require Kubernetes and other related environments and configurations, since Kubernetes is not included here, the All-in-one deployment here `does not include space, model fine-tuning and inference functions`.

### Prerequisites
* Hardware

Minimum Configuration: 2c CPU / 6GB RAM / 50GB Hard Disk

Recommended Configuration: 4c CPU / 16GB RAM / 500GB Hard Disk

* Software

Any Linux OS with x86_64 architecture

Docker Engine (>=5:20.10.24)

### Usage
1. Navigate to the `all_in_one` directory.
2. Edit the `.env` file and set `SERVER_DOMAIN` to the current host's IP address or domain name. DO NOT use `127.0.0.1` or `localhost`.
3. the space and registry related configurations in .env can be ignored without Kubernetes cluster.
3. Run the `startup.sh` script. Once all services are started, you can visit the self-deployed CSGHub service at `http://[SERVER_DOMAIN]`.

### Notes
1. Self-deployed CSGHub uses local-type Docker volumes for persistence, such as for PostgreSQL and Minio. Ensure that Docker local volumes have sufficient disk space.
1. Ensure that the external port `2222` of the host is accessible, as Git operations via the SSH protocol depend on it.
1. Make sure the host's external port 31001 is accessible, which is used by the casdoor service for user registration and login.
1. The Minio console can be visited through the port `9001`. If Minio console is not required, this port can be closed.
1. By default, only HTTP protocol is supported for CSGHub services. If HTTPS is required, configure it accordingly.
1. Do not arbitrarily modify or delete the `gitdata` and `gitlog` folders. These are runtime folders mounted into relevant container services, and the owner of these folders must be `1001`. Changing file owner or deleting these directories will result in startup errors.
1. Completely remove CSGHub instance with below command:
```
docker compose -f docker-compose.yml down -v
rm -rf gitdata gitlog
```
Binary file added docs/images/demo-cover.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/discord-qrcode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/functions.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/little_asistant.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/portal_tech_graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/project_intro.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/server_tech_graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/slack-qrcode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/wechat-assistant-new.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/wechat-group-new.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/wechat-group.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit b44f009

Please sign in to comment.