Update README

Signed-off-by: kingbri <[email protected]>
theroyallab · Dec 31, 2023 · 47744fe · 47744fe
1 parent 0dc12d8
commit 47744fe
Showing 1 changed file with 2 additions and 152 deletions.
diff --git a/README.md b/README.md
@@ -16,159 +16,9 @@ Please check the issues page for issues that contributors can help on. We apprec
 
 If you want to add samplers, add them in the [exllamav2 library](https://github.com/turboderp/exllamav2) and then link them to tabbyAPI.
 
-## Prerequisites
+## Getting Started
 
-To get started, make sure you have the following installed on your system:
-
-- Python 3.x (preferably 3.11) with pip
-
-- CUDA 12.x (you can also use CUDA 11.8 or ROCm 5.6, but there will be more work required to install dependencies such as Flash Attention 2)
-
-NOTE: For Flash Attention 2 to work on Windows, CUDA 12.x **must** be installed!
-
-## Installing
-
-1. Clone this repository to your machine: `git clone https://github.com/theroyallab/tabbyAPI`
-
-2. Navigate to the project directory: `cd tabbyAPI`
-
-3. Create a python environment:
-
-   1. Through venv (recommended)
-
-      1. `python -m venv venv`
-
-      2. On Windows (Using powershell or Windows terminal): `.\venv\Scripts\activate`. On Linux: `source venv/bin/activate`
-
-   2. Through conda
-
-      1. `conda create -n tabbyAPI python=3.11`
-
-      2. `conda activate tabbyAPI`
-
-4. Install the requirements file based on your system:
-
-   1. Cuda 12.x: `pip install -r requirements.txt`
-
-   2. Cuda 11.8: `pip install -r requirements-cu118.txt`
-
-   3. ROCm 5.6: `pip install -r requirements-amd.txt`
-
-## Configuration
-
-A config.yml file is required for overriding project defaults. If you are okay with the defaults, you don't need a config file!
-
-If you do want a config file, copy over `config_sample.yml` to `config.yml`. All the fields are commented, so make sure to read the descriptions and comment out or remove fields that you don't need.
-
-## Launching the Application
-
-1. Make sure you are in the project directory and entered into the venv
-
-2. Run the tabbyAPI application: `python main.py`
-
-## Updating
-
-To update tabbyAPI, just run `pip install --upgrade -r requirements.txt` using the `requirements.txt` for your configuration (ex. CUDA 11.8 or ROCm 5.6)
-
-### Update Exllamav2
-
-> [!WARNING]
-> 
-> These instructions are meant for advanced users.
-
-If the version of exllamav2 doesn't meet your specifications, you can install the dependency from various sources.
-
-NOTE:
-
-- TabbyAPI will print a warning if a sampler isn't found due to the exllamav2 version being too low.
-
-- Any upgrades using a requirements file will result in overwriting your installed wheel. To fix this, change `requirements.txt` locally, create an issue or PR, or install your version of exllamav2 after upgrades.
-
-Here are ways to install exllamav2:
-
-1. From a [wheel/release](https://github.com/turboderp/exllamav2#method-2-install-from-release-with-prebuilt-extension) (Recommended)
-
-   1. Find the version that corresponds with your cuda and python version. For example, a wheel with `cu121` and `cp311` corresponds to CUDA 12.1 and python 3.11
-
-2. From [pip](https://github.com/turboderp/exllamav2#method-3-install-from-pypi): `pip install exllamav2`
-
-   1. This is a JIT compiled extension, which means that the initial launch of tabbyAPI will take some time. The build may also not work due to improper environment configuration.
-
-3. From [source](https://github.com/turboderp/exllamav2#method-1-install-from-source)
-
-## API Documentation
-
-Docs can be accessed once you launch the API at `http://<your-IP>:<your-port>/docs`
-
-If you use the default YAML config, it's accessible at `http://localhost:5000/docs` 
-
-## Authentication
-
-TabbyAPI uses an API key and admin key to authenticate a user's request. On first launch of the API, a file called `api_tokens.yml` will be generated with fields for the admin and API keys.
-
-If you feel that the keys have been compromised, delete `api_tokens.yml` and the API will generate new keys for you.
-
-API keys and admin keys can be provided via the following request headers:
-
-- `x-api-key` and `x-admin-key` respectively
-
-- `Authorization` with the `Bearer ` prefix
-
-DO NOT share your admin key unless you want someone else to load/unload a model from your system!
-
-#### Authentication Requrirements
-
-All routes require an API key except for the following which require an **admin** key
-
-- `/v1/model/load`
-
-- `/v1/model/unload`
-
-## Chat Completions
-
-`/v1/chat/completions` now uses Jinja2 for templating. Please read [Huggingface's documentation](https://huggingface.co/docs/transformers/main/chat_templating) for more information of how chat templates work.
-
-Also make sure to set the template name in `config.yml` to the template's filename.
-
-## Common Issues
-
-- AMD cards will error out with flash attention installed, even if the config option is set to False. Run `pip uninstall flash_attn` to remove the wheel from your system.
-
-  - See [#5](https://github.com/theroyallab/tabbyAPI/issues/5)
-
-- Exllamav2 may error with the following exception: `ImportError: DLL load failed while importing exllamav2_ext: The specified module could not be found.`
-
-  - First, make sure to check if the wheel is equivalent to your python version and CUDA version. Also make sure you're in a venv or conda environment.
-
-  - If those prerequisites are correct, the torch cache may need to be cleared. This is due to a mismatching exllamav2_ext.
-
-    - In Windows: Find the cache at `C:\Users\<User>\AppData\Local\torch_extensions\torch_extensions\Cache` where `<User>` is your Windows username
-
-    - In Linux: Find the cache at `~/.cache/torch_extensions`
-
-    - look for any folder named `exllamav2_ext` in the python subdirectories and delete them.
-
-    - Restart TabbyAPI and launching should work again.
-
-## Supported Model Types
-
-TabbyAPI uses Exllamav2 as a powerful and fast backend for model inference, loading, etc. Therefore, the following types of models are supported:
-
-- Exl2 (Highly recommended)
-
-- GPTQ
-
-- FP16 (using Exllamav2's loader)
-
-#### Alternative Loaders/Backends
-
-If you want to use a different model type than the ones listed above, here are some alternative backends with their own APIs:
-
-- GGUF + GGML - [KoboldCPP](https://github.com/lostruins/KoboldCPP)
-
-- AWQ - [Aphrodite Engine](https://github.com/PygmalionAI/Aphrodite-engine)
-
-- [Text Generation WebUI](https://github.com/oobabooga/text-generation-webui)
+Read the [Wiki](https://github.com/theroyallab/tabbyAPI/wiki) for more information. It contains user-facing documentation for installation, configuration, sampling, API usage, and so much more.
 
 ## Contributing