- Clone the repository:
git clone [<URL-of-your-repo>](https://github.com/avijra/redhatai.git)
- In the root folder, create a new directory:
mkdir SOURCE_DOCUMENTS
- Install required libraries:
pip install -r requirements.txt
- Start the app:
streamlit run redhat_ai.py
- Build the Docker image:
docker build . -t name_of_image:version
- Running with GPUs:
- Ensure you have NVIDIA Docker utilities:
yum install nvidia-container-toolkit
- Start the container:
docker run -p 8501:8501 -it --mount src="$HOME/.cache",target=/root/.cache,type=bind --gpus=all name_of_image:version
-
If not using GPUs:
docker run -p 8501:8501 -it --mount src="$HOME/.cache",target=/root/.cache name_of_image:version
-
Alternatively, pull images directly:
docker pull avijra/redhatai_vicuna-13b-gptq:1.0 docker pull avijra/redhatai_vicuna-7b-gptq:1.0
- Image building and downloading takes time due to size (~20GB). Relax and sip your coffee ☕.
- First app run will be slower due to model download.
- App response time varies based on GPU VRAM. Estimate: 48 GiB.
- Code currently supports only 2 CUDA devices. To accommodate more, adjust in
run_redhatai.py
.
- For example:
# Current Setup model = AutoGPTQForCausalLM.from_quantized( ... max_memory={ 0: "15GIB", 1: "15GIB" }, ... ) # Adjusted for 3 devices model = AutoGPTQForCausalLM.from_quantized( ... max_memory={ 0: "15GIB", 1: "15GIB", 2: "15GIB" }, ... )