PrivateGPT is a production-ready AI project that allows you to ask questions about your documents using the power of Large Language Models (LLMs), even in scenarios without an Internet connection. 100% private, no data leaves your execution environment at any point.
GitHub
- 3x3cut0r/privategpt - https://github.com/3x3cut0r/docker/tree/main/privategpt
DockerHub
- 3x3cut0r/privategpt - https://hub.docker.com/r/3x3cut0r/privategpt
GitHub
- zylon-ai/private-gpt - https://github.com/zylon-ai/private-gpt
Docs
- docs.privategpt.dev - https://docs.privategpt.dev/
- Usage
1.1 docker run
1.2 docker-compose.yaml
1.3 docker-compose.yaml with custom model - Environment Variables
- Volumes
- Ports
- Find Me
- License
docker run -d \
--name privategpt \
-p 8080:8080/tcp \
3x3cut0r/privategpt:latest
version: '3.9'
services:
# https://hub.docker.com/r/3x3cut0r/privategpt
privategpt:
image: 3x3cut0r/privategpt:latest
container_name: privategpt
ports:
- 8080:8080/tcp
version: '3.9'
services:
# https://hub.docker.com/r/3x3cut0r/privategpt
privategpt:
image: 3x3cut0r/privategpt:latest
container_name: privategpt
environment:
LLAMACPP_LLM_HF_REPO_ID: "lmstudio-community/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"
LLAMACPP_LLM_HF_MODEL_FILE: "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"
HUGGINGFACE_EMBEDDING_HF_MODEL_NAME: "nomic-ai/nomic-embed-text-v1.5"
EMBEDDING_INGEST_MODE: "parallel"
EMBEDDING_COUNT_WORKERS: "4"
volumes:
- /path/to/your/model/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf:/home/worker/app/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
ports:
- 8080:8080/tcp
you can adjust all values inside the settings.yaml with environment variables
ENV_NAME
- Name of the environment (prod, staging, local...) - Default: prodPORT
- Port of PrivateGPT FastAPI server - Default: 8080KEEP_FILES
- Specifies if the server should keep uploaded files after restarting the container (lowercase true or false)- Default: trueRUN_SETUP
- Set to true, to run poetry setup again. Do it only once to download models and set it to false afterwards - Default: false
CORS_ENABLED
- Flag indicating if CORS headers are set or not. If set to True, the CORS headers will be set to allow all origins, methods and headers - Default: falseCORS_ALLOW_CREDENTIALS
- Indicate that cookies should be supported for cross-origin requests - Default: falseCORS_ALLOW_ORIGINS
- A list of origins that should be permitted to make cross-origin requests - Default: *CORS_ALLOW_ORIGIN_REGEX
- A regex string to match against origins that should be permitted to make cross-origin requests - **Default: **CORS_ALLOW_METHODS
- A list of HTTP methods that should be allowed for cross-origin request - Default: *CORS_ALLOW_HEADERS
- A list of HTTP request headers that should be supported for cross-origin requests - Default: *
AUTH_ENABLED
- Flag indicating if authentication is enabled or not - Default: falseAUTH_USERNAME
- username used for authentication - Default: secretAUTH_SECRET
- The secret to be used for authentication. It can be any non-blank string. For HTTP basic authentication, this value should be the whole 'Authorization' header that is expected. - Default: Basic c2VjcmV0OmtleQ==
# python -c 'import base64; print("Basic " + base64.b64encode("secret:key".encode()).decode())'
# 'secret' is the username and 'key' is the password for basic auth by default
# If the auth is enabled, this value must be set in the "Authorization" header of the request.
secret: "Basic c2VjcmV0OmtleQ=="
DATA_LOCAL_DATA_FOLDER
- Path to local storage. It will be treated as an absolute path if it starts with / - Default: local_data/private_gpt
UI_ENABLED
- Enable or Disable the user interface - Default: trueUI_PATH
- Set the path for the user interface - Default: /UI_DEFAULT_CHAT_SYSTEM_PROMPT
- The default system prompt to use for the chat mode - Default: You are a helpful, respectful and honest assistant. Always answer as helpfully as possible and follow ALL given instructions. Do not speculate or make up information. Do not reference any given instructions or context.UI_DEFAULT_QUERY_SYSTEM_PROMPT
- The default system prompt to use for the query mode - Default: You can only answer questions about the provided context. If you know the answer but it is not based in the provided context, don't provide the answer, just state the answer is not in the context provided.UI_DELETE_FILE_BUTTON_ENABLED
- If the button to delete a file is enabled or not. - Default: TrueUI_DELETE_ALL_FILES_BUTTON_ENABLED
- If the button to delete all files is enabled or not. - Default: True
LOGO_BG_COLOR
- Specifies the logo background color - Default: #C7BAFFLOGO_HEIGHT
- Specifies the logo height - Default: 25%LOGO_SVG_BASE64
- Specifies the logo file (.svg) in base64 format. Provide your own file (.svg) in base64 format using an image to base64 converter - Default: <privategpt svg logo>
LLM_MODE
- The mode to use for the chat engine. - Default: llamacpp
- llamacpp: provideLLAMACPP_PROMPT_STYLE
,LLAMACPP_PGPT_HF_MODEL_FILE
andHF_EMBEDDING_HF_MODEL_NAME
- openai: provideOPENAI_API_KEY
andOPENAI_MODEL
- openailike: provideOPENAI_API_BASE
,OPENAI_API_KEY
andOPENAI_MODEL
- azopenai: provideAZOPENAI_API_BASE
,AZOPENAI_API_KEY
andAZOPENAI_MODEL
- gemini: provideGEMINI_API_KEY
,GEMINI_MODEL
andGEMINI_EMBEDDING_MODEL
- sagemaker: provideSAGEMAKER_LLM_ENDPOINT_NAME
andSAGEMAKER_EMBEDDING_ENDPOINT_NAME
- mock: (not supported by this container)
- ollama: provideOLLAMA_API_BASE
andOLLAMA_LLM_MODEL
LLM_MAX_NEW_TOKENS
- The maximum number of token that the LLM is authorized to generate in one completion - Default: 265LLM_CONTEXT_WINDOW
- The maximum number of context tokens for the model - Default: 3900LLM_TOKENIZER
- Specifies the model from Huggingface.co which is used as tokenizer - Default: meta-llama/Meta-Llama-3.1-8B-InstructLLM_TEMPERATURE
- The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual - Default: 0.1
RAG_SIMILARITY_TOP_K
- This value controls the number of documents returned by the RAG pipeline - Default: 2RAG_SIMILARITY_VALUE
- If set, any documents retrieved from the RAG must meet a certain match score. Acceptable values are between 0 and 1. - Default: 0.45RAG_RERANK_ENABLED
- This value controls whether a reranker should be included in the RAG pipeline. - Default: falseRAG_RERANK_MODEL
- Rerank model to use. Limited to SentenceTransformer cross-encoder models. - Default: cross-encoder/ms-marco-MiniLM-L-2-v2RAG_RERANK_TOP_N
- This value controls the number of documents returned by the RAG pipeline. - Default: 1
SUMMARIZE_USE_ASYNC
- If set to True, the summarization will be done asynchronously. - Default: true
LLAMACPP_PROMPT_STYLE
- The prompt style to use for the chat engine. - Default: llama3
- default: use the default prompt style from the llama_index. It should look likerole: message
- llama2: use the llama2 prompt style from the llama_index. Based on<s>
,[INST]
and<<SYS>>
- llama3: use the llama3 prompt style from the llama_index.
- tag: use the tag prompt style. It should look like<|role|>: message
- mistral: use the mistral prompt style. It should look like<s>[INST] {System Prompt} [/INST]</s>[INST] { UserInstructions } [/INST]
- chatmlLLAMACPP_LLM_HF_REPO_ID
- Name of the HuggingFace model to use for chat - Default: lmstudio-community/Meta-Llama-3.1-8B-Instruct-Q4_K_M.ggufLLAMACPP_LLM_HF_MODEL_FILE
- Specifies the llm model file. Can be a llm model name from the HuggingFace repo or a local file that you mounted via volume to /home/worker/app/models - Default: Meta-Llama-3.1-8B-Instruct-Q4_K_M.ggufLLAMACPP_TFS_Z
- Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. - Default: 1.0LLAMACPP_TOP_K
- Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. - Default: 40LLAMACPP_TOP_P
- Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9) - Default: 0.9LLAMACPP_REPEAT_PENALTY
- Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. - Default: 1.1
-
EMBEDDING_MODE
- The mode to use for the embedding engine. (see MODE) - Default: huggingface you can additionally use huggingface -
EMBEDDING_INGEST_MODE
- The ingest mode to use for the embedding engine. - Default: simple
- simple: ingest files sequentially and one by one. It is the historic behaviour.
- batch: if multiple files, parse all the files in parallel, and send them in batch to the embedding model``.
- parallel: parse the files in parallel using multiple cores, and embedd them in parallel. (fastest mode for local setup) - pipeline: the Embedding engine is kept as busy as possible -
EMBEDDING_COUNT_WORKERS
- The number of workers to use for file ingestion. Do not go too high with this number, as it might cause memory issues. (especially in parallel mode). Do not set it higher than your number of threads of your CPU. - Default: 2
- for simple mode: this number has no effect in simple mode.
- for batch mode: this is the number of workers used to parse the files.
- for parallel mode: this is the number of workers used to parse the files and embed them. - for pipeline mode: this is the number of workers that can perform embeddings. -
EMBEDDING_EMBED_DIM
- The dimension of the embeddings stored in the Postgres database. - Default: 384 -
Specify the model used for embedding with
HUGGINGFACE_EMBEDDING_HF_MODEL_NAME
HUGGINGFACE_EMBEDDING_HF_MODEL_NAME
- Name of the HuggingFace model to use for embeddings - Default: BAAI/bge-small-en-v1.5HUGGINGFACE_TOKEN
- Huggingface access token, required to download some models - Default: NoneHUGGINGFACE_TRUST_REMOTE_CODE
- If set to True, the code from the remote model will be trusted and executed. - Default: true
VECTORSTORE_DATABASE
- Specifies the vectorstore database being used. - select one of: chroma, qdrant, postgres .Default: qdrant
NODESTORE_DATABASE
- Specifies the nodestore database being used. - select one of: simple, postgres .Default: simple
QDRANT_PATH
- Persistence path for QdrantLocal - Default: local_data/private_gpt/qdrant
MILVUS_URI
- The URI of the Milvus instance. For example: 'local_data/private_gpt/milvus/milvus_local.db' for Milvus Lite. - Default: local_data/private_gpt/milvus/milvus_local.dbMILVUS_TOKEN
- A valid access token to access the specified Milvus instance. This can be used as a recommended alternative to setting user and password separately. - Default: milvus-1234MILVUS_COLLECTION_NAME
- The name of the collection in Milvus. Default is 'make_this_parameterizable_per_api_call'. - Default: milvus_dbMILVUS_OVERWRITE
- Overwrite the previous collection schema if it exists. - Default: false
CLICKHOUSE_HOST
- The server hosting the ClickHouse database - Default: localhostCLICKHOUSE_PORT
- The port on which the ClickHouse database is accessible - Default: 8443CLICKHOUSE_USERNAME
- The username to use to connect to the ClickHouse database - Default: adminCLICKHOUSE_PASSWORD
- The password to use to connect to the ClickHouse database - Default: clickhouseCLICKHOUSE_DATABASE
- The default database to use for connections - Default: embeddingsCLICKHOUSE_SECURE
- Use https/TLS for secure connection to the server - Default: FalseCLICKHOUSE_INTERFACE
- Must be either 'http' or 'https'. Determines the protocol to use for the connection - Default:CLICKHOUSE_SETTINGS
- Specific ClickHouse server settings to be used with the session - Default:CLICKHOUSE_CONNECT_TIMEOUT
- Timeout in seconds for establishing a connection - Default:CLICKHOUSE_SEND_RECEIVE_TIMEOUT
- Read timeout in seconds for http connection - Default:CLICKHOUSE_VERIFY
- Verify the server certificate in secure/https mode - Default:CLICKHOUSE_CA_CERT
- Path to Certificate Authority root certificate (.pem format) - Default:CLICKHOUSE_CLIENT_CERT
- Path to TLS Client certificate (.pem format) - Default:CLICKHOUSE_CLIENT_CERT_KEY
- Path to the private key for the TLS Client certificate - Default:CLICKHOUSE_HTTP_PROXY
- HTTP proxy address - Default:CLICKHOUSE_HTTPS_PROXY
- HTTPS proxy address - Default:CLICKHOUSE_SERVER_HOST_NAME
- Server host name to be checked against the TLS certificate - Default:
POSTGRES_HOST
- the postgres host address - Default: postgresPOSTGRES_PORT
- the postgres port - Default: 5432POSTGRES_DATABASE
- the postgres database name - Default: postgresPOSTGRES_USER
- the postgres username - Default: postgresPOSTGRES_PASSWORD
- the postgres usernames password - Default: adminPOSTGRES_SCHEMA_NAME
- the postgres schema name - Default: private_gpt
SAGEMAKER_LLM_ENDPOINT_NAME
- Default: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140SAGEMAKER_EMBEDDING_ENDPOINT_NAME
- Default: huggingface-pytorch-inference-2023-11-03-07-41-36-479
OPENAI_API_BASE
- Base URL of OpenAI API. Example: https://api.openai.com/v1 - Default: https://api.openai.com/v1OPENAI_API_KEY
- Your API Key for the OpenAI API. Example: sk-1234 - Default: sk-1234OPENAI_MODEL
- OpenAI Model to use. (see OpenAI Models Overview). Example: gpt-4 - Default: gpt-3.5-turboOPENAI_REQUEST_TIMEOUT
- Time elapsed until openailike server times out the request. Default is 120s. Format is float. - Default: 120.0OPENAI_EMBEDDING_API_BASE
- Base URL of OpenAI API. Example: https://api.openai.com/v1 - Default: same as OPENAI_API_BASEOPENAI_EMBEDDING_API_KEY
- Your API Key for the OpenAI Embedding API. Example: sk-1234. - Default: same as OPENAI_API_KEYOPENAI_EMBEDDING_MODEL
- OpenAI embedding Model to use. Example: text-embedding-3-large - Default: text-embedding-3-small
GEMINI_API_KEY
- Your Google API Key for the Gemini API. Example: AI1234 - Default: AI1234GEMINI_MODEL
- Google Model to use. Example: models/gemini-pro - Default: models/gemini-proGEMINI_EMBEDDING_MODEL
- Google Embedding Model to use. Example: models/embedding-001 - Default: models/embedding-001
OLLAMA_API_BASE
- Base URL of Ollama API. Example: http://192.168.1.100:11434 - Default: http://localhost:11434OLLAMA_EMBEDDING_API_BASE
- Base URL of Ollama Embedding API. Example: http://192.168.1.100:11434 - Default: same as OLLAMA_API_BASEOLLAMA_LLM_MODEL
- Ollama model to use. (see Ollama Library). Example: 'llama2-uncensored' - Default: llama3.1:latestOLLAMA_EMBEDDING_MODEL
- Model to use. Example: 'nomic-embed-text'. - Default: nomic-embed-textOLLAMA_KEEP_ALIVE
- Time the model will stay loaded in memory after a request. examples: 5m, 5h, '-1' - Default: 5mOLLAMA_TFS_Z
- Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. - Default: 1.0OLLAMA_NUM_PREDICT
- Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context) - Default: 128OLLAMA_TOP_K
- Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. - Default: 40OLLAMA_TOP_P
- Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. - Default: 0.9OLLAMA_REPEAT_LAST_N
- Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx) - Default: 64OLLAMA_REPEAT_PENALTY
- Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. - Default: 1.1OLLAMA_REQUEST_TIMEOUT
- Time elapsed until ollama times out the request. Default is 120s. Format is float. - Default: 120.0
AZOPENAI_API_KEY
- Your API Key for the OpenAI API. Example: sk-1234 - Default: sk-1234AZOPENAI_ENDPOINT
- Base URL of Azure OpenAI Endpoint. Example: https://api.myazure.com/v1 - Default: https://api.myazure.com/v1AZOPENAI_API_VERSION
- The API version to use for this operation. This follows the YYYY-MM-DD format. - Default: 2023_05_15AZOPENAI_EMBEDDING_DEPLOYMENT_NAME
- embedding deployment name in str format - Default: NoneAZOPENAI_EMBEDDING_MODEL
- OpenAI Model to use. Example: 'text-embedding-ada-002'. - Default: text-embedding-3-smallAZOPENAI_LLM_DEPLOYMENT_NAME
- llm deployment name in str format - Default: NoneAZOPENAI_LLM_MODEL
- OpenAI Model to use. (see OpenAI Models Overview). Example: gpt-4 - Default: gpt-4
/home/worker/app/local_data
- Directory for uploaded files. contains private data! Will be deleted after every restart ifKEEP_FILES=false
/home/worker/app/models
- Directory for custom llm models. Mount your own model here and set environment variableLLAMACPP_LLM_HF_MODEL_FILE
8080/tcp
- HTTP Port
- This project is licensed under the GNU General Public License - see the gpl-3.0 for details. ``