Skip to content

magda-io/magda-csv-semantic-indexer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

magda-csv-semantic-indexer

Version: 1.0.0-alpha.0

A Helm chart for Magda CSV Semantic Indexer

Homepage: https://github.com/magda-io/magda-csv-semantic-indexer

Source Code

Requirements

Kubernetes: >= 1.14.0-0

Repository Name Version
oci://ghcr.io/magda-io/charts magda-common 5.2.0

Values

Key Type Default Description
defaultAdminUserId string "00000000-0000-4000-8000-000000000000"
defaultImage.imagePullSecret bool false
defaultImage.pullPolicy string "IfNotPresent"
defaultImage.repository string "ghcr.io/magda-io"
defaultSemanticIndexerConfig.bulkEmbeddingsSize int 1
defaultSemanticIndexerConfig.bulkIndexSize int 50
defaultSemanticIndexerConfig.chunkSizeLimit int 512
defaultSemanticIndexerConfig.chunkSizeLimit int 512
defaultSemanticIndexerConfig.id string "csv-semantic-indexer"
defaultSemanticIndexerConfig.indexName string "semantic-index"
defaultSemanticIndexerConfig.indexVersion int 1
defaultSemanticIndexerConfig.overlap int 50
defaultSemanticIndexerConfig.overlap int 50
embeddingApiURL string "http://magda-embedding-api"
global object {"image":{},"rollingUpdate":{},"searchEngine":{"defaultDatasetBucket":"magda-datasets","semanticIndexer":{"indexName":null,"indexVersion":null,"knnVectorFieldConfig":{"compressionLevel":"32x","dimension":768,"efConstruction":100,"efSearch":100,"m":16,"mode":"on_disk","spaceType":"l2"},"numberOfReplicas":0,"numberOfShards":1}}} only for providing appropriate default value for helm lint
global.searchEngine.semanticIndexer.knnVectorFieldConfig.compressionLevel string "32x" The compression_level mapping parameter selects a quantization encoder that reduces vector memory consumption by the given factor.
global.searchEngine.semanticIndexer.knnVectorFieldConfig.dimension int 768 Dimension of the embedding vectors.
global.searchEngine.semanticIndexer.knnVectorFieldConfig.efConstruction int 100 Similar to efSearch but used during index construction. Higher values improve search quality but increase index build time.
global.searchEngine.semanticIndexer.knnVectorFieldConfig.efSearch int 100 The size of the candidate queue during search. Larger values may improve search quality but increase search latency.
global.searchEngine.semanticIndexer.knnVectorFieldConfig.m int 16 The maximum number of graph edges per vector. Higher values increase memory usage but may improve search quality.
global.searchEngine.semanticIndexer.knnVectorFieldConfig.mode string "on_disk" Vector workload mode: on_disk or in_memory.
image.name string "magda-csv-semantic-indexer"
minioConfig.defaultDatasetBucket string ""
minioConfig.endPoint string "magda-minio"
minioConfig.port int 9000
minioConfig.region string ""
minioConfig.useSSL bool false
opensearchURL string "http://opensearch:9200"
port int 6305 Service port configuration
resources.limits.cpu string "100m"
resources.requests.cpu string "50m"
resources.requests.memory string "200Mi"
semanticIndexer.bulkEmbeddingsSize int nil number of string we request embedding api to process in one request
semanticIndexer.bulkIndexSize int nil Number of documents we send to OpenSearch for bulk processing in a single request
semanticIndexer.chunkSizeLimit int nil The maximum number of tokens in a single chunk.
semanticIndexer.id string "" Semantic indexer ID
semanticIndexer.indexName string nil index name
semanticIndexer.indexVersion int nil index version
semanticIndexer.overlap int nil The number of overlapping tokens between chunks.

Autogenerated from chart metadata using helm-docs v1.11.0

About

A Magda semantic indexer can index CSV files

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •