Skip to content

feat(search): Qdrant vector store for semantic search#3037

Open
flash7777 wants to merge 3 commits into
opencloud-eu:mainfrom
flash7777:feat/qdrant-vector-search
Open

feat(search): Qdrant vector store for semantic search#3037
flash7777 wants to merge 3 commits into
opencloud-eu:mainfrom
flash7777:feat/qdrant-vector-search

Conversation

@flash7777

Copy link
Copy Markdown

Summary

Optional Qdrant integration for semantic (vector) search alongside the existing keyword index (Bleve/OpenSearch). When enabled, document embeddings from open_taki v2 are stored in Qdrant and freetext queries automatically search both backends.

Depends on PR #3036 (open_taki v2 protocol support).

How it works

  • Indexing: open_taki v2 returns embeddings per document β†’ stored in Qdrant with metadata (name, path, summary, entities)
  • Searching: freetext queries β†’ Bleve (keyword) + Qdrant (semantic), results merged and deduplicated
  • Structured queries (name:, tag:, mtime:): Bleve only β€” no vector overhead
  • Point IDs: OpaqueId (native UUID), stable across file moves within a space

Config

SEARCH_VECTOR_ENABLED=true
SEARCH_VECTOR_URL=http://qdrant:6333
SEARCH_VECTOR_COLLECTION=opencloud

Or YAML:

vector:
  enabled: true
  url: "http://qdrant:6333"
  collection: "opencloud"

Changes

  • qdrant/client.go β€” Lightweight REST client (upsert, search, auto-create collection)
  • config/content.go β€” VectorStore config (enabled, url, collection)
  • config/config.go β€” Vector field in main config
  • config/defaults/defaultconfig.go β€” Defaults (disabled, localhost:6333, "opencloud")
  • content/tika.go β€” GetEmbedding() for query-time embedding via open_taki
  • search/service.go β€” Qdrant ingest in doUpsertItem + semantic search merge in Search()

Backward compatible

  • Disabled by default (SEARCH_VECTOR_ENABLED=false)
  • No impact when Qdrant unavailable (graceful fallback with warning)
  • Works without open_taki (no embeddings = no vector ingest)

Tested

Deployed on cloud.brandis.eu: 34+ documents in Qdrant, 0 upsert errors.

flash and others added 3 commits June 28, 2026 16:28
When open_taki is detected, IndexSpace uses a configurable worker pool
for parallel extraction instead of sequential processing. This leverages
the LLM backend's batch capacity (vLLM max-num-seqs=16).

- 8 workers by default (configurable via SEARCH_EXTRACTOR_TIKA_MAX_WORKERS)
- Only active with open_taki v2 (classic Tika stays sequential)
- Workers use direct upsert (no batch, thread-safe)
- IsTaki() exported on Tika extractor for runtime detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Store document embeddings from open_taki v2 in Qdrant alongside the
keyword index (bleve/opensearch). Enables semantic search over all
indexed documents.

- VectorStore config: SEARCH_VECTOR_ENABLED, SEARCH_VECTOR_URL, SEARCH_VECTOR_COLLECTION
- Qdrant REST client (upsert, search, auto-create collection)
- Embedding + metadata stored per document (name, title, path, summary, entities)
- Graceful: disabled by default, no impact when Qdrant unavailable
…sults

Freetext queries automatically search both Bleve (keyword) and Qdrant
(semantic). Results are merged and deduplicated.

Structured queries (name:, tag:, mtime:, etc.) go to Bleve only.

- isFreetext() detects query type by checking for field prefixes
- GetEmbedding() on Tika extractor: sends query to open_taki for embedding
- searchVector(): Qdrant search with score threshold (0.3)
- Results merged: Qdrant hits added if not already in Bleve results
- Stat each Qdrant hit to verify access permissions
@codacy-production

Copy link
Copy Markdown

Up to standards βœ…

🟒 Issues 0 issues

Results:
0 new issues

View in Codacy

🟒 Metrics 12 complexity · 2 duplication

Metric Results
Complexity 12
Duplication 2

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant