feat(search): Qdrant vector store for semantic search by flash7777 · Pull Request #3037 · opencloud-eu/opencloud

flash7777 · 2026-06-28T14:30:15Z

Summary

Optional Qdrant integration for semantic (vector) search alongside the existing keyword index (Bleve/OpenSearch). When enabled, document embeddings from open_taki v2 are stored in Qdrant and freetext queries automatically search both backends.

Depends on PR #3036 (open_taki v2 protocol support).

How it works

Indexing: open_taki v2 returns embeddings per document → stored in Qdrant with metadata (name, path, summary, entities)
Searching: freetext queries → Bleve (keyword) + Qdrant (semantic), results merged and deduplicated
Structured queries (name:, tag:, mtime:): Bleve only — no vector overhead
Point IDs: OpaqueId (native UUID), stable across file moves within a space

Config

SEARCH_VECTOR_ENABLED=true
SEARCH_VECTOR_URL=http://qdrant:6333
SEARCH_VECTOR_COLLECTION=opencloud

Or YAML:

vector:
  enabled: true
  url: "http://qdrant:6333"
  collection: "opencloud"

Changes

qdrant/client.go — Lightweight REST client (upsert, search, auto-create collection)
config/content.go — VectorStore config (enabled, url, collection)
config/config.go — Vector field in main config
config/defaults/defaultconfig.go — Defaults (disabled, localhost:6333, "opencloud")
content/tika.go — GetEmbedding() for query-time embedding via open_taki
search/service.go — Qdrant ingest in doUpsertItem + semantic search merge in Search()

Backward compatible

Disabled by default (SEARCH_VECTOR_ENABLED=false)
No impact when Qdrant unavailable (graceful fallback with warning)
Works without open_taki (no embeddings = no vector ingest)

Tested

Deployed on cloud.brandis.eu: 34+ documents in Qdrant, 0 upsert errors.

When open_taki is detected, IndexSpace uses a configurable worker pool for parallel extraction instead of sequential processing. This leverages the LLM backend's batch capacity (vLLM max-num-seqs=16). - 8 workers by default (configurable via SEARCH_EXTRACTOR_TIKA_MAX_WORKERS) - Only active with open_taki v2 (classic Tika stays sequential) - Workers use direct upsert (no batch, thread-safe) - IsTaki() exported on Tika extractor for runtime detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Store document embeddings from open_taki v2 in Qdrant alongside the keyword index (bleve/opensearch). Enables semantic search over all indexed documents. - VectorStore config: SEARCH_VECTOR_ENABLED, SEARCH_VECTOR_URL, SEARCH_VECTOR_COLLECTION - Qdrant REST client (upsert, search, auto-create collection) - Embedding + metadata stored per document (name, title, path, summary, entities) - Graceful: disabled by default, no impact when Qdrant unavailable

…sults Freetext queries automatically search both Bleve (keyword) and Qdrant (semantic). Results are merged and deduplicated. Structured queries (name:, tag:, mtime:, etc.) go to Bleve only. - isFreetext() detects query type by checking for field prefixes - GetEmbedding() on Tika extractor: sends query to open_taki for embedding - searchVector(): Qdrant search with score threshold (0.3) - Results merged: Qdrant hits added if not already in Bleve results - Stat each Qdrant hit to verify access permissions

codacy-production · 2026-06-28T14:32:14Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 12 complexity · 2 duplication

Metric Results

Complexity 12

Duplication 2

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

flash and others added 3 commits June 28, 2026 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(search): Qdrant vector store for semantic search#3037

feat(search): Qdrant vector store for semantic search#3037
flash7777 wants to merge 3 commits into
opencloud-eu:mainfrom
flash7777:feat/qdrant-vector-search

flash7777 commented Jun 28, 2026

Uh oh!

codacy-production Bot commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

flash7777 commented Jun 28, 2026

Summary

How it works

Config

Changes

Backward compatible

Tested

Uh oh!

codacy-production Bot commented Jun 28, 2026

Up to standards ✅

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant