Skip to content

feat: add optional BM25 retrieval confidence metadata#1

Open
SeCuReDmE-main-dev wants to merge 1 commit into
feature/haystack-evaluator-uncertainty-phase1from
feature/haystack-retrieval-confidence-phase2
Open

feat: add optional BM25 retrieval confidence metadata#1
SeCuReDmE-main-dev wants to merge 1 commit into
feature/haystack-evaluator-uncertainty-phase1from
feature/haystack-retrieval-confidence-phase2

Conversation

@SeCuReDmE-main-dev

Copy link
Copy Markdown
Owner

Summary

  • add an opt-in include_confidence parameter to InMemoryBM25Retriever
  • attach BM25 confidence metadata via Document.meta only when include_confidence=True and scale_score=True
  • cover sync, async, and serialization behavior with targeted tests

Compatibility

  • preserves existing Document.score semantics
  • preserves default retriever behavior when include_confidence=False
  • does not change the Document dataclass or global retriever contracts

Not included

  • no cross-retriever confidence normalization
  • no new shared retriever helpers
  • no runtime/router/agent consumption changes

Validation

  • .venv\\Scripts\\python.exe -m pytest test/components/retrievers/test_in_memory_bm25_retriever.py
  • .venv\\Scripts\\ruff.exe check haystack/components/retrievers/in_memory/bm25_retriever.py test/components/retrievers/test_in_memory_bm25_retriever.py

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an include_confidence parameter to the InMemoryBM25Retriever component, allowing users to include retrieval confidence metadata when scores are scaled. The implementation covers the constructor, serialization, and both synchronous and asynchronous execution paths, supported by new unit tests. Feedback from the reviewer focused on improving documentation consistency by explicitly detailing the metadata keys in the docstrings of the run and run_async methods.

Comment on lines +148 to +150
:param include_confidence:
When `True`, adds optional retrieval confidence metadata to returned documents when `scale_score` is also
`True`. When `False`, no retrieval confidence metadata is added.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring for include_confidence in the run method should be consistent with the one in __init__ by explicitly mentioning the metadata keys. This improves clarity for users interacting with the component's run method.

Suggested change
:param include_confidence:
When `True`, adds optional retrieval confidence metadata to returned documents when `scale_score` is also
`True`. When `False`, no retrieval confidence metadata is added.
:param include_confidence:
When `True`, adds retrieval confidence metadata to returned documents when `scale_score` is also
`True`. The metadata is exposed via `Document.meta["retrieval_confidence"]` and
`Document.meta["retrieval_confidence_source"]`. When `False`, no retrieval confidence metadata is added.

Comment on lines +194 to +196
:param include_confidence:
When `True`, adds optional retrieval confidence metadata to returned documents when `scale_score` is also
`True`. When `False`, no retrieval confidence metadata is added.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring for include_confidence in the run_async method should be consistent with the one in __init__ by explicitly mentioning the metadata keys. This improves clarity for users interacting with the component's run_async method.

Suggested change
:param include_confidence:
When `True`, adds optional retrieval confidence metadata to returned documents when `scale_score` is also
`True`. When `False`, no retrieval confidence metadata is added.
:param include_confidence:
When `True`, adds retrieval confidence metadata to returned documents when `scale_score` is also
`True`. The metadata is exposed via `Document.meta["retrieval_confidence"]` and
`Document.meta["retrieval_confidence_source"]`. When `False`, no retrieval confidence metadata is added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant