feat: add optional BM25 retrieval confidence metadata#1
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
There was a problem hiding this comment.
Code Review
This pull request introduces an include_confidence parameter to the InMemoryBM25Retriever component, allowing users to include retrieval confidence metadata when scores are scaled. The implementation covers the constructor, serialization, and both synchronous and asynchronous execution paths, supported by new unit tests. Feedback from the reviewer focused on improving documentation consistency by explicitly detailing the metadata keys in the docstrings of the run and run_async methods.
| :param include_confidence: | ||
| When `True`, adds optional retrieval confidence metadata to returned documents when `scale_score` is also | ||
| `True`. When `False`, no retrieval confidence metadata is added. |
There was a problem hiding this comment.
The docstring for include_confidence in the run method should be consistent with the one in __init__ by explicitly mentioning the metadata keys. This improves clarity for users interacting with the component's run method.
| :param include_confidence: | |
| When `True`, adds optional retrieval confidence metadata to returned documents when `scale_score` is also | |
| `True`. When `False`, no retrieval confidence metadata is added. | |
| :param include_confidence: | |
| When `True`, adds retrieval confidence metadata to returned documents when `scale_score` is also | |
| `True`. The metadata is exposed via `Document.meta["retrieval_confidence"]` and | |
| `Document.meta["retrieval_confidence_source"]`. When `False`, no retrieval confidence metadata is added. |
| :param include_confidence: | ||
| When `True`, adds optional retrieval confidence metadata to returned documents when `scale_score` is also | ||
| `True`. When `False`, no retrieval confidence metadata is added. |
There was a problem hiding this comment.
The docstring for include_confidence in the run_async method should be consistent with the one in __init__ by explicitly mentioning the metadata keys. This improves clarity for users interacting with the component's run_async method.
| :param include_confidence: | |
| When `True`, adds optional retrieval confidence metadata to returned documents when `scale_score` is also | |
| `True`. When `False`, no retrieval confidence metadata is added. | |
| :param include_confidence: | |
| When `True`, adds retrieval confidence metadata to returned documents when `scale_score` is also | |
| `True`. The metadata is exposed via `Document.meta["retrieval_confidence"]` and | |
| `Document.meta["retrieval_confidence_source"]`. When `False`, no retrieval confidence metadata is added. |
Summary
include_confidenceparameter toInMemoryBM25RetrieverDocument.metaonly wheninclude_confidence=Trueandscale_score=TrueCompatibility
Document.scoresemanticsinclude_confidence=FalseDocumentdataclass or global retriever contractsNot included
Validation
.venv\\Scripts\\python.exe -m pytest test/components/retrievers/test_in_memory_bm25_retriever.py.venv\\Scripts\\ruff.exe check haystack/components/retrievers/in_memory/bm25_retriever.py test/components/retrievers/test_in_memory_bm25_retriever.py