Skip to content

microsoft/DiskANN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

152 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DiskANN3: A Composable Vector Indexing Library

DiskANN3 is a composable library for bringing scalable, accurate and cost-effective vector indexing to multiple databases. It draws on research from the DiskANN project. See the research overview page for more details and references.

To use DiskANN3 in your system, you would implement the DataProvider trait for your store to describe how index terms such as vectors, adjacency lists should be store and retrieved. DiskANN3 provides vector update and query API to users and internally uses the implementation of DataProvider trait to serve these requests.

This repo offers the following Provider implementations as illustrative examples:

  • In-memory providers, for maximum performance. These are volatile and not intended for use in databases. DiskANN3 + in-memory providers outperforms HNSWlib on throughput.
  • Disk provider, for larger than memory support. This is intended to match the performormance of the first version of DiskANN reported in NeurIPS'19 Paper.
  • Garnet-based provider for high-throughput scale up vector search, and as an example of mapping to a k-v store. This outperforms all vector DBs on throughput, latency and recall.
  • Bf-tree provider as an illustration of how to connect to a B-tree in your database.

The provider for Cosmos DB NoSQL Vector Search is not included here but documented in the VLDB'25 paper.

The library supports the following algorithmic features

  • Real-time updates (using logic from IP-DiskANN and Fresh-DiskANN) that support stable recall under long update streams -- no merges, rebuilds, patches needed.
  • A diverse set of distance functions and quantizers (PQ, MinMax, Scalar, Spherical) implemented for x86 and aarch64.
  • Choice of memory tiers to allow operation at different price-performance points.
  • Vector search interfaces that allow pagination, range filters (e.g., dist<0.5), diversity aware top-k search.
  • Hooks to allow attribute filters (predicate) processing along with vector search.

Getting Started

  • Start with diskann-benchmarks to benchmark this library and its concrete implementations. This also allows you to build, store and load indices.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

See guidelines for contributing to this project.

Legacy C++ Code

PyPI version Downloads shield License: MIT

Older C++ code is retained on the cpp_main branch, and implements the following papers, but is not actively developed or maintained. This was the second rewrite of DiskANN algorithms.

DiskANN Paper DiskANN Paper DiskANN Paper

The legacy C++ code was forked off from code for NSG algorithm.

If you use the C++ version in your software please cite the following:

@misc{diskann-github,
   author = {Simhadri, Harsha Vardhan and Krishnaswamy, Ravishankar and Srinivasa, Gopal and Subramanya, Suhas Jayaram and Antonijevic, Andrija and Pryce, Dax and Kaczynski, David and Williams, Shane and Gollapudi, Siddarth and Sivashankar, Varun and Karia, Neel and Singh, Aditi and Jaiswal, Shikhar and Mahapatro, Neelam and Adams, Philip and Tower, Bryan and Patel, Yash}},
   title = {{DiskANN: Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search}},
   url = {https://github.com/Microsoft/DiskANN},
   version = {0.6.1},
   year = {2023}
}

Note

Trademarks: This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

About

A vector indexing library to bring fast, fresh and filtered search to your database

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages