Skip to content

Add AudiencePro, statguard, and StreamXL#2

Open
Mullassery wants to merge 2 commits into
rp-libs:mainfrom
Mullassery:main
Open

Add AudiencePro, statguard, and StreamXL#2
Mullassery wants to merge 2 commits into
rp-libs:mainfrom
Mullassery:main

Conversation

@Mullassery

Copy link
Copy Markdown

New libraries

Three Rust-powered Python libraries for data engineering and ML:

Data Processing

  • statguard — Declarative data quality and validation library with a contract DSL compiled to a columnar execution plan. Schema checks, drift detection (PSI + KS), anomaly detection, and native Delta Lake/Iceberg/Parquet support. 13–25× faster than pandera and Great Expectations.
  • StreamXL — Streaming XLSX reader that processes large Excel files row-by-row at constant memory. ~27k rows/sec, 68 MB at 250k rows vs openpyxl's 911 MB. 4–5× faster overall.

Machine Learning & AI

  • AudiencePro — Customer segmentation library combining RFM analysis, KMeans/K-Prototypes clustering, drift detection, and streaming incremental updates in one package. 10–25× faster than the scikit-learn + pandas pipeline.

All three use PyO3 bindings, are pip/uv installable, and support Python 3.8+.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds three new libraries (statguard, StreamXL, and AudiencePro) to the README.md file. The review feedback points out stylistic inconsistencies regarding the use of em-dashes in the descriptions of statguard and AudiencePro, suggesting alternative phrasing to align with the rest of the document.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread README.md
- [polars](https://github.com/pola-rs/polars) - DataFrame library with a Pandas-like API.
- [pydantic-core](https://github.com/pydantic/pydantic-core) - Core validation logic for Pydantic v2.
- [rustworkx](https://github.com/Qiskit/rustworkx) - High-performance Python graph library implemented in Rust.
- [statguard](https://github.com/Mullassery/statguard) - Declarative data quality and validation library — schema checks, drift detection (PSI + KS), anomaly detection, and native Delta Lake/Iceberg support. 13–25× faster than pandera and Great Expectations.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of the em-dash () with spaces is inconsistent with the rest of the README.md file, which uses standard hyphens, colons, or prepositions for punctuation. Consider rephrasing to use a preposition like 'supporting' to maintain stylistic consistency across the list.

Suggested change
- [statguard](https://github.com/Mullassery/statguard) - Declarative data quality and validation library schema checks, drift detection (PSI + KS), anomaly detection, and native Delta Lake/Iceberg support. 13–25× faster than pandera and Great Expectations.
- [statguard](https://github.com/Mullassery/statguard) - Declarative data quality and validation library supporting schema checks, drift detection (PSI + KS), anomaly detection, and native Delta Lake/Iceberg support. 13–25× faster than pandera and Great Expectations.

Comment thread README.md

Tools for machine learning, NLP, and AI applications.

- [AudiencePro](https://github.com/Mullassery/AudiencePro) - Python library for customer segmentation — RFM analysis, KMeans/K-Prototypes clustering, drift detection, and streaming updates at 10–25× the speed of scikit-learn + pandas.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of the em-dash () with spaces is inconsistent with the rest of the README.md file, which uses standard hyphens, colons, or prepositions for punctuation. Consider rephrasing to use a preposition like 'featuring' to maintain stylistic consistency across the list.

Suggested change
- [AudiencePro](https://github.com/Mullassery/AudiencePro) - Python library for customer segmentation RFM analysis, KMeans/K-Prototypes clustering, drift detection, and streaming updates at 10–25× the speed of scikit-learn + pandas.
- [AudiencePro](https://github.com/Mullassery/AudiencePro) - Python library for customer segmentation featuring RFM analysis, KMeans/K-Prototypes clustering, drift detection, and streaming updates at 10–25× the speed of scikit-learn + pandas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant