Highlights πΒΆ
Paguro is a an open-source Python library, its features include:
Data Validation
- Paguro introduces a new expressive API for data validation, which allows to
Compose a validation tree with validators for single-column, cross-column, cross-frame, nested, transformationsβ¦
Validate schema and data content with multiple configurations
Serialization/deserialization of validators
Fast and efficient filtering of valid/invalid rows
Automatic validation at each step of data manipulation with Dataset/LazyDataset
β¦and much more!
Paguro is full of many more features!
Data(Lazy)Frame like structures with persistent user defined information
Structures for deferred frame construction
Configurable exploratory analysis
Beautiful terminal and html outputs
Paguro Design Principles
Built to complement Polars |
You can see Paguro as an extension of the Polars API. Use Polars structures alongside, and within, Paguroβs objects. |
Lazy |
We compose with Polars LazyFrame so your transformations and validation remain fully optimized. |
Ease of use |
Intuitive and expressive API. |
Quick examplesΒΆ
In [1]: import paguro as pg
ValidationΒΆ
Validators
In [2]: valid_amount = pg.vcol("total_amount", ge=0)
try:
valid_amount.validate(orders)
except pg.exceptions.ValidationError as e:
print(e)
In [3]: valid_frame = pg.vframe(
...: pg.vcol("total_amount", ge=0),
...: delivery_after_order=pl.col("delivery_date") >= pl.col("order_date")
...: )
...:
try:
valid_frame.validate(orders)
except pg.exceptions.ValidationError as e:
print(e)
In [4]: valid_frame = pg.vframe(
...: pg.vcol("total_amount", ge=0),
...: name="orders",
...: delivery_after_order=pl.col("delivery_date") >= pl.col("order_date")
...: )
...:
In [5]: valid_relations = pg.vrelations(
...: valid_frame,
...: relations="orders[customer_id] < customers[id]"
...: )
...:
try:
valid_relations.validate({"orders": orders, "customers": customers})
except pg.exceptions.RelationValidationError as e:
print(e)
LicenseΒΆ
Paguro is distributed under the Apache License, Version 2.0.
Β© 2025 Bernardo Dionisi | SPDX-License-Identifier: Apache-2.0
AcknowledgementsΒΆ
Acknowledgements
The open source community continues to amaze us with their creativity and generosity in sharing knowledge. Weβre thrilled to be part of this ecosystem and hope Paguro contributes something meaningful to it.
First and foremost, massive thanks to the incredible team behind Polars.
The dedication of the Polarsβ team to building a lightning-fast query engine with an elegant DataFrame API made developing Paguro genuinely enjoyable.
Special appreciation goes to the many libraries that inspired parts of Paguro, some of which are:
Rich showed us how beautiful terminal output can be;
skimpy demonstrated the power of intuitive data summarization;
pandera pushed us to think about statistical validation;
dataframely encouraged us to integrate cross-frame validation;
pydantic for setting the pace for powerful and intuitive validation.