_images/logo-paguro.png
Polars frames in a shell.

Highlights πŸš€ΒΆ

Paguro is a an open-source Python library, its features include:

Data Validation

  • Paguro introduces a new expressive API for data validation, which allows to
    • Compose a validation tree with validators for single-column, cross-column, cross-frame, nested, transformations…

    • Validate schema and data content with multiple configurations

  • Serialization/deserialization of validators

  • Fast and efficient filtering of valid/invalid rows

  • Automatic validation at each step of data manipulation with Dataset/LazyDataset

  • …and much more!

Paguro is full of many more features!

  • Data(Lazy)Frame like structures with persistent user defined information

  • Structures for deferred frame construction

  • Configurable exploratory analysis

  • Beautiful terminal and html outputs

Paguro Design Principles

Built to complement Polars

You can see Paguro as an extension of the Polars API. Use Polars structures alongside, and within, Paguro’s objects.

Lazy

We compose with Polars LazyFrame so your transformations and validation remain fully optimized.

Ease of use

Intuitive and expressive API.

Quick examplesΒΆ

In [1]: import paguro as pg

ValidationΒΆ

Validators

In [2]: valid_amount = pg.vcol("total_amount", ge=0)
orders_images/examples/datasets/orders.svg
try:
    valid_amount.validate(orders)
except pg.exceptions.ValidationError as e:
    print(e)
In [3]: valid_frame = pg.vframe(
   ...:     pg.vcol("total_amount", ge=0),
   ...:     delivery_after_order=pl.col("delivery_date") >= pl.col("order_date")
   ...: )
   ...: 
orders_images/examples/datasets/orders.svg
try:
    valid_frame.validate(orders)
except pg.exceptions.ValidationError as e:
    print(e)
In [4]: valid_frame = pg.vframe(
   ...:     pg.vcol("total_amount", ge=0),
   ...:     name="orders",
   ...:     delivery_after_order=pl.col("delivery_date") >= pl.col("order_date")
   ...: )
   ...: 

In [5]: valid_relations = pg.vrelations(
   ...:     valid_frame,
   ...:     relations="orders[customer_id] < customers[id]"
   ...: )
   ...: 
orders & customers_images/examples/datasets/orders.svg _images/examples/datasets/customers.svg
try:
    valid_relations.validate({"orders": orders, "customers": customers})
except pg.exceptions.RelationValidationError as e:
    print(e)

LicenseΒΆ

Paguro is distributed under the Apache License, Version 2.0.

Β© 2025 Bernardo Dionisi | SPDX-License-Identifier: Apache-2.0

AcknowledgementsΒΆ

Acknowledgements

The open source community continues to amaze us with their creativity and generosity in sharing knowledge. We’re thrilled to be part of this ecosystem and hope Paguro contributes something meaningful to it.

First and foremost, massive thanks to the incredible team behind Polars.

The dedication of the Polars’ team to building a lightning-fast query engine with an elegant DataFrame API made developing Paguro genuinely enjoyable.

Special appreciation goes to the many libraries that inspired parts of Paguro, some of which are: Rich showed us how beautiful terminal output can be; skimpy demonstrated the power of intuitive data summarization; pandera pushed us to think about statistical validation; dataframely encouraged us to integrate cross-frame validation; pydantic for setting the pace for powerful and intuitive validation.