class paguro.LazyDataset[VFM: paguro.models.vfm.VFrameModel]

A LazyFrame like structure with validation and other extensions.

Constructors

LazyDataset(...)

Initialize the lazydataset.

Validation + Model

Validation

with_validation(*validators, ...) Self

Add validation to the dataset.

property validation : Validation | None

Retrives the validation that has been added to the dataset.

validate(*validators, ...) Self

Validate the dataset.

Model

without_model() LazyDataset[Any]
collect_model_blueprint(...) str | None

Generate a blueptrint for a model of the datased based on VFrameModel.

Information

EDA

skim(config: list[tuple] | None = None, *, ...) Collection

Generate a summary of the dataset based on specified configurations.

Export

to_polars() polars.LazyFrame
to_dataframe() polars.DataFrame

To Polars DataFrame. Collects if the underlying dataset is a LazyFrame.

to_lazyframe() polars.LazyFrame

To Polars LazyFrame.

Polars Methods

Adapted

Some polars methods have been adapted to manage model/validation/info or to accept paguro’s types as arguments. The computation over the data is still handled with polars.

sink_parquet(path, ...) None

Sink parquet.

collect(**kwargs: Any) Dataset[VFM]

Polars’ .lazy().

group_by(*by, ...) _GroupBy[LazyDataset]

Polars’ group_by.

group_by_dynamic(index_column, ...) _GroupBy[LazyDataset]

Polars’ group_by_dynamic.

rolling(index_column: IntoExpr, *, ...) _GroupBy[LazyDataset]

Polars’ rolling.

join(other: LazyDataset[U] | polars.LazyFrame, ...) Self

Polars’ join.

join_asof(other: LazyDataset[U] | polars.LazyFrame, *, ...) Self

Polars’ join_asof.

join_where(other: LazyDataset[U] | polars.LazyFrame, ...) Self

Polars’ join_where.

merge_sorted(other: LazyDataset[U] | polars.LazyFrame, key) Self

Polars’ merge_sorted.

rename(mapping, ...) Self

Polars’ rename.

Delegated

set_sorted(column: str, *, descending: bool = False) Self

See set_sorted

lazy() Self

Polars’ .lazy.

approx_n_unique() Self

See approx_n_unique

bottom_k(k: int, *, by: IntoExpr | Iterable[IntoExpr], ...) Self

See bottom_k

cache() Self

See cache

cast(dtypes, ...) Self

See cast

clear(n: int = 0) Self

See clear

clone() Self

See clone

collect_async(...) Awaitable[polars.DataFrame] | _GeventDataFrameResult[polars.DataFrame]

See collect_async

collect_batches(*, ...) Iterator[polars.DataFrame]

See collect_batches

collect_schema() Schema

See collect_schema

property columns : Any

See columns

count() Self

See count

describe(...) polars.DataFrame

See describe

drop(*columns, ...) Self

See drop

drop_nans(...) Self

See drop_nans

drop_nulls(...) Self

See drop_nulls

property dtypes : Any

See dtypes

explain(*, format: ExplainFormat = 'plain', ...) str

See explain

explode(columns, ...) Self

See explode

fetch(n_rows: int = 500, **kwargs: Any) polars.DataFrame

See fetch

fill_nan(value: int | float | Expr | None) Self

See fill_nan

fill_null(value: Any | Expr | None = None, ...) Self

See fill_null

filter(*predicates, ...) Self

See filter

first() Self

See first

gather_every(n: int, offset: int = 0) Self

See gather_every

head(n: int = 5) Self

See head

inspect(fmt: str = '{}') Self

See inspect

interpolate() Self

See interpolate

last() Self

See last

limit(n: int = 5) Self

See limit

map_batches(function, ...) Self

See map_batches

match_to_schema(schema: SchemaDict | Schema, *, ...) Self

See match_to_schema

max() Self

See max

mean() Self

See mean

median() Self

See median

melt(...) Self

See melt

min() Self

See min

null_count() Self

See null_count

pipe(function, ...) T

See pipe

pipe_with_schema(function) Self

See pipe_with_schema

profile(*, ...) tuple[polars.DataFrame, polars.DataFrame]

See profile

quantile(quantile: float | Expr, ...) Self

See quantile

remote(...) pc.LazyFrameRemote

See remote

remove(*predicates, ...) Self

See remove

reverse() Self

See reverse

property schema : Any

See schema

select(*exprs: IntoExpr | Iterable[IntoExpr], **named_exprs) Self

See select

select_seq(*exprs: IntoExpr | Iterable[IntoExpr], ...) Self

See select_seq

shift(n: int | IntoExprColumn = 1, *, ...) Self

See shift

show_graph(*, optimized: bool = True, ...) str | None

See show_graph

sink_batches(...) polars.LazyFrame | None

See sink_batches

sink_csv(...) polars.LazyFrame | None

See sink_csv

sink_ipc(...) polars.LazyFrame | None

See sink_ipc

sink_ndjson(...) polars.LazyFrame | None

See sink_ndjson

slice(offset: int, length: int | None = None) Self

See slice

sort(by: IntoExpr | Iterable[IntoExpr], *more_by, ...) Self

See sort

sql(query: str, *, table_name: str = 'self') Self

See sql

std(ddof: int = 1) Self

See std

sum() Self

See sum

tail(n: int = 5) Self

See tail

top_k(k: int, *, by: IntoExpr | Iterable[IntoExpr], ...) Self

See top_k

unique(...) Self

See unique

unnest(columns, ...) Self

See unnest

unpivot(...) Self

See unpivot

update(other: polars.LazyFrame, ...) Self

See update

var(ddof: int = 1) Self

See var

property width : Any

See width

with_columns(*exprs: IntoExpr | Iterable[IntoExpr], ...) Self

See with_columns

with_columns_seq(*exprs: IntoExpr | Iterable[IntoExpr], ...) Self

See with_columns_seq

with_context(other: Self | list[Self]) Self

See with_context

with_row_count(name: str = 'row_nr', offset: int = 0) Self

See with_row_count

with_row_index(name: str = 'index', offset: int = 0) Self

See with_row_index

Properties

property vcol : VFM
property model : VFM | None

Methods

with_model[U](model: type[U], *, ...) LazyDataset[U]
with_info(k: str, /, **mapping: Any) Self
with_name(name: str | None) Self