- class paguro.LazyDataset[VFM: paguro.models.vfm.VFrameModel]
A LazyFrame like structure with validation and other extensions.
Constructors¶
- LazyDataset(...)
Initialize the lazydataset.
Validation + Model¶
Validation¶
- with_validation(*validators, ...) Self
Add validation to the dataset.
- property validation : Validation | None
Retrives the validation that has been added to the dataset.
- validate(*validators, ...) Self
Validate the dataset.
Model¶
- collect_model_blueprint(...) str | None
Generate a blueptrint for a model of the datased based on VFrameModel.
Information¶
EDA¶
-
skim(config: list[tuple] | None =
None, *, ...) Collection Generate a summary of the dataset based on specified configurations.
Export¶
- to_polars() polars.LazyFrame
- to_dataframe() polars.DataFrame
To Polars DataFrame. Collects if the underlying dataset is a LazyFrame.
- to_lazyframe() polars.LazyFrame
To Polars LazyFrame.
Polars Methods¶
Adapted¶
Some polars methods have been adapted to manage model/validation/info or to accept
paguro’s types as arguments. The computation over the data is still handled withpolars.- sink_parquet(path, ...) None
Sink parquet.
- group_by(*by, ...) _GroupBy[LazyDataset]
Polars’ group_by.
- group_by_dynamic(index_column, ...) _GroupBy[LazyDataset]
Polars’ group_by_dynamic.
- rolling(index_column: IntoExpr, *, ...) _GroupBy[LazyDataset]
Polars’ rolling.
- join(other: LazyDataset[U] | polars.LazyFrame, ...) Self
Polars’ join.
- join_asof(other: LazyDataset[U] | polars.LazyFrame, *, ...) Self
Polars’ join_asof.
- join_where(other: LazyDataset[U] | polars.LazyFrame, ...) Self
Polars’ join_where.
- merge_sorted(other: LazyDataset[U] | polars.LazyFrame, key) Self
Polars’ merge_sorted.
Delegated¶
-
set_sorted(column: str, *, descending: bool =
False) Self See set_sorted
- lazy() Self
Polars’ .lazy.
- approx_n_unique() Self
See approx_n_unique
- collect_async(...) Awaitable[polars.DataFrame] | _GeventDataFrameResult[polars.DataFrame]
See collect_async
- collect_batches(*, ...) Iterator[polars.DataFrame]
See collect_batches
- collect_schema() Schema
See collect_schema
- drop_nulls(...) Self
See drop_nulls
- explain(*, format: ExplainFormat = 'plain', ...) str
See explain
- filter(*predicates, ...) Self
See filter
-
gather_every(n: int, offset: int =
0) Self See gather_every
- interpolate() Self
See interpolate
- map_batches(function, ...) Self
See map_batches
- match_to_schema(schema: SchemaDict | Schema, *, ...) Self
See match_to_schema
- null_count() Self
See null_count
- pipe_with_schema(function) Self
See pipe_with_schema
- remove(*predicates, ...) Self
See remove
- select(*exprs: IntoExpr | Iterable[IntoExpr], **named_exprs) Self
See select
- select_seq(*exprs: IntoExpr | Iterable[IntoExpr], ...) Self
See select_seq
- show_graph(*, optimized: bool = True, ...) str | None
See show_graph
- sink_batches(...) polars.LazyFrame | None
See sink_batches
- sink_ndjson(...) polars.LazyFrame | None
See sink_ndjson
- with_columns(*exprs: IntoExpr | Iterable[IntoExpr], ...) Self
See with_columns
- with_columns_seq(*exprs: IntoExpr | Iterable[IntoExpr], ...) Self
See with_columns_seq
- with_context(other: Self | list[Self]) Self
See with_context
-
with_row_count(name: str =
'row_nr', offset: int =0) Self See with_row_count
-
with_row_index(name: str =
'index', offset: int =0) Self See with_row_index
Properties¶
Methods¶
- with_model[U](model: type[U], *, ...) LazyDataset[U]