- class paguro.Dataset[VFM: paguro.models.vfm.VFrameModel]
A DataFrame like structure with validation and other extensions.
Constructors¶
- Dataset(...)
Initializes the Dataset class.
Validation + Model¶
Validation¶
- with_validation(*validators, ...) Self
Add validation to the dataset.
- property validation : Validation | None
Retrives the validation that has been added to the dataset.
- validate(*validators, ...) Self
Validate the dataset.
Model¶
- without_model() Dataset[Any]
Remove the model and validation from the dataset.
- collect_model_blueprint(...) str | None
Generate a blueptrint for a model of the datased based on VFrameModel.
Information¶
EDA¶
-
skim(config: list[tuple] | None =
None, *, ...) Collection Generate a summary of the dataset based on specified configurations.
Export¶
- to_polars() polars.DataFrame
- to_dataframe() polars.DataFrame
To Polars DataFrame. Collects if the underlying dataset is a LazyFrame.
- to_lazyframe() polars.LazyFrame
To Polars LazyFrame.
Polars Methods¶
Adapted¶
Some polars methods have been adapted to manage model/validation/info or to accept
paguro’s types as arguments. The computation over the data is still handled withpolars.- write_parquet(file: str | Path | IO[bytes], *, ...) None
Write parquet.
- lazy() LazyDataset[VFM]
Polars’ .lazy.
- group_by_dynamic(index_column: IntoExpr, ...) _GroupBy[Dataset]
Polars’ group_by_dynamic.
- rolling(index_column: IntoExpr, *, period, ...) _GroupBy[Dataset]
Polars’ rolling.
- join_where(other: Dataset[U] | polars.DataFrame, ...) Self
Polars’ join_where.
Delegated¶
-
set_sorted(column: str, *, descending: bool =
False) Self See set_sorted
- approx_n_unique() Self
See approx_n_unique
- collect_schema() Schema
See collect_schema
- drop_in_place(name: str) polars.Series
See drop_in_place
- drop_nulls(...) Self
See drop_nulls
-
estimated_size(unit: SizeUnit =
'b') int | float See estimated_size
- filter(*predicates, ...) Self
See filter
-
gather_every(n: int, offset: int =
0) Self See gather_every
- get_column(name: str, *, ...) polars.Series | Any
See get_column
- get_columns() list[TypeAliasForwardRef('polars.Series')]
See get_columns
- insert_column(index: int, column: IntoExprColumn) Self
See insert_column
- interpolate() Self
See interpolate
- is_duplicated() polars.Series
See is_duplicated
- iter_columns() Iterator[polars.Series]
See iter_columns
-
iter_slices(n_rows: int =
10000) Iterator[polars.DataFrame] See iter_slices
- map_columns(column_names, ...) Self
See map_columns
- match_to_schema(schema: SchemaDict | Schema, *, ...) Self
See match_to_schema
- max_horizontal() polars.Series
See max_horizontal
-
mean_horizontal(*, ignore_nulls: bool =
True) polars.Series See mean_horizontal
- min_horizontal() polars.Series
See min_horizontal
- null_count() Self
See null_count
- partition_by(...) list[polars.DataFrame] | dict[tuple[Any, ...], polars.DataFrame]
See partition_by
- remove(*predicates, ...) Self
See remove
- replace_column(index: int, column: polars.Series) Self
See replace_column
- rows_by_key(key, ...) dict[Any, Any]
See rows_by_key
- select(*exprs: IntoExpr | Iterable[IntoExpr], **named_exprs) Self
See select
- select_seq(*exprs: IntoExpr | Iterable[IntoExpr], ...) Self
See select_seq
-
shrink_to_fit(*, in_place: bool =
False) Self See shrink_to_fit
-
sum_horizontal(*, ignore_nulls: bool =
True) polars.Series See sum_horizontal
-
transpose(*, include_header: bool =
False, ...) Self See transpose
- with_columns(*exprs: IntoExpr | Iterable[IntoExpr], ...) Self
See with_columns
- with_columns_seq(*exprs: IntoExpr | Iterable[IntoExpr], ...) Self
See with_columns_seq
-
with_row_count(name: str =
'row_nr', offset: int =0) Self See with_row_count
-
with_row_index(name: str =
'index', offset: int =0) Self See with_row_index
- write_avro(file: str | Path | IO[bytes], ...) None
See write_avro
-
write_clipboard(*, separator: str =
'\t', **kwargs: Any) None See write_clipboard
- write_database(table_name: str, connection, ...) int
See write_database
- write_delta(target, ...) deltalake.table.TableMerger | None
See write_delta
- write_excel(...) Workbook
See write_excel
- write_iceberg(target: str | pyiceberg.table.Table, mode) None
See write_iceberg
- write_ipc_stream(file, ...) BytesIO | None
See write_ipc_stream
-
write_json(file: IOBase | str | Path | None =
None) str | None See write_json
- write_ndjson(...) str | None
See write_ndjson
-
to_arrow(*, compat_level: CompatLevel | None =
None) pa.Table See to_arrow
- to_dummies(...) Self
See to_dummies
-
to_init_repr(n: int =
1000) str See to_init_repr