differences.ATTgt.fit(formula: str, weights_name: str = None, control_group: str = 'never_treated', base_delta: str | list | dict = 'base', est_method: str | Callable = 'dr', as_repeated_cross_section: bool = None, boot_iterations: int = 0, random_state: int = None, alpha: float = 0.05, cluster_var: list | str = None, split_sample_by: Callable | str | dict = None, n_jobs: int = 1, backend: str = 'loky', progress_bar: bool = True) DataFrame

Computes the cohort-time-(stratum) average treatment effects:

effects for each cohort, in each time, (for each stratum).

Parameters:
formula : str

Wilkinson formula for the outcome variable and covariates

If no covariates the formula must contain only the name of the outcome variable

# example with covariates
formula = 'y ~ a + b + a:b'

# example without covariates
formula = 'y'

Formulas are implemented using formulaic, refer to its documentation for additional details.

weights_name: str = None

The name of the column containing the sampling weights. If None, all observations have same weights.

control_group: str = 'never_treated'

  • "never_treated"

  • "not_yet_treated"

base_delta: str | list | dict = 'base'

Use base period values for covariates and/or delta values, i.e. the change in value, between the value of covariates at time and the value at base period.

Available options are:

  • "base"

    the value of each covariate is set to its base period value

  • "delta"

    the value of each time-varying covariate is set to the delta. Time-constant covariates included through x_formula are dropped, and a warning issued.

  • ["base", "delta"] or "base_delta"

    the value of each covariate is set to its base period value, and the value of each time-varying covariate is set to the delta.

  • {'base': ['a', 'b', ..]}

    the value of the specified covariates is set to its base period value, and the value of each time-varying covariate is set to the delta. A warning is issued if x_formula included time-constant covariates that are not included in base_delta.

  • {'delta': ['c', 'd', ..]}

    the value of each covariate is set to its base period value, and the value of the specified time-varying covariates is set to the delta. If the covariates included in ‘delta’ are not time-varying they will be removed from the list.

  • {'base': ['a', 'b', ..], 'delta': ['c', 'd', ..]}

    the value of the specified covariates is set to its base period value, and the value of the specified time-varying covariates is set to the delta. A warning is issued if x_formula included time-constant covariates that are not included in ‘delta’. If the covariates included in ‘delta’ are not time-varying they will be removed from the list.

est_method: str | Callable = 'dr'

  • "dr-mle" or "dr"

    for locally efficient doubly robust DiD estimator, with logistic propensity score model for the probability of being treated

  • "dr-ipt"

    for locally efficient doubly robust DiD estimator, with propensity score estimated using the inverse probability tilting

  • "reg"

    for outcome regression DiD estimator

  • "std_ipw-mle" or "std_ipw"

    for standardized inverse probability weighted DiD estimator, with logistic propensity score model for the probability of being treated

as_repeated_cross_section: bool = None

boot_iterations: int = 0

random_state: int = None

alpha: float = 0.05

The significance level.

cluster_var: list | str = None

split_sample_by: Callable | str | dict = None

The name of the column along which to split the data, or a function which takes the data and returns a sample mask for a binary split, for example:

lambda: x = x['column name'] >= x['column name'].median()

The estimation of the ATT will be run separately for each specified sample; used for heterogeneity analysis.

n_jobs: int = 1

The maximum number of concurrently running jobs. If -1 all CPUs are used.

If ≠ 1, concurrent jobs will be run for two separate tasks:

  • computing the cohort-time ATT; each cohort-time is assigned to a job

  • computing the bootstrap; the influence function is split into n_jobs parts and the boostrap is computed concurrently for each part

Parallelization is implemented using joblib, refer to its documentation for additional details on n_jobs.

backend: str = 'loky'

Parallelization backend implementation.

Parallelization is implemented using joblib, refer to its documentation for additional details on backend.

progress_bar: bool = True

If True, a progress bar will display the progress over the cohort-times iterations and/or the iterations over the number of boostrap concurrent splits (not the bootstrap iterations).

Return type:

A DataFrame with the group time ATTs