infovar.handlers package

Submodules

infovar.handlers.continuous_handler module

class infovar.handlers.continuous_handler.ContinuousHandler[source]

Bases: Handler

Class for easily calculating, manipulating and saving calculations of statistical relationships between variables and targets estimated over sliding windows. The term continuous means that the calculation is performed for a large number of windows in order to approach a continuous result.

check_settings(settings: Dict[str, Any]) Dict[str, Any][source]

Verifies the validity of settings. If necessary, can return a modified version of it. It does not modify the dictionnary in-place.

Parameters:

settings (Dict[str, Any]) –

Settings dictionnary for statistics computation. Format:

  • statistics: List[str] – names of statistics to compute. If you want to use a custom statistic, consider calling set_additional_stats before.

  • windows:

    • features: str or List[str] – sliding window features. These may be targets other than those for which the statistics are calculated.

    • bounds: List[int, int] or List[List[int, int]] – bounds [start, stop] of sliding windows. The order is the same as for the features list.

    • bounds_include_windows: bool or List[bool], optional – whether the bounds correspond to the limits for the center of the sliding windows (False) or to the side of the windows (True).

    • scale: Literal[linear, log] | List[Literal[linear, log]], optional – scale of sliding windows. Note that the scale can differ between two sliding windows.

    • length: float | str | List[float or str], optional – length of the window. If scale is linear, it correspond to an additive offset. If scale is log, it correspond to a multiplicative factor between the two extremities of the window. One, and only one, field among length and num_windows has to be provided.

    • num_windows: float or List[float], optional – number of non-overlapping sliding window for each feature. One, and only one, field among length and num_windows has to be provided.

    • points: int or List[int], optional – number of sliding windows. One, and only one, field among points and overlap has to be provided.

    • overlap: float or str or List[float or str], optional – Percentage of overlap between two consecutive sliding windows (the interpretation depends on the chosen scale). This constrain the number of windows. One, and only one, field among points and overlap has to be provided.

  • min_samples: int, optional – minimum number of samples to use for computation. If the actual number of available samples is lower, the result is set to NaN.

  • max_samples: int, optional – maximum number of samples to use for computation. If the actual number of available samples is higher, max_samples random samples are drawn.

  • uncertainty: Dict[key, entry], optional – key is a statistic name and entry is a Dict with keys “name” (field to provide the name of the Resampler to use) and “args” (field to provide keyword arguments for Resampler). If you want to use a custom resampling, consider calling set_additional_resamplings before.

Returns:

Potentially amended settings dictionnary.

Return type:

Dict[str, Any]

create(x_names: str | Sequence[str], y_names: str | Sequence[str]) None[source]

Creates the statistics directory if not exists as well as the pickle files for features in x_names and y_names.

Parameters:
  • x_names (Union[str, Sequence[str]]) – Variable names.

  • y_names (Union[str, Sequence[str]]) – Target names.

delete_stats(x_names: List[str] | str | None, y_names: str | List[str], stats: str | List[str]) None[source]

Removes stats stats for targets y_names and variables x_names. If x_names is omitted, the stats are removed for any variable with the specified target.

Parameters:
  • x_names (Optional[Union[str, List[str]]]) – Variables. If None, the statistics are removed for any variables.

  • y_names (Union[str, List[str]]) – Targets.

  • stats (Union[str, List[str]]) – Statistic names.

ext: str = '.pickle'

File extension

filename_main_sep = '___'

Separator between targets and variables

filename_secondary_sep = '+'

Separator between individual targets or variables

get_available_stats(x_names: str | List[str], y_names: str | List[str], window_features: str | List[str]) List[str][source]

Returns all available statistics for targets y_names, variables x_names and sliding window over window_features in saves.

Parameters:
  • x_names (Union[str, List[str]]) – Variables.

  • y_names (Union[str, List[str]]) – Targets.

  • window_features (Union[str, List[str]]) – Features for sliding window.

Returns:

Available statistics.

Return type:

List[str]

get_available_targets() List[List[str]][source]

Returns all available targets in saves.

Returns:

Available targets in saves.

Return type:

List[List[str]]

get_available_variables(y_names: None | str | List[str]) List[List[str]][source]

Returns all available variables for targets y_names in saves.

Parameters:

y_names (Union[None, str, List[str]]) – Targets.

Returns:

Available variables in saves.

Return type:

List[List[str]]

get_available_window_features(x_names: str | List[str], y_names: str | List[str]) List[List[str]][source]

Returns all available sliding window features for targets y_names and variables x_names in saves.

Parameters:
  • x_names (Union[str, List[str]]) – Variables.

  • y_names (Union[str, List[str]]) – Targets.

Returns:

Available sliding window features.

Return type:

List[List[str]]

get_filename(x_names: str | Sequence[str], y_names: str | Sequence[str]) str[source]

Builds a save filename from target names.

Parameters:
  • x_names (Union[str, Sequence[str]]) – Variable names.

  • y_names (Union[str, Sequence[str]]) – Target names.

Returns:

Filename.

Return type:

str

parse_filename(filename: str) Tuple[Sequence[str], Sequence[str]][source]

Identifies variables and targets from formatted save filename.

Parameters:

filename (str) – Save filename.

Returns:

  • Sequence[str] – Identified variables.

  • Sequence[str]] – Identified targets.

read(x_names: str | Sequence[str], y_names: str | Sequence[str], wins_features: str | Sequence[str]) Dict[str, Any][source]

Returns entries for variables x_names, targets y_names and sliding window features wins_features.

Parameters:
  • x_names (Union[str, Sequence[str]]) – Variable names.

  • y_names (Union[str, Sequence[str]]) – Target names.

  • wins_features (Union[str, Sequence[str]]) – Sliding window feature names.

Returns:

_description_

Return type:

Dict[str, Any]

remove(x_names: Sequence[str | Sequence[str]] | None, y_names: Sequence[str | Sequence[str]] | None) None[source]

Removes saved results. If x_names and y_names are both None, remove the whole directory if exists. If x_names is None, remove all pickle files that match targets y_names. If y_names is None, remove all pickle files that match variables x_names. If x_names and y_names are both not None, remove this specific pickle file.

Parameters:
  • x_names (Optional[Sequence[Union[str, Sequence[str]]]]) – Variable names. If None, remove all files that match targets y_names.

  • y_names (Optional[Sequence[Union[str, Sequence[str]]]]) – Target names. If None, remove all files that match variables x_names.

store(x_names: str | List[str], y_names: str | List[str], settings: Dict[str, Any], overwrite: bool = False, raise_error: bool = False) None[source]

Computes and saves statistics. Detailed instructions are provided by settings. If overwrite is True, existing results are overwritten. Else, they are kept.

Parameters:
  • x_names (Union[str, List[str]]) – Variable or set of variable names.

  • y_names (Union[str, List[str]]) – Target or set of target names.

  • settings (Dict[str, Any]) – Instructions for computation. More details on the dictionnary format are given in the check_settings documentation.

  • overwrite (bool, optional) – Whether existing results must be overwritten, by default False (existing results kept).

  • raise_error (bool, optional) – Whether the function should propagate errors that occur during the calculation of statistics. If False, the entries are set to None, by default True.

infovar.handlers.discrete_handler module

class infovar.handlers.discrete_handler.DiscreteHandler[source]

Bases: Handler

Class for easily calculating, manipulating and saving calculations of statistical relationships between variables and targets according to predefined situations (restrictions). The term “discrete” means that the calculation is performed for a finite number of independent restrictions.

static check_settings(settings: Dict[str, Any]) Dict[str, Any][source]

Verifies the validity of settings. If necessary, can return a modified version of it. It does not modify the dictionnary in-place.

Parameters:

settings (Dict[str, Any]) –

Settings dictionnary for statistics computation. Format:

  • statistics: List[str] – names of statistics to compute. If you want to use a custom statistic, consider calling set_additional_stats before.

  • restrictions: List[str], optional – names of restrictions to use. The definition of the restriction are provided by the set_restrictions method.

  • uncertainty: Dict[key, entry], optional – key is a statistic name and entry is a Dict with keys “name” (field to provide the name of the Resampler to use) and “args” (field to provide keyword arguments for Resampler). If you want to use a custom resampling, consider calling set_additional_resamplings before.

  • min_samples: int, optional – minimum number of samples to use for computation. If the actual number of available samples is lower, the result is set to None.

  • max_samples: int, optional – maximum number of samples to use for computation. If the actual number of available samples is higher, max_samples random samples are drawn.

Returns:

Potentially amended settings dictionnary.

Return type:

Dict[str, Any]

create(y_names: str | Sequence[str])[source]

Create the statistics directory if not exists as well as the JSON files for features in y_names.

Parameters:

y_names (Union[str, Sequence[str]]) – Target names.

delete_stats(x_names: List[str] | str | None, y_names: str | List[str], stats: str | List[str]) None[source]

Removes stats stats for variables x_names and targets y_names . If x_names is omitted, the stats are removed for any variable with the specified target.

Parameters:
  • x_names (Optional[Union[str, List[str]]], optional) – Variable names. If None, the statistics are removed for any variables.

  • y_names (Union[str, List[str]]) – Target names.

  • stats (Union[str, List[str]]) – Statistic names.

ext: str = '.json'

File extension

filename_sep: str = '_'

Separator between targets

get_available_restrictions(x_names: str | List[str], y_names: str | List[str]) List[str][source]

Returns all available restrictions for targets variables x_names and y_names in saves.

Parameters:
  • x_names (Union[str, List[str]]) – Variables.

  • y_names (Union[str, List[str]]) – Targets.

Returns:

Available restrictions.

Return type:

List[str]

get_available_stats(x_names: str | List[str], y_names: str | List[str], restriction: str) List[str][source]

Returns all available statistics for variables x_names, targets y_names, and restriction restriction in saves.

Parameters:
  • x_names (Union[str, List[str]]) – Variables.

  • y_names (Union[str, List[str]]) – Targets.

  • restriction (Union[str, List[str]]) – Restriction.

Returns:

Available statistics.

Return type:

List[str]

get_available_targets() List[List[str]][source]

Returns all available targets in saves.

Returns:

Available targets in saves.

Return type:

List[List[str]]

get_available_variables(y_names: str | List[str]) List[List[str]][source]

Returns all available variables for targets y_names in saves.

Parameters:

y_names (Union[None, str, List[str]]) – Targets.

Returns:

Available variables in saves.

Return type:

List[List[str]]

get_filename(y_names: str | Sequence[str]) str[source]

Builds a save filename from target names.

Parameters:

y_names (Union[str, Sequence[str]]) – Target names.

Returns:

Filename.

Return type:

str

parse_filename(filename: str) Sequence[str][source]

Identifies data names from save filename.

Parameters:

filename (str) – Save filename.

Returns:

Target names.

Return type:

Sequence[str]

read(x_names: str | List[str] | Iterable[List[str]], y_names: str | List[str], restr: str, iterable_x: bool = False, default: str = 'raise') Dict[str, Any] | List[Dict[str, Any]][source]

Returns entries for variables x_names and targets y_names.

Parameters:
  • x_names (Union[str, List[str], Iterable[List[str]]]) – Variables. If Iterable, you must set the iterable_x argument to True.

  • y_names (Union[str, List[str]]) – Targets.

  • restr (str) – Restriction.

  • iterable_x (bool, optional) – If True, the x_names argument is considered as a list of different variables. Default False.

  • default (Any, optional) – Default behavior if entry does not exists. If “raise”, an error is raised. Else, default is returned instead. By default “raise”.

Returns:

Dictionnary corresponding to variables, targets and restrictions. If iterable_x is True, list of Dictionnary.

Return type:

Union[Dict[str, Any], List[Dict[str, Any]]]

remove(y_names: List[str] | str | None) None[source]

Removes saved results. If y_names is None, remove the entire self.save_path directory. If y_names is not None, only remove the corresponding JSON file, if exists. If not, raise an error.

Parameters:

y_names (Optional[Union[List[str], str]]) – Name of target file to remove. If None, all saves are deleted.

restrictions: Dict[str, Dict] | None = None

Dict of current restrictions

set_restrictions(d: Dict[str, Dict[str, Tuple[float, float]]]) None[source]

Set new restrictions, i.e., the constraints on one or more targets that reduce the number of data samples that can be used in the calculation.

Parameters:

d (Dict[str, Dict[str, Tuple[float, float]]]) – New restrictions.

store(x_names: str | List[str] | Iterable[List[str]], y_names: str | List[str], settings: Dict[str, Any], overwrite: bool = False, iterable_x: bool = False, save_every: int = 1, progress_bar: bool = True, total_iter: int | None = None, raise_error: bool = True) None[source]

Computes and saves statistics. Detailed instructions are provided by settings. If overwrite is True, existing results are overwritten. Else, they are kept. If iterable_x is True, the function assumes that x_names is an list of variables or sets of variables.

Parameters:
  • x_names (Union[str, List[str], Iterable[List[str]]]) – Variable or set of variable names. If iterable_x is True, list of variable or set of variable names.

  • y_names (Union[str, List[str]]) – Target or set of target names.

  • settings (Dict[str, Any]) – Instructions for computation. More details on the dictionnary format are given in the check_settings documentation.

  • overwrite (bool, optional) – Whether existing results must be overwritten, by default False (existing results kept).

  • iterable_x (bool, optional) – Whether x_names is a list of variables or sets of variables, by default False.

  • save_every (int, optional) – Defines how many variables the backup should be updated with. Increasing the value of this argument speeds up the program by reducing the number of times the backup file is written (ignored if iterable_x is False), by default 1

  • progress_bar (bool, optional) – Whether a progress bar has to be displayed (ignored if iterable_x is False), by default True.

  • total_iter (int, optional) – Number of elements in iterable. Useful when the iterable is not a Sequence, by default None.

  • raise_error (bool, optional) – Whether the function should propagate errors that occur during the calculation of statistics. If False, the entries are set to None, by default True.

infovar.handlers.getters module

class infovar.handlers.getters.StandardGetter(x_names: List[str], y_names: List[str], x: ndarray, y: ndarray)[source]

Bases: object

Class implementing a get function for handlers.

Initializer.

Parameters:
  • x_names (List[str]) – Variable names.

  • y_names (List[str]) – Target names.

  • x (np.ndarray) – Variable data. Must have x.shape[0] == len(x_names).

  • y (np.ndarray) – Target data. Must have y.shape[0] == len(y_names).

get(x_features: List[str], y_features: List[str], restrictions: Dict[str, Tuple[float]], max_samples: int | None = None) Tuple[ndarray, ndarray][source]

Returns variable and target data that verifies the restrictions provided by restriction dictionnary. If max_samples is not None, it precises the maximum number of random samples to draw.

Parameters:
  • x_features (List[str]) – Names of features to return.

  • y_features (List[str]) – Names of targets to return.

  • restrictions (Dict[str, Tuple[float]]) – Dictionnary of restrictions on variable or target values.

  • max_samples (Optional[int], optional) – If not None, maximum number of random samples to draw, by default None.

Returns:

  • np.ndarray – Selected variable data.

  • np.ndarray – Selected target data.

x

Variable data

x_names

Variable names.

y

Target data

y_names

Target names.

infovar.handlers.handler module

class infovar.handlers.handler.Handler[source]

Bases: ABC

Abstract class for handlers.

create()[source]

Creates self.save_path directory if not exists.

abstract delete_stats(*args, **kwargs) None[source]

Removes saved results for a given statistics.

static drop_duplicates(ls: List[List[str]])[source]
ext: str

Save files extension.

get_existing_saves() List[str][source]

Returns the filenames (basenames) of any existing saves at self.save_path. Any file ending with “cls.ext” is considered a valid save.

Returns:

Existing saves.

Return type:

List[str]

abstract get_filename(*args, **kwargs) str[source]

Builds a save filename from data names.

Returns:

Filename.

Return type:

str

getter: Callable[[List[str], List[str], Dict[str, Tuple[float, float]]], Tuple[ndarray, ndarray]]

Function providing samples for further computation.

overview()[source]

Describes the handler and existing backups.

overwrite(*args, **kwargs) None[source]

Calls self.store method with overwrite=True.

abstract parse_filename(filename: str) Any[source]

Identifies data names from save filename.

Parameters:

filename (str) – Save filename.

Returns:

Data names.

Return type:

Any

abstract read(**kwargs) Dict[str, Any][source]

Accesses saved values.

Returns:

Read entries.

Return type:

Dict[str, Any]

abstract remove(*args, **kwargs) None[source]

Removes self.save_path directory if exists.

resamplings: Dict[str, Resampling]

Dictionnary of available resamplings.

save_path: str

Save directory.

set_additional_resamplings(additional_resamplings: Dict[str, Resampling] = {}) None[source]

Add new resamplings (instances of Resampling) to estimate the variance of some estimators. Each resampling has a user-defined name. This name will then be reused, for example in the store function and its variants.

Parameters:

additional_resamplings (Dict[str, Resampling], optional) – Additional resamplings to be used. Default: {}.

set_additional_stats(additional_stats: Dict[str, Statistic] = {}) None[source]

Add new resamplings (instances of Statistic) to estimate the informativity of variables. Each statistic has a user-defined name. This name will then be reused, for example in the store function and its variants.

Parameters:

additional_stats (Dict[str, Statistic], optional) – Additional statistics to be used. Default {}.

set_getter(getter: Callable[[List[str], List[str], Dict[str, Tuple[float, float]]], Tuple[ndarray, ndarray]]) None[source]

Defines the function (getter) that provides samples for statistical relationship calculations. In most cases, this will correspond to the get method of the StandardGetter, but users can define their own implementation.

Parameters:

getter (Callable[ [List[str], List[str], Dict[str, Tuple[float, float]]], Tuple[np.ndarray, np.ndarray] ]) – Function providing samples for further computation.

set_path(save_path: str | None = None) None[source]

Defines a new path to a save directory. Must be called at least once before calling other functions such as store.

Parameters:

save_path (Optional[str], optional) – New save path. Default None.

stats: Dict[str, Callable]

Dictionnary of available statistics.

abstract store(overwrite: bool = False, **kwargs) None[source]

Compute and save values. The behavior depends on the values of the overwrite argument.

Parameters:

overwrite (bool, optional) – If True, overwrite the current computed value, if exists. Default: False.

update(*args, **kwargs) None[source]

Calls self.store method with overwrite=False.

Module contents