infovar.stats package

Submodules

infovar.stats.canonical_estimators module

infovar.stats.canonical_estimators.canonical_corr(X: ndarray, Y: ndarray, max: bool = True) → float | ndarray[source]

Returns the canonical correlation coefficient of data X and Y. If max is False, returns all the singular values in decreasing order. These coefficients can be use for example to compute mutual information in the multivariate Gaussian case.

Parameters:

X (np.ndarray) – Data.
Y (np.ndarray) – Other data.
max (bool, optional) – If True, the function returns the main canonical correlation coefficient. Else, it returns all the coefficient in decreasing order. Default True.

Returns:

Main canonical correlation coefficient or all singular values in decreasing order.

Return type:

Union[float, np.ndarray]

infovar.stats.canonical_estimators.cca(X: ndarray, Y: ndarray) → Tuple[ndarray, ndarray][source]

Canonical correlation analysis.

Parameters:

X (np.ndarray) – Data.
Y (np.ndarray) – Other data.

Returns:

np.ndarray – X linear combination coefficients.
np.ndarray – Y linear combination coefficients.
np.ndarray – Main canonical correlation coefficient.

infovar.stats.canonical_estimators.contraction_matrix(X: ndarray, Y: ndarray) → Tuple[ndarray, ndarray, ndarray][source]

Returns the contraction matrix as well as the matricial square-root of the covariance matrices of data X and Y.

Parameters:

X (np.ndarray) – Data.
Y (np.ndarray) – Other data.

Returns:

np.ndarray – Contraction matrix.
np.ndarray – Matricial square-root of X covariance matrix.
np.ndarray – Matricial square-root of Y covariance matrix.

infovar.stats.entropy_estimators module

infovar.stats.entropy_estimators.centropy(x: ndarray, y: ndarray, k: int = 3, base: float = 2) → float[source]

The classic K-L k-nearest neighbor continuous entropy estimator for the: entropy of X conditioned on Y.

infovar.stats.entropy_estimators.entropy(x: ndarray, k: int = 3, base: float = 2) → float[source]

The classic K-L k-nearest neighbor continuous entropy estimator: x should be a list of vectors, e.g. x = [[1.3], [3.7], [5.1], [2.4]] if x is a one-dimensional scalar and we have four samples

infovar.stats.entropy_estimators.mi(x: ndarray, y: ndarray, k: int = 3, base: float = 2) → float[source]: Mutual information of x and y (conditioned on z if z is not None) x, y should be a list of vectors, e.g. x = [[1.3], [3.7], [5.1], [2.4]] if x is a one-dimensional scalar and we have four samples

infovar.stats.info_theory module

infovar.stats.info_theory.condh_to_mse_gaussian(condh: float | ndarray, dim: int = 1, base: float = 2) → float | ndarray[source]

Converts conditional differential entropy into estimation mean squared error (MSE) under multivariate Gaussian assumption.

Parameters:

condh (Union[float, np.ndarray]) – Conditional differential entropy.
dim (int, optional) – Dimension of multivariate Gaussian variable, by default 1 (univariate case).
base (float, optional) – Base of differential entropy, by default 2 (bits).

Returns:

Estimation mean squared error.

Return type:

Union[float, np.ndarray]

infovar.stats.info_theory.condh_to_rmse_gaussian(condh: float | ndarray, dim: int = 1, base: float = 2) → float | ndarray[source]

Converts conditional differential entropy into estimation root mean squared error (RMSE) under multivariate Gaussian assumption.

Parameters:

condh (Union[float, np.ndarray]) – Conditional differential entropy.
dim (int, optional) – Dimension of multivariate Gaussian variable, by default 1 (univariate case).
base (float, optional) – Base of differential entropy, by default 2 (bits).

Returns:

Estimation root mean squared error.

Return type:

Union[float, np.ndarray]

infovar.stats.info_theory.corr_to_info_gaussian_1d(rho: float | ndarray, base: float = 2) → float[source]

Converts Pearson correlation coefficient into mutual information under univariate Gaussian asumption.

Parameters:

rho (Union[float, np.ndarray]) – Pearson correlation coefficient or array of correlation coefficients.
base (float, optional) – Base of mutual information, by default 2 (bits).

Returns:

Mutual information between the two subsets of variables.

Return type:

float

infovar.stats.info_theory.corr_to_info_gaussian_nd(C: ndarray, I1: List[int], I2: List[int], base: float = 2) → float[source]

Converts covariance matrix into mutual information under multivariate Gaussian asumption.

Parameters:

C (np.ndarray) – Full covariance matrix of multivariate normal variable.
I1 (List[int]) – Indices of first subset of variables.
I2 (List[int]) – Indices of second subset of variables.
base (float, optional) – Base of mutual information, by default 2 (bits).

Returns:

Mutual information between the two subsets of variables.

Return type:

float

infovar.stats.info_theory.info_to_corr_gaussian(mi: float, base: float = 2) → float[source]

Converts mutual information into a Pearson correlation coefficient under multivariate Gaussian asumption.

Parameters:

mi (float) – Mutual information.
base (float, optional) – Base of mutual information, by default 2 (bits).

Returns:

Correlation coefficient.

Return type:

float

infovar.stats.preprocessing module

infovar.stats.preprocessing.break_degeneracy(data: ndarray) → ndarray[source]

Measures the sample step and add an adequate noise to break degeneracy (i.e., eliminate duplicates). Allows k-nearest neighbor estimators (e.g., entropy) to be used with data that, without processing, would cause the algorithms to fail. Note: this function does not work in all situations (for instance when applying a logarithm).

Parameters:: data (np.ndarray) – Data with potential duplicates.
Returns:: Data without duplicates. If no duplicates are found, no changes are made.
Return type:: np.ndarray

infovar.stats.ranking module

infovar.stats.ranking.prob_higher(mus: ndarray, sigmas: ndarray, idx: int | None = None, approx: bool = True, pbar: bool = False) → ndarray | float[source]

Returns the probability of a given estimation (described by an estimated value and a standard deviation) to be the highest among all provided estimations. The argument idx specifies the index of the estimation whose probability to be the highest has to be computed. If None, returns the probability for every provided estimation. Source: https://stats.stackexchange.com/questions/44139/what-is-px-1x-2-x-1x-3-x-1x-n

Parameters:

mus (np.ndarray) – Estimates.
sigmas (np.ndarray) – Uncertainty of estimates (1 sigma).
idx (Optional[int], optional) – _description_, by default None
approx (bool, optional) – If True, neglects estimates above three sigma. Default: True.
pbar (bool, optional) – If True, displays a progress bar. Default: False

Returns:

If idx is an integer, probability of the i-th estimate to be the highest. If idx is None, array of probability for each estimate.

Return type:

Union[np.ndarray, float]

infovar.stats.resampling module

class infovar.stats.resampling.Bootstrapping[source]

Bases: Resampling

compute_sigma(variables: ndarray, targets: ndarray, stat: Statistic, n: int = 10) → float[source]

Estimates the standard deviation of the estimator stat using by bootstrap. This method permits to estimate the variance of an estimator for a given data distribution. It consists in creating new datasets from the same distribution by drawing with replacement samples from existing data.

Parameters:

variables (np.ndarray) – Variable data. Must be a 2D array.
targets (np.ndarray) – Target data. Must be a 2D array with the same number of rows than variables.
stat (Statistic) – Estimator whose variance is to be estimated.
n (int, optional) – Number of bootstrap samples, by default 10

Returns:

Estimate of estimator standard deviation.

Return type:

float

class infovar.stats.resampling.Resampling[source]

Bases: ABC

abstract compute_sigma(variables: ndarray, targets: ndarray, stat: Statistic, **kwargs) → float[source]

Estimates the standard deviation of the estimator stat.

Parameters:

variables (np.ndarray) – Variable data. Must be a 2D array.
targets (np.ndarray) – Target data. Must be a 2D array with the same number of rows than variables.
stat (Statistic) – Estimator whose variance is to be estimated.

Returns:

Estimate of estimator standard deviation.

Return type:

float

class infovar.stats.resampling.Subsampling[source]

Bases: Resampling

compute_sigma(variables: ndarray, targets: ndarray, stat: Statistic, n: int = 5, min_samples: int = 20, min_subsets: int = 5, decades: float = 2) → float[source]

Estimates the standard deviation of the estimator stat using the approach proposed in Holmes, C. M., & Nemenman, I. (2019). It assumes that the variance of the estimator depends on the number of samples N as Var[stat](N) = B/N, with B being a parameter to be estimated that depends on the data distribution. This function assumes that the previous relation is true for the given estimator and compute its variance for several number of samples N by subsampling the dataset. This permit to estimate the value of B.

Parameters:

variables (np.ndarray) – Variable data. Must be a 2D array.
targets (np.ndarray) – Target data. Must be a 2D array with the same number of rows than variables.
stat (Statistic) – Estimator whose variance is to be estimated.
n (int, optional) – Number of different subset sizes, by default 5.
min_samples (int, optional) – Minimum number of samples required for a subset, by default 20.
min_subsets (int, optional) – Minimum number of subsets for a given subset size, by default 5.
decades (float, optional) – Maximum orders of magnitude between the largest and smallest subset sizes, by default 2.

Returns:

Estimate of estimator standard deviation.

Return type:

float

infovar.stats.statistics module

class infovar.stats.statistics.Condh[source]

Bases: Statistic

Conditional differential entropy.

class infovar.stats.statistics.Corr[source]

Bases: Statistic

Canonical correlation.

class infovar.stats.statistics.GaussInfo[source]

Bases: Statistic

Mutual information under multivariate Gaussian assumption.

class infovar.stats.statistics.GaussInfoReparam[source]

Bases: Statistic

Mutual information under multivariate Gaussian assumption after Gaussian reparameterization of marginals.

class infovar.stats.statistics.MI[source]

Bases: Statistic

Mutual information estimator.

static gaussianize(x: ndarray) → ndarray[source]

Reparameterization of of univariate distribution into Gaussian. Raise an error if x does not have one dimension.

Parameters:: x (np.ndarray) – Univariate data to reparameterize.
Returns:: Reparameterized data.
Return type:: np.ndarray

static marginaly_gaussianize(x: ndarray) → ndarray[source]

Reparameterization of marginal distributions of multivariate data into Gaussian ones.

Parameters:: x (np.ndarray) – Multidimensional data to reparameterize.
Returns:: Reparameterized data.
Return type:: np.ndarray

class infovar.stats.statistics.Statistic[source]

Bases: ABC

Abstract class for all statistics used in procedure Can be used to defined your own.

infovar.stats package

Submodules

infovar.stats.canonical_estimators module

infovar.stats.entropy_estimators module

infovar.stats.info_theory module

infovar.stats.preprocessing module

infovar.stats.ranking module

infovar.stats.resampling module

infovar.stats.statistics module

Module contents