infovar.stats package
Submodules
infovar.stats.canonical_estimators module
- infovar.stats.canonical_estimators.canonical_corr(X: ndarray, Y: ndarray, max: bool = True) float | ndarray[source]
Returns the canonical correlation coefficient of data X and Y. If max is False, returns all the singular values in decreasing order. These coefficients can be use for example to compute mutual information in the multivariate Gaussian case.
- Parameters:
X (np.ndarray) – Data.
Y (np.ndarray) – Other data.
max (bool, optional) – If True, the function returns the main canonical correlation coefficient. Else, it returns all the coefficient in decreasing order. Default True.
- Returns:
Main canonical correlation coefficient or all singular values in decreasing order.
- Return type:
Union[float, np.ndarray]
- infovar.stats.canonical_estimators.cca(X: ndarray, Y: ndarray) Tuple[ndarray, ndarray][source]
Canonical correlation analysis.
- Parameters:
X (np.ndarray) – Data.
Y (np.ndarray) – Other data.
- Returns:
np.ndarray – X linear combination coefficients.
np.ndarray – Y linear combination coefficients.
np.ndarray – Main canonical correlation coefficient.
- infovar.stats.canonical_estimators.contraction_matrix(X: ndarray, Y: ndarray) Tuple[ndarray, ndarray, ndarray][source]
Returns the contraction matrix as well as the matricial square-root of the covariance matrices of data X and Y.
- Parameters:
X (np.ndarray) – Data.
Y (np.ndarray) – Other data.
- Returns:
np.ndarray – Contraction matrix.
np.ndarray – Matricial square-root of X covariance matrix.
np.ndarray – Matricial square-root of Y covariance matrix.
infovar.stats.entropy_estimators module
- infovar.stats.entropy_estimators.centropy(x: ndarray, y: ndarray, k: int = 3, base: float = 2) float[source]
- The classic K-L k-nearest neighbor continuous entropy estimator for the
entropy of X conditioned on Y.
infovar.stats.info_theory module
- infovar.stats.info_theory.condh_to_mse_gaussian(condh: float | ndarray, dim: int = 1, base: float = 2) float | ndarray[source]
Converts conditional differential entropy into estimation mean squared error (MSE) under multivariate Gaussian assumption.
- Parameters:
condh (Union[float, np.ndarray]) – Conditional differential entropy.
dim (int, optional) – Dimension of multivariate Gaussian variable, by default 1 (univariate case).
base (float, optional) – Base of differential entropy, by default 2 (bits).
- Returns:
Estimation mean squared error.
- Return type:
Union[float, np.ndarray]
- infovar.stats.info_theory.condh_to_rmse_gaussian(condh: float | ndarray, dim: int = 1, base: float = 2) float | ndarray[source]
Converts conditional differential entropy into estimation root mean squared error (RMSE) under multivariate Gaussian assumption.
- Parameters:
condh (Union[float, np.ndarray]) – Conditional differential entropy.
dim (int, optional) – Dimension of multivariate Gaussian variable, by default 1 (univariate case).
base (float, optional) – Base of differential entropy, by default 2 (bits).
- Returns:
Estimation root mean squared error.
- Return type:
Union[float, np.ndarray]
- infovar.stats.info_theory.corr_to_info_gaussian_1d(rho: float | ndarray, base: float = 2) float[source]
Converts Pearson correlation coefficient into mutual information under univariate Gaussian asumption.
- Parameters:
rho (Union[float, np.ndarray]) – Pearson correlation coefficient or array of correlation coefficients.
base (float, optional) – Base of mutual information, by default 2 (bits).
- Returns:
Mutual information between the two subsets of variables.
- Return type:
float
- infovar.stats.info_theory.corr_to_info_gaussian_nd(C: ndarray, I1: List[int], I2: List[int], base: float = 2) float[source]
Converts covariance matrix into mutual information under multivariate Gaussian asumption.
- Parameters:
C (np.ndarray) – Full covariance matrix of multivariate normal variable.
I1 (List[int]) – Indices of first subset of variables.
I2 (List[int]) – Indices of second subset of variables.
base (float, optional) – Base of mutual information, by default 2 (bits).
- Returns:
Mutual information between the two subsets of variables.
- Return type:
float
- infovar.stats.info_theory.info_to_corr_gaussian(mi: float, base: float = 2) float[source]
Converts mutual information into a Pearson correlation coefficient under multivariate Gaussian asumption.
- Parameters:
mi (float) – Mutual information.
base (float, optional) – Base of mutual information, by default 2 (bits).
- Returns:
Correlation coefficient.
- Return type:
float
infovar.stats.preprocessing module
- infovar.stats.preprocessing.break_degeneracy(data: ndarray) ndarray[source]
Measures the sample step and add an adequate noise to break degeneracy (i.e., eliminate duplicates). Allows k-nearest neighbor estimators (e.g., entropy) to be used with data that, without processing, would cause the algorithms to fail. Note: this function does not work in all situations (for instance when applying a logarithm).
- Parameters:
data (np.ndarray) – Data with potential duplicates.
- Returns:
Data without duplicates. If no duplicates are found, no changes are made.
- Return type:
np.ndarray
infovar.stats.ranking module
- infovar.stats.ranking.prob_higher(mus: ndarray, sigmas: ndarray, idx: int | None = None, approx: bool = True, pbar: bool = False) ndarray | float[source]
Returns the probability of a given estimation (described by an estimated value and a standard deviation) to be the highest among all provided estimations. The argument idx specifies the index of the estimation whose probability to be the highest has to be computed. If None, returns the probability for every provided estimation. Source: https://stats.stackexchange.com/questions/44139/what-is-px-1x-2-x-1x-3-x-1x-n
- Parameters:
mus (np.ndarray) – Estimates.
sigmas (np.ndarray) – Uncertainty of estimates (1 sigma).
idx (Optional[int], optional) – _description_, by default None
approx (bool, optional) – If True, neglects estimates above three sigma. Default: True.
pbar (bool, optional) – If True, displays a progress bar. Default: False
- Returns:
If idx is an integer, probability of the i-th estimate to be the highest. If idx is None, array of probability for each estimate.
- Return type:
Union[np.ndarray, float]
infovar.stats.resampling module
- class infovar.stats.resampling.Bootstrapping[source]
Bases:
Resampling- compute_sigma(variables: ndarray, targets: ndarray, stat: Statistic, n: int = 10) float[source]
Estimates the standard deviation of the estimator stat using by bootstrap. This method permits to estimate the variance of an estimator for a given data distribution. It consists in creating new datasets from the same distribution by drawing with replacement samples from existing data.
- Parameters:
variables (np.ndarray) – Variable data. Must be a 2D array.
targets (np.ndarray) – Target data. Must be a 2D array with the same number of rows than variables.
stat (Statistic) – Estimator whose variance is to be estimated.
n (int, optional) – Number of bootstrap samples, by default 10
- Returns:
Estimate of estimator standard deviation.
- Return type:
float
- class infovar.stats.resampling.Resampling[source]
Bases:
ABC- abstract compute_sigma(variables: ndarray, targets: ndarray, stat: Statistic, **kwargs) float[source]
Estimates the standard deviation of the estimator stat.
- Parameters:
variables (np.ndarray) – Variable data. Must be a 2D array.
targets (np.ndarray) – Target data. Must be a 2D array with the same number of rows than variables.
stat (Statistic) – Estimator whose variance is to be estimated.
- Returns:
Estimate of estimator standard deviation.
- Return type:
float
- class infovar.stats.resampling.Subsampling[source]
Bases:
Resampling- compute_sigma(variables: ndarray, targets: ndarray, stat: Statistic, n: int = 5, min_samples: int = 20, min_subsets: int = 5, decades: float = 2) float[source]
Estimates the standard deviation of the estimator stat using the approach proposed in Holmes, C. M., & Nemenman, I. (2019). It assumes that the variance of the estimator depends on the number of samples N as Var[stat](N) = B/N, with B being a parameter to be estimated that depends on the data distribution. This function assumes that the previous relation is true for the given estimator and compute its variance for several number of samples N by subsampling the dataset. This permit to estimate the value of B.
- Parameters:
variables (np.ndarray) – Variable data. Must be a 2D array.
targets (np.ndarray) – Target data. Must be a 2D array with the same number of rows than variables.
stat (Statistic) – Estimator whose variance is to be estimated.
n (int, optional) – Number of different subset sizes, by default 5.
min_samples (int, optional) – Minimum number of samples required for a subset, by default 20.
min_subsets (int, optional) – Minimum number of subsets for a given subset size, by default 5.
decades (float, optional) – Maximum orders of magnitude between the largest and smallest subset sizes, by default 2.
- Returns:
Estimate of estimator standard deviation.
- Return type:
float
infovar.stats.statistics module
- class infovar.stats.statistics.GaussInfo[source]
Bases:
StatisticMutual information under multivariate Gaussian assumption.
- class infovar.stats.statistics.GaussInfoReparam[source]
Bases:
StatisticMutual information under multivariate Gaussian assumption after Gaussian reparameterization of marginals.
- class infovar.stats.statistics.MI[source]
Bases:
StatisticMutual information estimator.