Introduction to ContinuousHandler

[1]:
import os
import sys
from typing import Tuple

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm

sys.path.append(os.path.join(os.path.abspath(""), ".."))

from infovar import ContinuousHandler, StandardGetter, ContinuousHelper

Context

Imagine you receive a box with two displays showing a numerical value. The box also has three knobs that can be turned to increase or decrease a value.

As it happens, you’re not the only one to have received such a box. In fact, 6 of your colleagues have also received a similar box. There’s just one detail that sets them apart: on their respective boxes, one or more knobs are hidden. This makes it impossible to read the value of these knobs and turn them. The boxes are as follows, no two identical: - 3 boxes with one of the three knobs hidden, - 3 boxes with two of the three knobs hidden, - your box, with all knobs visible.

When you turn one of the knobs, and put your ear to it, you’ll remark that the hidden knobs also turn, a priori randomly. Another important detail is that, even in the case of the box with all the knobs, two similar configurations never give exactly the same value on the screens, even though they are generally quite close.

Boxes

What you don’t know is that these boxes have been sent to you by an impish statistician. The behavior of these boxes is actually governed by a simple non-deterministic mathematical formula:

\[\begin{split}\begin{array}{c}y_1\\y_2\end{array} = \begin{array}{ll} (x_1-x_2)^2 + x_3 + \varepsilon_1 & \quad\text{s.t.}\quad\varepsilon_1\sim\mathcal{N}(0, 0.05)\\x_3^2 + \varepsilon_2 & \quad\text{s.t.}\quad\varepsilon_2\sim \mathcal{N}(0, 0.1)\end{array}\end{split}\]

where \(x_i\) is knob number \(i\) and \(y_j\) is display number \(j\).

[2]:
def function(
    x1: np.ndarray, x2: np.ndarray, x3: np.ndarray
) -> Tuple[np.ndarray, np.ndarray]:
    """
    ci = cursor n°i (between -1 and 1)
    """
    assert x1.shape == x2.shape == x3.shape
    assert (
        (np.abs(x1) <= 1).all() and (np.abs(x2) <= 1).all() and (np.abs(x3) <= 1).all()
    )

    y1 = (x1 - x2) ** 2 + x3 + np.random.normal(0, 0.05, x1.shape)
    y2 = x3 ** 2 + np.random.normal(0, 0.1, x1.shape)
    return y1, y2

Your goal, and that of each of your colleagues, is to quantify the influence of the knobs on the displayed values. To do this, you will each note the values displayed by the screen, taking into account the known values of the knobs. Knob values will be sampled uniformly between -1 and 1.

Getter

In this package, a “getter” is a function used to supply data meeting certain constraints. We provide a StandardGetter class that allows you to use samples already available and return a certain number via the get method. This method will be supplied to the handler.

[3]:
n_samples = 500_000
x1 = np.random.uniform(-1, 1, n_samples)
x2 = np.random.uniform(-1, 1, n_samples)
x3 = np.random.uniform(-1, 1, n_samples)

y1, y2 = function(x1, x2, x3)

getter = StandardGetter(
    ["x1", "x2", "x3"],
    ["y1", "y2"],
    np.column_stack((x1, x2, x3)),
    np.column_stack((y1, y2)),
)

Continuous handler

The ContinuousHandler is a tool that allows you to statistically analyze the influence of knobs on the values displayed on the screen.This class also manages the storage and access of these results.

[ ]:
handler = ContinuousHandler()

handler.set_path(os.path.join("handlers", "data"))
handler.set_getter(getter.get)

handler.overview()
[ ]:
# Remove existing saves if any
handler.remove(None, "y1")
handler.remove(None, "y2")

handler.overview()

First results

[ ]:
a1 = np.min(getter.y[:, 0])
b1 = np.max(getter.y[:, 0])

a2 = np.min(getter.y[:, 1])
b2 = np.max(getter.y[:, 1])

print(f"[{a1}, {b1}], [{a2}, {b2}]")
[ ]:
settings = {
    "statistics": ["mi"],
    "windows": {
        "features": ["y1", "y2"],         # Sliding window features
        "bounds": [[a1, b1], [a2, b2]],   # Sliding window bounds
        "bounds_include_windows": True,   # The bounds does not correspond to the center of the extreme windows
        "scale": "linear",                # No logarithmic scale
        "length": [0.2, 0.2],             # A sliding window has a length of 0.2
        "points": 25,                     # Number of sliding window used
    },
    "min_samples": 200,
    "max_samples": 1_000
}

for t in ["y1", "y2"]:
    for v in ["x1", "x2", "x3"]:
        handler.overwrite(v, t, settings)
[ ]:
data = handler.read("x1", "y1", ["y1", "y2"])
print(data.keys())
print(data["mi"].keys())
[ ]:
plt.figure()

y1, y2 = data["mi"]["coords"]
samples = data["mi"]["samples"]

plt.pcolormesh(y1, y2, samples.T, cmap="Oranges", norm=LogNorm(1, None))
plt.colorbar()

plt.xlabel("$y_1$")
plt.ylabel("$y_2$")
plt.title(f"Number of samples per sliding window")

plt.show()
[10]:
vmax = {
    "y1": 0,
    "y2": 0
}
for t in ["y1", "y2"]:
    for v in ["x1", "x2", "x3"]:
        data = handler.read(v, t, ["y1", "y2"])["mi"]
        vmax[t] = max(vmax[t], np.nanmax(data["data"]))
[ ]:
for t in ["y1", "y2"]:

    plt.figure(figsize=(3*6.4, 4.8))
    for i, v in enumerate(["x1", "x2", "x3"], 1):
        data = handler.read(v, t, ["y1", "y2"])["mi"]

        y1, y2 = data["coords"]
        mi = data["data"]

        plt.subplot(1, 3, i)

        plt.pcolormesh(y1, y2, mi.T, cmap="inferno", vmin=0, vmax=vmax[t])
        plt.colorbar()

        plt.xlabel("$y_1$")
        plt.ylabel("$y_2$")
        plt.title(f"Information map for {v} and {t}")

    plt.show()

Comparison with other metrics

[12]:
# Work in progress

Influence of combined knobs

[13]:
# Work in progress

Most informative knob selection

[14]:
# Work in progress
[15]:
# Work in progress