sispca.utils

Classes

Kernel

Custom data class for more efficient storage of kernels.

Functions

normalize_col(x[, center, scale])

Z-score normalization.

tr_cov(x)

Calculate the trace of the covariance matrix of the hidden representation.

gaussian_kernel(x[, bw])

Calculate the Gaussian kernel matrix.

delta_kernel(x)

Calculate the delta kernel matrix.

hsic_gaussian(x, y[, bw])

Calculate the HSIC between two tensors using Gaussian kernel.

hsic_linear(x, y)

Calculate the HSIC between two tensors using linear kernel.

gram_schmidt(x)

Project the data to an orthogonal space using Gram-Schmidt process.

slice_sparse_matrix(K, index)

Slice a PyTorch sparse matrix K by the indices in index such that the result is K[index, :][:, index].

Module Contents

sispca.utils.normalize_col(x, center=True, scale=True)

Z-score normalization.

Parameters:

x (2D tensor) – (n_sample, n_feature).

Returns:

(x - x.mean(dim=0)) / x.std(dim=0)

sispca.utils.tr_cov(x)

Calculate the trace of the covariance matrix of the hidden representation.

Parameters:

x (2D tensor) – (n_sample, n_feature).

Returns:

tr(x @ x.T)

sispca.utils.gaussian_kernel(x, bw=None)

Calculate the Gaussian kernel matrix.

Parameters:
  • x (2D tensor) – (n_sample, n_feature).

  • bw – Bendwidth of the Gaussian kernel. If None, will set to the median distance.

Returns

K (2D tensor): (n_sample, n_sample).

sispca.utils.delta_kernel(x)

Calculate the delta kernel matrix.

Parameters:

x (2D array) – Category labels. (n_sample, n_feature).

Returns

K (2D tensor): (n_sample, n_sample).

sispca.utils.hsic_gaussian(x, y, bw=None)

Calculate the HSIC between two tensors using Gaussian kernel.

Parameters:
  • x (2D tensor) – (n_sample, n_feature_1).

  • y (2D tensor) – (n_sample, n_feature_2).

  • bw – Bendwidth of the Gaussian kernel. If None, will set to the median distance.

Returns:

HSIC between x and y.

Return type:

HSIC (float)

sispca.utils.hsic_linear(x, y)

Calculate the HSIC between two tensors using linear kernel.

Parameters:
  • x (2D tensor) – (n_sample, n_feature_1).

  • y (2D tensor) – (n_sample, n_feature_2).

Returns:

HSIC between x and y.

Return type:

HSIC (float)

sispca.utils.gram_schmidt(x)

Project the data to an orthogonal space using Gram-Schmidt process.

Parameters:

x (2D tensor)

Returns:

data with orthonormal columns.

Return type:

x_new (2D tensor)

class sispca.utils.Kernel(target_type, Q=None, target_kernel=None)

Custom data class for more efficient storage of kernels.

Usage:

kernel = Kernel(‘continuous’, Q = target_data) # K = Q @ Q.T kernel.realization() # return the (n, n) kernel matrix kernel.subset(idx) # return the sub-kernel matrix of shape (m, m) where m = len(idx) kernel.xtKx(x) # return x.T @ K @ x

Parameters:
  • target_type (str) – One of [‘continuous’, ‘categorical’, ‘identity’,’custom’]. The type of the target data. If ‘custom’, the target_kernel should be provided.

  • Q (int or 2D tensor) – If int, Q is the dimension of the identity matrix. If 2D tensor, Q is the decomposed matrix (n_obs, n_var) where K = Q @ Q.T.

  • target_kernel (2D tensor) – The pre-calculated kernel matrix of shape (n_obs, n_obs). Applied when target_type is ‘custom’. Will be stored as a sparse tensor.

target_type
Q = None
target_kernel = None
shape
_rank = None
_sanity_check()
_shape()
realization()
xtKx(x)
subset(idx)

Helper function to extract batched inputs for training. idx (tensor) is the index of the batch.

rank()

Calculate the rank of the kernel matrix

sispca.utils.slice_sparse_matrix(K: torch.sparse_coo_tensor, index: torch.Tensor)

Slice a PyTorch sparse matrix K by the indices in index such that the result is K[index, :][:, index].

Parameters:
  • K – Input sparse matrix (torch.sparse_coo_tensor).

  • index – 1D tensor of row/column indices to slice (torch.Tensor).

Returns:

][:, index].

Return type:

A new sparse matrix (torch.sparse_coo_tensor) corresponding to K[index,