sispca.utils

Classes

Kernel

Custom data class for more efficient storage of kernels.

Functions

`normalize_col`(x[, center, scale])	Z-score normalization.
`tr_cov`(x)	Calculate the trace of the covariance matrix of the hidden representation.
`gaussian_kernel`(x[, bw])	Calculate the Gaussian kernel matrix.
`delta_kernel`(x)	Calculate the delta kernel matrix.
`hsic_gaussian`(x, y[, bw])	Calculate the HSIC between two tensors using Gaussian kernel.
`hsic_linear`(x, y)	Calculate the HSIC between two tensors using linear kernel.
`gram_schmidt`(x)	Project the data to an orthogonal space using Gram-Schmidt process.
`slice_sparse_matrix`(K, index)	Slice a PyTorch sparse matrix K by the indices in index such that the result is K[index, :][:, index].

Module Contents

sispca.utils.normalize_col(x, center=True, scale=True)

Z-score normalization.

Parameters:: x (2D tensor) – (n_sample, n_feature).
Returns:: (x - x.mean(dim=0)) / x.std(dim=0)

sispca.utils.tr_cov(x)

Calculate the trace of the covariance matrix of the hidden representation.

Parameters:: x (2D tensor) – (n_sample, n_feature).
Returns:: tr(x @ x.T)

sispca.utils.gaussian_kernel(x, bw=None)

Calculate the Gaussian kernel matrix.

Parameters:

x (2D tensor) – (n_sample, n_feature).
bw – Bendwidth of the Gaussian kernel. If None, will set to the median distance.

Returns: K (2D tensor): (n_sample, n_sample).

sispca.utils.delta_kernel(x)

Calculate the delta kernel matrix.

Parameters:: x (2D array) – Category labels. (n_sample, n_feature).

Returns: K (2D tensor): (n_sample, n_sample).

sispca.utils.hsic_gaussian(x, y, bw=None)

Calculate the HSIC between two tensors using Gaussian kernel.

Parameters:

x (2D tensor) – (n_sample, n_feature_1).
y (2D tensor) – (n_sample, n_feature_2).
bw – Bendwidth of the Gaussian kernel. If None, will set to the median distance.

Returns:

HSIC between x and y.

Return type:

HSIC (float)

sispca.utils.hsic_linear(x, y)

Calculate the HSIC between two tensors using linear kernel.

Parameters:

x (2D tensor) – (n_sample, n_feature_1).
y (2D tensor) – (n_sample, n_feature_2).

Returns:

HSIC between x and y.

Return type:

HSIC (float)

sispca.utils.gram_schmidt(x)

Project the data to an orthogonal space using Gram-Schmidt process.

Parameters:: x (2D tensor)
Returns:: data with orthonormal columns.
Return type:: x_new (2D tensor)

class sispca.utils.Kernel(target_type, Q=None, target_kernel=None)

Custom data class for more efficient storage of kernels.

Usage:: kernel = Kernel(‘continuous’, Q = target_data) # K = Q @ Q.T kernel.realization() # return the (n, n) kernel matrix kernel.subset(idx) # return the sub-kernel matrix of shape (m, m) where m = len(idx) kernel.xtKx(x) # return x.T @ K @ x

Parameters:

target_type (str) – One of [‘continuous’, ‘categorical’, ‘identity’,’custom’]. The type of the target data. If ‘custom’, the target_kernel should be provided.
Q (int or 2D tensor) – If int, Q is the dimension of the identity matrix. If 2D tensor, Q is the decomposed matrix (n_obs, n_var) where K = Q @ Q.T.
target_kernel (2D tensor) – The pre-calculated kernel matrix of shape (n_obs, n_obs). Applied when target_type is ‘custom’. Will be stored as a sparse tensor.

target_type

Q = None

target_kernel = None

shape

_rank = None

_sanity_check()

_shape()

realization()

xtKx(x)

subset(idx): Helper function to extract batched inputs for training. idx (tensor) is the index of the batch.

rank(): Calculate the rank of the kernel matrix

sispca.utils.slice_sparse_matrix(K: torch.sparse_coo_tensor, index: torch.Tensor)

Slice a PyTorch sparse matrix K by the indices in index such that the result is K[index, :][:, index].

Parameters:

K – Input sparse matrix (torch.sparse_coo_tensor).
index – 1D tensor of row/column indices to slice (torch.Tensor).

Returns:

][:, index].

Return type:

A new sparse matrix (torch.sparse_coo_tensor) corresponding to K[index,