Model
- class dataset2vec.model.Dataset2Vec(config: ~dataset2vec.config.Dataset2VecConfig = Dataset2VecConfig(activation_cls=<class 'torch.nn.modules.activation.ReLU'>, f_dense_hidden_size=32, f_res_hidden_size=32, f_res_n_layers=3, f_block_repetitions=7, f_out_size=32, g_layers_sizes=[32, 16, 8], h_dense_hidden_size=16, h_res_hidden_size=16, h_res_n_layers=3, h_block_repetitions=3, output_size=16), optimizer_config: ~dataset2vec.config.OptimizerConfig = OptimizerConfig(gamma=1, optimizer_cls=<class 'torch.optim.adam.Adam'>, learning_rate=0.0001, weight_decay=0.0001))
Bases:
LightningBaseDataset2Vec meta-feature extractor implemented using torch.
- calculate_loss(labels: Tensor, similarities: Tensor) Tensor
Calculates loss function which corresponds to the cross-entropy in the classification whether two datasets originate from the same source.
- Parameters:
labels (Tensor) – True labels of the data. Can be either discrete or continuous.
similarities (Tensor) – labels generated by the model.
- Returns:
value of the loss function.
- Return type:
Tensor
- forward(X: Tensor, y: Tensor) Any
Generates encoding of the dataset. The size of the output does not depend on the dimensionality of the data. The formula for the encoding is the following:
\[\varphi(x) = h\left( \frac{1}{|M||T|}\sum_{m \in M, t \in T} g\left( \frac{1}{N}\sum_{i=1, \dots, N}f(X_{i, m}, y_{i, t}) \right) \right)\]\(f\) is the network responsible for the interdependency encoding, \(g\) creates generates joint distributions representations and \(h\) generates final encoding of the dataset. \(X_{i, m}\) and \(y_{i, t}\) are the \(m\)-th feature and \(t\)-th target of the \(i\)-th observation of the dataset. \(M, T\) are cardinalities of the features and target columns.
- Parameters:
X (Tensor) – Feautre matrix
y (Tensor) – Targets matrix
- Returns:
Encoding of the input dataset with
output_sizedimensionality- Return type:
Tensor