#81: Decoupling the dataset from the ML logic and a lot more (!47) · Merge requests · AI-Augmented-Design / aixd

Alessandro Maissen requested to merge 81-decouple-ml-and-dataset into master Oct 09, 2023

This MR decouples aims to decouple the Dataset from the ML-related functionality. To achieve this, major refactoring and enhancements were necessary in several locations. These are

Complete revision of the DataModule
Complete revision of the checkpoint logic in CondAEModel to store datamodule parameters (e.g, batch_size) or other extra parameters (e.g., fitting parameters such as the max number of epochs)
Created a dependency between DataModule and CondAEModel, i.e, if the model was created according to the DataModule (in particular with CondAEModel.from_datamodule(...)) it is possible to restore the DataModule with fitted transformations and normalisation (without training data) from the model, i.e, the checkpoint
Complete revision of the per data block normalisation, as we need this normalisation to be pickable.
New DataBlock called TransformableDataBlock to keep track of transformed data objects and its dimensions. This should finally help to solve many problems we had with categorical variables.
Added tests for the ML model and DataModule, these are by far not complete but better than having no tests.
Adjusted the Semiramis example to the new workflow
Other stuff: Removed a lot of unused state, fixed some minor bugs in the categorical encoder

Deferred

There are some things that are not tackled in this part of the MR. This incudes adjustments in the sampler and the plotter, so they might be in a buggy state after the merge. In a second step @sluis will take care of the sampler, while @alessandro.maissen revises the plotter. This MR already adds some comments to locations were further revision is required.

Breaking Changes

Many, examples need to be updated. See the Semiramis example.

Edited Oct 13, 2023 by Alessandro Maissen

#81: Decoupling the dataset from the ML logic and a lot more

Deferred

Breaking Changes

Merge request reports