class Dataset - revise/clarify mechanisms between the data object, the data files and in-memory instances
Issues:
- Dataset class seems to have two places where the data (design parameters and/or performance attributes) are stored or loaded to, e.g.:
Dataset.data['design_parameters']
-
Dataset.design_par.data
What is the difference between them?
-
Currently
dataset.import_data(flag_fromscratch = True)
deletes the data files. This is counterintuitive. When I callimport_data
, there should be no option to delete data. Maybe these two functionalities should be separated. -
Thoughout the toolkit, data is kept in many different formats: dictionary (in
Dataset.data
), a pandas.DataFrame (), and a matrix / ndarray. Can any of these be replaced with the other to reduce the need for transforming them there and forth?
Features required:
(new and existing)
-
a method to instantiate a Dataset
object from file (i.e. the definitions of the data objects, input/outputML, normalizations etc.). --> existingDataset.load_dataset_obj()
. Check if input/outputML, normalizations are being saved and restored. This would allow to resume working on a project with all settings in place. -
a separate method to load the actual data points/samples from files --> existing Dataset.load()
-
a separate method to clear data files (dataset obj, data files, checkpoints, logs?)
Documentation:
-
explain what happens when a new Dataset
instance is created with the same name and file location as a previous/existing: is it overwritten? what happens to the data files?
Edited by Alessandro Maissen