#59: Revise DataObject's
This MR makes the following changes/contributions and tackles problems mentioned in #59 (closed)
In domain_def.py
- add an abstract base class
Domain
to act as a super class forOptions
andInterval
New normalization.py
- Add a class
DataObjectNormalization
, which can be subclassed to implement per data object normalizations - Implement all the existing normalizations as an own subclass
New transform.py
- Add a class
DataObjectTransform
, which can be subclassed to implement per data object transformations - Implement all the existing transformations as an own subclass
In data_types.py
- Use the new classes for transformations and normalization
- Replace
_apply_transf(...)
bytransform(...)
andinverse_transform(...)
- Add setters and getters to handle DataObject name
- Add proper type annotations and documentation
- Remove the property value(), data can not be set anymore on the DataObject
- Add TODOs for further revision and simplification
New test_data_objects.py
- Add tests to validate implementation
Other
- Make the adjustments in the code base to fit the new interfaces of DataObject, this includes the toy example.
Breaking changes:
- There is no flag
flag_norm_perfeat
on theDataObject
anymore. However, you can use theper_column
flag on the normalization. So to init aDataObject
with per feature normalization do
dobj_a = DataObject("test_a", dim=1, domain=Interval(0,1), normalization="norm_0to1", norm_arg_dict={"per_column": True})
dobj_b = DataObject("test_b", dim=1, domain=Interval(0,1), normalization=ZeroToOne(per_column=True))
dobj_c = DataObject("test_c", dim=1, domain=Interval(0,1))
dobj_c.normalization = ZeroToOne(per_column=True)
- There is no
DataObject._apply_transf(...)
anymore, usetransform(...)
andinverse_transform(...)
- The masked normalization, used for instance when the domain is a MaskedInterval, must set explicitly.
dobj_a = DataReal("test_a", dim=1, domain=IntervalMasked(1,2), normalization="masked_norm_0to1")
dobj_b = DataReal("test_b", dim=1, domain=IntervalMasked(1,2), normalization=MaskedZeroToOne())
dobj_c = DataReal("test_c", dim=1, domain=IntervalMasked(1,2))
dobj_c.normalization = "masked_norm_0to1"
- Before it was possible to init a
DataReal
asdobj = DataReal("name", domain=Interval(0,1))
ordobj = DataReal("name", range=[0,1])
. The latter is not possible anymore. However you can use the new classmethod for that, i.e.,dobj = DataReal.from_range("name", vmin=0, vmax=1)
. Similar changes for init categorical variables from an option list, useDataCategorical.from_options(...)
In a nutshell this MR does simplify back-transformation of predictions, resolves problems when copying DataObjects, and partially tackles #49 (closed) by documenting the per DataObject normalzations.
Edited by Alessandro Maissen