Skip to content

MinMaxScaler error

What's happening:

MinMaxScaler encounters/produces an error when transforming a multidimensional variable where some columns have constant values. It doesn't happen for 1-dim variable with constant values, nor when in a multi-dim variable each column has (different or same) constant values.

C:\Users\aapolina\CODE\aixd\src\aixd\data\transform.py:295: RuntimeWarning: invalid value encountered in divide
  data_mat_std = centered / span

Triggered by DataModule.from_dataset().

Although it doesn't break the execution, it then leads to incorrect artefacts downstream. For example, in training, all losses are nan.0 , e.g. train/loss=nan.0

To reproduce:

import numpy as np
import pandas as pd
from aixd.data import DataReal, DesignParameters, PerformanceAttributes, Dataset, Interval
from aixd.mlmodel.data.data_loader import DataModule

n=7
x = np.hstack([np.random.uniform(low=0, high=10, size=(n,1)), np.ones((n,1))]).tolist()
y = np.random.uniform(low=0, high=10, size=(n,1)).tolist()
df = pd.DataFrame({'x':x, 'y':y})
 

dp = DesignParameters(name='DP',dobj_list= [DataReal(name='x',dim = 2, domain = Interval(0,10))])
pa = PerformanceAttributes(name='PA',dobj_list = [DataReal(name='y', dim=1)])

dataset = Dataset(name='test', design_par=dp,perf_attributes=pa, overwrite=True)
dataset.import_data_from_df(df)

datamodule = DataModule.from_dataset(dataset, input_ml_names=['x'], output_ml_names=['y'], batch_size=2)
Edited by Ania Apolinarska