You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, there is no built-in way to encode categorical features/targets which means the burden is on the user to manually construct their data set in a way that will work with the system.
One option could be that presented in PR #47, where the labels (categories) are encoded into a vector. For example, the Iris data set is as follows:
sepal_length
sepal_width
petal_length
petal_width
species
5.1
3.5
1.4
0.2
Iris-setosa
7.0
3.2
4.7
1.4
Iris-versicolor
6.3
3.3
6.0
2.5
Iris-virginica
Which is transformed into the encoding:
sepal_length
sepal_width
petal_length
petal_width
species_being_Iris-setosa
species_being_Iris-versicolor
species_being_Iris-virginica
5.1
3.5
1.4
0.2
1.0
0.0
0.0
7.0
3.2
4.7
1.4
0.0
1.0
0.0
6.3
3.3
6.0
2.5
0.0
0.0
1.0
Another option would to be provide a parsing function that can do this automatically, similarly to what is done in lgp.examples.Iris:
val targetLabels =setOf("Iris-setosa", "Iris-versicolor", "Iris-virginica")
val featureIndices =0..3val targetIndex =4val datasetLoader =CsvDatasetLoader(
reader =BufferedReader(
// Load from the resource file.InputStreamReader(this.datasetStream)
),
featureParseFunction = { header:Header, row:Row->val features = row.zip(header)
.slice(featureIndices)
.map { (featureValue, featureName) ->Feature(
name = featureName,
value = featureValue.toDouble()
)
}
Sample(features)
},
targetParseFunction = { _:Header, row:Row->val target = row[targetIndex]
// ["Iris-setosa", "Iris-versicolor", "Iris-virginica"] -> [0.0, 1.0, 2.0]Targets.Single(targetLabels.indexOf(target).toDouble())
}
)
The text was updated successfully, but these errors were encountered:
At the moment, there is no built-in way to encode categorical features/targets which means the burden is on the user to manually construct their data set in a way that will work with the system.
One option could be that presented in PR #47, where the labels (categories) are encoded into a vector. For example, the Iris data set is as follows:
Which is transformed into the encoding:
Another option would to be provide a parsing function that can do this automatically, similarly to what is done in
lgp.examples.Iris
:The text was updated successfully, but these errors were encountered: