scala - Categorical Variables in Apache Spark using MLib -
i relatively new world of apache spark. trying estimate large scale model using linearregressionwithsgd() estimate fixed effects , interaction terms without having create huge design matrix.
i noticed there implementation supporting categorical variables in decisiontree
that creates hash map strings integers , feeds model. has attempted similar exercise linear models in spark?
thanks.
you can use one-hot encoding convert categorical variable feature space can feed linear regression model.
for instance, if have categorical variable values: low, medium, high, can encode in 3 different integer features below:
category low medium high low 1 0 0 medium 0 1 0 high 0 0 1 this method that, there other approaches if categorical values aren't large, one-hot encoding fit.
Comments
Post a Comment