python - scikit-learn SelectPercentile TFIDF data feature reduction -
i using various mechanisms in scikit-learn create tf-idf representation of training data set , test set consisting of text features. both data sets preprocessed use same vocabulary features , number of features same. can create model on training data , assess performance on test data. wondering if use selectpercentile reduce number of features in training set after transformation, how can identify same features in test set utilise in prediction?
traindensedata = traintransformeddata.toarray() testdensedata = testtransformeddata.toarray() if ( usefeaturereduction== true): reducedtraindata = selectpercentile(f_regression,percentile=10).fit_transform(traindensedata,trainyarray) clf.fit(reducedtraindata, trainyarray) # apply feature reduction test data
you should store selectpercentile
object, , use transform
test data:
select = selectpercentile(f_regression,percentile=10) reducedtraindata = select.fit_transform(traindensedata,trainyarray) reducedtestdata = select.transform(testdensedata)
Comments
Post a Comment