python - scikit-learn SelectPercentile TFIDF data feature reduction -


i using various mechanisms in scikit-learn create tf-idf representation of training data set , test set consisting of text features. both data sets preprocessed use same vocabulary features , number of features same. can create model on training data , assess performance on test data. wondering if use selectpercentile reduce number of features in training set after transformation, how can identify same features in test set utilise in prediction?

traindensedata = traintransformeddata.toarray() testdensedata = testtransformeddata.toarray()  if ( usefeaturereduction== true):     reducedtraindata = selectpercentile(f_regression,percentile=10).fit_transform(traindensedata,trainyarray)  clf.fit(reducedtraindata, trainyarray)   # apply feature reduction test data 

you should store selectpercentile object, , use transform test data:

select = selectpercentile(f_regression,percentile=10) reducedtraindata = select.fit_transform(traindensedata,trainyarray) reducedtestdata = select.transform(testdensedata) 

Comments

Popular posts from this blog

javascript - AngularJS custom datepicker directive -

javascript - jQuery date picker - Disable dates after the selection from the first date picker -