python - scikit-learn SelectPercentile TFIDF data feature reduction -

September 15, 2014

i using various mechanisms in scikit-learn create tf-idf representation of training data set , test set consisting of text features. both data sets preprocessed use same vocabulary features , number of features same. can create model on training data , assess performance on test data. wondering if use selectpercentile reduce number of features in training set after transformation, how can identify same features in test set utilise in prediction?

traindensedata = traintransformeddata.toarray() testdensedata = testtransformeddata.toarray()  if ( usefeaturereduction== true):     reducedtraindata = selectpercentile(f_regression,percentile=10).fit_transform(traindensedata,trainyarray)  clf.fit(reducedtraindata, trainyarray)   # apply feature reduction test data

you should store selectpercentile object, , use transform test data:

select = selectpercentile(f_regression,percentile=10) reducedtraindata = select.fit_transform(traindensedata,trainyarray) reducedtestdata = select.transform(testdensedata)

Search This Blog

Plus Code

python - scikit-learn SelectPercentile TFIDF data feature reduction -

Comments

Post a Comment

Popular posts from this blog

r - Trouble relying on third party package imports in my package -

java - Intellij IDEA shortcut How to add new element (ex. class or package)? -

Payment information shows nothing in one page checkout page magento -