python - sklearn LinearRegression.Predict() issue -
i trying predict call volume call center based on various other factors. have clean dataset, small well, enough. able train , test historical data , score, summary, etc. life of me unable figure out how predict future calls using forecasted factor data. data below:
date daynum factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 factor9 variabletopredict 9/17/2014 1 592 83686.46 0 0 250 15911.8 832 99598.26 177514 72 9/18/2014 2 1044 79030.09 0 0 203 23880.55 1238 102910.64 205064 274 9/19/2014 3 707 84207.27 0 0 180 8143.32 877 92350.59 156360 254 9/20/2014 4 707 97577.78 0 0 194 16688.95 891 114266.73 196526 208 9/21/2014 5 565 83084.57 0 0 153 13097.04 713 96181.61 143678 270
the code have far below:
from sklearn import metrics sklearn.preprocessing import standardscaler sklearn.cross_validation import kfold, cross_val_score sklearn.linear_model import linearregression import pandas pd d = pd.read_csv("h://my documents//python scripts//rawdata//q2917.csv", "r", delimiter=",") e = pd.read_csv("h://my documents//python scripts//rawdata//fy16q2917test.csv", "r", delimiter=",") #print(d) #b = pd.dataframe.as_matrix(d) #print(b) x = d.as_matrix(['factor2', 'factor4', 'factor5', 'factor6']) y = d.as_matrix(['variabletopredict']) x1 = e.as_matrix(['factor2', 'factor4', 'factor5', 'factor6']) y1 = e.as_matrix(['variabletopredict']) #print(len(train)) #print(target) #use scaler scalerx = standardscaler() train = scalerx.fit_transform(x1) scalery = standardscaler() target = scalery.fit_transform(y1) clf = linearregression(fit_intercept=true) cv = kfold(len(train), 10, shuffle=true, random_state=33) #decf = linearregression.decision_function(train, target) test = linearregression.predict(train, target) score = cross_val_score(clf,train, target,cv=cv ) print("score: {}".format(score.mean()))
this of course gives me error there nulls in y values, there because blank , trying predict it. problem here is, new enough python fundamentally misunderstanding how should built. if worked way, wouldn't correct, isn't taking account past data when building model predict future. need have these in same file possibly? if so, how tell consider these 3 columns row row b, predict dependent column same rows, apply model analyze 3 columns future data , predict future calls. don't expect whole answer here, job do, small clues appreciated.
in order build regression model, need training data , training scores. these allow fit set of regression parameters problem.
then predict, need prediction data, not prediction scores, because don't have these - you're trying predict them!
the code below, example, run:
from sklearn.linear_model import linearregression import numpy np trainingdata = np.array([ [2.3,4.3,2.5], [1.3,5.2,5.2], [3.3,2.9,0.8], [3.1,4.3,4.0] ]) trainingscores = np.array([3.4,7.5,4.5,1.6]) clf = linearregression(fit_intercept=true) clf.fit(trainingdata,trainingscores) predictiondata = np.array([ [2.5,2.4,2.7], [2.7,3.2,1.2] ]) clf.predict(predictiondata)
it looks though you're putting wrong number of arguments predict()
call - have @ snippet here , should able work out how change it.
just interest, can run following line afterwards access parameters regression fits data: print repr(clf.coef_)
Comments
Post a Comment