python - sklearn LinearRegression.Predict() issue -


i trying predict call volume call center based on various other factors. have clean dataset, small well, enough. able train , test historical data , score, summary, etc. life of me unable figure out how predict future calls using forecasted factor data. data below:

date    daynum  factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 factor9 variabletopredict 9/17/2014   1   592 83686.46    0   0   250 15911.8 832 99598.26    177514  72 9/18/2014   2   1044    79030.09    0   0   203 23880.55    1238    102910.64   205064  274 9/19/2014   3   707 84207.27    0   0   180 8143.32 877 92350.59    156360  254 9/20/2014   4   707 97577.78    0   0   194 16688.95    891 114266.73   196526  208 9/21/2014   5   565 83084.57    0   0   153 13097.04    713 96181.61    143678  270 

the code have far below:

from sklearn import metrics sklearn.preprocessing import standardscaler sklearn.cross_validation import kfold, cross_val_score sklearn.linear_model import linearregression import pandas pd  d = pd.read_csv("h://my documents//python scripts//rawdata//q2917.csv", "r", delimiter=",") e = pd.read_csv("h://my documents//python scripts//rawdata//fy16q2917test.csv", "r", delimiter=",") #print(d) #b = pd.dataframe.as_matrix(d) #print(b) x = d.as_matrix(['factor2', 'factor4', 'factor5', 'factor6'])     y = d.as_matrix(['variabletopredict']) x1 = e.as_matrix(['factor2', 'factor4', 'factor5', 'factor6']) y1 = e.as_matrix(['variabletopredict']) #print(len(train)) #print(target) #use scaler scalerx = standardscaler() train = scalerx.fit_transform(x1) scalery = standardscaler() target = scalery.fit_transform(y1)  clf = linearregression(fit_intercept=true) cv = kfold(len(train), 10, shuffle=true, random_state=33)   #decf = linearregression.decision_function(train, target) test = linearregression.predict(train, target) score = cross_val_score(clf,train, target,cv=cv )  print("score: {}".format(score.mean())) 

this of course gives me error there nulls in y values, there because blank , trying predict it. problem here is, new enough python fundamentally misunderstanding how should built. if worked way, wouldn't correct, isn't taking account past data when building model predict future. need have these in same file possibly? if so, how tell consider these 3 columns row row b, predict dependent column same rows, apply model analyze 3 columns future data , predict future calls. don't expect whole answer here, job do, small clues appreciated.

in order build regression model, need training data , training scores. these allow fit set of regression parameters problem.

then predict, need prediction data, not prediction scores, because don't have these - you're trying predict them!

the code below, example, run:

from sklearn.linear_model import linearregression import numpy np  trainingdata = np.array([ [2.3,4.3,2.5], [1.3,5.2,5.2], [3.3,2.9,0.8], [3.1,4.3,4.0]  ]) trainingscores = np.array([3.4,7.5,4.5,1.6])  clf = linearregression(fit_intercept=true) clf.fit(trainingdata,trainingscores)  predictiondata = np.array([ [2.5,2.4,2.7], [2.7,3.2,1.2] ]) clf.predict(predictiondata) 

it looks though you're putting wrong number of arguments predict() call - have @ snippet here , should able work out how change it.

just interest, can run following line afterwards access parameters regression fits data: print repr(clf.coef_)


Comments

Popular posts from this blog

Payment information shows nothing in one page checkout page magento -

tcpdump - How to check if server received packet (acknowledged) -