python - Loss of strings when creates a Numpy Array from a Pandas Dataframe -

April 15, 2010

i sorry if basic... essentially, using pandas load huge csv file , convert numpy array post processing. appreciate help!

the issue of strings missing during transformation (from pandas dataframe numpy array). example, strings in column "abstract" complete see below print datafile["abstract"][0]. however, once converted them numpy array, few strings left. see below print df_all[0,3]

import pandas pd import csv import numpy np  datafile = pd.read_csv(path, header=0) df_all = pd.np.array(datafile, dtype='string') header_t = list(datafile.columns.values)

strings complete in pandas dataframe`

print datafile["abstract"][0]  in order test held assumption homeopathic medicines contain negligible quantities of major ingredients, 6 such medicines labeled in latin containing arsenic purchased on counter , mail order , arsenic contents measured. values determined similar expected label information in 2 of 6 , markedly @ variance in remaining four. arsenic present in notable quantities in 2 preparations. sales personnel interviewed not identify arsenic being ingredient in these preparations , therefore incapable of warning general public of possible dangers ingestion. no such warnings appeared on labels.

strings incomplete in numpy`

print df_all[0,3] in order test held assumption homeopathic me

i think when specify dtype='string', specifying default s64 type, truncate string 64 chars. skip dtype='string' part should go (and dtype become object).

better yet, don't convert dataframe array, use build-in df.values.

Search This Blog

Plus Code