python - Loss of strings when creates a Numpy Array from a Pandas Dataframe -
i sorry if basic... essentially, using pandas
load huge csv
file , convert numpy
array post processing. appreciate help!
the issue of strings missing during transformation (from pandas dataframe
numpy array
). example, strings in column "abstract" complete see below print datafile["abstract"][0]
. however, once converted them numpy array
, few strings left. see below print df_all[0,3]
import pandas pd import csv import numpy np datafile = pd.read_csv(path, header=0) df_all = pd.np.array(datafile, dtype='string') header_t = list(datafile.columns.values)
strings complete in pandas dataframe`
print datafile["abstract"][0] in order test held assumption homeopathic medicines contain negligible quantities of major ingredients, 6 such medicines labeled in latin containing arsenic purchased on counter , mail order , arsenic contents measured. values determined similar expected label information in 2 of 6 , markedly @ variance in remaining four. arsenic present in notable quantities in 2 preparations. sales personnel interviewed not identify arsenic being ingredient in these preparations , therefore incapable of warning general public of possible dangers ingestion. no such warnings appeared on labels.
strings incomplete in numpy`
print df_all[0,3] in order test held assumption homeopathic me
i think when specify dtype='string'
, specifying default s64
type, truncate string 64 chars. skip dtype='string'
part should go (and dtype
become object
).
better yet, don't convert dataframe
array
, use build-in df.values
.
Comments
Post a Comment