datetime - R - subsetting by date -
i'm trying subset large dataframe date field ad facing strange behaviour:
1) find interesting time interval:
> ld[ld$bps>30000000,] date.first.seen duration proto src.ip.addr src.pt dst.ip.addr dst.pt tos packets bytes bps 1400199 2015-03-31 13:52:24 0.008 tcp 3.3.3.3 3128 4.4.4.4 65115 0 39 32507 32500000 1711899 2015-03-31 14:58:10 0.004 tcp 3.3.3.3 3128 4.4.4.7 49357 0 29 23830 47700000
2) , try whats happening on second:
> ld[ld$date.first.seen=="2015-03-31 13:52:24",] date.first.seen duration proto src.ip.addr src.pt dst.ip.addr dst.pt tos packets bytes bps 1401732 2015-03-31 13:52:24 17.436 tcp 3.3.3.3 3128 6.6.6.6 51527 0 3 1608 737
don't understand behavior - should way more results. example
> ld[1399074,] date.first.seen duration proto src.ip.addr src.pt dst.ip.addr dst.pt tos packets bytes bps 1399074 2015-03-31 13:52:24 0.152 tcp 10.10.10.10 3128 11.11.11.11 62375 0 8 3910 205789
for date use posixlt
> str(ld) 'data.frame': 2657583 obs. of 11 variables: $ date.first.seen: posixlt, format: "2015-03-31 06:00:00" "2015-03-31 06:00:00" "2015-03-31 06:00:00" "2015-03-31 06:00:01" ... ...
would appreciate assistance. thanks!
posixlt may carry additional info supressed when printing entire data.frame
, timezone, daylight savings etc. have @ https://stat.ethz.ch/r-manual/r-devel/library/base/html/datetimeclasses.html.
printing posixlt variable (ld$date.first.seen
) supply @ least of additional information.
if you're not particular reason required keep variable in posixlt , if don't need functionality format enables, simple:
ld$date.first.seen = as.character(ld$date.first.seen)
added before subset statement solve problem.
Comments
Post a Comment