string - Subsetting in R, joining and calculating multiple repetitions -
string - Subsetting in R, joining and calculating multiple repetitions -
here sample:
> tmp label value1 value2 1 aa_x_x xx xx 2 bc_x_x xx xx 3 aa_x_x xx xx 4 bc_x_x xx xx
how calculate median of repeated labels (or more, of corresponding values in other info frame columns), taking business relationship first 2 letters (ie. "aa_1_1" , "aa_s_3" same values)? list of labels finite , usable.
i have read aggregate
, %in%
, subset
, substr
, unable compile useful , simple.
here hope get:
> tmp.result label median1 some.calculation2 1 aa xx xx 2 bc xx xx 3 aa xx xx 4 bc xx xx
thank much.
have tried making new info frame--i'll phone call tmp2
--where tmp2$label==substr(tmp$label,0,2)
? there, can, example, utilize tapply(tmp2$value1,tmp2$label,mean)
average values of value1
aggregated on tmp2$label
.
an alternative using dplyr
library(dplyr) tmp %>% group_by(label=sub('_.*$', '', label)) %>% transmute(median1=median(value1), mean1=mean(value2))
or data.table
library(data.table) setdt(tmp)[, c('median1', 'mean1') := list(median(value1), mean1= mean(value2)) , .(label=sub('_.*$', '', label))][, c(1,4:5), with=false]
string r condition data.frame subset
Comments
Post a Comment