string - Subsetting in R, joining and calculating multiple repetitions -

June 15, 2013

here sample:

> tmp     label   value1  value2 1   aa_x_x  xx      xx 2   bc_x_x  xx      xx 3   aa_x_x  xx      xx 4   bc_x_x  xx      xx

how calculate median of repeated labels (or more, of corresponding values in other info frame columns), taking business relationship first 2 letters (ie. "aa_1_1" , "aa_s_3" same values)? list of labels finite , usable.

i have read aggregate, %in%, subset , substr, unable compile useful , simple.

here hope get:

> tmp.result     label   median1 some.calculation2 1   aa      xx      xx 2   bc      xx      xx 3   aa      xx      xx 4   bc      xx      xx

thank much.

have tried making new info frame--i'll phone call tmp2--where tmp2$label==substr(tmp$label,0,2)? there, can, example, utilize tapply(tmp2$value1,tmp2$label,mean) average values of value1 aggregated on tmp2$label.

an alternative using dplyr

library(dplyr) tmp %>%    group_by(label=sub('_.*$', '', label)) %>%     transmute(median1=median(value1), mean1=mean(value2))

or data.table

 library(data.table)  setdt(tmp)[,  c('median1', 'mean1') := list(median(value1),      mean1= mean(value2)) , .(label=sub('_.*$', '', label))][, c(1,4:5),         with=false]

string r condition data.frame subset

Search This Blog

JC

string - Subsetting in R, joining and calculating multiple repetitions -

Comments

Post a Comment

Popular posts from this blog

iphone - Dismissing a UIAlertView -

c# - Can ProtoBuf-Net deserialize to a flat class? -

javascript - Change element in each JQuery tab to dynamically generated colors -