performance - Real time text processing using Python -



performance - Real time text processing using Python -

real time text processing using python. e.g. consider sentance

going schol today

i want next (real time):

1) tokenize 2) check spellings 3) stem(nltk.porterstemmer()) 4) lemmatize (nltk.wordnetlemmatizer())

currently using nltk library these operations, not real time (meaning taking few seconds finish these operations). processing 1 sentence @ time, possible create efficient

update: profiling:

fri jul 8 17:59:32 2011 srj.profile 105503 function calls (101919 primitive calls) in 1.743 cpu seconds ordered by: internal time list reduced 1797 10 due restriction ncalls tottime percall cumtime percall filename:lineno(function) 7450 0.136 0.000 0.208 0.000 sre_parse.py:182(__next) 602/179 0.130 0.000 0.583 0.003 sre_parse.py:379(_parse) 23467/22658 0.122 0.000 0.130 0.000 {len} 1158/142 0.092 0.000 0.313 0.002 sre_compile.py:32(_compile) 16152 0.081 0.000 0.081 0.000 {method 'append' of 'list' objects} 6365 0.070 0.000 0.249 0.000 sre_parse.py:201(get) 4947 0.058 0.000 0.086 0.000 sre_parse.py:130(__getitem__) 1641/639 0.039 0.000 0.055 0.000 sre_parse.py:140(getwidth) 457 0.035 0.000 0.103 0.000 sre_compile.py:207(_optimize_charset) 6512 0.034 0.000 0.034 0.000 {isinstance}

timit:

t = timeit.timer(main) print t.timeit(1000) => 3.7256231308

i know nltk slow, can hardly believe it's slow. in case, first stemming, lemmatizing bad idea, since these operations serve same purpose , feeding output stemmer lemmatizer bound give worse results lemmatizing. skip stemmer increment in both performance , accuracy.

python performance nlp text-processing nltk

Comments

Popular posts from this blog

iphone - Dismissing a UIAlertView -

intellij idea - Update external libraries with intelij and java -

javascript - send data from a new window to previous window in php -