performance - Real time text processing using Python -
performance - Real time text processing using Python -
real time text processing using python. e.g. consider sentance
going schol todayi want next (real time):
1) tokenize 2) check spellings 3) stem(nltk.porterstemmer()) 4) lemmatize (nltk.wordnetlemmatizer())currently using nltk library these operations, not real time (meaning taking few seconds finish these operations). processing 1 sentence @ time, possible create efficient
update: profiling:
fri jul 8 17:59:32 2011 srj.profile 105503 function calls (101919 primitive calls) in 1.743 cpu seconds ordered by: internal time list reduced 1797 10 due restriction ncalls tottime percall cumtime percall filename:lineno(function) 7450 0.136 0.000 0.208 0.000 sre_parse.py:182(__next) 602/179 0.130 0.000 0.583 0.003 sre_parse.py:379(_parse) 23467/22658 0.122 0.000 0.130 0.000 {len} 1158/142 0.092 0.000 0.313 0.002 sre_compile.py:32(_compile) 16152 0.081 0.000 0.081 0.000 {method 'append' of 'list' objects} 6365 0.070 0.000 0.249 0.000 sre_parse.py:201(get) 4947 0.058 0.000 0.086 0.000 sre_parse.py:130(__getitem__) 1641/639 0.039 0.000 0.055 0.000 sre_parse.py:140(getwidth) 457 0.035 0.000 0.103 0.000 sre_compile.py:207(_optimize_charset) 6512 0.034 0.000 0.034 0.000 {isinstance}timit:
t = timeit.timer(main) print t.timeit(1000) => 3.7256231308
i know nltk slow, can hardly believe it's slow. in case, first stemming, lemmatizing bad idea, since these operations serve same purpose , feeding output stemmer lemmatizer bound give worse results lemmatizing. skip stemmer increment in both performance , accuracy.
python performance nlp text-processing nltk
Comments
Post a Comment