Is it possible to reserve the size of a hash table in Perl? -
Is it possible to reserve the size of a hash table in Perl? -
i'm creating hash table in perl, of unknown size.
the hash table maps string reference array.
the main loop of application adds 5-10 elements hash table in each iteration. hash table fills up, things start slow downwards drastically. observation, when there ~50k keys in hash table, adding keys slows magnitude of 20x.
i postulate hash table has become full, , collisions occurring. 'reserve' size of hash table, i'm unsure how.
the hash in question hngramstoword.
for each word, 1-len-grams of word added keys, reference array of words contain ngram.
for example:
addtongramhash("hello");
[h, e, l, l, o, he, el, ll, lo, hel, llo, hell, ello, hello ] added keys, mapping "hello"
sub addtongramhash($) { $word = shift; @angrams = makengrams($word); foreach $ngram (@angrams) { @awords; if(defined($hngramstoword{$ngram})) { @awords = @{$hngramstoword{$ngram}}; } force (@awords, $word); $hngramstoword{$ngram} = \@awords; } homecoming scalar keys %hngramstoword; } sub makengrams($) { $word = shift; $len = length($word); @angrams; for(1..$len) { $ngs = $_; for(0..$len-$ngs) { $ngram = substr($word, $_, $ngs); force (@angrams, $ngram); } } homecoming @angrams; }
you can set number of buckets hash so:
keys(%hash) = 128; the number rounded powerfulness of two.
that said, unlikely slowdown see due excess hash collisions, since perl dynamically expand number of buckets needed. , since 5.8.2, observe pathological info results in given bucket beingness overused , reconfigure hashing function hash.
show code, , able help find real problem.
a demonstration of big number of hash keys (don't allow go on till out of memory...):
use strict; utilize warnings; $start = time(); %hash; $sig{alrm} = sub { alarm 1; printf( "%.0f keys/s; %d keys, %s buckets used\n", keys(%hash) / (time() - $start), scalar(keys(%hash)), scalar(%hash) ); }; alarm 1; $hash{rand()}++ while 1; once there lot of keys, notice perceptible slowdown when needs expand number of buckets, still maintains pretty pace.
looking @ code, more words loaded, more work has each word.
you can prepare changing this:
@awords; if(defined($hngramstoword{$ngram})) { @awords = @{$hngramstoword{$ngram}}; } force (@awords, $word); $hngramstoword{$ngram} = \@awords; to this:
force @{ $hngramstoword{$ngram} }, $word; no need re-create array twice. no need check if ngram has entry - autovivify array reference you.
perl hash hashmap hashtable
Comments
Post a Comment