Is it possible to reserve the size of a hash table in Perl? -



Is it possible to reserve the size of a hash table in Perl? -

i'm creating hash table in perl, of unknown size.

the hash table maps string reference array.

the main loop of application adds 5-10 elements hash table in each iteration. hash table fills up, things start slow downwards drastically. observation, when there ~50k keys in hash table, adding keys slows magnitude of 20x.

i postulate hash table has become full, , collisions occurring. 'reserve' size of hash table, i'm unsure how.

the hash in question hngramstoword.

for each word, 1-len-grams of word added keys, reference array of words contain ngram.

for example:

addtongramhash("hello");

[h, e, l, l, o, he, el, ll, lo, hel, llo, hell, ello, hello ] added keys, mapping "hello"

sub addtongramhash($) { $word = shift; @angrams = makengrams($word); foreach $ngram (@angrams) { @awords; if(defined($hngramstoword{$ngram})) { @awords = @{$hngramstoword{$ngram}}; } force (@awords, $word); $hngramstoword{$ngram} = \@awords; } homecoming scalar keys %hngramstoword; } sub makengrams($) { $word = shift; $len = length($word); @angrams; for(1..$len) { $ngs = $_; for(0..$len-$ngs) { $ngram = substr($word, $_, $ngs); force (@angrams, $ngram); } } homecoming @angrams; }

you can set number of buckets hash so:

keys(%hash) = 128;

the number rounded powerfulness of two.

that said, unlikely slowdown see due excess hash collisions, since perl dynamically expand number of buckets needed. , since 5.8.2, observe pathological info results in given bucket beingness overused , reconfigure hashing function hash.

show code, , able help find real problem.

a demonstration of big number of hash keys (don't allow go on till out of memory...):

use strict; utilize warnings; $start = time(); %hash; $sig{alrm} = sub { alarm 1; printf( "%.0f keys/s; %d keys, %s buckets used\n", keys(%hash) / (time() - $start), scalar(keys(%hash)), scalar(%hash) ); }; alarm 1; $hash{rand()}++ while 1;

once there lot of keys, notice perceptible slowdown when needs expand number of buckets, still maintains pretty pace.

looking @ code, more words loaded, more work has each word.

you can prepare changing this:

@awords; if(defined($hngramstoword{$ngram})) { @awords = @{$hngramstoword{$ngram}}; } force (@awords, $word); $hngramstoword{$ngram} = \@awords;

to this:

force @{ $hngramstoword{$ngram} }, $word;

no need re-create array twice. no need check if ngram has entry - autovivify array reference you.

perl hash hashmap hashtable

Comments

Popular posts from this blog

iphone - Dismissing a UIAlertView -

c# - Can ProtoBuf-Net deserialize to a flat class? -

javascript - Change element in each JQuery tab to dynamically generated colors -