hash - hashing strings -



hash - hashing strings -

i have streaming strings (text containing words , number).

taking 1 line @ time streaming strings, assign unique value them.

the examples may be:strings scores/hash

user1 logged in comp1 port8087 1109 user2 logged in comp2 1135 user3 logged in port8080 1098 user1 logged in comp2 port8080 1178

these string should in same cluster. have thought mapping(bad type of hashing) strings such little alter in string wont impact score much.

one simple way of doing may be: taking ulicp8, ulic .... ( i.e. 1st letter of each sentence) , find way of scoring. after similar scored strings kept in same bucket , later on sub grouping them.

the improved method be: lets not take out first word of each word of string find way take representative value of word such string representation may quite suitable mapping score/hash mention.

considering levenstein distance or jaccard_index or similarity distance metrices, of them require inputting strings comparisions. isn't there method hash/score string stated without going comparisions.( pos tagging, comparing looks uneffiecient purpose info streaming, huge in number, unstructured)

hope understand want accomplish , please help me out. forgot comments below , lets restart.

"at to the lowest degree 2 similar word (not considering length) should have similar hash value"

this against basic requirements hash function. hash function minimal changes input should produce vehement changes bucket hash falls into.

you looking algorithm calculates similarity or distance between 2 inputs.

hash

Comments

Popular posts from this blog

iphone - Dismissing a UIAlertView -

c# - Can ProtoBuf-Net deserialize to a flat class? -

javascript - Change element in each JQuery tab to dynamically generated colors -