python - Method for indexing an object database -

September 15, 2015

i'm using object database (zodb) in order store complex relationships between many objects running performance issues. result started build indexes in order speed object retrieval , insertion. here story , hope can help.

initially when add together object database insert in branch dedicated object type. in order prevent multiple objects representing same entity added method iterate on existing objects in branch in order find duplicates. worked @ first database grew in size time took load each object memory , check attributes grew exponentially , unacceptably.

to solve issue started create indexes based on attributes in object when object added saved in type branch within attribute value index branch. example, saving person object attributes firstname = 'john' , lastname = 'smith', object appended person object type branch , appended lists within attribute index branch keys 'john' , 'smith'.

this saved lot of time duplicate checking since new object analysed , set of objects intersect within attribute indexes need checked.

however, ran issue regards dealing when updating objects. indexes need updated reflect fact may not accurate more. requires either remembering old values straight accessed , object removed or iterating on values of attribute type in order find remove object. either way performance origin degrade 1 time again , can't figure out way solve it.

has had kind of issue before? did solve it, or have deal when using oodbms's?

thank in advance help.

yes, repoze.catalog nice, , documented.

in short : don't create indexing part of site structure!

look @ using container/item hierarchy store , traverse content item objects; plan able traverse content either (a) path (graph edges filesystem) or (b) identifying singleton containers @ distinct location.

identify content using either rfc 4122 uuids (uuid.uuid type) or 64-bit integers.

use central catalog index (e.g. repoze.catalog); catalog should @ known location relative root application object of zodb. , catalog index attributes of objects , homecoming record-ids (usually integers) on query. job map integer ids (perhaps indrecting via uuids) physical traversal path in database storing content. helps if utilize zope.location , zope.container mutual interfaces traversal of object graph root/application downward.

use zope.lifecycleevent handlers index content , maintain things fresh.

the problem -- generalized

zodb flexible: persistent object graph transactions, leaves room sink or swim in own data-structures , interfaces.

the solution -- generalized

usually, picking pre-existing idioms community around zodb work: zope.lifecycleevent handlers, "containerish" traversal using zope.container , zope.location, , repoze.catalog.

more particular

only when exhaust generalized idioms , know why won't work, seek build own indexes using various flavors of btrees in zodb. more care admit, have cause.

in cases, maintain indexes (search, discovery) , site (traversal , storage) construction distinct.

the idioms problem domain

master zodb btrees: want:

to store content objects subclasses of persistent in containers subclasses of oobtree providing container interfaces (see below). to store btrees catalog or global indexes or utilize packages repoze.catalog , zope.index abstract detail away (hint: catalog solutions typically store indexes oibtrees yield integer record ids search results; typically have sort of document mapper utility translates record ids resolvable in application uuid (provided can traverse graph uuid) or path (the way zope2 catalog does).

imho, don't bother working intids , key-references , such (these less idiomatic , more hard if don't need them). utilize catalog , documentmap repoze.catalog results in integer uuid or path form, , figure out how object. note, want utility/singleton has job of retrieving object given id or uuid returned search.

use zope.lifecycleevent or similar bundle provides synchronous event callback (handler) registrations. these handlers should phone call whenever atomic edit made on object (likely 1 time per transaction, not in transaction machinery).

learn zope component architecture; not absolute requirement, certainly helpful, if understand zope.interface interfaces of upstream packages zope.container

understanding of how zope2 (zcatalog) this: catalog fronts multiple indexes or various sorts, each search query, each have specialized info structures, , each homecoming integer record id sequences. these merged across indexes catalog doing set intersections , returned lazy-mapping of "brain" objects containing metadata stubs (each brain has getobject() method actual content object). getting actual objects catalog search relies upon zope2 idiom of using paths root application object identify location of item cataloged.

python indexing zodb oodbms

Search This Blog

JC

python - Method for indexing an object database -

Comments

Post a Comment

Popular posts from this blog

iphone - Dismissing a UIAlertView -

c# - Can ProtoBuf-Net deserialize to a flat class? -

javascript - Change element in each JQuery tab to dynamically generated colors -