Extracting the actual in-text title from a PDF -

July 15, 2011

there seems lot of questions extracting title pdf (using metadata). however, big bulk of titles not seem exist in metadata. found out when using http://pybrary.net/pypdf/pythondoc-pypdf.pdf.html .

is there anyway retrieve in text title pdf? tried export text file search there no consistent formatting. there way export pdf document formatting, check font size >= 14 ?

this question. applications create pdfs don't seem useful available metadata fields.

take pdflatex example: when 1 sets \title{...} , \author{...} in preamble, info not reflected in metadata. after quick search, solution appears to introduce block in preamble read pdflatex [1]:

\pdfinfo { /title{...} /author{...} ... }

...which placed in the relevant metadata fields of pdf. unusual necessary, though.

i cannot speak word processors word or writer. 1 presumes such metadata fields have set manually user.

perhaps heuristic approach way can approach problem if pdfs not generated you. [2] seems similar want, guess depends how published pdfs -- tool seems scientific-paper oriented.

i hope @ to the lowest degree help.

[1] http://wlug.org.nz/pdflatexnotes [2] http://www.molspaces.com/d_cb2bib-metadata.php

pdf title extraction

Search This Blog

JC

Extracting the actual in-text title from a PDF -

Comments

Post a Comment

Popular posts from this blog

iphone - Dismissing a UIAlertView -

c# - Can ProtoBuf-Net deserialize to a flat class? -

javascript - Change element in each JQuery tab to dynamically generated colors -