Question : Searching Content of PDF Files On Our Linux Based Intranet Web Site

Hello,

I'm trying to find a way for our users to be able to search the contents of our PDF directory and returning results with links to those PDF's that search parameters are given.
Much like being able to search web pages. We are currently using HtDig with XPDF as its PDF parser, but the results are not very accurate at all.
Adobe's iFilter is not an option as our Intranet web server is Linux.
Any help would be greatly appreciated.
Thank You
Mack

Answer : Searching Content of PDF Files On Our Linux Based Intranet Web Site

The pdftptext binary may be a lower release than the package;

on my system;
   # pdftotext -v
   pdftotext version 3.00
   Copyright 1996-2004 Glyph & Cog, LLC

   # rpm -qvilf /usr/bin/pdftotext
   Name        : poppler-utils                Relocations: (not relocatable)
   Version     : 0.5.4                             Vendor: CentOS
   Release     : 4.4.el5_1                     Build Date: Fri 18 Apr 2008 05:45:45 PM BST
   

Notice that my pdftotext file is provided by the poppler package, not the xpdf package.

On my system the man page for pdftotext says;
   # pdftotext [pdf-file [text-file]]

If you just specify the pdf-file the output will go to the screen, if you specify a text file it will go to the text file.
The options can be checked out using the man pages for reference.




Random Solutions  
 
programming4us programming4us