Question : Searching Content of PDF Files On Our Linux Based Intranet Web Site

Hello,

I'm trying to find a way for our users to be able to search the contents of our PDF directory and returning results with links to those PDF's that search parameters are given.
Much like being able to search web pages. We are currently using HtDig with XPDF as its PDF parser, but the results are not very accurate at all.
Adobe's iFilter is not an option as our Intranet web server is Linux.
Any help would be greatly appreciated.
Thank You
Mack

Answer : Searching Content of PDF Files On Our Linux Based Intranet Web Site

The pdftptext binary may be a lower release than the package;

on my system;
# pdftotext -v
pdftotext version 3.00
Copyright 1996-2004 Glyph & Cog, LLC

# rpm -qvilf /usr/bin/pdftotext
Name : poppler-utils Relocations: (not relocatable)
Version : 0.5.4 Vendor: CentOS
Release : 4.4.el5_1 Build Date: Fri 18 Apr 2008 05:45:45 PM BST

Notice that my pdftotext file is provided by the poppler package, not the xpdf package.

On my system the man page for pdftotext says;
# pdftotext [pdf-file [text-file]]

If you just specify the pdf-file the output will go to the screen, if you specify a text file it will go to the text file.
The options can be checked out using the man pages for reference.

Random Solutions

0x80072F0D Error on Mobile Device when trying to sync with Exchange 2007

WSS 3.0 & OWA 2007 -> Linking calendars???

RepAdmin/Rehost Naming Context Error Message

Problem while adding additional IDENTITY column to the existing temporary table?

Supressing controls with Null values in Access Report

Creating a VPN Connection via Group Policy or Logon Script

hp Pavilion 2901tu Driver for winxp

Why do I get an random hourglass on Outlook 2007 clients?

Text-Diff in RUBY

cgi equivalent to the php code