Sphinx supported documents –
Text/Html/XML.
- Documents search(Text/PPT/XML/PDF/XLS/DOC/Images).
- XML search using Sphinx data source xmlpipe/xmlpipe2.
Text /DOC
/XML/PPT/XLS/PDF/Images file search example
For Text/DOC/XML : there is no
need other tools . Sphinx directly supports.
PPT/PDF/XLS – Sphinx does not
supports directly . We need implement by using third
party
plugin. Like convertion tools like
- PDF to text/HTML/XML
- PPT to text/HTML/XML.
- XLS to text/HTML/XML.
Required Tools:
For PDF – pdftohtml
Download:
(Linux)
(Windows)
Usage :
(Linux)
$ pdftohtml /path/to/PDFfile.pdf
(Windows)
"C:\path\to\pdftohtm.exe" "C:\path\to\PDFfile.pdf".
Further
Information :
http://spblinux.de/2.0/doc/pdftohtml.html.
For PPT – ppthtml
Download:
(Linux)
(Windows)
Usage :
(Linux)
$ ppthtml /path/to/PPtfile.ppt
(Windows)
"C:\path\to\ppthtml.exe" "C:\path\to\PPTfile.ppt".
Further
Information :
.http://man.cx/ppthtml%281%29
For XLS – xlhtml
Download:
(Linux)
(Windows)
Usage :
(Linux)
$ xlhtml /path/to/XLSfile.xls
(Windows)
"C:\path\to\xlhtml.exe" "C:\path\to\XLSfile.xls".
Further
Information :
.http://man.cx/xlhtml%281%29.
Here is Complete example :
- create table sphinx_data:CREATE TABLE IF NOT EXISTS `Document_data` (`id` int(11) NOT NULL AUTO_INCREMENT,`file_name` varchar(50) NOT NULL,`path` varchar(100) NOT NULL,`convertion_path` varchar(100) NOT NULL,PRIMARY KEY (`id`)) ;
file_name
: Doc Name
path
: Doc Saved Path.
Convertion_path : Converted
Docs(Only PPT/PDF/XLS) saved Path
2.
Format Convertion.
Text/Doc/xml – no need to
convertion .
PDF – convert HTML format
using pdftohtml tool
PPT - convert HTML format
using ppthtml tool.
XlS – convert HTML format
using xlhtml tool.
Convertion files path are
stored in convertion_path column.
3.
Insert doc data to db table
Documents_data like
id file_name path
(File location) Convertion path
1 test.txt
/<file_path>/test.txt/ /<file_path>/test.txt/
2 test.doc
/<file_path>/test.doc/ /<file_path>/test.doc/
3 test.xml
/<file_path>/test.xml/ /<file_path>/test.xml/
4 test.ppt
/<file_path>/test.ppt/ /<file_path>/test.html/
5 test.xls
/<file_path>/test.xls/ /<file_path>/test.html/
6 test.pdf
/<file_path>/test.pdf/ /<file_path>/test.html/
7 images
/<file_path>/test.image. <file_path>/test.images/
- For Searching - Used Convertion_path column.
Documents View or Download
– Use Path column (Original Doc file path).
5. Configuring sphinx.conf
source se
{
type = mysql
sql_sock = <sql
socket path><default /etc/var/mysql/mysql.sock>
sql_host = localhost
sql_user = username --> (as you
described in database)
sql_pass = password --> (as you
described in database)
sql_db = databseName
sql_query = select id, file_name,
path from sphinx_data
sql_file_field = convertion_path -->
search column.
sql_query_info = select * from
sphinx_data where id=$id
}
index se
{
path = idx
source = databaseName
html_strip = 1
}
5. Run the indexer to create
full-text index from your data:
$ cd
/usr/local/sphinx/etc
$
/usr/local/sphinx/bin/indexer –all
6. Search
$ cd /usr/local/sphinx/etc
$
/usr/local/sphinx/bin/search promedik.
7.
Returns Documents Ids.
8. Display
search results Using Any language(PHP,Java,Python,Perl).
-PAVANKUMAR JOSHI
very good
ReplyDelete