Thursday, 5 July 2012

XML FILE SEARCH USING SPHINX DATASOURCE LIKE XMLPIPE/XMLPIPE2


Non-SQL storage indexing. Data can also be streamed to batch indexer in a simple XML format called XMLpipe, or inserted directly into an incremental RT index.
To use xmlpipe, configure the data source in your configuration file as follows:
source example_xmlpipe_source
{
    type = xmlpipe
    //Perl
    xmlpipe_command = perl /www/mysite.com/bin/sphinxpipe.pl
//PHP
    xmlpipe_command = php /www/mysite.com/bin/sphinxpipe.php
//Direct File
    xmlpipe_command = cat /www/mysite.com/bin/sphinxpipe.xml
}
The indexer will run the command specified in xmlpipe_command, and then read, parse and index the data it prints to stdout. More formally, it opens a pipe to given command and then reads from that pipe.
indexer will expect one or more documents in custom XML format.
XmlPipe2 structure:
/****************XML File Format*************************************/
<?xml version="1.0" encoding="utf-8"?>
<sphinx:docset>

<sphinx:schema>
<sphinx:field name="subject"/> 
<sphinx:field name="content"/>
<sphinx:attr name="published" type="timestamp"/>
<sphinx:attr name="author_id" type="int" bits="16" default="1"/>
</sphinx:schema>

<sphinx:document id="1234">
<content>this is the main content</content>
<published>1012325463</published>
<subject>note how field/attr tags can be
in <b class="red">randomized</b> order</subject>
<misc>some undeclared element</misc>
</sphinx:document>

<!-- ... even more sphinx:document entries here ... -->
<sphinx:killlist>
<id>1234</id>
</sphinx:killlist>
</sphinx:docset>
/*********************************************************************/



Required Tools:
    1. expat
    Expat is an XML parser library written in C. It is a stream-oriented parser in which an application registers handlers for things the parser might find in the XML document (like start tags)
expat-1.95.8-8.3.el5_5.3
expat-1.95.8-8.3.el5_5.3

    1. expat -devel.
expat-devel-1.95.8-8.3.el5_5.3
expat-devel-1.95.8-8.3.el5_5.3.

expat” installation Using Package Manager:
1.Linux
CentOS/Redhat/Fedora - $ sudo yum install expat expat-devel.
Ubuntu/ Debian - $ sudo apt-get install libexpat libexpat-dev

Manual Installation:

download link tar.gz package

RPM package


Installation Steps:
  1. Extract expat downloaded file
tar file : $tar -xvf expat-2.0.1.tar.gz
rpm file : $ rpm -qlp ovpc-2.1.10.rpm

2. $ ./configure --prefix=/<installation_path>/

3. $ make

4. $ make install



Here complete Example:

1. sphinx Configuration

 
2. Genarte XMLPIPE2 supported xml schema.
XML Creation Tools:
In Php:
Use xmlWriter API.
In Java:
      1. Apache Xerces.
      2. DOM XML parser
      3. JDOM XML Parser
In Perl :

3. Run the indexer to create full-text index from your data:
$ cd /usr/local/sphinx/etc
$ /usr/local/sphinx/bin/indexer –all

4. Search
$ cd /usr/local/sphinx/etc
$ /usr/local/sphinx/bin/search promedik.

5. Returns Documents Ids.

6. Display search results Using Any language(PHP,Java,Python,Perl).


                                                                                       -PAVANKUMAR JOSHI

No comments:

Post a Comment