Solr5 POST TOOL

来源:互联网 发布:如何用手机淘宝开网店 编辑:程序博客网 时间:2024/06/05 18:38

Solr includes a simple command line tool for POSTing various types of content to a Solr server. The tool is bin/post. The bin/post tool is a Unix shell script; for Windows (non-Cygwin) usage, see the Windows section below.

To run it, open a window and enter:

bin/post -c gettingstarted example/films/films.json

This will contact the server at localhost:8983. Specifying the collection/core name is mandatory. The '-help' (or simply '-h' option will output information on its usage (i.e., bin/post -help).

Using the bin/post Tool

Specifying either the collection/core name or the full update url is mandatory when using bin/post.

The basic usage of bin/post is:

$ bin/post -help
 
Usage: post -c <collection> [OPTIONS] <files|directories|urls|-d ["...",...]>
    or post -help
   collection name defaults to DEFAULT_SOLR_COLLECTION if not specified
 
OPTIONS
=======
  Solr options:
    -url <base Solr update URL> (overrides collection, host, and port)
    -host <host> (default: localhost)
    -port <port> (default: 8983)
    -commit yes|no (default: yes)
  Web crawl options:
    -recursive <depth> (default: 1)
    -delay <seconds> (default: 10)
  Directory crawl options:
    -delay <seconds> (default: 0)
  stdin/args options:
    -type <content/type> (default: application/xml)
  Other options:
    -filetypes <type>[,<type>,...] (default: xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log)
    -params "<key>=<value>[&<key>=<value>...]" (values must be URL-encoded; these pass through to Solr update request)
    -out yes|no (default: no; yes outputs Solr response to console)
...

 

Examples

There are several ways to use bin/post.  This section presents several examples.

Indexing XML

Add all documents with file extension .xml to collection or core named gettingstarted.

bin/post -c gettingstarted *.xml

Add all documents with file extension .xml to the gettingstarted collection/core on Solr running on port 8984.

bin/post -c gettingstarted -port 8984 *.xml

Send XML arguments to delete a document from gettingstarted.

bin/post -c gettingstarted -d '<delete><id>42</id></delete>'

Indexing CSV

Index all CSV files into gettingstarted:

bin/post -c gettingstarted *.csv

Index a tab-separated file into gettingstarted:

bin/post -c signals -params "separator=%09" -type text/csv data.tsv

The content type (-type) parameter is required to treat the file as the proper type, otherwise it will be ignored and a WARNING logged as it does not know what type of content a .tsv file is.  The CSV handler supports the separator parameter, and is passed through using the -params setting.

Indexing JSON

Index all JSON files into gettingstarted.

bin/post -c gettingstarted *.json

Indexing rich documents (PDF, Word, HTML, etc)

Index a PDF file into gettingstarted.

bin/post -c gettingstarted a.pdf

Automatically detect content types in a folder, and recursively scan it for documents for indexing into gettingstarted.

bin/post -c gettingstarted afolder/

Automatically detect content types in a folder, but limit it to PPT and HTML files and index into gettingstarted.

bin/post -c gettingstarted -filetypes ppt,html afolder/

Windows support

bin/post exists currently only as a Unix shell script, however it delegates its work to a cross-platform capable Java program.  The  SimplePostTool can be run directly in supported environments, including Windows.

SimplePostTool

The bin/post script currently delegates to a standalone Java program called SimplePostTool.  This tool, bundled into a executable JAR, can be run directly using java -jar example/exampledocs/post.jar.  See the help output and take it from there to post files, recurse a website or file system folder, or send direct commands to a Solr server.  

$ java -jar example/exampledocs/post.jar -h
SimplePostTool version 5.0.0
Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]
.
.
.
0 0
原创粉丝点击