<add>    <doc>        <field name="id">test4</field>        <field name="title">testagain</field>        <field name="url">http://www.163.com</field>    </doc></add>

 (2)使用java -jar post.jar
[root@jediael44 exampledocs]# java -Durl=http://ip:8080/solr/update -jar post.jar test.xmlSimplePostTool version 1.5Posting files to base url http://ip:8080/solr/update using content-type application/xml..POSTing file test.xml1 files indexed.COMMITting Solr index changes to http://localhost:8080/solr/update..Time spent: 0:00:00.135

[root@jediael44 exampledocs]# java -jar post.jar --helpSimplePostTool version 1.5Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]Supported System Properties and their defaults:  -Ddata=files|web|args|stdin (default=files)  -Dtype=<content-type> (default=application/xml)  -Durl=<solr-update-url> (default=http://localhost:8983/solr/update)  -Dauto=yes|no (default=no)  -Drecursive=yes|no|<depth> (default=0)  -Ddelay=<seconds> (default=0 for files, 10 for web)  -Dfiletypes=<type>[,<type>,...] (default=xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log)  -Dparams="<key>=<value>[&<key>=<value>...]" (values must be URL-encoded)  -Dcommit=yes|no (default=yes)  -Doptimize=yes|no (default=no)  -Dout=yes|no (default=no)This is a simple command line tool for POSTing raw data to a Solr port. Data can be read from files specified as commandline args, URLs specified as args, as raw commandline arg strings or via STDIN.Examples:  java -jar post.jar *.xml  java -Ddata=args -jar post.jar '<delete><id>42</id></delete>'  java -Ddata=stdin -jar post.jar < hd.xml  java -Ddata=web -jar post.jar http://example.com/  java -Dtype=text/csv -jar post.jar *.csv  java -Dtype=application/json -jar post.jar *.json  java -Durl=http://localhost:8983/solr/update/extract -Dparams=literal.id=a -Dtype=application/pdf -jar post.jar a.pdf  java -Dauto -jar post.jar *  java -Dauto -Drecursive -jar post.jar afolder  java -Dauto -Dfiletypes=ppt,html -jar post.jar afolderThe options controlled by System Properties include the Solr URL to POST to, the Content-Type of the data, whether a commit or optimize should be executed, and whether the response should be written to STDOUT. If auto=yes the tool will try to set type and url automatically from file name. When posting rich documents the file name will be propagated as "resource.name" and also used as "literal.id". You may override these or any other request parameterthrough the -Dparams property. To do a commit only, use "-" as argument. The web mode is a simple crawler following links within domain, default delay=10s.

java -Dtype=application/json -jar post.jar *.json




package org.ljh.test.solr;import org.apache.solr.client.solrj.SolrServer;import org.apache.solr.client.solrj.impl.HttpSolrServer;import org.apache.solr.common.SolrInputDocument;public class BasicSolrJIndexDemo {public static void main(String[] args) throws Exception {/* * 注意,虽然使用地址http://ip:8080/solr/#/collection1来访问页面,但应该通过http:/ * /ip:8080/solr/collection1来进行文档的提交 */String serverUrl = (args != null && args.length > 0) ? args[0]: "http://localhost:8080/solr/collection1";SolrServer solrServer = new HttpSolrServer(serverUrl);SolrInputDocument doc1 = new SolrInputDocument();doc1.setField("id", "solrJTest3");doc1.setField("url", "http://www.163.com/");solrServer.add(doc1);SolrInputDocument doc2 = new SolrInputDocument();doc2.setField("id", "solrJTest4");doc2.setField("url", "http://www.sina.com/");solrServer.add(doc2);solrServer.commit(true,true);}}


package org.ljh.test.solr;import org.apache.solr.client.solrj.SolrQuery;import org.apache.solr.client.solrj.SolrServer;import org.apache.solr.client.solrj.impl.HttpSolrServer;import org.apache.solr.client.solrj.response.QueryResponse;import org.apache.solr.common.SolrDocument;import org.apache.solr.common.SolrDocumentList;public class BasicSolrJSearchDemo {public static void main(String[] args) throws Exception {String serverUrl = (args != null && args.length > 0) ? args[0]: "http://localhost:8080/solr/collection1";SolrServer solrServer = new HttpSolrServer(serverUrl);//读取输入参数作为查询关键字,若无关键字,则查询全部内容。String queryString = (args != null && args.length > 1) ? args[1] : "url:163";SolrQuery solrQuery = new SolrQuery(queryString);solrQuery.setRows(5);QueryResponse resp = solrServer.query(solrQuery);SolrDocumentList hits = resp.getResults();for(SolrDocument doc : hits ){System.out.println(doc.getFieldValue("id").toString() + " : " + doc.getFieldValue("url"));}}}



(2)ExtractingRequestHandler, aka Solr Cell


二、schema.xml : 定义文档的格式




<?xml version="1.0" encoding="UTF-8" ?>   <schema name="nutch" version="1.5">    <types>        <fieldType name="string" class="solr.StrField" sortMissingLast="true"            omitNorms="true"/>         <fieldType name="long" class="solr.TrieLongField" precisionStep="0"            omitNorms="true" positionIncrementGap="0"/>        <fieldType name="float" class="solr.TrieFloatField" precisionStep="0"            omitNorms="true" positionIncrementGap="0"/>        <fieldType name="date" class="solr.TrieDateField" precisionStep="0"            omitNorms="true" positionIncrementGap="0"/>        <fieldType name="text" class="solr.TextField"            positionIncrementGap="100">            <analyzer>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>                <filter class="solr.StopFilterFactory"                    ignoreCase="true" words="stopwords.txt"/>                <filter class="solr.WordDelimiterFilterFactory"                    generateWordParts="1" generateNumberParts="1"                    catenateWords="1" catenateNumbers="1" catenateAll="0"                    splitOnCaseChange="1"/>                <filter class="solr.LowerCaseFilterFactory"/>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>            </analyzer>        </fieldType>        <fieldType name="url" class="solr.TextField"            positionIncrementGap="100">            <analyzer>                <tokenizer class="solr.StandardTokenizerFactory"/>                <filter class="solr.LowerCaseFilterFactory"/>                <filter class="solr.WordDelimiterFilterFactory"                    generateWordParts="1" generateNumberParts="1"/>            </analyzer>        </fieldType>    </types>    <fields>        <field name="id" type="string" stored="true" indexed="true"/>        <!-- core fields -->        <field name="batchId" type="string" stored="true" indexed="false"/>        <field name="digest" type="string" stored="true" indexed="false"/>        <field name="boost" type="float" stored="true" indexed="false"/>        <!-- fields for index-basic plugin -->        <field name="host" type="url" stored="false" indexed="true"/>        <field name="url" type="url" stored="true" indexed="true"            required="true"/>        <field name="content" type="text" stored="false" indexed="true"/>        <field name="title" type="text" stored="true" indexed="true"/>        <field name="cache" type="string" stored="true" indexed="false"/>        <field name="tstamp" type="date" stored="true" indexed="false"/>        <field name="_version_" type="long" indexed="true" stored="true"/>        <!-- fields for index-anchor plugin -->        <field name="anchor" type="string" stored="true" indexed="true"            multiValued="true"/>        <!-- fields for index-more plugin -->        <field name="type" type="string" stored="true" indexed="true"            multiValued="true"/>        <field name="contentLength" type="long" stored="true"            indexed="false"/>        <field name="lastModified" type="date" stored="true"            indexed="false"/>        <field name="date" type="date" stored="true" indexed="true"/>        <!-- fields for languageidentifier plugin -->        <field name="lang" type="string" stored="true" indexed="true"/>        <!-- fields for subcollection plugin -->        <field name="subcollection" type="string" stored="true"            indexed="true" multiValued="true"/>        <!-- fields for feed plugin (tag is also used by microformats-reltag)-->        <field name="author" type="string" stored="true" indexed="true"/>        <field name="tag" type="string" stored="true" indexed="true" multiValued="true"/>        <field name="feed" type="string" stored="true" indexed="true"/>        <field name="publishedDate" type="date" stored="true"            indexed="true"/>        <field name="updatedDate" type="date" stored="true"            indexed="true"/>        <!-- fields for creativecommons plugin -->        <field name="cc" type="string" stored="true" indexed="true"            multiValued="true"/>                    <!-- fields for tld plugin -->            <field name="tld" type="string" stored="false" indexed="false"/>    </fields>    <uniqueKey>id</uniqueKey>    <defaultSearchField>content</defaultSearchField>    <solrQueryParser defaultOperator="OR"/></schema>





<field name="tag" type="string" stored="true" indexed="true" multiValued="true"/>








是否索引这个域,索引了就可以用作搜索域,除此之外,即使你不需要对这个域进行搜索,但需要排序、分组、查询提示、facet、function queries等,也需要对这个域进行索引。


In addition to enabling searching, you will also need to mark your field as indexed if you need to sort, facet, group by, provide query suggestions for, or execute function queries on values within a field.



 <field name="tag" type="string" stored="true" indexed="true" multiValued="true"/>

<add><doc>............<Field name="tag">lucene</Field><Field name="tag">solr</Field></doc></add>




 <field name="url" type="url" stored="true" indexed="true" required="true"/>


<dynamicField name="*_ti" type="tint" indexed="true" stored="true"/>

Dynamic fields help address common problems that occur when building search applications, including
■ Modeling documents with many fields
■ Supporting documents from diverse sources
■ Adding new document sources
具体可见solr in action的5.3.3节。



copy fields support two use cases that are common in most search applications:
■ Populate a single catch-all field with the contents of multiple fields.
■ Apply different text analysis to the same field content to create a new searchable field.


<copyField source="title" dest="text"/><copyField source="author" dest="text"/><copyField source="description" dest="text"/><copyField source="keywords" dest="text"/><copyField source="content" dest="text"/><copyField source="content_type" dest="text"/><copyField source="resourcename" dest="text"/><copyField source="url" dest="text"/>



<field name="text"type="stemmed_text"indexed="true"stored="true"/><field name="auto_suggest"type="unstemmed_text"indexed="true"stored="false"multiValued="true"/>...<copyField source="text" dest="auto_suggest" />





一类是要对其进行分析后再索引的非结构化数据,如文章 的正文等,如StrField,TrieLongField等。


(4)在schema.xml中看到 的solr.*代表的是org.apache.solr.schema.*,如

        <fieldType name="string" class="solr.StrField" sortMissingLast="true"  omitNorms="true"/> 






DateField的默认格式:In general, Solr expects your dates to be in the ISO-8601 Date/Time format (yyyy-MMddTHH:mm:ssZ); the date in our tweet (2012-05-22T09:30:22Z) breaks down to
yyyy = 2012
MM = 05
dd = 22
HH = 09 (24-hr clock)
mm = 30
ss = 22
Z = UTC Timezone (Z is for Zulu)


<field name="timestamp">2012-05022T09:30:00Z/HOUR</fileld>








positionIncrementGap:见solr in action 5.4.4节。







(5)使用基本类似来作为uniqueKey,不要使用复杂类型。  One thing to note is that it’s best to use a primitive field type, such as string or long, for the field you indicate as being the <uniqueKey/> as that ensures Solr doesn’t make
any changes to the value during indexing



<!--  The default high-performance update handler  --><updateHandler class="solr.DirectUpdateHandler2"><!-- Enables a transaction log, used for real-time get, durability, and         and solr cloud replica recovery.  The log can grow as big as         uncommitted changes to the index, so use of a hard autoCommit         is recommended (see below).         "dir" - the target directory for transaction logs, defaults to the                solr data directory.  --><updateLog><str name="dir">${solr.ulog.dir:}</str></updateLog><!-- AutoCommit         Perform a hard commit automatically under certain conditions.         Instead of enabling autoCommit, consider using "commitWithin"         when adding documents.          http://wiki.apache.org/solr/UpdateXmlMessages         maxDocs - Maximum number of documents to add since the last                   commit before automatically triggering a new commit.         maxTime - Maximum amount of time in ms that is allowed to pass                   since a document was added before automatically                   triggering a new commit.          openSearcher - if false, the commit causes recent index changes           to be flushed to stable storage, but does not cause a new           searcher to be opened to make those changes visible.         If the updateLog is enabled, then it's highly recommended to         have some sort of hard autoCommit to limit the log size.      --><autoCommit><maxTime>${solr.autoCommit.maxTime:15000}</maxTime><openSearcher>false</openSearcher></autoCommit><!-- softAutoCommit is like autoCommit except it causes a         'soft' commit which only ensures that changes are visible         but does not ensure that data is synced to disk.  This is         faster and more near-realtime friendly than a hard commit.      --><autoSoftCommit><maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime></autoSoftCommit><!-- Update Related Event Listeners                  Various IndexWriter related events can trigger Listeners to         take actions.         postCommit - fired after every commit or optimize command         postOptimize - fired after every optimize command      --><!-- The RunExecutableListener executes an external command from a         hook such as postCommit or postOptimize.                  exe - the name of the executable to run         dir - dir to use as the current working directory. (default=".")         wait - the calling thread waits until the executable returns.                 (default="true")         args - the arguments to pass to the program.  (default is none)         env - environment variables to set.  (default is none)      --><!-- This example shows how RunExecutableListener could be used         with the script based replication...         http://wiki.apache.org/solr/CollectionDistribution      --><!--       <listener event="postCommit" class="solr.RunExecutableListener">         <str name="exe">solr/bin/snapshooter</str>         <str name="dir">.</str>         <bool name="wait">true</bool>         <arr name="args"> <str>arg1</str> <str>arg2</str> </arr>         <arr name="env"> <str>MYVAR=val1</str> </arr>       </listener>      --></updateHandler>

0 3