Configuration of Solr(5.2.0) 1

来源:互联网 发布:约瑟夫环python 编辑:程序博客网 时间:2024/06/18 12:57

This a very simple configuration to make Solr working for your own documents. Further customized setting can be configured gradually. The format of my documents is JSON, the language is English.

1. Direct to the directory of the Solr, start the Solr by

bin/solr start

2. Create a new core for your documents by

bin/solr create -c yourcorename

3. Direct to the conf of yourcorename, it should be located in server/solr/conf. To make the Solr work as you expect, at least two files need to be configured, 'managed-schema' and 'solrconfig.xml'. 

3.1 Change the name of file 'managed-schema' to 'schema.xml', with '.xml' as its extension.

3.2 Find the line '<schema name="example-data-driven-schema" version="1.5">', the name can be changed to name that has meaning to you, like <schema name="yourcorename" version="1.5">. This is optional.

3.3 Set up schema for your documents. This step is important. The format of my JSON documents is [{doc_id:xxx, title:xxx, date:xxx, abstract:xxx, content:xxx,},], and the 'doc_id' is the unique key of the records in my documents. Scroll down the default schema, you will see '<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />', for me, I changed the 'id' to 'doc_id', as I don't have an 'id' field in my document; and simultaneously change the '<uniqueKey>id</uniqueKey>' to '<uniqueKey>doc_id</uniqueKey>'. If this is not changed, after indexing, the Solr will generate an 'id' value for each record in your documents; in this case, when indexing your documents twice you will get duplicates. Several attributes are specified there. 'type' can be 'string', 'int', 'long', 'text_en', 'text_general', etc. For this 'doc_id' field, I set 'type' to 'long'. 'indexed' could be two values, 'true' or false. If you want a field to be searched, this attribute should be set to 'true'. 'store' has two values too, true or false. If you want to display this field in your searched result, it should be set to 'true'. 'required' has two values, 'true' or 'false'. If this is set to 'true', then when indexing, an error would happen if a record lacks this field. 'multiValued' has two values, 'true' or 'false'. For this unique key 'doc_id', it must be set to 'false'. Finally, the 'doc_id' field looks like '<field name="doc_id" type="long" indexed="true" stored="true" required="true" multiValued="false"/>'. Keep the '_version_', '_text_' and '<copyField source="*" dest="_text_"/>' as default.

3.4 For me, add the following content.

    <field name="title" type="text_general" indexed="true" stored="true"/>
    <field name="date" type="date" indexed="true" stored="true"/>
    <field name="abstract" type="text_general" indexed="true" stored="true"/>
    <field name="content" type="text_general" indexed="true" stored="true"/>

3.5 Now, we can stop working at schema.xml and go to solrconfig.xml. We will come back to schema.xml. In solrconfig.xml, comment out or delete

<schemaFactory class="ManagedIndexSchemaFactory">
    <bool name="mutable">true</bool>
    <str name="managedSchemaResourceName">managed-schema</str>
  </schemaFactory>

and then add this line '<schemaFactory class="ClassicIndexSchemaFactory"/>'.  The last but important thing, comment out or delete the following content

<processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
      <str name="defaultFieldType">strings</str>
      <lst name="typeMapping">
        <str name="valueClass">java.lang.Boolean</str>
        <str name="fieldType">booleans</str>
      </lst>
      <lst name="typeMapping">
        <str name="valueClass">java.util.Date</str>
        <str name="fieldType">tdates</str>
      </lst>
      <lst name="typeMapping">
        <str name="valueClass">java.lang.Long</str>
        <str name="valueClass">java.lang.Integer</str>
        <str name="fieldType">tlongs</str>
      </lst>
      <lst name="typeMapping">
        <str name="valueClass">java.lang.Number</str>
        <str name="fieldType">tdoubles</str>
      </lst>
    </processor>

3.5 As I set the doc_id's type to long, the elevate.xml needs to be edited. Comment out or delete these two lines:

<doc id="MA147LL/A" />

<doc id="IW-02" exclude="true" />

Otherwise, the core yourcorename won't start.

3.6 Now, you can index your document correctly to Solr by running

bin/post -c yourcorename documentspath

3.7 Go to 'http://localhost:8983/solr'. 

You just set up a simple search engine using Solr.

0 0