solr5中配置dataimport-handler

来源:互联网 发布:数据复制 编辑:程序博客网 时间:2024/05/21 11:14


1.Configuring solrconfig.xml

(文件所在路径D:\solr5\solr-5.0.0\server\solr\xuq\conf\solrconfig.xml

Data Import Handler必须先在solrconfig.xml中进行注册。

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
  <lst name="defaults">
    <str name="config">DIHconfigfile.xml</str>
  </lst>
</requestHandler>

2.Configuring the DIH Configuration File(DIHconfigfile.xml

(文件所在路径D:\solr5\solr-5.0.0\server\solr\xuq\conf\DIHconfigfile.xml

the DIH Configuration File是基于目录下D:\solr5\solr-5.0.0\example\example-DIH\solr\db\conf中的db-data-config.xml文件的注释文件,它从以下定义的4张数据库表中抽取数据。

<dataConfig>
<!-- The first element is the dataSource, in this case an HSQLDB database.
     The path to the JDBC driver and the JDBC URL and login credentials are all specified here.
     Other permissible attributes include whether or not to autocommit to Solr,the batchsize
     used in the JDBC connection, a 'readOnly' flag -->
  <dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:./example-DIH/hsqldb/ex" user="sa" />
 
<!-- a 'document' element follows, containing multiple 'entity' elements.
     Note that 'entity' elements can be nested, and this allows the entity
     relationships in the sample database to be mirrored here, so that we can
     generate a denormalized Solr record which may include multiple features
     for one item, for instance -->
  <document>
 
<!-- The possible attributes for the entity element are described below.
     Entity elements may contain one or more 'field' elements, which map
     the data source field names to Solr fields, and optionally specify
     per-field transformations -->
<!-- this entity is the 'root' entity. -->
    <entity name="item" query="select * from item"
            deltaQuery="select id from item where last_modified > '${dataimporter.last_index_time}'">
      <field column="NAME" name="name" />
 
<!-- This entity is nested and reflects the one-to-many relationship between an item and its multiple features.
     Note the use of variables; ${item.ID} is the value of the column 'ID' for the current item
     ('item' referring to the entity name)  -->
      <entity name="feature" 
              query="select DESCRIPTION from FEATURE where ITEM_ID='${item.ID}'"
              deltaQuery="select ITEM_ID from FEATURE where last_modified > '${dataimporter.last_index_time}'"
              parentDeltaQuery="select ID from item where ID=${feature.ITEM_ID}">
        <field name="features" column="DESCRIPTION" />
      </entity>
      <entity name="item_category"
              query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'"
              deltaQuery="select ITEM_ID, CATEGORY_ID from item_category where last_modified > '${dataimporter.last_index_time}'"
              parentDeltaQuery="select ID from item where ID=${item_category.ITEM_ID}">
        <entity name="category"
                query="select DESCRIPTION from category where ID = '${item_category.CATEGORY_ID}'"
                deltaQuery="select ID from category where last_modified > '${dataimporter.last_index_time}'"
                parentDeltaQuery="select ITEM_ID, CATEGORY_ID from item_category where CATEGORY_ID=${category.ID}">
          <field column="description" name="cat" />
        </entity>
      </entity>
    </entity>
  </document>
</dataConfig>

数据源根据实际情况配置。将DIHconfigfile.xml文件放到solrconfig.xml同目录下,重启solr服务器即可。

资料参考:https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler


1 0
原创粉丝点击