Solr简介

来源:互联网 发布:阿里云 合作伙伴 列表 编辑:程序博客网 时间:2024/05/21 01:57

                 Apache Solr简介(xiangjiang5011@163.com)

Apache solr官方查看地址:http://lucene.apache.org/solr/

What Is Solr?

Solr is the popular, blazing fast open source enterprise searchplatform from the Apache Lucene project. Its major features includepowerful full-text search, hit highlighting, faceted search, dynamicclustering, database integration, rich document (e.g., Word, PDF) handling, andgeospatialsearch. Solr is highly scalable, providing distributedsearch and index replication, and it powers the search and navigation featuresof many of the world's largest internet sites.

Solr is written in Java and runs as a standalone full-text searchserver within a servlet container such asTomcat. Solr uses theLucene Java search library at its core for full-text indexing and search, andhas REST-like HTTP/XML and JSON APIs that make it easy to use from virtuallyany programming language. Solr's powerful external configuration allows it tobe tailored to almost any type of application without Java coding, and it hasan extensive plugin architecture when more advanced customization is required.

 

solr目前已经更新到3.5的版本可能由于中文分词器支持的原因(很多中文分词器并不支持solr的高版本),因此企业内部用的最多的还是1.4.1版本的solr,几乎支持所有的中分分词器,为了设计到solr的各方各面,已经公司的需要,因此,此次都是关于solr1.4.1相关的学习

 

windowns系统下在tomcat中安装单个solr

推荐学习地址:http://wiki.apache.org/solr/SolrTomcat 

1.准备目录环境

新建目录D://solrworkspace/

下载solr1.4.1版本到目录D://solrworkspace/apache-solr-1.4.1.zip

下载tomcat6.0版本到目录D://solrworkspace/apache-tomcat-6.0.30.zip

解压D://solrworkspace/apache-solr-1.4.1.zip到当前目录为D://solrworkspace/apache-solr-1.4.1($APACHE_SOLR_HOME)

解压D://solrworkspace/apache-tomcat-6.0.30.zip到当前目录为D://solrworkspace/apache-tomcat-6.0.30($TOMCAT_HOME)

新建目录 D://solrworkspace/solr ($SOLR_HOME)

新建目录($TOMCAT_HOME)/conf/Catalina/localhost

2.jar包部署

$APACHE_SOLR_HOME/dist/apache-solr-1.4.1.war copy到路径为D://solrworkspace/apache-solr-1.4.1.war  并改名为D://solrworkspace/solr.war

$APACHE_SOLR_HOME/dist/apache-solr-*.jar copy到目录$TOMCAT_HOME\lib

3.配置文件的修改

修改$TOMCAT_HOME/conf/tomcat-users.xml   添加

<role rolename="manager"/><role rolename="admin"/><user username="tomcat" password="tomcat" roles="manager,admin"/>
添加对中文的支持 $TOMCAT_HOME/conf/server.xml
<Server > <Service >    <Connector port="8080" protocol="HTTP/1.1"                connectionTimeout="20000"                redirectPort="8443" URIEncoding="UTF-8" /> 
    ...   </Connector> </Service></Server>

目录($TOMCAT_HOME)/conf/Catalina/localhost 新建solr.xml

修改solr.xml内容为

 

<?xml version="1.0" encoding="UTF-8"?>
<Context docBase="D:\solrworkspace\solr.war" debug="0" crossContext="true" >
    <Environment name="solr/home" type="java.lang.String" value="D:\solrworkspace\solr" override="true" />
</Context>

4.solr配置文件的部署

copy ($APACHE_SOLR_HOME)/example/solr目录下所有文件到$SOLR_HOME目录

5.启动tomcat

($TOMCAT_HOME)/bin/startup.bat

6.访问solr管理界面 http://localhost:8080/solr

                 

 solr添加mmseg4j中文分词

mmseg4j-1.8.3版本支持solr1.4.1,当前最高版本版本mmseg1.8.5版本过高不支持solr1.4.1

在$SOLR_HOME目录下新建lib,dic两个文件夹

下载mmseg4j-1.8.3.zip到D:/solrworkspace/mmseg4j-1.8.3.zip

解压D:/solrworkspace/mmseg4j-1.8.3.zip 为D:/solrworkspace/mmseg4j-1.8.3($MMSEG_HOME)

复制$MMSEG_HOME/data 目录下  *.dic 到目录 $SOLR_HOME/dic目录下

复制$MMSEG_HOME/mmseg4j-all-1.8.3.jar 到目录$SOLR_HOME/lib目录下

修改$SOLR_HOME/config/schema.xml

复制

  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">

              <tokenizer class="solr.WhitespaceTokenizerFactory"/>

             <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"     />

              ......

         </analyzer>

         </analyzer>

    </fieldType>

分别为

  <fieldType name="text_mmseg_complex" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">

               <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="dic"/>

              <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"     />

              ......

         </analyzer>

    </fieldType>

  <fieldType name="text_mmseg_max_word" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">

               <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" dicPath="dic"/>

              <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"     />

              ......

         </analyzer>

    </fieldType>

<fieldType name="text_mmseg_simple" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">

               <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicPath="dic"/>

              <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"     />

              ......

         </analyzer>

    </fieldType>

添加3条

 <fields>

 <field name="textMmsegComplex" type="text_mmseg_complex" indexed="true" stored="false"/>

   <field name="textMmsegMaxWord" type="text_mmseg_max_word" indexed="true" stored="false"/>

   <field name="textMmsegSimple" type="text_mmseg_simple" indexed="true" stored="false"/>

 </fields>

 

访问http://localhost:8080/solr/admin/analysis.jsp

Filed 选择框选择 name    后面的输入框填写textMmsegComplex,textMmsegMaxWord,textMmsegSimple3种值,分别对应mmseg3中分词格式

Field value (Index) 被索引的分词词组,Field value (Query) 被查询的分词词组

后面的输入框输入你想要被分词的语句或词组  

点击Analyze可以看到分词后被索引,和查询的结果

 

 

 

 

 

 

 

 

原创粉丝点击