Refactoring with Solr
来源:互联网 发布:行政区域数据 编辑:程序博客网 时间:2024/06/10 04:40
1.About Solr
Solris the popular, blazing fast, open source NoSQL search platform from the ApacheLucene project. Its major features include powerful full-text search, hithighlighting, faceted search, dynamic clustering, database integration, richdocument (e.g., Word, PDF) handling, and geospatial search. Solr is highlyscalable, providing fault tolerant distributed search and indexing, and powersthe search and navigation features of many of the world's largest internetsites.
SolrFeatures: Solr is a standalone enterprise search server with a REST-like API.You put documents in it (called "indexing") via JSON, XML, CSV orbinary over HTTP. You query it via HTTP GET and receive JSON, XML, CSV orbinary results.
2.Solr SetupSoftwareDownload
Java:You will need the Java Runtime Environment (JRE) version 1.7 or higher.
http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html
Tomcat:Through the server deployment project (May also be other server).
http://tomcat.apache.org/
Solr
http://lucene.apache.org/solr/
SetupStepsStep1
ExtractSolr.zip
Step2
Copysolr \server\webapps\solr.war to tomcat \webapps
Step3
Runtomcat startup.bat (tomcat will automatically unpack solr.war)
Step4
Deletetomcat \webapps\solr.war (if not,tomcat will publish solr every time whenserver start up)
Step5
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>${solrHome }</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
Changetomcat \webapps\solr\WEB-INF\web.xml.
Addabove code in <web-app /> node.
Step6
Copyall files under solr \example\example-DIH\solr to local path named ${solrHome }
Step7
Copysolr \dist\*.jar to tomcat webapps\solr\WEB-INF\lib
Step8
Starttomcat, to access http://localhost:8080/solr/
Ifsuccessfully, You will see below page.
3.Schema.xml
Schema.xml is usually thefirst file you configure when setting up a new Solr installation.
Theschema declares:
l Whatkinds of fields there are
l Whichfield should be used as the unique/primary key
l Whichfields are required
l Howto index and search each field
<types>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
...
</types>
TheXML consists of a number of parts.
Field Types
<fields>
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="textgen" indexed="true" stored="true"/>
...
</fields>
Theexample Solr schema.xml comes with a number of pre-defined field types, andthey're quite well-documented. You can also use them as templates for creatingnew field types.
Fields
Thedocumentation provides a list of valid attributes:
name: mandatory - the name forthe field
type: mandatory - the name of apreviously defined type from the <types> section
indexed: true if this field shouldbe indexed (searchable or sortable)
stored: true if this field shouldbe retrievable
compressed: [false] if this fieldshould be stored using gzip compression (this will only apply if the field typeis compressable; among the standard field types, only TextField and StrFieldare)
multiValued: true if this field maycontain multiple values per document
omitNorms: (expert) set to true to omitthe norms associated with this field (this disables length normalization andindex-time boosting for the field, and saves some memory). Only full-textfields or fields that need an index-time boost need norms.
termVectors: [false] set to true tostore the term vector for a given field. When using MoreLikeThis, fields usedfor similarity should be stored for best performance.
termPositions: Store position informationwith the term vector. This will increase storage costs.
termOffsets: Store offset informationwith the term vector. This will increase storage costs.
default: a value that should be usedif no value is specified when adding a document.
Misc
<uniqueKey>id</uniqueKey>
uniqueKey
<defaultSearchField>aggregate_text</defaultSearchField>
Equivalent to the primary keyof the document.
defaultSearchField
<solrQueryParser defaultOperator="OR"/>
solrQueryParser
Usedfor determining if multiple terms are ANDed or ORed together by default.
4.Solrconfig.xmlSolrconfig.xmlis usually the second file you configure when setting up a new Solrinstallation, after schema.xml.
<!--
Used to specify an alternate directory to hold all index data
other than the default ./data under the Solr home.
If replication is in use, this should match the replication configuration.
-->
<dataDir>${solr.data.dir:./solr/data}</dataDir>
The more commonly-usedelements in solrconfig.xml are:
l data directory location
l cacheparameters
l requesthandlers
Request handlers areresponsble for accepting HTTP requests, performing searches, then returning theresults.
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="echoParams">explicit</str>
<!--
<int name="rows">10</int>
<str name="fl">*</str>
<str name="version">2.1</str>
-->
</lst>
</requestHandler>
Thedefault request handler that comes configured with the example webapp, alsoknown as the standard request handler, looks like this:
l searchcomponents
Search components extend theabstract class SearchComponent and areresponsible for performing the actual searches.
5.SolrJSetting up the classpath
From /dist
apache-solr-solrj-*.jar
From/dist/solrj-lib
commons-codec-1.3.jar
commons-httpclient-3.1.jar
commons-io-1.4.jar
jcl-over-slf4j-1.5.5.jar
slf4j-api-1.5.5.jar
From /lib
slf4j-jdk14-1.5.5.jar
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.common.SolrInputDocument;
import java.io.IOException;
public class SolrjPopulator {
public static void main(String[] args) throws IOException, SolrServerException {
HttpSolrServer server = new HttpSolrServer("http://localhost:8983/solr");
for(int i=0;i<1000;++i) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("cat", "book");
doc.addField("id", "book-" + i);
doc.addField("name", "The Legend of the Hobbit part " + i);
server.add(doc);
if(i%100==0) server.commit(); // periodically flush
}
server.commit();
}
}
Add documents using SolrJ
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocumentList;
import java.net.MalformedURLException;
public class SolrJSearcher {
public static void main(String[] args) throws MalformedURLException, SolrServerException {
HttpSolrServer solr = new HttpSolrServer("http://localhost:8983/solr");
SolrQuery query = new SolrQuery();
query.setQuery("sony digital camera");
query.addFilterQuery("cat:electronics","store:amazon.com");
query.setFields("id","price","merchant","cat","store");
query.setStart(0);
query.set("defType", "edismax");
QueryResponse response = solr.query(query);
SolrDocumentList results = response.getResults();
for (int i = 0; i < results.size(); ++i) {
System.out.println(results.get(i));
}
}
}
Search using SolrJ
6.Boosts
7.Next
8.Suggetions
9.References
Boosts
In addition to the scoring factorsmentioned above, the primary method of modifying document scores is byboosting.
There are 2 kinds of boosts. Index-time andQuery-time boosts.
Index-time boosts are applied when addingdocuments, and apply to the entire document or to specific fields.
Query-time boosts are applied whenconstructing a search query, and apply to specific fields.
Query boosts are applied by appending thecaret character ^ followed by a positive number to query clauses.
title:foo OR(title:foo AND title:bar)^2.0 OR title:"foo bar"^10
Negative boosts
Whilst Lucene allows negative boosts, Solrdoes not.
The only way to meaningfully perform anegative boost, is by applying a positive boost to a negative query. Forexample:
(*:*-title:foo)^2.0
This boosts all documents which don't have"foo" in the title by 2.0, thereby effectively applying a down boostto documents which do.
We mainly use Index-time fashion to applyboosts when adding documents.
There are two fields to operate usingSolrj.We can adding boost to the field in solr document or adding boost to solrdocument itself.
There are three people wrwangwr@cn.ibm,panhm@cn.ibm.com and yanjuqi@cn.ibm.com. All of them have a title “I can playjava”.Now we add boost 1 to wrwangwr@cn.ibm.comtitle field,2 to panhm@cn.ibm.com title field and 3 to yanjuqi@cn.ibm.com titlefield. After we selected the key word "java",the data will displaylike blow.
0 0
- Refactoring with Solr
- Refactoring Mongoose with Q
- Refactoring Game Entities with Components
- Refactoring - replace Switch statement with state/strategy
- Refactoring Day 8 : Replace Inheritance with Delegation
- [Project organization] - structure refactoring with TFS
- Refactoring Day 31 : Replace conditional with Polymorphism
- Refactoring
- Refactoring
- Refactoring
- Refactoring
- Solr with Apache Tomcat
- Evolve Your Hierarchy : Refactoring Game Entities with Components
- nutch with solr getting started
- Refactoring Notes-Refactoring Methods
- Solr: Indexing XML with Lucene and REST
- nutch2.1 index with solr 4.0
- Newegg Cassandra Secondary Indexing with Solr
- 树莓派3玩耍日志3——pi4j安装测试
- 快速排序(Quicksort)的Javascript实现
- TransR/CTransR论文:链接预测
- windows下批量读取文件夹及子文件夹下的文件名字,方便制作训练样本
- crossdomain.xml 文件安全配置 百度云观测、360网站监测 dz论坛crossdomain.xml
- Refactoring with Solr
- javascript中的Left,Top,Width,Height等相关属性汇总
- sqlite3函数sqlite3_exec
- 第四周 时间
- window.location.hash详解
- 视频网络协议
- poj1047 数论
- 字体问题
- select自定义属性值