Solr文档学习--Documents, Fields, and Schema Design

来源:互联网 发布:电脑速录软件 编辑:程序博客网 时间:2024/05/16 09:07

Overview of Documents, Fields, and Schema Design

The fundamental premise of Solr is simple. You give it a lot of information, then later you can ask it questions and find the piece of information you want. The part where you feed in all the information is called or indexing up. When you ask a question, it’s called a query.

Solr所做的事情就是建索引和查询数据。

Solr’s Schema File

Solr stores details about the field types and fields it is expected to understand in a schema file. The name and location of this file may vary depending on how you initially configured Solr or if you modified it later.

  • managed-schema is the name for the schema file Solr uses by default to support making Schema changes at runtime via the Schema API , or Schemaless Mode features. You may explicitly configure the managed schema features to use an alternative filename if you choose, but the contents of the files are managed schema features still updated automatically by Solr.
  • schema.xml is the traditional name for a schema file which can be edited manually by users who use the ClassicIndexSchemaFactory.
  • If you are using SolrCloud you may not be able to find any file by these names on the local filesystem. You will only be able to see the schema through the Schema API (if enabled) or through the Solr Admin UI’s Cloud Screens .

Solr对字段的定义在solr的模式定义文件里。

managed-schema

managed-schema 是solr的默认模式定义文件,可以在运行是通过Schema API 改变,或者使用Schemaless Mode特性。

我们建一个叫test的collection

这里写图片描述

会自动生成一个managed-schema的文件

这里写图片描述

文件开始有这么一段注释

<!-- Solr managed schema - automatically generated - DO NOT EDIT -->

也就是说我们不能通过编辑managed-schema来定义字段

我们可以通过Schema API来改变

先看一下当前schema的定义的fields

这里写图片描述

我们通过Schema API 来创建一个name的field

这里写图片描述

再回控制台看看

这里写图片描述

我们添加一条记录

这里写图片描述

查询一下

这里写图片描述

用程序操作一下

定义一个User

import org.apache.solr.client.solrj.beans.Field;import org.springframework.data.annotation.Id;public class User {    @Id    @Field    private String id;    @Field    private String name;    public String getId() {        return id;    }    public void setId(String id) {        this.id = id;    }    public String getName() {        return name;    }    public void setName(String name) {        this.name = name;    }    @Override    public String toString() {        return "User [id=" + id + ", name=" + name + "]";    }}

主程序

User user = new User();user.setId("123456");user.setName("程高伟");saveSolrResource(user);SolrQuery query = new SolrQuery();query.setQuery("程高伟");QueryResponse rsp = client.query(query);List<User> userList = rsp.getBeans(User.class);System.out.println(userList);

结果

这里写图片描述

schema.xml

通过ClassicIndexSchemaFactory用户可以编辑schema.xml

在配置schema.xml 之前需要了解Solr的Field定义

SolrCloud 暂不讨论

Field Type Definitions and Properties

A field type definition can include four types of information:

  • The name of the field type (mandatory)
  • An implementation class name (mandatory)
  • If the field type is TextField, a description of the field analysis for the field type
  • Field type properties - depending on the implementation class, some properties may be mandatory.

一个field(下面可能说字段)可以包括以下的4中信息。

  1. 名字(必须的)
  2. 实现类名称(必须的)
  3. 如果field是TextField,还要关注field analysis的描述(用来分词的)
  4. field类型的其他属性,主要看实现类是要求

Field types are defined in schema.xml. Each field type is defined between fieldType elements.

字段类型在schema.xml文件中定义,在fieldType标签中定义

<fieldType name="ancestor_path" class="solr.TextField">    <analyzer type="index">        <tokenizer class="solr.KeywordTokenizerFactory"/>    </analyzer>    <analyzer type="query">        <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/"/>    </analyzer></fieldType>

第一行包括名字ancestor_path,实现类solr.TextField,中间的是analyzer。后面再写文章

The implementing class is responsible for making sure the field is handled correctly. In the class names in schema.xml , the string is shorthand for org.apache.solr.analysis or org.apache.solr.schema .Therefore, solr.TextField is really org.apache.solr.schema.TextField.

实现类用来确保该字段可以被正确的处理。在schema.xml的定义中solr是 org.apache.solr.analysis或org.apache.solr.schema的缩写,因此,solr.TextField实际上是org.apache.solr.schema.TextField.

属性

Property Description Values name fieldType的名字。需要遵循命名规则 class 用来存储和索引数据的类 “solr.”是一个缩写,”solr.TextField”就是”org.apache.solr.schema.TextField”,前面有介绍,如果用第三方就必须使用包名+类名 indexed 如果设为true,字段的值可以被用来查询,将会被索引 true(默认)或false stored 如果为true, 字段的值可以查询到并返回 true(默认)或false required 必须的 true或false(默认false) 其他属性参考官方文档 。。。 。。。

Field Types Included with Solr

solr自带的数据类型

The following table lists the field types that are available in Solr. The package org.apache.solr.schema includes all the classes listed in this table.

这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述

Field Properties by Use Case

属性使用案例

这里写图片描述
这里写图片描述

角标的含义

这里写图片描述

总体看一下schema.xml

<schema>    <types>    <fields>    <uniqueKey>    <copyField></schema>

总体感觉比较乱。
全当记录了。

参考文献

http://101.110.118.72/archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-6.1.pdf

0 0
原创粉丝点击