Marklogic search development -2.2Controlling the Search With the Options Node

来源:互联网 发布:三星ml2161清零软件 编辑:程序博客网 时间:2024/03/29 22:50

2.2   Controlling the Search With the Options Node

2.2.1        Checking an Options Node Withsearch:check-options

Search:check-options 这个东西可以用来检测你的search option的设置中是否有错误。你可以通过<debug>true</debug>打开这个设置,并在所有的searchresponse中获得检测的结果。

2.2.2        Constraint Options

                             

Constraint option可以让你的searchsearch使用数据库的切片信息。意思是使用切片信息进行查询。

它包括了range indexworldlexicon,collection lexicon,fields,以及文档的结构如:elementvalue,attribute value等等,这些东东都有助于对数据文档的切片处理。而constraint option就是使用到了如上的方式切片的数据片断进行search

使用场景:

1:对特定的gammer进行解析。如:constraint range。。

2:可以定义search responsefacet的信息结构。

3:可以加强对term completion(search:suggest)的支持。

decade:1980s

constaint Option 支持对value,word,facet的设置。Constaint对value,word的设置就类似于marklogic将会使用对应的value,word查询来进行相应的search处理。如:cts:element-value-query,cts:element-attribute-value-query或者,cts:element-word-quer,cts:element-attribute-word-query等等。

然而对于facet的设置,要求在进行完毕search后,需要对search response对象进行facet的加工处理,因此设置facet的constraint必须作用在有range index上的value,element,word上。因为facet需要给出index的信息。

When they havefacets, they must generate fast and accurate

counts and distinct values. Therefore theconstraints that allow facets require a range index on the

elementor attribute on which they apply, or require a particular lexicon to exist inthe database.

  <constraintname="Statutes">

      <rangecollation="http://marklogic.com/collation/codepoint"type="xs:string" facet="true">

        <elementns="" name="ACT"/>

        <facet-option>limit=50</facet-option>

        <facet-option>frequency-order</facet-option>

        <facet-option>descending</facet-option>

      </range>

    </constraint>

(你必须设置ACT这个rangeindex在你的数据库中。)

Search API解析Querystring的时候,首先依据search option中的joiner string来查找用户的querystring中是否有匹配的joiner string。并将query string转化为ctsquery如果你没有设置constraint,那么searchAPI会忽略joiner string中的配置项目。

比如你的joiner option中设置了”and”

<joiner strength="20"apply="infix" element="cts:and-query"tokenize="word">AND</joiner>

当你针对constranit进行查询时:

search:parse('unrecognized-constraint:hello and go')

=>

<cts:word-query qtextref="cts:text"

xmlns:cts="http://marklogic.com/cts">

<cts:text>unrecognized-constraint:hello</cts:text>

</cts:word-query>

说白了,如果没有定义constraint那么实际你将获得cts-query的并非是<cts:element-range-quer/>而是

<cts:word-query/

Value Constraint Example

The followingoptions node defines two value constraints: one for an element and one for an

attribute.

<optionsxmlns="http://marklogic.com/appservices/search">

<constraintname="my-value">

<value>

<element ns="my-namespace" name="my-localname"/>

</value>

</constraint>

<constraint name="my-attribute-value">

<value>

<attribute ns=""name="my-attribute"/>

<element ns="my-namespace" name="my-localname"/>

</value>

</constraint>

</options>

Using theseconstraints, you can issue query text such as the following (from search:search or

search:parse, for example) to use these constraints:

my-value:"Thisis an element value."

my-attribute-value:123456

Both parts of theabove query text would match the following document:

<my-documentxmlns="my-namespace">

<my-localname>Thisis an element value.</my-localname>

<my-localnamemy-attribute="123456"/>

</my-document>

Word Constraint Examples

Thefollowing options node defines two word constraints: one for acts:element-word-queryand

one fora cts:field-word-query:

<options xmlns="http://marklogic.com/appservices/search">

<constraintname="name">

<word>

<elementns="http://authors-r-us.com" name="name"/>

</word>

</constraint>

<constraintname="description">

<word>

<field name="my-field"/>

</word>

</constraint>

</options>

MarkLogic Server Search API: Understandingand Using

MarkLogic 6—September, 2012 Search Developer’s Guide—Page 32

Usingthese constraints, you can issue query text such as the following (fromsearch:searchor

search:parse, for example) to use these constraints:

name:raymond

description:author

Thefirst query text above would match the following document (because a

cts:word-query("raymond") wouldmatch):

<my-document xmlns="http://authors-r-us.com">

<name>Raymond Carver</name>

</my-document>

Thesecond query text above matches the above document if thenameelementwas part of the field

named my-field.For details on fields, seeFields Database SettingsintheAdministrator’s Guide.

Collection Constraint Example

The following optionsnode defines a collection constraint, which allows you to constrain your

search to documentsthat are in a specified collection. To use this constraint, the collectionlexicon

must be enabled inthe database, otherwise an exception is thrown. If prefix is an attribute to the

the collection element in theconstraint, then the collection name is derived from the prefix

concatenated withthe constraint value.

One use for acollection constraint is to allow faceted navigation based on collections. For

example, if youhave collections based on subjects (for example, one called history, one called

math, and so on), then you can use a collection constraint to narrow thesearch to one of the

subjects.

<optionsxmlns="http://marklogic.com/appservices/search">

<constraint name="subject">

<collection prefix="/my-collections/"/>

</constraint>

</options>

Assumingthat all documents in your database have collection URIs that begin with thestring

/my-collections/like the following:

/my-collections/math

/my-collections/economics

/my-collections/zoology

Then the followingquery text examples will match documents in the corresponding collections:

subject:math

subject:economics

subject:zoology

If the databasecontains no documents in the specified collection, then the search returns no

matches.For information on collections, see “Collections” on page 236.

Bucketed Range Constraint Example

Range constraintsoperate on typed element or attribute values that have a corresponding range

index in thedatabase. Without the correct range index, rangeconstraints will throw a runtime

exception. Range constraint values can match on either allof the individual values for the element

orattribute, or on specified buckets,

说白了Range constraint可以同时实现基于elementattributesearch

which are namedranges of values. There are two types of

buckets, specifiedwith the bucket and computed-bucket elements in the range constraint

specification. The bucketspecification takes absolute ranges, and thecomputed-bucket

specificationtakes ranges that are relative to a given time.

这个很有意思,但是不清楚,bucket作用在绝对的ranges上。而computed-bucket会跟相对时间有关?

For moreinformation aboutcomputed-bucket range constraints, see “Computed Buckets Example” on page 62.

The followingexample uses search:parse with an options node that contains a bucket range

constraint. Thefollowing example is generated from the Oscars sample application, built using

ApplicationBuilder:

xqueryversion "1.0-ml";

importmodule namespace search =

"http://marklogic.com/appservices/search"

at"/MarkLogic/appservices/search/search.xqy";

search:parse('decade:1980s',(查询目标)

<optionsxmlns="http://marklogic.com/appservices/search">

<constraint name="decade">

<range type="xs:gYear" facet="true">

<bucket lt="1930" ge="1920"name="1920s">1920s</bucket>

<bucketlt="1940" ge="1930"name="1930s">1930s</bucket>

<bucketlt="1950" ge="1940"name="1940s">1940s</bucket>

<bucketlt="1960" ge="1950"name="1950s">1950s</bucket>

<bucketlt="1970" ge="1960"name="1960s">1960s</bucket>

<bucketlt="1980" ge="1970"name="1970s">1970s</bucket>

<bucketlt="1990" ge="1980"name="1980s">1980s</bucket>

<bucketlt="2000" ge="1990"name="1990s">1990s</bucket>

<bucketge="2000" name="2000s">2000s</bucket>

<facet-option>limit=10</facet-option>

<attribute ns="" name="year"/>

<element ns="http://marklogic.com/wikipedia"name="nominee"/>

</range>

</constraint>

</options>)

这个例子表示:查询constraint限定在elementnominee,其attributeyear,element-attribute-range19201230……

This query returnsthe following cts:query:

<cts:and-queryqtextconst="decade:1980s"

xmlns:cts="http://marklogic.com/cts">

<cts:element-attribute-range-query qtextconst="decade:1980s"

operator="&gt;=">

<cts:elementxmlns:_1="http://marklogic.com/wikipedia">

_1:nominee</cts:element>

<cts:attribute>year</cts:attribute>

<cts:valuexsi:type="xs:gYear"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

1980</cts:value>

</cts:element-attribute-range-query>

<cts:element-attribute-range-query qtextconst="decade:1980s"

operator="&lt;">

<cts:elementxmlns:_1="http://marklogic.com/wikipedia">

_1:nominee</cts:element>

<cts:attribute>year</cts:attribute>

<cts:valuexsi:type="xs:gYear"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

1990</cts:value>

</cts:element-attribute-range-query>

</cts:and-query>

很有意思,在query string中我们只定义了'decade:1980s'查询目标,但实际上会执行2个cts:element-attribute-range-query查询,为什么呢?

See the Oscarssample application that you generate from Application Builder for sample data

against which youcan run this query. For other range constraint examples, see “Buckets

Example”on page 60 and “Computed Buckets Example” on page 62, and the followingexample.

Exact Match (Unbucketed) Range Constraint Example

The followingexample shows an exact match year range constraint against the Oscars sample

application. Itreturns results that match the year 1964. To see the output, run this queryagainst

the Oscarsdatabase.

xqueryversion "1.0-ml";

importmodule namespace search =

"http://marklogic.com/appservices/search"

at "/MarkLogic/appservices/search/search.xqy";

let$options :=

<optionsxmlns="http://marklogic.com/appservices/search">

<constraint name="year">

<range type="xs:gYear" facet="true">

<facet-option>limit=10</facet-option>

<attributens="" name="year"/>

<elementns="http://marklogic.com/wikipedia"

name="nominee"/>

</range>

</constraint>

</options>

return

search:search("year:1964", $options)

注意到,此处没有定义bucket

2.2.3        Operator Options

可以使用operator option来清理你的searchresponse数据。如做相应的sorting等等。

如果说constraint option是用来执行或者解析querystring的,如:你定义了相应的value或者wordcollectionrange等等的constranit,它可以将你的querystring解析成对应的elemnet-value,或者word,collection等等查询,从而narowller缩小search范围达到提升performance的效果。那么operatoroption就是来规范你的search结果的。如你可以在其中定义按照date来排序sorting你的结果。

<optionsxmlns="http://marklogic.com/appservices/search">

<search:operator name="sort">

<search:state name="relevance">

<search:sort-order>

<search:score/>

</search:sort-order>

</search:state>

<search:state name="date">

<search:sort-orderdirection="descending" type="xs:dateTime">

<search:element ns="my-ns" name="date"/>

</search:sort-order>

<search:sort-order>

<search:score/>

</search:sort-order>

</search:state>

</search:operator>

</options>

我们可以定义自己的Operator来丰富查询功能。如上定义了sort这个operator,。那么我们可以使用sortrelevance或者sortdate来对结果集进行sorting

Search operator 的定义如下:

Search:Operator

                               Search:State

                                                Search:sort-order

                                                Search:additional-query

                                                Search:debug

                                                Search:forest

                                                Search:page-length

                                                Search:quality-weight

                                                Search:sort-order

                                                Search:transform-results

可以看出这些对State的定义或者限定,是对整个search结果的二次处理,你可以在search结果集中进行sorting,二次查询,debug,限定forest来源,pagelength,以及做变形处理(transform-result)

2.2.4        Return Options

如果说operator option是用来对searchAPI的结果集进行二次加工的,那么return option就是对search API的结果集进行限定输入的。它决定了要输出什么。

<optionsxmlns="http://marklogic.com/appservices/search">

<return-metrics>true</return-metrics>

<return-facets>true</return-facets>

<return-results>false</return-results>

</options>

Only the neededparts of the response are computed, so if you do not return results (as in the

above example) ordo not return something else, then the work needed to perform that part of the

responseis not done, and the search runs faster.

这样做就是为了防止在某些场景下返回不必要的数据到前台去。

2.2.5        Searchable Expression Option

在默认状态下search API查询的scope会是整个数据库fn:collection().因此显然我们需要一个option可以定义searchscope。而之前提及的constraintoption也只是在search scope内做类似valuewordrange等的进一步精细化查询。

因此我们需要一个scope级别的option设置来有效的定义searchscope。这就是searchable expression option.

The followingexample shows a searchable expression that searches over both CITATION

elementsand html elements:

<searchable-expressionxmlns:xh="http://www.w3.org/1999/xhtml">

/(xh:html | CITATION)

</searchable-expression>

If an expression is not fully searchable, it will throw anXDMP-UNSEARCHABLEexception at runtime

2.2.6        Fragment Scope Option

Fragment scope也是用来定义search scope的。不同searchableexpression的是,searchable expression貌似是对整个database中的document进行pathexpression级别上scope的限定。而Fragment scope提供两种限定方案,一种是基于database级别的,即可以定义在整个databaseglobal级别进行查询。另一种是基于constraint级别的,即可以定义在constraint option节点下,限定相应的fragment来查询。

这从侧面可能说明了如下的结构:

一个Document---〉劈成若干多的fragment进行存储--->rangeindex,value,word查询是基于fragment。(fragmentindex之间是有依赖关系存在的,否则不可能做到基于indexfragment查询)

如果是基于constraint级别的fragmentscope option定义,那么直接限定了在相应fragment集合中进行查询。那么理论上每一个fragment必然都有一个唯一的key被保存在marklogicmemory中。这样才能做到高效。试想,有这样一个lookup list或者keyvalue内存对象,里面记录了fragmentkey,以及地址信息(甚至fragment信息,缓存起来根据5分钟法则进行缓存生命周期管理)Cts:search会根据option定义首先选取scope。第一个就是fragmentscope接着使用searchable expression再次划定scope,再来根据找到的fragment进行constraint的查询(constraint查询,一般会将querystring翻译为对应的cts:element-value; element-attribute; range value,collection等等查询),constraint的查询,有些是直接到相应的index或者lexiocn中进行查询。以上都只是对indexlexicon或者fragment进行了search,而要保持查询的100%精确程度,则必须要经过对fragment内容的匹配查询。

 

There are two types of fragment-scope options: aglobal fragment scope, which applies to the

both the search and any constraints in the search, and alocal fragment scope, which applies to a

given constraint. A global fragment-scope is specified asa child of <options>, and a local

fragment scope is specified as a child of a contraintkind (for example, a child of <range>,

<value>, or <word>). Anylocal fragment scope will override the global fragment scope.

 

2.2.7        Modifying Your Snippet Results

对于Marklogic来说transform-resultsoption就是用来加工searchresponse结果集。

你可以使用apply来使用自己customize的function来处理response结果。

Specifying transform-results Options

默认情况下marklogic使用apply="snippet"这个funcation来处理response结果。

<transform-resultsapply="snippet">

<per-match-tokens>30</per-match-tokens>

<max-matches>4</max-matches>

<max-snippet-chars>200</max-snippet-chars>

<preferred-elements/>

</transform-results>

以上定义了snippet的结构,表示需要最多显示4个地方在同一个matchingdocument中。匹配的snippet最多显示200个字节,每一块与查询匹配的段落文件长度保持在30word以内。

<transform-results apply="raw" />

表示在snippet中使用原始的matching 文本而且是整个node信息,不做highlight处理。因此你可以做自己的处理,比如:取相应的文本,设置返回长度等等。(实际这些默认的search:snippet方法已经提供了)

<transform-results apply="empty-snippet" />

顾名思义response结果中不包含snippet信息。

<transform-resultsapply="metadata-snippet">

<preferred-elements>

<!-- Specify namespace and localname forelements that exist

in properties documents -->

<elementns="http://my.namespace" name="my-local-name"/>

</preferred-elements>

</transform-results>

这个东东非常有用,这个表示匹配search的结果文件将返回指定的element信息。如果你没有指定特定的element那么将返回最近改动的element信息

比如:某一个商品价格xml文档,价格当然是最敏感的信息了。往往我们会将价格这个element定义为metadata中重要的一个node。当我们试图search商品信息的时候,这个价格element就可以设置到此处。如:查询“三星手机”那么会得到所有三星不同手机型号得的信息,同时直接获得他们的价格信息。同样,如果没有指定特定字段信息的时候,我们会得到最近针对匹配上的三星手机它们的更新信息。如:最近调整了价格,调整了参数,甚至是调整了库存。非常有用。

Specifying Your Own Code in transform-results

这个不用多说了,就是你可以通过apply的方式functionvalue的模式是用你自己自定义的方法来处理search response

自定义的处理方法,必须与默认的search:snippet方法拥有一致的方法签名:

declarefunction search:snippet(

$result as node(),

$ctsquery asschema-element(cts:query),

$options as element(search:transform-results)?

) as element(search:snippet)

 

<optionsxmlns="http://marklogic.com/appservices/search">

<transform-resultsapply="my-snippet"ns="my-namespace"

at="/my-snippet.xqy">

</transform-results>

</options>

2.2.8        Other Search Options

以上谈了这么多,大致有如下几个方面:

针对查询条件validation的: checking option

针对查询大scope的:                              searchableexpression; fragment scope

针对查询query的:                                   constraint;(value/word/conllectionetc。。)

针对查询结果处理的:                              snippet

针对查询结果返回的:                              return option

有了这么多的option,其中针对querystring的只有一个就是constraint,因为它是针对valueword或者rangeindexcollection的。它会将query string翻译成对应的cts:word-query,element-attribue-value-query,range index search等等。

很完美,但我们还需要针对query string做进一步的完善。如:termcompletion(类似于联想功能),特定的符号支持(gammerandor – 等等),以及对search的大小写敏感支持,也或者支持特定或自定义的字符(如:手机:xx

这些都被囊括在other search option杂烩中了。

原创粉丝点击