solr文档索引最佳实践

来源:互联网 发布:网银淘宝买东西的流程 编辑:程序博客网 时间:2024/05/29 16:27

solr文档索引最佳实践

@(OTHERS)[solr]

  • solr文档索引最佳实践
    • 一直接提交
    • 二AutoCommit
    • 三 commitWithin
    • 四建议及结论
      • 1单线程情况
      • 2多线程情况

solr的文档生成后,需要将其提交到solr集群,提交的方法有以下三种:

(一)直接提交

每生成一个文档就直接提交至solr:

CloudSolrClient client = new CloudSolrClient(SOLR_ZK);SolrInputDocument doc2 = new SolrInputDocument();doc2.addField("id", "ljhtest3");doc2.addField("key_ss", map);client.add(doc2);client.commit();

即每add一次就commit一次,这种实现方案实现简单,但性能不高。

* 注意,不只是commit()效率不高,client.add()的效率也是非常低的,因此需要将所有文档先add进一个collection,然后client.add(collection) *

        List<SolrInputDocument> docList = new LinkedList<SolrInputDocument>();        for (int i = 0; i < DOC_NUM; i++) {            SolrInputDocument doc2 = new SolrInputDocument();            doc2.addField("id", "way2" + i);            Set set = new HashSet();            for (String s : "abc,edf,kkk,lll".split(",")) {                set.add(s);            }            Map map = new HashMap();            map.put("set", set);            doc2.addField("key_ss", map);            docList.add(doc2);        }        client.add(docList);        client.commit();

(二)AutoCommit

可以在solrConfig.xml中的updateHandler设置自动提交机制:


<!-- Enables a transaction log, used for real-time get, durability, and     and solr cloud replica recovery.  The log can grow as big as     uncommitted changes to the index, so use of a hard autoCommit     is recommended (see below).     "dir" - the target directory for transaction logs, defaults to the            solr data directory.     "numVersionBuckets" - sets the number of buckets used to keep            track of max version values when checking for re-ordered            updates; increase this value to reduce the cost of            synchronizing access to version buckets during high-volume            indexing, this requires 8 bytes (long) * numVersionBuckets            of heap space per Solr core.--><updateLog>  <str name="dir">${solr.ulog.dir:}</str>  <int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int></updateLog><!-- AutoCommit     Perform a hard commit automatically under certain conditions.     Instead of enabling autoCommit, consider using "commitWithin"     when adding documents.      http://wiki.apache.org/solr/UpdateXmlMessages     maxDocs - Maximum number of documents to add since the last               commit before automatically triggering a new commit.     maxTime - Maximum amount of time in ms that is allowed to pass               since a document was added before automatically               triggering a new commit.      openSearcher - if false, the commit causes recent index changes       to be flushed to stable storage, but does not cause a new       searcher to be opened to make those changes visible.     If the updateLog is enabled, then it's highly recommended to     have some sort of hard autoCommit to limit the log size.  --> <autoCommit>    <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>    <openSearcher>false</openSearcher>  </autoCommit><!-- softAutoCommit is like autoCommit except it causes a     'soft' commit which only ensures that changes are visible     but does not ensure that data is synced to disk.  This is     faster and more near-realtime friendly than a hard commit.  --> <autoSoftCommit>    <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>  </autoSoftCommit>

可以指定多长时间间隔或者多少文档就会提交一次。
有hard & soft2种,后者不会将数据同步到disk。

(三) commitWithin

client.add(DEFAULT_COLLECTION, doc,1000);

client.add()的文档会在1000ms内被提交到solr中。

(四)建议及结论

1、单线程情况

1、将需要提交的文档add到一个collection中。
2、client add这个collection,而不是add文档 。
3、client指定commitWithin参数。
参考代码如下:

        List<SolrInputDocument> docList = new LinkedList<SolrInputDocument>();        for (int i = 0; i < DOC_NUM; i++) {            SolrInputDocument doc2 = new SolrInputDocument();            doc2.addField("id", "way2" + i);            Set set = new HashSet();            for (String s : "abc,edf,kkk,lll".split(",")) {                set.add(s);            }            Map map = new HashMap();            map.put("set", set);            doc2.addField("key_ss", map);            docList.add(doc2);        }        client.add(docList);        client.commit();

2、多线程情况

(1)hbase写时要有多线程
(2)coprocessor会在多个分区中并行执行。

原创粉丝点击