solr文档索引最佳实践
来源:互联网 发布:网银淘宝买东西的流程 编辑:程序博客网 时间:2024/05/29 16:27
solr文档索引最佳实践
@(OTHERS)[solr]
- solr文档索引最佳实践
- 一直接提交
- 二AutoCommit
- 三 commitWithin
- 四建议及结论
- 1单线程情况
- 2多线程情况
solr的文档生成后,需要将其提交到solr集群,提交的方法有以下三种:
(一)直接提交
每生成一个文档就直接提交至solr:
CloudSolrClient client = new CloudSolrClient(SOLR_ZK);SolrInputDocument doc2 = new SolrInputDocument();doc2.addField("id", "ljhtest3");doc2.addField("key_ss", map);client.add(doc2);client.commit();
即每add一次就commit一次,这种实现方案实现简单,但性能不高。
* 注意,不只是commit()效率不高,client.add()的效率也是非常低的,因此需要将所有文档先add进一个collection,然后client.add(collection) *
List<SolrInputDocument> docList = new LinkedList<SolrInputDocument>(); for (int i = 0; i < DOC_NUM; i++) { SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField("id", "way2" + i); Set set = new HashSet(); for (String s : "abc,edf,kkk,lll".split(",")) { set.add(s); } Map map = new HashMap(); map.put("set", set); doc2.addField("key_ss", map); docList.add(doc2); } client.add(docList); client.commit();
(二)AutoCommit
可以在solrConfig.xml中的updateHandler设置自动提交机制:
<!-- Enables a transaction log, used for real-time get, durability, and and solr cloud replica recovery. The log can grow as big as uncommitted changes to the index, so use of a hard autoCommit is recommended (see below). "dir" - the target directory for transaction logs, defaults to the solr data directory. "numVersionBuckets" - sets the number of buckets used to keep track of max version values when checking for re-ordered updates; increase this value to reduce the cost of synchronizing access to version buckets during high-volume indexing, this requires 8 bytes (long) * numVersionBuckets of heap space per Solr core.--><updateLog> <str name="dir">${solr.ulog.dir:}</str> <int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int></updateLog><!-- AutoCommit Perform a hard commit automatically under certain conditions. Instead of enabling autoCommit, consider using "commitWithin" when adding documents. http://wiki.apache.org/solr/UpdateXmlMessages maxDocs - Maximum number of documents to add since the last commit before automatically triggering a new commit. maxTime - Maximum amount of time in ms that is allowed to pass since a document was added before automatically triggering a new commit. openSearcher - if false, the commit causes recent index changes to be flushed to stable storage, but does not cause a new searcher to be opened to make those changes visible. If the updateLog is enabled, then it's highly recommended to have some sort of hard autoCommit to limit the log size. --> <autoCommit> <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> <openSearcher>false</openSearcher> </autoCommit><!-- softAutoCommit is like autoCommit except it causes a 'soft' commit which only ensures that changes are visible but does not ensure that data is synced to disk. This is faster and more near-realtime friendly than a hard commit. --> <autoSoftCommit> <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> </autoSoftCommit>
可以指定多长时间间隔或者多少文档就会提交一次。
有hard & soft2种,后者不会将数据同步到disk。
(三) commitWithin
client.add(DEFAULT_COLLECTION, doc,1000);
client.add()的文档会在1000ms内被提交到solr中。
(四)建议及结论
1、单线程情况
1、将需要提交的文档add到一个collection中。
2、client add这个collection,而不是add文档 。
3、client指定commitWithin参数。
参考代码如下:
List<SolrInputDocument> docList = new LinkedList<SolrInputDocument>(); for (int i = 0; i < DOC_NUM; i++) { SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField("id", "way2" + i); Set set = new HashSet(); for (String s : "abc,edf,kkk,lll".split(",")) { set.add(s); } Map map = new HashMap(); map.put("set", set); doc2.addField("key_ss", map); docList.add(doc2); } client.add(docList); client.commit();
2、多线程情况
(1)hbase写时要有多线程
(2)coprocessor会在多个分区中并行执行。
阅读全文
0 0
- solr文档索引最佳实践
- MYSQL索引最佳实践
- Solr添加文档到索引
- mysql索引最佳实践-笔记
- [译] MYSQL索引最佳实践
- android开发最佳实践文档
- 优化solr全量建索引速度实践
- 调用lucene向solr建索引实践
- solr学习文档之增量更新索引
- Solr使用:3.Solr添加文档到索引
- MySQL 索引最佳实践之问题反馈
- MySQL 索引最佳实践之问题反馈
- DB2 数据库索引设计的最佳实践
- hibernate3最佳实践 (Hibernate参考文档笔记)
- 10 个项目文档最佳实践
- 10 个项目文档最佳实践
- 10 个项目文档最佳实践
- 10 个项目文档最佳实践
- Information:Gradle tasks [:app:assembleDebug]
- 获得性能大幅提升的go程序优化实践,火焰图使用
- 居中方式
- 正则表达式大全——包括校验数字、字符、一些特殊的需求等
- 中文分词:之Trie树
- solr文档索引最佳实践
- Java 常见异常种类
- 素数筛选
- 谷歌浏览器调试技巧
- 【附答案】Java 大数据方向面试题,你会几个?
- python xml解析
- kubernetes创建资源对象yaml文件例子--rc
- zookeeper的简单介绍
- C++之避免遮掩继承而来的名称(33)---《Effective C++》