nutch2.3.1 构建solr6索引时meta_keywords longer than the max length 32766
来源:互联网 发布:网络错误678怎么解决 编辑:程序博客网 时间:2024/05/17 23:51
解决办法有3
1是在managed schema置meta_* 的index=false
2是在managed schema置meta_* 的type=任意一种class是solr.TextField的类型
3是修改nutch代码MetaTagsParser.java如下
private void addIndexedMetatags(Map<CharSequence, ByteBuffer> metadata, String metatag, String value) { //add here if(value.getBytes("utf-8").length > 32765) return; String lcMetatag = metatag.toLowerCase(Locale.ROOT); if (metatagset.contains("*") || metatagset.contains(lcMetatag)) { if (LOG.isDebugEnabled()) { LOG.debug("Found meta tag: " + lcMetatag + "\t" + value); } metadata.put(new Utf8(PARSE_META_PREFIX + lcMetatag), ByteBuffer.wrap(value.getBytes())); } }
3 如果数据库已经存在过长数据,需要在index时过滤,修改文件: SolrIndexWriter.java
@Override public void write(NutchDocument doc) throws IOException { final SolrInputDocument inputDoc = new SolrInputDocument(); for (final Entry<String, List<String>> e : doc) { for (final String val : e.getValue()) { Object val2 = val; if (e.getKey().equals("content") || e.getKey().equals("title")) { val2 = SolrUtils.stripNonCharCodepoints(val); } if(e.getKey().startsWith("meta_") && val.getBytes("utf-8").length > 32765){ LOG.warn("trim too long value for key:" + e.getKey()); continue; } inputDoc.addField(solrMapping.mapKey(e.getKey()), val2); String sCopy = solrMapping.mapCopyKey(e.getKey()); if (sCopy != e.getKey()) { inputDoc.addField(sCopy, val2); } } } inputDoc.setDocumentBoost(doc.getScore()); inputDocs.add(inputDoc); documentCount++; if (inputDocs.size() >= batchSize) { try { LOG.info("Adding " + Integer.toString(inputDocs.size()) + " documents"); solr.add(inputDocs); } catch (final SolrServerException e) { throw new IOException(e); } inputDocs.clear(); } }
0 0
- nutch2.3.1 构建solr6索引时meta_keywords longer than the max length 32766
- UTF8 encoding is longer than the max length 32766
- 搭建Hadoop2.6+Hbase0.98.20+Nutch2.3.1+solr6.0.1环境
- The max length of a byte array
- print the numbers less than the max n-bit integer
- You and I have memories longer than the road that stretches out
- More than the max…
- 建立varchar字段的索引时,提示Specified key was too long; max key length is 767 bytes
- solr6-定时增量索引
- Delete log files longer than 45 days
- Find another way longer than A-C
- More than one way to get the max of 3 numbers.
- Listen queue size is greater than the system max net.core.somaxconn (128)解决
- 集成Nutch2.3.1/Hadoop2.5.2/Hbase1.1.5/Solr4.10.4构建搜索引擎:安装及运行
- Nutch2.3.1版本选择
- [Python] openpyxl读存大数据 Exception: String longer than 32767 characters
- Field XXXX input value is longer than screen field-BDC
- DB2 的Max Key Length
- DOS命令
- JavaScript实现trim函数
- 线程同步:条件变量的使用细节分析
- 素数打表
- 35 岁前程序员要规划好的四件事!
- nutch2.3.1 构建solr6索引时meta_keywords longer than the max length 32766
- solr 6.1 服务端 tomcat 搭建及调用
- Scala与MapReduce开发的IDE插件
- Git基本常用命令
- 设计模式笔记1 简单工厂
- Android分享之Log工具类
- [USACO2003 Dec]Cow Queueing数数的梦 (基础水数位DP带注释!)
- HBase使用总结
- LeetCode 79. Word Search