solr和hbase结合进行索引搜索
来源:互联网 发布:javascript正则表达式 编辑:程序博客网 时间:2024/05/07 17:05
solrcloud集群情况
solrcloud集群已经安装完成。
solr版本:5.5.0,zookeeper版本:3.4.6
solr的操作用户、密码: solr/solr123
solr使用的zookeeper安装位置:/opt/zookeeper-3.4.6
solr安装位置:/opt/solr-5.5.0
solr端口:8983
zookeeper端口:9983
5台机器,每台机器上安装的都有solr和zookeeperzookeeper启动:/opt/zookeeper-3.4.6/bin/zkServer.sh startzookeeper停止:/opt/zookeeper-3.4.6/bin/zkServer.sh stopzookeeper状态:/opt/zookeeper-3.4.6/bin/zkServer.sh statussolr启动:/opt/solr-5.5.0/bin/solr startsolr停止:/opt/solr-5.5.0/bin/solr stopsolr状态:/opt/solr-5.5.0/bin/solr statussolr访问:http://10.1.202.67:8983/solr/ http://10.1.202.68:8983/solr/ http://10.1.202.69:8983/solr/ http://10.1.202.70:8983/solr/ http://10.1.202.71:8983/solr/
复制IK分词到solr中
注意:IK分词不要用之前IKAnalyzer2012FF_u1.jar的版本,以前版本不支持solr5.0以上,需要用IKAnalyzer2012FF_u2.jar,或者在github上下载源码,然后自己编译,github连接如下:
https://github.com/EugenePig/ik-analyzer-solr5/blob/master/README.md
本人用的是自己编译的ik分词包Ik-analyzer-solr5-5.x.jar
scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.67:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib
scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.68:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib
scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.69:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib
scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.70:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib
scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.71:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib3.managed-schema修改
solr5.5.0版本已经没有schema.xml的文件,替代用的是managed-schema,在solr-5.5.0/server/solr/configsets/目录下,在此目录下复制sample_techproducts_configs文件夹,命令如下:
cp sample_techproducts_configs poc_configs
编辑poc_configs/conf/managed-schema 增加ik分词字段类型和ik分词字段
vim managed-schema
<fields> <field name="title_ik" type="text_general" indexed="true" stored="true"/> <field name="content_ik" type="text_ik" indexed="true" stored="false"/> <field name="content_outline" type="text_general" indexed="false" stored="true"/> </fields> <fieldType name="text_ik" class="solr.TextField"> <analyzer type="index" useSmart="false" class="org.wltea.analyzer.lucene.IKAnalyzer"/> <analyzer type="query" useSmart="true" class="org.wltea.analyzer.lucene.IKAnalyzer"/> </fieldType>
注:content_ik字段太长,不能够存储文件,只能生产索引,也没有必要存储原文,导致数据量加大,搜索可能会受影响,内容的前50个字段截取出来存储到content_outline,方便查看内容的大致内容
重启solr服务器
./solr_stop_all.sh
#!/bin/bashPATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:~/binslaves="dn-1 dn-2 dn-3 dn-4 dn-5"cmd="/opt/solr-5.5.0/bin/solr stop"for slave in $slavesdo echo $slave ssh $slave $cmddone
./solr_start_all.sh
#!/bin/bashPATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:~/binslaves="dn-1 dn-2 dn-3 dn-4 dn-5"cmd="/opt/solr-5.5.0/bin/solr start"for slave in $slavesdo echo $slave ssh $slave $cmddone
分词测试
使用text_ik分词:
- 创建collection
编辑完成后可以创建collection,命令如下:
/opt/solr-5.5.0/bin/solr create -c collection1 -d /opt/solr-5.5.0/server/solr/configsets/poc_configs/conf -shards 5 -replicationFactor 2
创建完成后在访问页面出现一下内容:
配置文件并不会生成到solr目录下,而是增加到zookeeper上
连接zookeeper
/opt/zookeeper-3.4.6/bin/zkCli.sh -server 10.1.202.67:9983
当创建失败collection时,可以通过命令删除,命令如下:
/opt/solr-5.5.0/bin/solr delete -c collection1 -deleteConfig true
注:-deleteConfig true是删除zookeeper上的配置文件,防止下次创建时直接用此配置项或者报错
如果还是不行,那说明zookeeper上的配置文件没有删除,直接登录zookeeper,通过rmr /configs/collection1命令删除配置项。编写solrCloud通过mapreduce读取hbase的字段生成索引
主方法:
public class SolrHBaseMoreIndexer { public static Logger logger = LoggerFactory.getLogger(SolrHBaseMoreIndexer.class); private static void hadoopRun(String[] args){ String tbName = ConfigProperties.getHBASE_TABLE_NAME(); try { Job job = new Job(ConfigProperties.getConf(), "SolrHBaseMoreIndexer"); job.setJarByClass(SolrHBaseMoreIndexer.class); Scan scan = new Scan(); //开始和结束并不是ID,而是hbase的rowkey,rowkey是通过数字排序,而是通过字符串进行排序,所以109在1000的后面,即1 。。。1000 。。。109 scan.setStartRow(Bytes.toBytes("1")); scan.setStopRow(Bytes.toBytes("109")); for(String tbFamily:ConfigProperties.getHBASE_TABLE_FAMILY().split(",")){ scan.addFamily(Bytes.toBytes(tbFamily)); logger.info("tbName:"+tbName+",tbFamily:"+tbFamily); } scan.setCaching(500); // 设置缓存数据量来提高效率 scan.setCacheBlocks(false); // 创建Map任务 TableMapReduceUtil.initTableMapperJob(tbName, scan, SolrHBaseMoreIndexerMapper.class, null, null, job); // 不需要输出 job.setOutputFormatClass(NullOutputFormat.class); // job.setNumReduceTasks(0); System.exit(job.waitForCompletion(true) ? 0 : 1); } catch (Exception e) { logger.error("hadoopRun异常", e); } } public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException, URISyntaxException { SolrHBaseMoreIndexer.hadoopRun(args); }}
mapper方法:
public class SolrHBaseMoreIndexerMapper extends TableMapper<Text, Text> { CloudSolrClient cloudSolrServer; @Override protected void setup(Context context) throws IOException, InterruptedException { cloudSolrServer=SolrServerFactory.getCloudSolrClient(); } @Override protected void cleanup(Context context ) throws IOException, InterruptedException { try { cloudSolrServer.commit(true, true, true); cloudSolrServer.close(); } catch (SolrServerException e) { // TODO Auto-generated catch block e.printStackTrace(); } } public static Logger logger = LoggerFactory.getLogger(SolrHBaseMoreIndexerMapper.class); public void map(ImmutableBytesWritable key, Result hbaseResult, Context context) throws InterruptedException, IOException { SolrInputDocument solrDoc = new SolrInputDocument(); try { solrDoc.addField("id", new String(hbaseResult.getRow())); logger.info("id:"+new String(hbaseResult.getRow())); for (KeyValue rowQualifierAndValue : hbaseResult.list()) { String fieldName = new String(rowQualifierAndValue.getQualifier()); String family = new String(rowQualifierAndValue.getFamily()); String fieldValue = new String(rowQualifierAndValue.getValue()); if(family.equals("content")){ solrDoc.addField("content_outline",fieldValue.length()>50?fieldValue.substring(0, 50)+"...":fieldValue); } for(String tbFamily:ConfigProperties.getHBASE_TABLE_FAMILY().split(",")){ if(family.equals(tbFamily))solrDoc.addField(tbFamily+"_ik", fieldValue); } } //1分钟提交一次,防止每次提交影响效率 cloudSolrServer.add(null,solrDoc,60000); } catch (SolrServerException e) { logger.error("更新Solr索引异常:" + new String(hbaseResult.getRow()),e); } }}
配置文件读取类:
public class ConfigProperties { public static Logger logger = LoggerFactory.getLogger(ConfigProperties.class); private static Properties props; private static String HBASE_ZOOKEEPER_QUORUM; private static String HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT; private static String HBASE_MASTER; private static String HBASE_ROOTDIR; private static String DFS_NAME_DIR; private static String DFS_DATA_DIR; private static String FS_DEFAULT_NAME; private static String HBASE_TABLE_NAME; // 需要建立Solr索引的HBase表名称 private static String HBASE_TABLE_FAMILY; // HBase表的列族 private static String QUERY_FIELD; private static String SOLR_ZOOKEEPER; private static String SOLRCLOUD_SERVER1; private static String SOLRCLOUD_SERVER2; private static String SOLRCLOUD_SERVER3; private static String SOLRCLOUD_SERVER4; private static String SOLRCLOUD_SERVER5; private static String wordsFilePath; private static String querySeparator; private static String COLLECTION; private static boolean isQueryContent; private static Configuration conf; /** * 从配置文件读取并设置HBase配置信息 * * @param propsLocation * @return */ static { props = new Properties(); try { InputStream in = ConfigProperties.class.getClassLoader().getResourceAsStream("config.properties"); props.load(new InputStreamReader(in,"UTF-8")); HBASE_ZOOKEEPER_QUORUM = props.getProperty("HBASE_ZOOKEEPER_QUORUM"); HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT = props.getProperty("HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT"); HBASE_MASTER = props.getProperty("HBASE_MASTER"); HBASE_ROOTDIR = props.getProperty("HBASE_ROOTDIR"); DFS_NAME_DIR = props.getProperty("DFS_NAME_DIR"); DFS_DATA_DIR = props.getProperty("DFS_DATA_DIR"); FS_DEFAULT_NAME = props.getProperty("FS_DEFAULT_NAME"); HBASE_TABLE_NAME = props.getProperty("HBASE_TABLE_NAME"); HBASE_TABLE_FAMILY = props.getProperty("HBASE_TABLE_FAMILY"); QUERY_FIELD = props.getProperty("QUERY_FIELD"); SOLR_ZOOKEEPER = props.getProperty("SOLR_ZOOKEEPER"); SOLRCLOUD_SERVER1= props.getProperty("SOLRCLOUD_SERVER1"); SOLRCLOUD_SERVER2= props.getProperty("SOLRCLOUD_SERVER2"); SOLRCLOUD_SERVER3= props.getProperty("SOLRCLOUD_SERVER3"); SOLRCLOUD_SERVER4= props.getProperty("SOLRCLOUD_SERVER4"); SOLRCLOUD_SERVER5= props.getProperty("SOLRCLOUD_SERVER5"); wordsFilePath= props.getProperty("wordsFilePath"); querySeparator= props.getProperty("querySeparator"); isQueryContent=Boolean.parseBoolean(props.getProperty("isQueryContent","false")); COLLECTION= props.getProperty("COLLECTION"); conf = HBaseConfiguration.create(); conf.set("hbase.zookeeper.quorum", HBASE_ZOOKEEPER_QUORUM); conf.set("hbase.zookeeper.property.clientPort",HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT); conf.set("hbase.master", HBASE_MASTER); conf.set("hbase.rootdir", HBASE_ROOTDIR); conf.set("mapreduce.job.user.classpath.first","true"); conf.set("mapreduce.task.classpath.user.precedence","true"); } catch (IOException e) { logger.error("加载配置文件出错",e); } catch (NullPointerException e) { logger.error("加载文件出错",e); }catch (Exception e) { logger.error("加载配置文件出现位置异常",e); } } public static Logger getLogger() { return logger; } public static Properties getProps() { return props; } public static String getHBASE_ZOOKEEPER_QUORUM() { return HBASE_ZOOKEEPER_QUORUM; } public static String getHBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT() { return HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT; } public static String getHBASE_MASTER() { return HBASE_MASTER; } public static String getHBASE_ROOTDIR() { return HBASE_ROOTDIR; } public static String getDFS_NAME_DIR() { return DFS_NAME_DIR; } public static String getDFS_DATA_DIR() { return DFS_DATA_DIR; } public static String getFS_DEFAULT_NAME() { return FS_DEFAULT_NAME; } public static String getHBASE_TABLE_NAME() { return HBASE_TABLE_NAME; } public static String getHBASE_TABLE_FAMILY() { return HBASE_TABLE_FAMILY; } public static String getQUERY_FIELD() { return QUERY_FIELD; } public static String getSOLR_ZOOKEEPER() { return SOLR_ZOOKEEPER; } public static Configuration getConf() { return conf; } public static String getSOLRCLOUD_SERVER1() { return SOLRCLOUD_SERVER1; } public static void setSOLRCLOUD_SERVER1(String sOLRCLOUD_SERVER1) { SOLRCLOUD_SERVER1 = sOLRCLOUD_SERVER1; } public static String getSOLRCLOUD_SERVER2() { return SOLRCLOUD_SERVER2; } public static void setSOLRCLOUD_SERVER2(String sOLRCLOUD_SERVER2) { SOLRCLOUD_SERVER2 = sOLRCLOUD_SERVER2; } public static String getSOLRCLOUD_SERVER3() { return SOLRCLOUD_SERVER3; } public static void setSOLRCLOUD_SERVER3(String sOLRCLOUD_SERVER3) { SOLRCLOUD_SERVER3 = sOLRCLOUD_SERVER3; } public static String getSOLRCLOUD_SERVER4() { return SOLRCLOUD_SERVER4; } public static void setSOLRCLOUD_SERVER4(String sOLRCLOUD_SERVER4) { SOLRCLOUD_SERVER4 = sOLRCLOUD_SERVER4; } public static String getSOLRCLOUD_SERVER5() { return SOLRCLOUD_SERVER5; } public static void setSOLRCLOUD_SERVER5(String sOLRCLOUD_SERVER5) { SOLRCLOUD_SERVER5 = sOLRCLOUD_SERVER5; } public static String getCOLLECTION() { return COLLECTION; } public static void setCOLLECTION(String cOLLECTION) { COLLECTION = cOLLECTION; } public static String getWordsFilePath() { return wordsFilePath; } public static String getQuerySeparator() { return querySeparator; } public static void setQuerySeparator(String querySeparator) { ConfigProperties.querySeparator = querySeparator; } public static boolean getIsQueryContent() { return isQueryContent; }}
config.properties配置文件:
HBASE_ZOOKEEPER_QUORUM=10.1.202.67,10.1.202.68,10.1.202.69HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT=2181HBASE_MASTER=10.1.202.67:16000,10.1.202.68:16000HBASE_ROOTDIR=hdfs://ocdpCluster/apps/hbase/dataDFS_NAME_DIR=/hadoop/hdfs/namenodeDFS_DATA_DIR=/data1/hadoop/hdfs/data,/data2/hadoop/hdfs/data,/data3/hadoop/hdfs/data,/data4/hadoop/hdfs/data,/data5/hadoop/hdfs/data,/data6/hadoop/hdfs/data,/data7/hadoop/hdfs/dataFS_DEFAULT_NAME=hdfs://ocdpClusterHBASE_TABLE_NAME=td_poc_dynamic_infoHBASE_TABLE_FAMILY=title,contentQUERY_FIELD=content_ik:公司SOLR_ZOOKEEPER=10.1.202.67:9983,10.1.202.68:9983,10.1.202.69:9983,10.1.202.70:9983,10.1.202.71:9983SOLRCLOUD_SERVER1=http://10.1.202.67:8983/solr/SOLRCLOUD_SERVER2=http://10.1.202.68:8983/solr/SOLRCLOUD_SERVER3=http://10.1.202.69:8983/solr/SOLRCLOUD_SERVER4=http://10.1.202.70:8983/solr/SOLRCLOUD_SERVER5=http://10.1.202.71:8983/solr/COLLECTION=collection1wordsFilePath=/usr/local/pocProject/queryProject2/querywords.txt
pom依赖包
<dependencies> <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client --> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>1.1.2</version> </dependency> <dependency> <version>1.6.6</version> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </dependency> <dependency> <groupId>org.apache.solr</groupId> <artifactId>solr-solrj</artifactId> <version>5.1.0</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-server --> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-server</artifactId> <version>1.1.2</version> </dependency> </dependencies>
创建solrcloud连接:
public class SolrServerFactory { public static Logger logger = LoggerFactory.getLogger(SolrServerFactory.class); private static CloudSolrClient cloudSolrServer; public static synchronized CloudSolrClient getCloudSolrClient(){ if(cloudSolrServer==null){ logger.info("cloudSolrServer怎么还是空"); createCloudSolrClient(); } return cloudSolrServer; } private static void createCloudSolrClient(){ ModifiableSolrParams params = new ModifiableSolrParams(); params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 100);//10 params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 20);//5 HttpClient httpClient = HttpClientUtil.createClient(params); LBHttpSolrClient lbHttpSolrClient = new LBHttpSolrClient(httpClient, ConfigProperties.getSOLRCLOUD_SERVER1(), ConfigProperties.getSOLRCLOUD_SERVER2(),ConfigProperties.getSOLRCLOUD_SERVER3(), ConfigProperties.getSOLRCLOUD_SERVER4(),ConfigProperties.getSOLRCLOUD_SERVER5()); cloudSolrServer = new CloudSolrClient(ConfigProperties.getSOLR_ZOOKEEPER(),lbHttpSolrClient); cloudSolrServer.setDefaultCollection(ConfigProperties.getCOLLECTION());// cloudSolrServer.setZkClientTimeout(SearchConfig.getZookeeperClientTimeout());// cloudSolrServer.setZkConnectTimeout(SearchConfig.getZookeeperConnectTimeout()); }}
hbase连接
public class HbaseConnectionFactory { private static Connection connection = null; public static synchronized Connection getHTable(){ if(connection ==null){ try { connection = ConnectionFactory.createConnection(ConfigProperties.getConf()); } catch (IOException e) { e.printStackTrace(); } } return connection; } public static Connection getConnection() { return connection; }}查询代码如下:
public class QueryData {
public static Logger logger = LoggerFactory.getLogger(ConfigProperties.class);
/**
* @param args
* @throws SolrServerException
* @throws IOException
*/
public static void main(String[] args) throws SolrServerException, IOException {
CloudSolrClient cloudSolrServer=SolrServerFactory.getCloudSolrClient(); SolrQuery query = new SolrQuery(new String(ConfigProperties.getQUERY_FIELD())); query.setStart(0); //数据起始行,分页用 query.setRows(10); //返回记录数,分页用 QueryResponse response = cloudSolrServer.query(query); SolrDocumentList docs = response.getResults(); System.out.println("文档个数:" + docs.getNumFound()); //数据总条数也可轻易获取 System.out.println("查询时间:" + response.getQTime()); cloudSolrServer.close(); HTable table = new HTable(ConfigProperties.getConf(), ConfigProperties.getHBASE_TABLE_NAME()); Get get = null; List<Get> list = new ArrayList<Get>(); for (SolrDocument doc : docs) { logger.info("查询出ID为:"+(String) doc.getFieldValue("id")); get = new Get(Bytes.toBytes((String) doc.getFieldValue("id"))); list.add(get); } Result[] res = table.get(list); logger.info("查询出数据个数:"+res.length); byte[] titleBt = null; byte[] contentBt = null; String title = null; String content = null; for (Result rs : res) { if(rs.getRow()==null){ return; } titleBt = rs.getValue("title".getBytes(), "".getBytes()); contentBt = rs.getValue("create_date".getBytes(), "".getBytes()); if (titleBt != null && titleBt.length>0) {title = new String(titleBt);} else {title = "无数据";} //对空值进行new String的话会抛出异常 if (contentBt != null && contentBt.length>0) {content = new String(contentBt);} else {content = "无数据";} logger.info("id:"+new String(rs.getRow())); logger.info("title"+title + "|"); logger.info("content"+content + "|"); } table.close();}
}
最好在运行时把hdfs-site.xml和hbase-site.xml放入配置文件中
- solr和hbase结合进行索引搜索
- Hbase solr 二级索引
- Solr Multivalue field的索引和搜索
- 【Lucene&&Solr】Lucene索引和搜索流程
- solr安装部署、solr测试创建core、用solrj 访问solr(索引和搜索)
- HBase + Solr Cloud实现HBase二级索引
- HBase + Solr Cloud实现HBase二级索引
- 使用 Apache Lucene 和 Solr 进行位置感知搜索
- 使用 Apache Lucene 和 Solr 进行位置感知搜索
- 使用 Apache Lucene 和 Solr 进行位置感知搜索
- 使用 Apache Lucene 和 Solr 进行位置感知搜索
- 基于Lucene多索引进行索引和搜索
- 基于Lucene多索引进行索引和搜索
- 基于solr实现hbase的二级索引
- 利用Solr建立HBase的二级索引
- hbase基于solr配置二级索引
- hbase基于solr的实时索引
- 基于Solr的Hbase二级索引
- EXT4 form 表单内控件输入无效时不出现提示的原因
- scp命令的用法详解
- 语音提示
- labview实现字符串转16进制
- Flash SWF 包含一个声音项目的多个副本的解决办法
- solr和hbase结合进行索引搜索
- 乱码问题
- logback.xml配置详解
- Chrome开发者工具不完全指南:(三、性能篇)
- Android studio finished with non-zero exit value 2
- 深入理解abstract class和interface
- Springmvc 使用 AbstractExcelView 导出excel
- Android开发中调用系统设置界面
- 在outlook中设置每天固定的自动提醒