solr和hbase结合进行索引搜索

来源:互联网 发布:javascript正则表达式 编辑:程序博客网 时间:2024/05/07 17:05
  1. solrcloud集群情况
    solrcloud集群已经安装完成。
    solr版本:5.5.0,zookeeper版本:3.4.6
    solr的操作用户、密码: solr/solr123
    solr使用的zookeeper安装位置:/opt/zookeeper-3.4.6
    solr安装位置:/opt/solr-5.5.0
    solr端口:8983
    zookeeper端口:9983
    5台机器,每台机器上安装的都有solr和zookeeper

    zookeeper启动:/opt/zookeeper-3.4.6/bin/zkServer.sh startzookeeper停止:/opt/zookeeper-3.4.6/bin/zkServer.sh stopzookeeper状态:/opt/zookeeper-3.4.6/bin/zkServer.sh statussolr启动:/opt/solr-5.5.0/bin/solr startsolr停止:/opt/solr-5.5.0/bin/solr stopsolr状态:/opt/solr-5.5.0/bin/solr statussolr访问:http://10.1.202.67:8983/solr/                 http://10.1.202.68:8983/solr/                 http://10.1.202.69:8983/solr/                 http://10.1.202.70:8983/solr/                 http://10.1.202.71:8983/solr/ 
  2. 复制IK分词到solr中
    注意:IK分词不要用之前IKAnalyzer2012FF_u1.jar的版本,以前版本不支持solr5.0以上,需要用IKAnalyzer2012FF_u2.jar,或者在github上下载源码,然后自己编译,github连接如下:
    https://github.com/EugenePig/ik-analyzer-solr5/blob/master/README.md
    本人用的是自己编译的ik分词包Ik-analyzer-solr5-5.x.jar
    scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.67:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib
    scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.68:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib
    scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.69:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib
    scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.70:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib
    scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.71:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib

    3.managed-schema修改
    solr5.5.0版本已经没有schema.xml的文件,替代用的是managed-schema,在solr-5.5.0/server/solr/configsets/目录下,在此目录下复制sample_techproducts_configs文件夹,命令如下:
    cp sample_techproducts_configs poc_configs
    编辑poc_configs/conf/managed-schema 增加ik分词字段类型和ik分词字段
    vim managed-schema

<fields>        <field name="title_ik" type="text_general" indexed="true" stored="true"/>        <field name="content_ik" type="text_ik" indexed="true" stored="false"/>        <field name="content_outline" type="text_general" indexed="false" stored="true"/>    </fields>    <fieldType name="text_ik" class="solr.TextField">        <analyzer type="index" useSmart="false" class="org.wltea.analyzer.lucene.IKAnalyzer"/>        <analyzer type="query" useSmart="true" class="org.wltea.analyzer.lucene.IKAnalyzer"/>    </fieldType>

注:content_ik字段太长,不能够存储文件,只能生产索引,也没有必要存储原文,导致数据量加大,搜索可能会受影响,内容的前50个字段截取出来存储到content_outline,方便查看内容的大致内容

重启solr服务器
./solr_stop_all.sh

#!/bin/bashPATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:~/binslaves="dn-1 dn-2 dn-3 dn-4 dn-5"cmd="/opt/solr-5.5.0/bin/solr stop"for slave in $slavesdo    echo $slave    ssh $slave $cmddone

./solr_start_all.sh

#!/bin/bashPATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:~/binslaves="dn-1 dn-2 dn-3 dn-4 dn-5"cmd="/opt/solr-5.5.0/bin/solr start"for slave in $slavesdo    echo $slave    ssh $slave $cmddone

分词测试
使用text_ik分词:
这里写图片描述

  1. 创建collection
    编辑完成后可以创建collection,命令如下:
    /opt/solr-5.5.0/bin/solr create -c collection1 -d /opt/solr-5.5.0/server/solr/configsets/poc_configs/conf -shards 5 -replicationFactor 2
    创建完成后在访问页面出现一下内容:
    这里写图片描述

配置文件并不会生成到solr目录下,而是增加到zookeeper上
连接zookeeper
/opt/zookeeper-3.4.6/bin/zkCli.sh -server 10.1.202.67:9983
这里写图片描述

  1. 当创建失败collection时,可以通过命令删除,命令如下:
    /opt/solr-5.5.0/bin/solr delete -c collection1 -deleteConfig true
    注:-deleteConfig true是删除zookeeper上的配置文件,防止下次创建时直接用此配置项或者报错
    如果还是不行,那说明zookeeper上的配置文件没有删除,直接登录zookeeper,通过rmr /configs/collection1命令删除配置项。

  2. 编写solrCloud通过mapreduce读取hbase的字段生成索引
    主方法:

public class SolrHBaseMoreIndexer {    public static  Logger logger = LoggerFactory.getLogger(SolrHBaseMoreIndexer.class);    private static void hadoopRun(String[] args){        String tbName = ConfigProperties.getHBASE_TABLE_NAME();        try {            Job job = new Job(ConfigProperties.getConf(), "SolrHBaseMoreIndexer");            job.setJarByClass(SolrHBaseMoreIndexer.class);            Scan scan = new Scan();            //开始和结束并不是ID,而是hbase的rowkey,rowkey是通过数字排序,而是通过字符串进行排序,所以109在1000的后面,即1 。。。1000 。。。109            scan.setStartRow(Bytes.toBytes("1"));            scan.setStopRow(Bytes.toBytes("109"));            for(String tbFamily:ConfigProperties.getHBASE_TABLE_FAMILY().split(",")){                 scan.addFamily(Bytes.toBytes(tbFamily));                 logger.info("tbName:"+tbName+",tbFamily:"+tbFamily);            }            scan.setCaching(500); // 设置缓存数据量来提高效率            scan.setCacheBlocks(false);            // 创建Map任务            TableMapReduceUtil.initTableMapperJob(tbName, scan,                    SolrHBaseMoreIndexerMapper.class, null, null, job);             // 不需要输出            job.setOutputFormatClass(NullOutputFormat.class);            // job.setNumReduceTasks(0);            System.exit(job.waitForCompletion(true) ? 0 : 1);        } catch (Exception e) {            logger.error("hadoopRun异常", e);        }     }    public static void main(String[] args) throws IOException,            InterruptedException, ClassNotFoundException, URISyntaxException {        SolrHBaseMoreIndexer.hadoopRun(args);    }}

mapper方法:

public class SolrHBaseMoreIndexerMapper extends TableMapper<Text, Text> {     CloudSolrClient cloudSolrServer;     @Override     protected void setup(Context context)             throws IOException, InterruptedException {         cloudSolrServer=SolrServerFactory.getCloudSolrClient();     }     @Override     protected void cleanup(Context context             ) throws IOException, InterruptedException {         try {            cloudSolrServer.commit(true, true, true);            cloudSolrServer.close();        } catch (SolrServerException e) {            // TODO Auto-generated catch block            e.printStackTrace();        }     }    public static  Logger logger = LoggerFactory.getLogger(SolrHBaseMoreIndexerMapper.class);    public void map(ImmutableBytesWritable key, Result hbaseResult,            Context context) throws InterruptedException, IOException {        SolrInputDocument solrDoc = new SolrInputDocument();        try {            solrDoc.addField("id", new String(hbaseResult.getRow()));            logger.info("id:"+new String(hbaseResult.getRow()));            for (KeyValue rowQualifierAndValue : hbaseResult.list()) {                String fieldName = new String(rowQualifierAndValue.getQualifier());                String family = new String(rowQualifierAndValue.getFamily());                String fieldValue = new String(rowQualifierAndValue.getValue());                if(family.equals("content")){                    solrDoc.addField("content_outline",fieldValue.length()>50?fieldValue.substring(0, 50)+"...":fieldValue);                }                for(String tbFamily:ConfigProperties.getHBASE_TABLE_FAMILY().split(",")){                    if(family.equals(tbFamily))solrDoc.addField(tbFamily+"_ik", fieldValue);                }            }            //1分钟提交一次,防止每次提交影响效率            cloudSolrServer.add(null,solrDoc,60000);        } catch (SolrServerException e) {            logger.error("更新Solr索引异常:" + new String(hbaseResult.getRow()),e);        }    }}

配置文件读取类:

public class ConfigProperties {    public static  Logger logger = LoggerFactory.getLogger(ConfigProperties.class);    private static Properties props;    private static String HBASE_ZOOKEEPER_QUORUM;    private static String HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT;    private static String HBASE_MASTER;    private static String HBASE_ROOTDIR;    private static String DFS_NAME_DIR;    private static String DFS_DATA_DIR;    private static String FS_DEFAULT_NAME;    private static String HBASE_TABLE_NAME; // 需要建立Solr索引的HBase表名称    private static String HBASE_TABLE_FAMILY; // HBase表的列族    private static String QUERY_FIELD;    private static String SOLR_ZOOKEEPER;    private static String SOLRCLOUD_SERVER1;    private static String SOLRCLOUD_SERVER2;    private static String SOLRCLOUD_SERVER3;    private static String SOLRCLOUD_SERVER4;    private static String SOLRCLOUD_SERVER5;    private static String wordsFilePath;    private static String querySeparator;    private static String COLLECTION;    private static boolean isQueryContent;    private static Configuration conf;    /**     * 从配置文件读取并设置HBase配置信息     *      * @param propsLocation     * @return     */    static {        props = new Properties();        try {            InputStream in = ConfigProperties.class.getClassLoader().getResourceAsStream("config.properties");              props.load(new InputStreamReader(in,"UTF-8"));            HBASE_ZOOKEEPER_QUORUM = props.getProperty("HBASE_ZOOKEEPER_QUORUM");            HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT = props.getProperty("HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT");            HBASE_MASTER = props.getProperty("HBASE_MASTER");            HBASE_ROOTDIR = props.getProperty("HBASE_ROOTDIR");            DFS_NAME_DIR = props.getProperty("DFS_NAME_DIR");            DFS_DATA_DIR = props.getProperty("DFS_DATA_DIR");            FS_DEFAULT_NAME = props.getProperty("FS_DEFAULT_NAME");            HBASE_TABLE_NAME = props.getProperty("HBASE_TABLE_NAME");            HBASE_TABLE_FAMILY = props.getProperty("HBASE_TABLE_FAMILY");            QUERY_FIELD = props.getProperty("QUERY_FIELD");            SOLR_ZOOKEEPER = props.getProperty("SOLR_ZOOKEEPER");            SOLRCLOUD_SERVER1= props.getProperty("SOLRCLOUD_SERVER1");            SOLRCLOUD_SERVER2= props.getProperty("SOLRCLOUD_SERVER2");            SOLRCLOUD_SERVER3= props.getProperty("SOLRCLOUD_SERVER3");            SOLRCLOUD_SERVER4= props.getProperty("SOLRCLOUD_SERVER4");            SOLRCLOUD_SERVER5= props.getProperty("SOLRCLOUD_SERVER5");            wordsFilePath= props.getProperty("wordsFilePath");            querySeparator= props.getProperty("querySeparator");            isQueryContent=Boolean.parseBoolean(props.getProperty("isQueryContent","false"));            COLLECTION= props.getProperty("COLLECTION");            conf = HBaseConfiguration.create();            conf.set("hbase.zookeeper.quorum", HBASE_ZOOKEEPER_QUORUM);            conf.set("hbase.zookeeper.property.clientPort",HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT);            conf.set("hbase.master", HBASE_MASTER);            conf.set("hbase.rootdir", HBASE_ROOTDIR);            conf.set("mapreduce.job.user.classpath.first","true");            conf.set("mapreduce.task.classpath.user.precedence","true");        } catch (IOException e) {            logger.error("加载配置文件出错",e);        } catch (NullPointerException e) {            logger.error("加载文件出错",e);        }catch (Exception e) {            logger.error("加载配置文件出现位置异常",e);        }    }    public static Logger getLogger() {        return logger;    }    public static Properties getProps() {        return props;    }    public static String getHBASE_ZOOKEEPER_QUORUM() {        return HBASE_ZOOKEEPER_QUORUM;    }    public static String getHBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT() {        return HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT;    }    public static String getHBASE_MASTER() {        return HBASE_MASTER;    }    public static String getHBASE_ROOTDIR() {        return HBASE_ROOTDIR;    }    public static String getDFS_NAME_DIR() {        return DFS_NAME_DIR;    }    public static String getDFS_DATA_DIR() {        return DFS_DATA_DIR;    }    public static String getFS_DEFAULT_NAME() {        return FS_DEFAULT_NAME;    }    public static String getHBASE_TABLE_NAME() {        return HBASE_TABLE_NAME;    }    public static String getHBASE_TABLE_FAMILY() {        return HBASE_TABLE_FAMILY;    }    public static String getQUERY_FIELD() {        return QUERY_FIELD;    }    public static String getSOLR_ZOOKEEPER() {        return SOLR_ZOOKEEPER;    }    public static Configuration getConf() {        return conf;    }    public static String getSOLRCLOUD_SERVER1() {        return SOLRCLOUD_SERVER1;    }    public static void setSOLRCLOUD_SERVER1(String sOLRCLOUD_SERVER1) {        SOLRCLOUD_SERVER1 = sOLRCLOUD_SERVER1;    }    public static String getSOLRCLOUD_SERVER2() {        return SOLRCLOUD_SERVER2;    }    public static void setSOLRCLOUD_SERVER2(String sOLRCLOUD_SERVER2) {        SOLRCLOUD_SERVER2 = sOLRCLOUD_SERVER2;    }    public static String getSOLRCLOUD_SERVER3() {        return SOLRCLOUD_SERVER3;    }    public static void setSOLRCLOUD_SERVER3(String sOLRCLOUD_SERVER3) {        SOLRCLOUD_SERVER3 = sOLRCLOUD_SERVER3;    }    public static String getSOLRCLOUD_SERVER4() {        return SOLRCLOUD_SERVER4;    }    public static void setSOLRCLOUD_SERVER4(String sOLRCLOUD_SERVER4) {        SOLRCLOUD_SERVER4 = sOLRCLOUD_SERVER4;    }    public static String getSOLRCLOUD_SERVER5() {        return SOLRCLOUD_SERVER5;    }    public static void setSOLRCLOUD_SERVER5(String sOLRCLOUD_SERVER5) {        SOLRCLOUD_SERVER5 = sOLRCLOUD_SERVER5;    }    public static String getCOLLECTION() {        return COLLECTION;    }    public static void setCOLLECTION(String cOLLECTION) {        COLLECTION = cOLLECTION;    }    public static String getWordsFilePath() {        return wordsFilePath;    }    public static String getQuerySeparator() {        return querySeparator;    }    public static void setQuerySeparator(String querySeparator) {        ConfigProperties.querySeparator = querySeparator;    }    public static boolean getIsQueryContent() {        return isQueryContent;    }}

config.properties配置文件:

HBASE_ZOOKEEPER_QUORUM=10.1.202.67,10.1.202.68,10.1.202.69HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT=2181HBASE_MASTER=10.1.202.67:16000,10.1.202.68:16000HBASE_ROOTDIR=hdfs://ocdpCluster/apps/hbase/dataDFS_NAME_DIR=/hadoop/hdfs/namenodeDFS_DATA_DIR=/data1/hadoop/hdfs/data,/data2/hadoop/hdfs/data,/data3/hadoop/hdfs/data,/data4/hadoop/hdfs/data,/data5/hadoop/hdfs/data,/data6/hadoop/hdfs/data,/data7/hadoop/hdfs/dataFS_DEFAULT_NAME=hdfs://ocdpClusterHBASE_TABLE_NAME=td_poc_dynamic_infoHBASE_TABLE_FAMILY=title,contentQUERY_FIELD=content_ik:公司SOLR_ZOOKEEPER=10.1.202.67:9983,10.1.202.68:9983,10.1.202.69:9983,10.1.202.70:9983,10.1.202.71:9983SOLRCLOUD_SERVER1=http://10.1.202.67:8983/solr/SOLRCLOUD_SERVER2=http://10.1.202.68:8983/solr/SOLRCLOUD_SERVER3=http://10.1.202.69:8983/solr/SOLRCLOUD_SERVER4=http://10.1.202.70:8983/solr/SOLRCLOUD_SERVER5=http://10.1.202.71:8983/solr/COLLECTION=collection1wordsFilePath=/usr/local/pocProject/queryProject2/querywords.txt

pom依赖包

<dependencies>    <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client -->    <dependency>        <groupId>org.apache.hbase</groupId>        <artifactId>hbase-client</artifactId>        <version>1.1.2</version>    </dependency>    <dependency>        <version>1.6.6</version>        <groupId>org.slf4j</groupId>        <artifactId>slf4j-log4j12</artifactId>    </dependency>    <dependency>        <groupId>org.apache.solr</groupId>        <artifactId>solr-solrj</artifactId>        <version>5.1.0</version>    </dependency>     <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-server -->    <dependency>        <groupId>org.apache.hbase</groupId>        <artifactId>hbase-server</artifactId>        <version>1.1.2</version>    </dependency>  </dependencies>

创建solrcloud连接:

public class SolrServerFactory {    public static  Logger logger = LoggerFactory.getLogger(SolrServerFactory.class);    private static CloudSolrClient cloudSolrServer;    public static synchronized CloudSolrClient getCloudSolrClient(){        if(cloudSolrServer==null){            logger.info("cloudSolrServer怎么还是空");            createCloudSolrClient();        }        return cloudSolrServer;    }    private static void createCloudSolrClient(){        ModifiableSolrParams params = new ModifiableSolrParams();        params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 100);//10        params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 20);//5        HttpClient httpClient = HttpClientUtil.createClient(params);        LBHttpSolrClient lbHttpSolrClient = new LBHttpSolrClient(httpClient, ConfigProperties.getSOLRCLOUD_SERVER1(),                ConfigProperties.getSOLRCLOUD_SERVER2(),ConfigProperties.getSOLRCLOUD_SERVER3(),                ConfigProperties.getSOLRCLOUD_SERVER4(),ConfigProperties.getSOLRCLOUD_SERVER5());        cloudSolrServer = new CloudSolrClient(ConfigProperties.getSOLR_ZOOKEEPER(),lbHttpSolrClient);        cloudSolrServer.setDefaultCollection(ConfigProperties.getCOLLECTION());//      cloudSolrServer.setZkClientTimeout(SearchConfig.getZookeeperClientTimeout());//      cloudSolrServer.setZkConnectTimeout(SearchConfig.getZookeeperConnectTimeout());    }}

hbase连接

public class HbaseConnectionFactory {    private static Connection connection = null;     public static synchronized Connection getHTable(){        if(connection ==null){            try {                connection = ConnectionFactory.createConnection(ConfigProperties.getConf());            } catch (IOException e) {                e.printStackTrace();            }        }        return connection;    }    public static Connection getConnection() {        return connection;    }}查询代码如下:

public class QueryData {
public static Logger logger = LoggerFactory.getLogger(ConfigProperties.class);
/**
* @param args
* @throws SolrServerException
* @throws IOException
*/
public static void main(String[] args) throws SolrServerException, IOException {

    CloudSolrClient cloudSolrServer=SolrServerFactory.getCloudSolrClient();    SolrQuery query = new SolrQuery(new String(ConfigProperties.getQUERY_FIELD()));    query.setStart(0); //数据起始行,分页用    query.setRows(10); //返回记录数,分页用    QueryResponse response = cloudSolrServer.query(query);    SolrDocumentList docs = response.getResults();    System.out.println("文档个数:" + docs.getNumFound()); //数据总条数也可轻易获取    System.out.println("查询时间:" + response.getQTime());    cloudSolrServer.close();    HTable table = new HTable(ConfigProperties.getConf(), ConfigProperties.getHBASE_TABLE_NAME());    Get get = null;    List<Get> list = new ArrayList<Get>();    for (SolrDocument doc : docs) {        logger.info("查询出ID为:"+(String) doc.getFieldValue("id"));        get = new Get(Bytes.toBytes((String) doc.getFieldValue("id")));        list.add(get);    }    Result[] res = table.get(list);    logger.info("查询出数据个数:"+res.length);    byte[] titleBt = null;    byte[] contentBt = null;    String title = null;    String content = null;    for (Result rs : res) {        if(rs.getRow()==null){            return;        }        titleBt = rs.getValue("title".getBytes(), "".getBytes());        contentBt = rs.getValue("create_date".getBytes(), "".getBytes());        if (titleBt != null && titleBt.length>0) {title = new String(titleBt);} else {title = "无数据";} //对空值进行new String的话会抛出异常        if (contentBt != null && contentBt.length>0) {content = new String(contentBt);} else {content = "无数据";}        logger.info("id:"+new String(rs.getRow()));        logger.info("title"+title + "|");        logger.info("content"+content + "|");    }    table.close();}

}

最好在运行时把hdfs-site.xml和hbase-site.xml放入配置文件中
这里写图片描述

0 1