solr-6.1.0源码分析---4

来源:互联网 发布:苹果系统办公软件 编辑:程序博客网 时间:2024/06/08 17:45

solr-6.1.0源码分析—创建索引

本章假设nutch客户端通过solrindex命令发送一个http请求,请求命令如下,

bin/nutch solrindex http://127.0.0.1/solr/test crawl/crawldb -linkdb crawl/linkdb -dir crawl/segments/ -filter -normalize

请求地址中的test为solr中的core名称,参数如下,

wt=javabin&version=2

该请求到达服务器后,会经过tomcat框架调用SolrDispatchFilter的doFilter函数,继而调用HttpSolrCall的call函数。

SolrDispatchFilter::doFilter->HttpSolrCall::call

  public Action call() throws IOException {    ...    init();    HttpServletResponse resp = response;    switch (action) {      case ADMIN:        ...      case REMOTEQUERY:        ...      case PROCESS:        final Method reqMethod = Method.getMethod(req.getMethod());        SolrQueryResponse solrRsp = new SolrQueryResponse();        execute(solrRsp);        QueryResponseWriter responseWriter = core.getQueryResponseWriter(solrReq);        writeResponse(solrRsp, responseWriter, reqMethod);        return RETURN;  }

init函数进行处理请求前的准备工作,getMethod函数获得请求方法,例如GET和POST。call函数创建SolrQueryResponse用于封装返回信息,execute函数调用对应的Handler处理请求,包含处理请求的主要逻辑。getQueryResponseWriter根据请求参数javabin获取BinaryResponseWriter,然后调用writeResponse函数将结果返回给客户端。

SolrDispatchFilter::doFilter->HttpSolrCall::call->init

  private void init() throws Exception {    corename = path.substring(1, idx);    core = cores.getCore(corename);    path = path.substring(idx);    config = core.getSolrConfig();    SolrRequestParsers parser = config.getRequestParsers();    extractHandlerFromURLPath(parser);    solrReq = parser.parse(core, path, req);    action = PROCESS;    return;  }

getCore函数获得在SolrDispatchFilter的init函数中创建的SolrCore,getSolrConfig获得solr的配置信息SolrConfig,继而调用getRequestParsers获得请求解析器SolrRequestParsers。extractHandlerFromURLPath函数根据请求路径获得处理类,这里的请求路径为/update,最终获得UpdateRequestHandler。最后调用SolrRequestParsers的parse函数创建SolrQueryRequest封装HttpServletRequest。

SolrDispatchFilter::doFilter->HttpSolrCall::call->execute

  protected void execute(SolrQueryResponse rsp) {    solrReq.getContext().put("webapp", req.getContextPath());    solrReq.getCore().execute(handler, solrReq, rsp);  }  public void execute(SolrRequestHandler handler, SolrQueryRequest req, SolrQueryResponse rsp) {    ...    handler.handleRequest(req,rsp);    ...  }

execute会进而调用SolrCore的execute函数,该函数的主要逻辑是调用对应请求路径的SolrRequestHandler的handleRequest函数处理请求。本章假设该SolrRequestHandler是UpdateRequestHandler,因此下面看其的handleRequest函数。

SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest

  public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) {    ...    handleRequestBody( req, rsp );    ...  }  public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {    SolrParams params = req.getParams();    UpdateRequestProcessorChain processorChain =        req.getCore().getUpdateProcessorChain(params);    UpdateRequestProcessor processor = processorChain.createProcessor(req, rsp);    ContentStreamLoader documentLoader = newLoader(req, processor);    Iterable<ContentStream> streams = req.getContentStreams();    for (ContentStream stream : streams) {      documentLoader.load(req, rsp, stream, processor);    }      }

handleRequest函数主要调用handleRequestBody函数处理请求。getUpdateProcessorChain获得在SolrCore构造函数中创建的UpdateRequestProcessorChain。createProcessor创建处理该请求的processor。newLoader函数返回ContentStreamLoader。最后调用ContentStreamLoader的load函数处理请求。

SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->UpdateRequestProcessorChain::createProcessor

  public UpdateRequestProcessor createProcessor(SolrQueryRequest req,                                                 SolrQueryResponse rsp)   {    UpdateRequestProcessor processor = null;    UpdateRequestProcessor last = null;    for (int i = chain.size() - 1; i >= 0; i--) {      UpdateRequestProcessorFactory factory = chain.get(i);      processor = factory.getInstance(req, rsp, last);      last = processor == null ? last : processor;    }    return last;  }

默认的chain链表中包含RunUpdateProcessorFactory、DistributedUpdateProcessorFactory和LogUpdateProcessorFactory。RunUpdateProcessorFactory的getInstance函数返回RunUpdateProcessor。DistributedUpdateProcessorFactory的getInstance函数返回DistributedUpdateProcessor。LogUpdateProcessorFactory的getInstance函数返回LogUpdateProcessorFactory。

SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->ContentStreamLoader::load

    public void load(SolrQueryRequest req, SolrQueryResponse rsp,        ContentStream stream, UpdateRequestProcessor processor) throws Exception {      type = stream.getContentType();      int idx = type.indexOf(';');      if (idx > 0) {        type = type.substring(0, idx);      }      loader = loaders.get(type);      loader.load(req, rsp, stream, processor);    }

根据数据类型获得对应的loader,其初始化代码如下所示,

  protected Map<String,ContentStreamLoader> createDefaultLoaders(NamedList args) {    SolrParams p = null;    if(args!=null) {      p = SolrParams.toSolrParams(args);    }    Map<String,ContentStreamLoader> registry = new HashMap<>();    registry.put("application/xml", new XMLLoader().init(p) );    registry.put("application/json", new JsonLoader().init(p) );    registry.put("application/csv", new CSVLoader().init(p) );    registry.put("application/javabin", new JavabinLoader().init(p) );    registry.put("text/csv", registry.get("application/csv") );    registry.put("text/xml", registry.get("application/xml") );    registry.put("text/json", registry.get("application/json"));    pathVsLoaders.put(JSON_PATH,registry.get("application/json"));    pathVsLoaders.put(DOC_PATH,registry.get("application/json"));    pathVsLoaders.put(CSV_PATH,registry.get("application/csv"));    return registry;  }

假设这里的类型为application/xml,则获得XMLLoader,继而调用其load函数处理请求。

SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->ContentStreamLoader::load->XMLLoader::load

  public void load(SolrQueryRequest req, SolrQueryResponse rsp, ContentStream stream, UpdateRequestProcessor processor) throws Exception {    final String charset = ContentStreamBase.getCharsetFromContentType(stream.getContentType());    InputStream is = = stream.getStream();    XMLStreamReader parser = inputFactory.createXMLStreamReader(is, charset);    this.processUpdate(req, processor, parser);  }

首先获得数据的编码类型和输入流,然后通过createXMLStreamReader函数创建XMLStreamReader,最后调用processUpdate函数处理请求。

SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->ContentStreamLoader::load->XMLLoader::load->processUpdate

  void processUpdate(SolrQueryRequest req, UpdateRequestProcessor processor, XMLStreamReader parser)          throws XMLStreamException, IOException, FactoryConfigurationError {    AddUpdateCommand addCmd = null;    SolrParams params = req.getParams();    while (true) {      int event = parser.next();      switch (event) {        case XMLStreamConstants.END_DOCUMENT:          parser.close();          return;        case XMLStreamConstants.START_ELEMENT:          String currTag = parser.getLocalName();          if (currTag.equals(UpdateRequestHandler.ADD)) {            addCmd = new AddUpdateCommand(req);            addCmd.commitWithin = params.getInt(UpdateParams.COMMIT_WITHIN, -1);            addCmd.overwrite = params.getBool(UpdateParams.OVERWRITE, true);            for (int i = 0; i < parser.getAttributeCount(); i++) {              String attrName = parser.getAttributeLocalName(i);              String attrVal = parser.getAttributeValue(i);              if (UpdateRequestHandler.OVERWRITE.equals(attrName)) {                addCmd.overwrite = StrUtils.parseBoolean(attrVal);              } else if (UpdateRequestHandler.COMMIT_WITHIN.equals(attrName)) {                addCmd.commitWithin = Integer.parseInt(attrVal);              }            }          } else if ("doc".equals(currTag)) {            if(addCmd != null) {              addCmd.clear();              addCmd.solrDoc = readDoc(parser);              processor.processAdd(addCmd);            }          } else if (UpdateRequestHandler.COMMIT.equals(currTag) || UpdateRequestHandler.OPTIMIZE.equals(currTag)) {            ...          }          else if (UpdateRequestHandler.ROLLBACK.equals(currTag)) {            ...          }          else if (UpdateRequestHandler.DELETE.equals(currTag)) {            ...          }          break;      }    }  }

processUpdate函数遍历请求的标签,这里假设为add标签,则创建AddUpdateCommand并从请求中添加相关参数。创建完AddUpdateCommand后,继续遍历请求,从中找到doc标签,将请求的文档输入解析成SolrInputDocument类型的文档并添加到AddUpdateCommand中,然后调用processor的processAdd函数将该文档添加到lucene索引中。下面假设processor链表依次为LogUpdateProcessor、DistributedUpdateProcessor和RunUpdateProcessor。其中RunUpdateProcessor的processAdd函数完成lucene索引的添加工作。

LogUpdateProcessor::processAdd

    public void processAdd(AddUpdateCommand cmd) throws IOException {      ...      if (next != null) next.processAdd(cmd);      ...    }

LogUpdateProcessor的processAdd函数并没有重要的工作,next为DistributedUpdateProcessor,processAdd函数会继而调用DistributedUpdateProcessor的processAdd函数。

DistributedUpdateProcessor::processAdd

  public void processAdd(AddUpdateCommand cmd) throws IOException {    ...    versionAdd(cmd);    ...  }  protected boolean versionAdd(AddUpdateCommand cmd) throws IOException {    ...    doLocalAdd(cmd);    ...  }

versionAdd的省略的部分是为文档添加版本,然后调用doLocalAdd继续处理,doLocalAdd会获取下一个RunUpdateProcessor并调用其processAdd函数。

RunUpdateProcessor::processAdd

  public void processAdd(AddUpdateCommand cmd) throws IOException {    updateHandler.addDoc(cmd);    super.processAdd(cmd);  }

updateHandler为DirectUpdateHandler2。addDoc函数用于向lucene索引中添加文档。
其父类的processAdd进而调用下一个Processor的processAdd函数,如果没有就直接返回。

RunUpdateProcessor::processAdd->DirectUpdateHandler2::addDoc

  public int addDoc(AddUpdateCommand cmd) throws IOException {    return addDoc0(cmd);  }  private int addDoc0(AddUpdateCommand cmd) throws IOException {    ...    doNormalUpdate(cmd);    ...  }

addDoc进而调用doNormalUpdate函数添加文档。

RunUpdateProcessor::processAdd->DirectUpdateHandler2::addDoc->doNormalUpdate

  private void doNormalUpdate(AddUpdateCommand cmd) throws IOException {    Term updateTerm;    Term idTerm = new Term(idField.getName(), cmd.getIndexedId());    boolean del = false;    updateTerm = idTerm;    RefCounted<IndexWriter> iw = solrCoreState.getIndexWriter(core);    IndexWriter writer = iw.get();    Document luceneDocument = cmd.getLuceneDocument();    writer.updateDocument(updateTerm, luceneDocument);  }

idTerm一般设置为url地址,同一个url地址的文档最后会被覆盖。getIndexWriter和get函数最终获得SolrIndexWriter。getLuceneDocument函数将SolrInputDocument中的数据添加到lucene的文档Document中。然后调用SolrIndexWriter的updateDocument函数将该文档更新到lucene索引中,其内部会调用lucene的DocumentsWriter的updateDocument函数更新文档。

RunUpdateProcessor::processAdd->DirectUpdateHandler2::addDoc->doNormalUpdate->AddUpdateCommand::getLuceneDocument

   public Document getLuceneDocument() {     return DocumentBuilder.toDocument(getSolrInputDocument(), req.getSchema());   }  public static Document toDocument( SolrInputDocument doc, IndexSchema schema ){     Document out = new Document();    for( SolrInputField field : doc ) {      String name = field.getName();      SchemaField sfield = schema.getFieldOrNull(name);          for( Object v : field ) {        addField(out, sfield, v, 1f);        usedFields.add(sfield.getName());      }    }    ...    return out;  }

getFieldOrNull表示获取managed-scheme文件中关于field的配置。addField将域添加到lucene的文档中。这里依次遍历SolrInputDocument文档中的域,根据managed-schema文件的配置,将其添加到lucene的文档Document中。

0 0