solr-6.1.0源码分析---4
来源:互联网 发布:苹果系统办公软件 编辑:程序博客网 时间:2024/06/08 17:45
solr-6.1.0源码分析—创建索引
本章假设nutch客户端通过solrindex命令发送一个http请求,请求命令如下,
bin/nutch solrindex http://127.0.0.1/solr/test crawl/crawldb -linkdb crawl/linkdb -dir crawl/segments/ -filter -normalize
请求地址中的test为solr中的core名称,参数如下,
wt=javabin&version=2
该请求到达服务器后,会经过tomcat框架调用SolrDispatchFilter的doFilter函数,继而调用HttpSolrCall的call函数。
SolrDispatchFilter::doFilter->HttpSolrCall::call
public Action call() throws IOException { ... init(); HttpServletResponse resp = response; switch (action) { case ADMIN: ... case REMOTEQUERY: ... case PROCESS: final Method reqMethod = Method.getMethod(req.getMethod()); SolrQueryResponse solrRsp = new SolrQueryResponse(); execute(solrRsp); QueryResponseWriter responseWriter = core.getQueryResponseWriter(solrReq); writeResponse(solrRsp, responseWriter, reqMethod); return RETURN; }
init函数进行处理请求前的准备工作,getMethod函数获得请求方法,例如GET和POST。call函数创建SolrQueryResponse用于封装返回信息,execute函数调用对应的Handler处理请求,包含处理请求的主要逻辑。getQueryResponseWriter根据请求参数javabin获取BinaryResponseWriter,然后调用writeResponse函数将结果返回给客户端。
SolrDispatchFilter::doFilter->HttpSolrCall::call->init
private void init() throws Exception { corename = path.substring(1, idx); core = cores.getCore(corename); path = path.substring(idx); config = core.getSolrConfig(); SolrRequestParsers parser = config.getRequestParsers(); extractHandlerFromURLPath(parser); solrReq = parser.parse(core, path, req); action = PROCESS; return; }
getCore函数获得在SolrDispatchFilter的init函数中创建的SolrCore,getSolrConfig获得solr的配置信息SolrConfig,继而调用getRequestParsers获得请求解析器SolrRequestParsers。extractHandlerFromURLPath函数根据请求路径获得处理类,这里的请求路径为/update,最终获得UpdateRequestHandler。最后调用SolrRequestParsers的parse函数创建SolrQueryRequest封装HttpServletRequest。
SolrDispatchFilter::doFilter->HttpSolrCall::call->execute
protected void execute(SolrQueryResponse rsp) { solrReq.getContext().put("webapp", req.getContextPath()); solrReq.getCore().execute(handler, solrReq, rsp); } public void execute(SolrRequestHandler handler, SolrQueryRequest req, SolrQueryResponse rsp) { ... handler.handleRequest(req,rsp); ... }
execute会进而调用SolrCore的execute函数,该函数的主要逻辑是调用对应请求路径的SolrRequestHandler的handleRequest函数处理请求。本章假设该SolrRequestHandler是UpdateRequestHandler,因此下面看其的handleRequest函数。
SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest
public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) { ... handleRequestBody( req, rsp ); ... } public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { SolrParams params = req.getParams(); UpdateRequestProcessorChain processorChain = req.getCore().getUpdateProcessorChain(params); UpdateRequestProcessor processor = processorChain.createProcessor(req, rsp); ContentStreamLoader documentLoader = newLoader(req, processor); Iterable<ContentStream> streams = req.getContentStreams(); for (ContentStream stream : streams) { documentLoader.load(req, rsp, stream, processor); } }
handleRequest函数主要调用handleRequestBody函数处理请求。getUpdateProcessorChain获得在SolrCore构造函数中创建的UpdateRequestProcessorChain。createProcessor创建处理该请求的processor。newLoader函数返回ContentStreamLoader。最后调用ContentStreamLoader的load函数处理请求。
SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->UpdateRequestProcessorChain::createProcessor
public UpdateRequestProcessor createProcessor(SolrQueryRequest req, SolrQueryResponse rsp) { UpdateRequestProcessor processor = null; UpdateRequestProcessor last = null; for (int i = chain.size() - 1; i >= 0; i--) { UpdateRequestProcessorFactory factory = chain.get(i); processor = factory.getInstance(req, rsp, last); last = processor == null ? last : processor; } return last; }
默认的chain链表中包含RunUpdateProcessorFactory、DistributedUpdateProcessorFactory和LogUpdateProcessorFactory。RunUpdateProcessorFactory的getInstance函数返回RunUpdateProcessor。DistributedUpdateProcessorFactory的getInstance函数返回DistributedUpdateProcessor。LogUpdateProcessorFactory的getInstance函数返回LogUpdateProcessorFactory。
SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->ContentStreamLoader::load
public void load(SolrQueryRequest req, SolrQueryResponse rsp, ContentStream stream, UpdateRequestProcessor processor) throws Exception { type = stream.getContentType(); int idx = type.indexOf(';'); if (idx > 0) { type = type.substring(0, idx); } loader = loaders.get(type); loader.load(req, rsp, stream, processor); }
根据数据类型获得对应的loader,其初始化代码如下所示,
protected Map<String,ContentStreamLoader> createDefaultLoaders(NamedList args) { SolrParams p = null; if(args!=null) { p = SolrParams.toSolrParams(args); } Map<String,ContentStreamLoader> registry = new HashMap<>(); registry.put("application/xml", new XMLLoader().init(p) ); registry.put("application/json", new JsonLoader().init(p) ); registry.put("application/csv", new CSVLoader().init(p) ); registry.put("application/javabin", new JavabinLoader().init(p) ); registry.put("text/csv", registry.get("application/csv") ); registry.put("text/xml", registry.get("application/xml") ); registry.put("text/json", registry.get("application/json")); pathVsLoaders.put(JSON_PATH,registry.get("application/json")); pathVsLoaders.put(DOC_PATH,registry.get("application/json")); pathVsLoaders.put(CSV_PATH,registry.get("application/csv")); return registry; }
假设这里的类型为application/xml,则获得XMLLoader,继而调用其load函数处理请求。
SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->ContentStreamLoader::load->XMLLoader::load
public void load(SolrQueryRequest req, SolrQueryResponse rsp, ContentStream stream, UpdateRequestProcessor processor) throws Exception { final String charset = ContentStreamBase.getCharsetFromContentType(stream.getContentType()); InputStream is = = stream.getStream(); XMLStreamReader parser = inputFactory.createXMLStreamReader(is, charset); this.processUpdate(req, processor, parser); }
首先获得数据的编码类型和输入流,然后通过createXMLStreamReader函数创建XMLStreamReader,最后调用processUpdate函数处理请求。
SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->ContentStreamLoader::load->XMLLoader::load->processUpdate
void processUpdate(SolrQueryRequest req, UpdateRequestProcessor processor, XMLStreamReader parser) throws XMLStreamException, IOException, FactoryConfigurationError { AddUpdateCommand addCmd = null; SolrParams params = req.getParams(); while (true) { int event = parser.next(); switch (event) { case XMLStreamConstants.END_DOCUMENT: parser.close(); return; case XMLStreamConstants.START_ELEMENT: String currTag = parser.getLocalName(); if (currTag.equals(UpdateRequestHandler.ADD)) { addCmd = new AddUpdateCommand(req); addCmd.commitWithin = params.getInt(UpdateParams.COMMIT_WITHIN, -1); addCmd.overwrite = params.getBool(UpdateParams.OVERWRITE, true); for (int i = 0; i < parser.getAttributeCount(); i++) { String attrName = parser.getAttributeLocalName(i); String attrVal = parser.getAttributeValue(i); if (UpdateRequestHandler.OVERWRITE.equals(attrName)) { addCmd.overwrite = StrUtils.parseBoolean(attrVal); } else if (UpdateRequestHandler.COMMIT_WITHIN.equals(attrName)) { addCmd.commitWithin = Integer.parseInt(attrVal); } } } else if ("doc".equals(currTag)) { if(addCmd != null) { addCmd.clear(); addCmd.solrDoc = readDoc(parser); processor.processAdd(addCmd); } } else if (UpdateRequestHandler.COMMIT.equals(currTag) || UpdateRequestHandler.OPTIMIZE.equals(currTag)) { ... } else if (UpdateRequestHandler.ROLLBACK.equals(currTag)) { ... } else if (UpdateRequestHandler.DELETE.equals(currTag)) { ... } break; } } }
processUpdate函数遍历请求的标签,这里假设为add标签,则创建AddUpdateCommand并从请求中添加相关参数。创建完AddUpdateCommand后,继续遍历请求,从中找到doc标签,将请求的文档输入解析成SolrInputDocument类型的文档并添加到AddUpdateCommand中,然后调用processor的processAdd函数将该文档添加到lucene索引中。下面假设processor链表依次为LogUpdateProcessor、DistributedUpdateProcessor和RunUpdateProcessor。其中RunUpdateProcessor的processAdd函数完成lucene索引的添加工作。
LogUpdateProcessor::processAdd
public void processAdd(AddUpdateCommand cmd) throws IOException { ... if (next != null) next.processAdd(cmd); ... }
LogUpdateProcessor的processAdd函数并没有重要的工作,next为DistributedUpdateProcessor,processAdd函数会继而调用DistributedUpdateProcessor的processAdd函数。
DistributedUpdateProcessor::processAdd
public void processAdd(AddUpdateCommand cmd) throws IOException { ... versionAdd(cmd); ... } protected boolean versionAdd(AddUpdateCommand cmd) throws IOException { ... doLocalAdd(cmd); ... }
versionAdd的省略的部分是为文档添加版本,然后调用doLocalAdd继续处理,doLocalAdd会获取下一个RunUpdateProcessor并调用其processAdd函数。
RunUpdateProcessor::processAdd
public void processAdd(AddUpdateCommand cmd) throws IOException { updateHandler.addDoc(cmd); super.processAdd(cmd); }
updateHandler为DirectUpdateHandler2。addDoc函数用于向lucene索引中添加文档。
其父类的processAdd进而调用下一个Processor的processAdd函数,如果没有就直接返回。
RunUpdateProcessor::processAdd->DirectUpdateHandler2::addDoc
public int addDoc(AddUpdateCommand cmd) throws IOException { return addDoc0(cmd); } private int addDoc0(AddUpdateCommand cmd) throws IOException { ... doNormalUpdate(cmd); ... }
addDoc进而调用doNormalUpdate函数添加文档。
RunUpdateProcessor::processAdd->DirectUpdateHandler2::addDoc->doNormalUpdate
private void doNormalUpdate(AddUpdateCommand cmd) throws IOException { Term updateTerm; Term idTerm = new Term(idField.getName(), cmd.getIndexedId()); boolean del = false; updateTerm = idTerm; RefCounted<IndexWriter> iw = solrCoreState.getIndexWriter(core); IndexWriter writer = iw.get(); Document luceneDocument = cmd.getLuceneDocument(); writer.updateDocument(updateTerm, luceneDocument); }
idTerm一般设置为url地址,同一个url地址的文档最后会被覆盖。getIndexWriter和get函数最终获得SolrIndexWriter。getLuceneDocument函数将SolrInputDocument中的数据添加到lucene的文档Document中。然后调用SolrIndexWriter的updateDocument函数将该文档更新到lucene索引中,其内部会调用lucene的DocumentsWriter的updateDocument函数更新文档。
RunUpdateProcessor::processAdd->DirectUpdateHandler2::addDoc->doNormalUpdate->AddUpdateCommand::getLuceneDocument
public Document getLuceneDocument() { return DocumentBuilder.toDocument(getSolrInputDocument(), req.getSchema()); } public static Document toDocument( SolrInputDocument doc, IndexSchema schema ){ Document out = new Document(); for( SolrInputField field : doc ) { String name = field.getName(); SchemaField sfield = schema.getFieldOrNull(name); for( Object v : field ) { addField(out, sfield, v, 1f); usedFields.add(sfield.getName()); } } ... return out; }
getFieldOrNull表示获取managed-scheme文件中关于field的配置。addField将域添加到lucene的文档中。这里依次遍历SolrInputDocument文档中的域,根据managed-schema文件的配置,将其添加到lucene的文档Document中。
- solr-6.1.0源码分析---4
- solr-6.1.0源码分析---1
- solr-6.1.0源码分析---2
- solr-6.1.0源码分析---3
- Solr查询过程源码分析
- solr分布式搜索源码分析
- Solr查询过程源码分析
- solr中docValues字段排序源码分析
- Solr空间索引原理及源码分析
- [Solr源码分析]LRUCache和FastLRUCache实现分析
- 结合源码分析Solr&Lucene查询打分的工作流程
- Solr单节点索引和NRT源码分析
- Solr4.9.0源码分析之Solr的Servlet
- Solr源码掘金之 SolrCloud中的zookeeper使用分析
- Solr4.7源码分析-启动篇之Solr Cloud(二)——solr选举机制
- 在Idea下编译solr 6.1源码
- solr6.4通过ivy解析solr源码
- Solr源码结构
- 矩阵链乘 动态规划
- HiveSQL-datediff函数
- c++第三周初
- View 触摸事件分发拦截机制(一)
- DEV 控件----GridControl
- solr-6.1.0源码分析---4
- 软件测试与软件测试用例
- spring mvc 时间类型传值问题
- mybatis使用及动态sql
- java图形界面设计
- 光敏电阻的简单介绍
- UML里九中图形及举例
- 可行性研究报告【最终版】——学生成绩查询系统
- 50道JAVA基础编程练习题