presto的QueryExecution的start方法
来源:互联网 发布:mac win系统截图快捷键 编辑:程序博客网 时间:2024/06/06 09:48
presto的QueryExecution的start方法
标签(空格分隔): presto
- presto的QueryExecution的start方法
- 在queryExecution中start方法代码如下
- analyzeQuery方法分析
- planDistributionplan方法分析
- Task创建和提交
1 在queryExecution中,start方法代码如下:
public void start() { try (SetThreadName ignored = new SetThreadName("Query-%s", stateMachine.getQueryId())) { try { // transition to planning if (!stateMachine.transitionToPlanning()) { // query already started or finished return; } // analyze query PlanRoot plan = analyzeQuery();//查询分析,随后详细分析 // plan distribution of query planDistribution(plan);//计划query的分发,随后详细分析 // transition to starting if (!stateMachine.transitionToStarting()) { // query already started or finished return; } // if query is not finished, start the scheduler, otherwise cancel it SqlQueryScheduler scheduler = queryScheduler.get(); if (!stateMachine.isDone()) { scheduler.start(); } } catch (Throwable e) { fail(e); Throwables.propagateIfInstanceOf(e, Error.class); } } }
analyzeQuery()方法分析
planDistribution(plan)方法分析
- 最重要的代码块,创建调度器,这其中会创建多个stage:
// build the stage execution objects (this doesn't schedule execution) SqlQueryScheduler scheduler = new SqlQueryScheduler( stateMachine, locationFactory, outputStageExecutionPlan, nodePartitioningManager, nodeScheduler, remoteTaskFactory, stateMachine.getSession(), plan.isSummarizeTaskInfos(), scheduleSplitBatchSize, queryExecutor, rootOutputBuffers, nodeTaskMap, executionPolicy);
创建stage:
List<SqlStageExecution> stages = createStages( Optional.empty(), new AtomicInteger(), locationFactory, plan.withBucketToPartition(Optional.of(new int[1])), nodeScheduler, remoteTaskFactory, session, splitBatchSize, partitioningHandle -> partitioningCache.computeIfAbsent(partitioningHandle, handle -> nodePartitioningManager.getNodePartitioningMap(session, handle)), executor, nodeTaskMap, stageSchedulers, stageLinkages);
2.Task创建和提交
1) TaskResource接收到http请求后,会调用createOrUpdateTask方法,调用栈如下:
代码如下:
@POST @Path("{taskId}") @Consumes(MediaType.APPLICATION_JSON) @Produces(MediaType.APPLICATION_JSON) public Response createOrUpdateTask(@PathParam("taskId") TaskId taskId, TaskUpdateRequest taskUpdateRequest, @Context UriInfo uriInfo) { requireNonNull(taskUpdateRequest, "taskUpdateRequest is null"); Session session = taskUpdateRequest.getSession().toSession(sessionPropertyManager); TaskInfo taskInfo = taskManager.updateTask(session, taskId, taskUpdateRequest.getFragment(),//原始sql解析后的子sql,由协调节点下发而来 taskUpdateRequest.getSources(),//task的数据源,是个List<TaskSource>,说明一个task对应了多个split要进行处理 taskUpdateRequest.getOutputIds()); if (shouldSummarize(uriInfo)) { taskInfo = taskInfo.summarize(); } return Response.ok().entity(taskInfo).build(); }
2) sqlTaskManager管理多个task,没一个task对应一个SqlTask对象和一个taskid,在构造SqlTaskManager时全部创建好。调用sqlTaskManager的updateTask的代码如下:
@Override public TaskInfo updateTask(Session session, TaskId taskId, Optional<PlanFragment> fragment, List<TaskSource> sources, OutputBuffers outputBuffers) { requireNonNull(session, "session is null"); requireNonNull(taskId, "taskId is null"); requireNonNull(fragment, "fragment is null"); requireNonNull(sources, "sources is null"); requireNonNull(outputBuffers, "outputBuffers is null"); if (resourceOvercommit(session)) { // TODO: This should have been done when the QueryContext was created. However, the session isn't available at that point. queryContexts.getUnchecked(taskId.getQueryId()).setResourceOvercommit(); } SqlTask sqlTask = tasks.getUnchecked(taskId); sqlTask.recordHeartbeat(); return sqlTask.updateTask(session, fragment, sources, outputBuffers); }
3) SqlTask中调用updateTask方法:
public TaskInfo updateTask(Session session, Optional<PlanFragment> fragment, List<TaskSource> sources, OutputBuffers outputBuffers) { try { // The LazyOutput buffer does not support write methods, so the actual // output buffer must be established before drivers are created (e.g. // a VALUES query). outputBuffer.setOutputBuffers(outputBuffers); // assure the task execution is only created once SqlTaskExecution taskExecution; synchronized (this) { // is task already complete? TaskHolder taskHolder = taskHolderReference.get(); if (taskHolder.isFinished()) { return taskHolder.getFinalTaskInfo(); } taskExecution = taskHolder.getTaskExecution(); if (taskExecution == null) { checkState(fragment.isPresent(), "fragment must be present"); taskExecution = sqlTaskExecutionFactory.create(session, queryContext, taskStateMachine, outputBuffer, fragment.get(), sources);//一个task对应多个split源,多个split源最终由TaskExecution对象完成执行 taskHolderReference.compareAndSet(taskHolder, new TaskHolder(taskExecution)); needsPlan.set(false); } } if (taskExecution != null) { taskExecution.addSources(sources); } }
4) 创建SqlTaskExecution,SqlTaskExecution的构造函数如下:
创建多个driverFactory,每个对应处理一个split
LocalExecutionPlan localExecutionPlan = planner.plan( taskContext.getSession(), fragment.getRoot(), fragment.getSymbols(), fragment.getPartitioningScheme(), outputBuffer); driverFactories = localExecutionPlan.getDriverFactories();
需要重点分析plan方法,该方法属于LocalExecutionPlanner,返回一个本地执行计划LocalExecutionPlan对象,方法源码如下:
public LocalExecutionPlan plan( Session session, PlanNode plan,//从协调节点发送过来的fragment中获取,这里是outputNode, Map<Symbol, Type> types,//fragment中的symbols,记录了相关的字段和字段类型 PartitioningScheme partitioningScheme, //来自fragment,具体什么作用还不清楚 OutputBuffer outputBuffer) //用于构造输出 { List<Symbol> outputLayout = partitioningScheme.getOutputLayout(); if (partitioningScheme.getPartitioning().getHandle().equals(FIXED_BROADCAST_DISTRIBUTION) || partitioningScheme.getPartitioning().getHandle().equals(SINGLE_DISTRIBUTION) || partitioningScheme.getPartitioning().getHandle().equals(COORDINATOR_DISTRIBUTION)) { //什么情况下走这个流程是不是很清楚, return plan(session, plan, outputLayout, types, new TaskOutputFactory(outputBuffer)); } }
如果不走上边的流程,最后调用私有的plan方法,代码如下:
return plan( session, plan, outputLayout, types, new PartitionedOutputFactory(partitionFunction, partitionChannels, partitionConstants, nullChannel, outputBuffer, maxPagePartitioningBufferSize));
LocalExecutionPlanner的plan方法代码如下:
public LocalExecutionPlan plan(Session session, PlanNode plan, List<Symbol> outputLayout, Map<Symbol, Type> types, OutputFactory outputOperatorFactory) { LocalExecutionPlanContext context = new LocalExecutionPlanContext(session, types); PhysicalOperation physicalOperation = plan.accept(new Visitor(session), context); Function<Page, Page> pagePreprocessor = enforceLayoutProcessor(outputLayout, physicalOperation.getLayout()); List<Type> outputTypes = outputLayout.stream() .map(types::get) .collect(toImmutableList()); //从输入的types参数中获取信息,转化成输出需要的类型 DriverFactory driverFactory = new DriverFactory( context.isInputDriver(), true, ImmutableList.<OperatorFactory>builder() .addAll(physicalOperation.getOperatorFactories()) .add(outputOperatorFactory.createOutputOperator(context.getNextOperatorId(), plan.getId(), outputTypes, pagePreprocessor)) .build(), context.getDriverInstanceCount()); context.addDriverFactory(driverFactory); addLookupOuterDrivers(context); // notify operator factories that planning has completed context.getDriverFactories().stream() .map(DriverFactory::getOperatorFactories) .flatMap(List::stream) .filter(LocalPlannerAware.class::isInstance) .map(LocalPlannerAware.class::cast) .forEach(LocalPlannerAware::localPlannerComplete); return new LocalExecutionPlan(context.getDriverFactories()); }
阅读全文
0 0
- presto的QueryExecution的start方法
- presto查询处理流程(queryexecution提交)
- presto查询处理流程(queryexecution提交)
- Presto学习-presto的安装
- 深入理解Presto(1) : Presto的架构
- 配置presto的过程
- Thread的start方法。
- Presto Benchmark Driver 的使用
- Activity的静态start方法
- Presto:Facebook的分布式SQL查询引擎
- Presto:Facebook的分布式SQL查询引擎
- Facebook Presto 0.100&0.118的安装
- Presto中Queue的使用总结
- Presto:Facebook的分布式SQL查询引擎
- presto源码分析(hive的分区处理)
- presto对orc文件的读取
- Superset 连接 Presto 的正确姿势
- Thread的run方法和start方法
- 欢迎使用CSDN-markdown编辑器
- Python学习(十九)——CSV文件读写
- PHP 自动载入,实例化对象时自动include类文件(spl_autoload_register)
- 类型转换
- Python的numpy中的 broadcasting(广播)机制
- presto的QueryExecution的start方法
- zabbix 如何在problem视图新增一列
- linux查看端口号以及关闭端口号
- 20171202练习题组
- C++/C++11中命名空间(namespace)的使用
- 7-50 猴子选大王(20 分)
- PHP约瑟夫环问题
- python基础1:认识python和基础知识
- python与cgi