Spring Batch Partitioning example
来源:互联网 发布:crossover linux 破解 编辑:程序博客网 时间:2024/06/06 03:25
Spring Batch Partitioning example
Photo Credit : Spring Source
In Spring Batch, “Partitioning” is “multiple threads to process a range of data each”. For example, assume you have 100 records in a table, which has “primary id” assigned from 1 to 100, and you want to process the entire 100 records.
Normally, the process starts from 1 to 100, a single thread example. The process is estimated to take 10 minutes to finish.
Single Thread - Process from 1 to 100
In “Partitioning”, we can start 10 threads to process 10 records each (based on the range of ‘id’). Now, the process may take only 1 minute to finish.
Thread 1 - Process from 1 to 10Thread 2 - Process from 11 to 20Thread 3 - Process from 21 to 30......Thread 9 - Process from 81 to 90Thread 10 - Process from 91 to 100
To implement “Partitioning” technique, you must understand the structure of the input data to process, so that you can plan the “range of data” properly.
1. Tutorial
In this tutorial, we will show you how to create a “Partitioner” job, which has 10 threads, each thread will read records from the database, based on the provided range of ‘id’.
Tools and libraries used
- Maven 3
- Eclipse 4.2
- JDK 1.6
- Spring Core 3.2.2.RELEASE
- Spring Batch 2.2.0.RELEASE
- MySQL Java Driver 5.1.25
P.S Assume “users” table has 100 records.
id, user_login, user_passs, age 1,user_1,pass_1,202,user_2,pass_2,403,user_3,pass_3,704,user_4,pass_4,55,user_5,pass_5,52......99,user_99,pass_99,89100,user_100,pass_100,76
2. Project Directory Structure
Review the final project structure, a standard Maven project.
3. Partitioner
First, create a Partitioner
implementation, puts the “ partitioning range ” into theExecutionContext
. Later, you will declare the same fromId
and tied
in the batch-job XML file.
In this case, the partitioning range is look like the following :
Thread 1 = 1 - 10Thread 2 = 11 - 20Thread 3 = 21 - 30......Thread 10 = 91 - 100
package com.mkyong.partition; import java.util.HashMap;import java.util.Map; import org.springframework.batch.core.partition.support.Partitioner;import org.springframework.batch.item.ExecutionContext; public class RangePartitioner implements Partitioner { @Overridepublic Map<String, ExecutionContext> partition(int gridSize) { Map<String, ExecutionContext> result = new HashMap<String, ExecutionContext>(); int range = 10;int fromId = 1;int toId = range; for (int i = 1; i <= gridSize; i++) {ExecutionContext value = new ExecutionContext(); System.out.println("\nStarting : Thread" + i);System.out.println("fromId : " + fromId);System.out.println("toId : " + toId); value.putInt("fromId", fromId);value.putInt("toId", toId); // give each thread a name, thread 1,2,3value.putString("name", "Thread" + i); result.put("partition" + i, value); fromId = toId + 1;toId += range; } return result;} }
4. Batch Jobs
Review the batch job XML file, it should be self-explanatory. Few points to highlight :
- For partitioner, grid-size = number of threads .
- For pagingItemReader bean, a jdbc reader example, the
#{stepExecutionContext[fromId, toId]}
values will be injected by theExecutionContext
in rangePartitioner. - For itemProcessor bean, the
#{stepExecutionContext[name]}
values will be injected by theExecutionContext
in rangePartitioner. - For writers, each thread will output the records in a different csv files, with filename format –
users.processed[fromId]}-[toId].csv
.
<?xml version="1.0" encoding="UTF-8"?><beans xmlns="http://www.springframework.org/schema/beans"xmlns:batch="http://www.springframework.org/schema/batch"xmlns:util="http://www.springframework.org/schema/util"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.2.xsdhttp://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.2.xsdhttp://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-3.2.xsd"> <!-- spring batch core settings --> <import resource="../config/context.xml" /> <!-- database settings --> <import resource="../config/database.xml" /> <!-- partitioner job --> <job id="partitionJob" xmlns="http://www.springframework.org/schema/batch"> <!-- master step, 10 threads (grid-size) --> <step id="masterStep"><partition step="slave" partitioner="rangePartitioner"><handler grid-size="10" task-executor="taskExecutor" /></partition> </step> </job> <!-- each thread will run this job, with different stepExecutionContext values. --> <step id="slave" xmlns="http://www.springframework.org/schema/batch"><tasklet><chunk reader="pagingItemReader" writer="flatFileItemWriter"processor="itemProcessor" commit-interval="1" /></tasklet> </step> <bean id="rangePartitioner" class="com.mkyong.partition.RangePartitioner" /> <bean id="taskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor" /> <!-- inject stepExecutionContext --> <bean id="itemProcessor" class="com.mkyong.processor.UserProcessor" scope="step"><property name="threadName" value="#{stepExecutionContext[name]}" /> </bean> <bean id="pagingItemReader"class="org.springframework.batch.item.database.JdbcPagingItemReader"scope="step"><property name="dataSource" ref="dataSource" /><property name="queryProvider"> <beanclass="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean"><property name="dataSource" ref="dataSource" /><property name="selectClause" value="select id, user_login, user_pass, age" /><property name="fromClause" value="from users" /><property name="whereClause" value="where id >= :fromId and id <= :toId" /><property name="sortKey" value="id" /> </bean></property><!-- Inject via the ExecutionContext in rangePartitioner --><property name="parameterValues"> <map><entry key="fromId" value="#{stepExecutionContext[fromId]}" /><entry key="toId" value="#{stepExecutionContext[toId]}" /> </map></property><property name="pageSize" value="10" /><property name="rowMapper"><bean class="com.mkyong.UserRowMapper" /></property> </bean> <!-- csv file writer --> <bean id="flatFileItemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter"scope="step" ><property name="resource"value="file:csv/outputs/users.processed#{stepExecutionContext[fromId]}-#{stepExecutionContext[toId]}.csv" /><property name="appendAllowed" value="false" /><property name="lineAggregator"> <beanclass="org.springframework.batch.item.file.transform.DelimitedLineAggregator"><property name="delimiter" value="," /><property name="fieldExtractor"> <beanclass="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"><property name="names" value="id, username, password, age" /> </bean></property> </bean></property> </bean> </beans>
The item processor class is used to print out the processing item and current running “thread name” only.
package com.mkyong.processor; import org.springframework.batch.item.ItemProcessor;import com.mkyong.User; public class UserProcessor implements ItemProcessor<User, User> { private String threadName; @Overridepublic User process(User item) throws Exception { System.out.println(threadName + " processing : " + item.getId() + " : " + item.getUsername()); return item;} public String getThreadName() {return threadName;} public void setThreadName(String threadName) {this.threadName = threadName;} }
5. Run It
Loads everything and run it… 10 threads will be started to process the provided range of data.
package com.mkyong; import org.springframework.batch.core.Job;import org.springframework.batch.core.JobExecution;import org.springframework.batch.core.JobParameters;import org.springframework.batch.core.JobParametersBuilder;import org.springframework.batch.core.launch.JobLauncher;import org.springframework.context.ApplicationContext;import org.springframework.context.support.ClassPathXmlApplicationContext; public class PartitionApp { public static void main(String[] args) {PartitionApp obj = new PartitionApp ();obj.runTest(); } private void runTest() { String[] springConfig = { "spring/batch/jobs/job-partitioner.xml" }; ApplicationContext context = new ClassPathXmlApplicationContext(springConfig); JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");Job job = (Job) context.getBean("partitionJob"); try { JobExecution execution = jobLauncher.run(job, new JobParameters()); System.out.println("Exit Status : " + execution.getStatus()); System.out.println("Exit Status : " + execution.getAllFailureExceptions()); } catch (Exception e) {e.printStackTrace();} System.out.println("Done"); }}
Console output
Starting : Thread1fromId : 1toId : 10 Starting : Thread2fromId : 11toId : 20 Starting : Thread3fromId : 21toId : 30 Starting : Thread4fromId : 31toId : 40 Starting : Thread5fromId : 41toId : 50 Starting : Thread6fromId : 51toId : 60 Starting : Thread7fromId : 61toId : 70 Starting : Thread8fromId : 71toId : 80 Starting : Thread9fromId : 81toId : 90 Starting : Thread10fromId : 91toId : 100 Thread8 processing : 71 : user_71Thread2 processing : 11 : user_11Thread3 processing : 21 : user_21Thread10 processing : 91 : user_91Thread4 processing : 31 : user_31Thread6 processing : 51 : user_51Thread5 processing : 41 : user_41Thread1 processing : 1 : user_1Thread9 processing : 81 : user_81Thread7 processing : 61 : user_61Thread2 processing : 12 : user_12Thread7 processing : 62 : user_62Thread6 processing : 52 : user_52Thread1 processing : 2 : user_2Thread9 processing : 82 : user_82......
After the process is completed, 10 CSV files will be created.
1,user_1,pass_1,202,user_2,pass_2,403,user_3,pass_3,704,user_4,pass_4,55,user_5,pass_5,526,user_6,pass_6,697,user_7,pass_7,488,user_8,pass_8,349,user_9,pass_9,6210,user_10,pass_10,21
6. Misc
6.1 Alternatively, you can inject the #{stepExecutionContext[name]}
via annotation.
package com.mkyong.processor; import org.springframework.batch.item.ItemProcessor;import org.springframework.beans.factory.annotation.Value;import org.springframework.context.annotation.Scope;import org.springframework.stereotype.Component;import com.mkyong.User; @Component("itemProcessor")@Scope(value = "step")public class UserProcessor implements ItemProcessor<User, User> { @Value("#{stepExecutionContext[name]}")private String threadName; @Overridepublic User process(User item) throws Exception { System.out.println(threadName + " processing : " + item.getId() + " : " + item.getUsername()); return item;} }
Remember, enable the Spring component auto scanning.
<context:component-scan base-package="com.mkyong" />
6.2 Database partitioner reader – MongoDB example.
<bean id="mongoItemReader" class="org.springframework.batch.item.data.MongoItemReader"scope="step"><property name="template" ref="mongoTemplate" /><property name="targetType" value="com.mkyong.User" /><property name="query" value="{ 'id':{$gt:#{stepExecutionContext[fromId]}, $lte:#{stepExecutionContext[toId]} } }" /><property name="sort"><util:map id="sort"><entry key="id" value="" /></util:map></property> </bean>
Done.
Download Source Code
References
- Spring Batch Partitioning example
- Spring Batch + Spring TaskScheduler example
- Spring Batch MultiResourceItemReader example(八)
- Spring Batch Example – Hello World Project
- Spring Batch Connect to Oracle Example
- Spring Batch TaskScheduler example(九)
- Spring Batch Hello World Example(一)
- Spring Batch Tasklet example(二)
- Spring Batch Remote Partitioning(远程分区)简介
- Spring Batch Example – MySQL Database To XML(七)
- Spring Batch Example – CSV File To MySQL Database (四)
- Spring Batch Example – XML File To CSV File(六)
- spring batch
- Spring batch
- Spring Batch
- Spring Batch
- spring-batch
- Spring batch
- 一个 2 年 Android 开发者的 18 条忠告
- 女巫终于变黑猫了
- 欢迎使用CSDN-markdown编辑器
- onActivityResult回调的用法
- 写一个复数类Complex,(复数形如3.2+5.6i,2.9-1.3i,其中i*i=-1)。要求支持+-*/,++、--,到bool类型和string类型的转换,支持>>、<<运算符。
- Spring Batch Partitioning example
- 车(唯一分解定理+高精度乘以单精度)
- 7. Scaling and Parallel Processing
- Bash中的可执行命令
- namespace 命名空间
- 信用风险评估评分卡 之 极端值
- Jsp简介
- file change只触发一次 解决方案
- 理解面向对象