Parallel Processing and Multi-Core Utilization with Java
来源:互联网 发布:淘宝卖家怎么查看粉丝 编辑:程序博客网 时间:2024/05/22 07:53
原文地址:http://embarcaderos.net/2011/01/23/parallel-processing-and-multi-core-utilization-with-java/
In order to harvest the full power of a multi-core processor the software application must be able to execute tasks in parallel utilizing all available CPUs. Parallelization of a process consist of breaking up a large single process into multiple smaller tasks that can run in parallel and once finalized can be combined obtaining an overall improvement in performance. The result is the execution of a single task or process by multiple processors or CPUs “Parallel Processing”, not to be confused with concurrency.
In this article I look into Java 6 and Java 7 concurrency libraries with the newly added Fork/Join support in the case of Java 7 to achieve asynchronous parallelization, improve performance and take advantage of a multi-core environment.
Java 6 Concurrency Library
So let’s first look at some of the option that Java 6 concurrency library offers.Let’s start with a class that will be called from all test classes, this class will perform a task to give the process some load. The class called Task will implement Callable. The call method will sleep for 1 second and then it will perform some task like concatenate a string a number of times. The idea is to simulate a task that first makes a call to a remote service and waits for a seconds for the call to return (this is the sleep) and then simulates some processing to the result like formatting or decoration (this is the loop that concatenates a string).
import java.util.concurrent.Callable;public class Task implements Callable { private int seq; public Task() {} public Task(int i) { seq = i; } public Object call() { String str = ""; long begTest = new java.util.Date().getTime(); System.out.println("start - Task "+seq); try { // sleep for 1 second to simulate a remote call, // just waiting for the call to return Thread.sleep(1000); // loop that just concatenate a str to simulate // work on the result form remote call for(int i = 0; i < 20000; i++) str = str + 't'; } catch (InterruptedException e) {} Double secs = new Double((new java.util.Date().getTime() - begTest)*0.001); System.out.println("run time " + secs + " secs"); return seq; }}
The SerialTest class
Next I will introduce a class that will execute a task a number of times in a synchronous - serial mode and will display progress and time of execution; this way we will have the execution time to compare with other versions of the test that will use Java concurrent framework. We will start with 50 tasks to execute on every test.
public class SerialTest { private static int NUM_OF_TASKS = 50; public SerialTest () {} public void run() { long begTest = new java.util.Date().getTime(); Object taskResult; for(int i=0;i < NUM_OF_TASKS;i++) { Task task = new Task(i); taskResult = task.call(); System.out.println("result "+taskResult); } Double secs = new Double((new java.util.Date().getTime() - begTest)*0.001); System.out.println("run time " + secs + " secs"); } public static void main(String[] args) { new SerialTest().run(); System.exit(0); }}
image 1. Running SerialTest class shows this result:
No surprise it takes 65 seconds since we executed 50 tasks and every task sleeps for 1 second plus the time it will take to concatenate 20K strings and all is executed serially.
image 2. The Task Manager performance view while the SerialTest is running:
I am running the test in a 8 CPU box, the performance view shows a CPU usage of only 5% of the total CPU power combined, the view shows that 4 of the CPUs are idle and the other 4 have minimal activity with most likely one running our test and the other 3 performing some OS tasks.
The ExecutorServiceTest class
import java.util.ArrayList;import java.util.List;import java.util.concurrent.*;public class ExecutorServiceTest { private static int NUM_OF_TASKS = 50; public ExecutorServiceTest() {} public void run() { long begTest = new java.util.Date().getTime(); List< Future > futuresList = new ArrayList< Future >(); int nrOfProcessors = Runtime.getRuntime().availableProcessors(); ExecutorService eservice = Executors.newFixedThreadPool(nrOfProcessors); for(int index = 0; index < NUM_OF_TASKS; index++) futuresList.add(eservice.submit(new Task(index))); Object taskResult; for(Future future:futuresList) { try { taskResult = future.get(); System.out.println("result "+taskResult); } catch (InterruptedException e) {} catch (ExecutionException e) {} } Double secs = new Double((new java.util.Date().getTime() - begTest)*0.001); System.out.println("run time " + secs + " secs"); } public static void main(String[] args) { new ExecutorServiceTest().run(); System.exit(0); }}
16.73 seconds! a 75% improvement over the SerialTest class result.
The ExecutorServiceTest class utilizes the ExecutorService to achieve this huge improvement so let's look closely how this is done:
int nrOfProcessors = Runtime.getRuntime().availableProcessors(); ExecutorService eservice = Executors.newFixedThreadPool(nrOfProcessors);
The ExecutorService is initialized by callings the static method newFixedThreadPool from the Executors class, the Executor class is a factory class to create ExecutorService instances with initialized thread pools. The test uses the newFixedThreadPool that created a pool with a fixed number of threads that are reused. In this test the number of available processors is passed to the factory so the ExecutorService is created with as many threads in the pool as available processors.
image 4. The Task Manager performance view while the ExecutorServiceTest is running:
Using the ExecutorService to submit the 50 tasks for execution with a pool of 8 threads utilizes an average 60+% of the total power available utilizing all available processors and still leaving enough processing power (near 40%) for the OS and other processes.
Now let's go back to the code and continue analyzing the concurrent implementation. The next snipped of code shows how the tasks are submitted to the ExecutorService:
for(int index = 0; index < NUM_OF_TASKS; index++) futuresList.add(eservice.submit(new Task(index)));
The newly created instance of Task is passed to the ExecutorService 'submit' method and that call returns a handle to a Future interface that represents the result of an asynchronous operation. The future provides methods to check if the operation is completed and once is completed give us access to the result of the operation, in our test the result of the task. We add the futures returned to a list that we will use later to obtain the results.
Object taskResult; for(Future future:futuresList) { try { taskResult = future.get(); System.out.println("result "+taskResult); } catch (InterruptedException e) {} catch (ExecutionException e) {} }
In this last snipped loops the list of futures to obtain the results and display them, notice the use of the taskResult of type Object to obtain the returned value of the asyncronously processed task, the result can be casted to its returned type if needed. Also notice that future.get is called sequentially by looping the list of futures, since we submitted the task in order the results are received in the same order. This works fine in some cases but in cases where some tasks could take longer to executed than others the call to future.get will block the execution of the main thread until the tasks at hand represented by its future finishes, until then the call blocks waiting for the result. One way to solve this problem is to use future.isDone method and only call future.get method when future.isDone method returns true, else test another future in the list.
With this implementation we improved our test quite a bit using the concurrent framework but we are left with the potential problem of blocking while getting the results. Instead of using future.isDone we are going to use a CompletionService class to solve this problem.
The CompletionServiceTest class
import java.util.concurrent.*;public class CompletionServiceTest { private static int NUM_OF_TASKS = 50; public CompletionServiceTest() {} public void run() { long begTest = new java.util.Date().getTime(); int nrOfProcessors = Runtime.getRuntime().availableProcessors(); ExecutorService eservice = Executors.newFixedThreadPool(nrOfProcessors); CompletionService < Object > cservice = new ExecutorCompletionService < Object > (eservice); for (int index = 0; index < NUM_OF_TASKS; index++) cservice.submit(new Task(index)); Object taskResult; for(int index = 0; index < NUM_OF_TASKS; index++) { try { taskResult = cservice.take().get(); System.out.println("result "+taskResult); } catch (InterruptedException e) {} catch (ExecutionException e) {} } Double secs = new Double((new java.util.Date().getTime() - begTest)*0.001); System.out.println("run time " + secs + " secs"); } public static void main(String[] args) { new CompletionServiceTest().run(); System.exit(0); }}
image 5. Running CompletionServiceTest class will show this result:
Notice that this time image 5 shows a larger console showing more of the results. Looking closely at the details shown in the console we can see that task 47 displayed result between tasks 43 and 44 and we can also see that task 47 took 2.544 secs that is less than the 2.667 secs that took task 44 to complete so the use of CompletionService gave the advantage of getting the tasks that completed first even if they were submitted later in the order.
Let's take a look at the specific completionService code:
ExecutorService eservice = Executors.newFixedThreadPool(nrOfProcessors); CompletionService < Object > cservice = new ExecutorCompletionService < Object > (eservice); Object taskResult; for(int index = 0; index < NUM_OF_TASKS; index++) { try { taskResult = cservice.take().get();
The CompletionService interface and the ExecutorCompletionService class provide us with a tool that we can use combined with an ExecutorService to decouple the execution and the results. We can use the CompletionService to get the next available future that finished processing regardless the order of submission to the ExecutorService. The ExecutorCompletionService constructor takes an Executor as parameter, in our case an ExecutorService instance and what we get with the ExecutorCompletionService is basically a queue that returns the futures in the order that they complete.
Implementing a call back to join the results: The CallBackTest class
import java.util.concurrent.Callable;public class CallBackTask implements Callable { private CallBackTest callBackTest; private int seq; public CallBackTask() {} public CallBackTask(int i) { seq = i; } public Object call() { String str = ""; long begTest = new java.util.Date().getTime(); System.out.println("start - Task "+seq); try { // sleep for 1 second to simulate a remote call, // just waiting for the call to return Thread.sleep(1000); // loop that just concatenate a str to simulate // work on the result form remote call for(int index = 0; index < 20000; index++) str = str + 't'; } catch (InterruptedException e) {} callBackTest.callBack(seq); Double secs = new Double((new java.util.Date().getTime() - begTest)*0.001); System.out.println("task -"+seq+" took " + secs + " secs"); return null; } public void setCaller(CallBackTest callBackTest) { this.callBackTest = callBackTest; } public CallBackTest getCaller() { return callBackTest; }}
The new task class performs the same work but implements the new accessor methods 'setCaller' and 'getCaller'. The 'setCaller' method is invoked from the test class passing its own reference to allow the task to call back when it is done. Notice that the 'call' method in the CallBackTask class no longer returns a value, instead it calls the method 'callBack' from the instance reference set from the test class (set by the 'setCaller' method).
import java.util.concurrent.*;public class CallBackTest { private static int NUM_OF_TASKS = 50; Object result; int cnt = 0; long begTest, endTest; public CallBackTest() { begTest = new java.util.Date().getTime(); } public void callBack(Object result) { System.out.println("result "+result); this.result = result; if(++cnt == 50) { Double secs = new Double((new java.util.Date().getTime() - begTest)*0.001); System.out.println("run time " + secs + " secs"); System.exit(0); } } public void run() { int nrOfProcessors = Runtime.getRuntime().availableProcessors(); ExecutorService es = Executors.newFixedThreadPool(nrOfProcessors); for(int i = 0; i < NUM_OF_TASKS; i++) { CallBackTask task = new CallBackTask(i); task.setCaller(this); es.submit(task); // at this point after submitting the tasks the // main thread is free to perform other work. } } public static void main(String[] args) { new CallBackTest().run(); }}
A method 'callBack' is implemented in this new test class, this method is called by the tasks instances to call back the caller (CallBackTest) and hand back the result, there is a simple check in this method to know when the last result is received and terminate the process. The other important difference in the CallBackTest class is in the 'run' method, notice that after the task object (CallBackTask) is instantiated the method 'setCaller' is invoked passing 'this', this is to pass a reference of the test instance to the task to allow the call back.
image 6. With all the changes set let's take a look at how this new implementation performs:
Multi-Core utilization: Increasing the number of threads
The next test of the CallBackTest class will set the fixed threads in the pool to 50.
ExecutorService es = Executors.newFixedThreadPool(50);
image 7. CallBackTest with 50 threads in pool result:
image 8. The CPU performance view:
Multi-Core utilization, thread pool size and load
I ran three tests with the CompletionServiceTest class changing the number of threads and the pool size to show the state of the threads while executing.
image 9. Profiler view of the first test that submits 50 tasks and sets 8 threads in the pool.
image 10. Profiler view of the second test that submits 8 tasks with a pool thread size of 8.
image 11. Profiler view of the last test showing the last 8 threads of the submitted 50 tasks with 50 threads in the pool.
Java 7 Fork/Join framework
I executed all test classes with Java 7 (jdk 1.7) and all results obtained were consistently same or better than the results with Java 6 (JDK 1.6) pointing to a possible optimization of the concurrent package.An important addition to the concurrent package in Java 7 is the Fork/Join. The last test will use the newly added classes just to show how it can be integrated with existing concurrent package classes. This is not a full implementation of the Fork/Join framework but just a test using the RecursiveTask and ForkJoinPool classes in conjunction with the Future class.
A task implementation for ForkJoin: The FJTask class:
import java.util.concurrent.RecursiveTask;class FJTask extends RecursiveTask { private int seq; public FJTask(int n) { this.seq = n; } public Integer compute() { String str = ""; long begTest = new java.util.Date().getTime(); System.out.println("start - Task "+seq); try { Thread.sleep(1000); for(int index = 0; index < 20000; index++) str = str + 't'; } catch (InterruptedException e) {} Double secs = new Double((new java.util.Date().getTime() - begTest)*0.001); System.out.println("run time " + secs + " secs"); return seq; }}
This is the implementation task to be used by the ForkJoinPoolTest; it extends RecursiveTask and implements the method 'compute'.
The ForkJoinPoolTest class:
import java.util.ArrayList;import java.util.List;import java.util.concurrent.*;public class ForkJoinPoolTest { public ForkJoinPoolTest() {} private static int numOfTasks = 50; public void run() { long begTest = new java.util.Date().getTime(); List futuresList = new ArrayList(); ForkJoinPool fjPool = new ForkJoinPool(numOfTasks); for(int index = 0; index < numOfTasks; index++) futuresList.add(fjPool.submit(new FJTask(index))); Object taskResult; for(Future future:futuresList) { try { taskResult = future.get(); System.out.println("result ForkJoin "+taskResult); } catch (InterruptedException e) {} catch (ExecutionException e) {} } Double secs = new Double((new java.util.Date().getTime() - begTest)*0.001); System.out.println("run time " + secs + " secs"); } public static void main(String[] args) { new ForkJoinPoolTest().run(); System.exit(0); }}
image 12. ForkJoinPoolTest with 50 threads in pool result:
In my next post on this subject I will cover more in detail Java 7's Fork/Join framework.
Environment used:
All tests in this post executed on Windows 7 Enterprise 64bit.Java 6 test:
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
Java 7 test:
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b125)
Java HotSpot(TM) 64-Bit Server VM (build 20.0-b06, mixed mode)
- Parallel Processing and Multi-Core Utilization with Java
- Multi-Core and Parallel Programming Practices笔记1
- Multi-Core and Parallel Programming Practices笔记2
- Matlab Multi-core Parallel Computing
- Explain Plan with Parallel Processing
- 7. Scaling and Parallel Processing
- Parallel Architecture\Multi-Core Cache Coherence
- Advanced Computer Architecture and Parallel Processing
- Process Algebra for Parallel and Distributed Processing
- Java Multi-Threading and Concurrency Interview Questions with Answers
- High Performance Parallel Database Processing and Grid Databases
- Export with Spool and Parallel Utl_File
- Multi-Core HTTP Server with NodeJS
- Linux Multi-Core boot up and Hotplug
- Processing XML with Java site
- XML processing with java (SAX)
- Parallel Query Processing
- Chapter 18 Parallel Processing
- 经济学原理---8应用:税收的代价--- 读书笔记
- C++模板学习
- 网站优化每日必做内容
- Webstorm & PhpStorm的序列号和证书
- 服务器性能分析工具gprof的使用及没有生成gmon.out文件的原因
- Parallel Processing and Multi-Core Utilization with Java
- 关于Cassandra的节点通讯机制——Gossip
- Spring 3.x企业应用开发实战(10)----AOP切面
- 操作系统进程调度管理实验【C语言】【源码】【windows版】
- 让enter键功能和Tab键一样
- Sybase- Invalid column name '2013-09-25'.
- Cocos2D-x中关于do{}while(0)和CC_BREAK_IF的用法
- js操作表格
- 利用业务中的主动和被动巧妙的驱动用户遵守规则