关于System.arraycopy()的实现

来源:互联网 发布:免签约php发卡平台 编辑:程序博客网 时间:2024/05/16 10:22

再看String源码的时候牵扯出System.arraycopy(),查看其源码发现并没有实现方式。 so  google。。。

第一个:

I was surprised to see in the Java source that System.arraycopy is a native method.

Of course the reason is because it's faster. But what native tricks is the code able to employ that make it faster?

Why not just loop over the original array and copy each pointer to the new array - surely this isn't that slow and cumbersome?

In native code, it can be done with a single memcpy / memmove, as opposed to n distinct copy operations. The difference in performance is substantial.

原文:http://stackoverflow.com/questions/2772152/why-is-system-arraycopy-native-in-java



第二个

Following a question related to the way the JVM implements creation of Strings based on char[], I have mentioned that no iteration takes place when the char[] gets copied to the interior of the new string, since System.arraycopy gets called eventually, which copies the desired memory using a function such as memcpy at a native, implementation-dependent level (the original question).

I wanted to check that for myself, so I downloaded the Openjdk 7 source code and started browsing it. I found the implementation of System.arraycopy in the OpenJDK C++ source code, inopenjdx/hotspot/src/share/vm/oops/objArrayKlass.cpp:

if (stype == bound || Klass::cast(stype)->is_subtype_of(bound)) {  // elements are guaranteed to be subtypes, so no check necessary  bs->write_ref_array_pre(dst, length);  Copy::conjoint_oops_atomic(src, dst, length);} else {  // slow case: need individual subtype checks

If the elements need no type checks (that's the case with, for instance, primitive data type arrays), Copy::conjoin_oops_atomic gets called.

The Copy::conjoint_oops_atomic function resides in 'copy.hpp':

// overloaded for UseCompressedOopsstatic void conjoint_oops_atomic(narrowOop* from, narrowOop* to, size_t count) {  assert(sizeof(narrowOop) == sizeof(jint), "this cast is wrong");  assert_params_ok(from, to, LogBytesPerInt);  pd_conjoint_jints_atomic((jint*)from, (jint*)to, count);}

Now we're platform dependent, as the copy operation has a different implementation, based on OS/architecture. I'll go with Windows as an example.openjdk\hotspot\src\os_cpu\windows_x86\vm\copy_windows_x86.inline.hpp:

static void pd_conjoint_oops_atomic(oop* from, oop* to, size_t count) {// Do better than this: inline memmove body  NEEDS CLEANUPif (from > to) {  while (count-- > 0) {    // Copy forwards    *to++ = *from++;  }} else {  from += count - 1;  to   += count - 1;  while (count-- > 0) {    // Copy backwards    *to-- = *from--;  } }}

And... to my surprise, it iterates through the elements (the oop values), copying them one by one (seemingly). Can someone explain why the copy is done, even at the native level, by iterating through the elements in the array?

Because the jint most closely maps to int which most closely maps to the old hardware architectureWORD, which is basically the same size as the width of the data bus.

The memory architectures and cpu processing of today are designed to attempt processing even in the event of a cache miss, and memory locations tend to pre-fetch blocks. The code that you are looking at isn't quite as "bad" in performance as you might think. The hardware is smarter, and if you don't actually profile, your "smart" fetching routines might actually add nothing (or even slow down processing).

When you are introduced to hardware architectures, you must be introduced to simple ones. Modern ones do a lot more, so you can't assume that code that looks inefficient is actually inefficient. For example, when a memory lookup is done to evaluate the condition on an if statement, often both branches of the if statement are executed while the lookup is occurring, and the "false" branch of processing is discarded after the data becomes available to evaluate the condition. If you want to be efficient, you must profile and then act on the profiled data.

Look at the branch on JVM opcode section. You'll see it is (or perhaps, just was) an ifdef macro oddity to support (at one time) three different ways of jumping to the code that handled the opcode. That was because the three different ways actually made a meaningful performance difference on the different Windows, Linux, and Solaris architectures.

Perhaps they could have included MMX routines, but that they didn't tells me that SUN didn't think it was enough of a performance gain on modern hardware to worry about it.

原文:http://stackoverflow.com/questions/11210369/openjdk-implementation-of-system-arraycopy


原创粉丝点击