A look inside blocks
来源:互联网 发布:快刀软件注册机 编辑:程序博客网 时间:2024/05/01 05:42
A look inside blocks
Today I have been taking a look at the internals of how blocks work from a compiler perspective. By blocks, I mean the closure that Apple added to the C language and is now well and truly established as part of the language from a clang/LLVM perspective. I had been wondering just what a “block” was and how it magically seems to appear as an Objective-C object (you can copy
, retain
, release
them for instance). This blog post delves into blocks a little.
The basics
This is a block:
This creates a variable called block
which has a simple block assigned to it. That’s easy. Done right? No. I wanted to understand what exactly the compiler does with that bit of code.
Further more, you can pass variables to block:
Or even return values from them:
And being a closure, they wrap up the context they are in:
So just how does the compiler sort all of these bits out then? That is what I was interested in.
Diving into a simple example
My first idea was to look at how the compiler compiles a very simple block. Consider the following code:
The reason for the two functions is that I wanted to see both how a block is “called” and how a block is set up. If both of these were in one function then the optimiser might be too clever and we wouldn’t see anything interesting. I had to make the runBlockA
function noinline
so that the optimiser didn’t just inline that function in doBlockA
reducing it to the same problem.
The relevant bits of that code compiles down to this (armv7, O3
):
This is the runBlockA
function. So, that’s fairly simple then. Taking a look back up to the source for this, the function is just calling the block. r0
(register 0) is set to the first argument of the function in the ARM EABI. The first instruction therefore means that r1
is loaded from the value held in the adress stored in r0 + 12
. Think of this as a dereference of a pointer, reading 12 bytes into it. Then we branch to that address. Notice that r1
is used, which means that r0
is still the block itself. So it’s likely that the function this is calling takes the block as its first parameter.
From this I can ascertain that the block is likely some sort of structure where the function the block should execute is stored 12 bytes into said structure. And when a block is passed around, a pointer to one of these structures is passed.
Now onto the doBlockA
method:
Well, that’s pretty simple also. This is a program counter relative load. You can just think of this as loading the address of the variable called ___block_literal_global
into r0
. Then the runBlockA
function is called. So given we know that the block object is being passed to runBlockA
, this ___block_literal_global
must be that block object.
Now we’re getting somewhere! But what exactly is ___block_literal_global
? Well, looking through the assembly we find this:
Ah ha! That looks very much like a struct to me. There’s 5 values in the struct, each of which are 4-bytes (long). This must be the block object that runBlockA
was acting upon. And look, 12 bytes into the struct is what looks suspiciously like a function pointer as it’s called ___doBlockA_block_invoke_0
. Remember that was what the runBlockA
function was jumping to.
But what is __NSConcreteGlobalBlock
? Well, we’ll come back to that. It’s ___doBlockA_block_invoke_0
and ___block_descriptor_tmp
that are of interest since these also appear in the assembly:
That ___doBlockA_block_invoke_0
looks suspiciously like the actual block implementation itself, since the block we used was an empty block. This function just returns straight away, exactly how we’d expect an empty function to be compiled.
Then comes ___block_descriptor_tmp
. This appears to be another struct, this time with 4 values in it. The second one is 20
which is how big the ___block_literal_global
is. Maybe that is a size value then? There’s also a C-string called .str
which has a value v4@?0
. This looks like some form of encoding of a type. That might be an encoding of the block type (i.e. it returns void and takes no parameters). The other values I have no idea about.
But the source is out there, isn’t it?
Yes, the source is out there! It’s part of the compiler-rt
project within LLVM. Trawling through the code I found the following definitions within Block_private.h:
Those look awfully familiar! The Block_layout
struct is what our ___block_literal_global
is and the Block_descriptor
struct is what our ___block_descriptor_tmp
is. And look, I was right about the size being the 2nd value of the descriptor. The bit that’s slightly strange is the 3rd and 4th values of the Block_descriptor
. These look like they should be function pointers but in our compiled case they seemed to be 2 strings. I’ll ignore that little point for now.
The isa
of Block_layout
is interesting as that must be what _NSConcreteGlobalBlock
is and also must be how a block can emulate being an Objective-C object. If _NSConcreteGlobalBlock
is a Class
then the Objective-C message dispatch system will happily treat a block object as a normal object. This is similar to how toll-free bridging works. For more information on that side of things, have a read of Mike Ash’s excellent blog post about it.
Having pieced all that together, the compiler looks like it’s treating the code as something like this:
That’s good to know. It makes a lot more sense now what’s going on under the hood of blocks.
What’s next?
Next up I will take a look at a block that takes a parameter and a block that captures variables from the enclosing scope. These will surely make things a bit different! So, watch this space for more.
This is a follow on post to A look inside blocks: Episode 1 in which I looked into the innards of blocks and how the compiler sees them. In this article I take a look at blocks that are not constant and how they are formed on the stack.
Block types
In the first article we saw the block have a class of _NSConcreteGlobalBlock
. The block structure and descriptor were both fully initialised at compile time since all variables were known. There are a few different types of block, each with their own associated class. However for simplicities sake, we just need to consider 3 of them:
_NSConcreteGlobalBlock
is a block defined globally where it is fully complete at compile time. These blocks are those that don’t capture any scope such as an empty block._NSConcreteStackBlock
is a block located on the stack. This is where all blocks start out before they are eventually copied onto the heap._NSConcreteMallocBlock
is a block located on the heap. After copying a block, this is where they end up. Once here they are reference counted and freed when the reference count drops to zero.
A block that captures scope
This time we’re going to look at the following bit of code:
The function called foo
is just there so that the block captures something, by having a function to call with a captured variable. Once again, we look at the armv7 assembly produced, relevant bits only:
First of all the runBlockA
function is the same as before. It’s calling the invoke
function of the block. Then onto doBlockA
:
Well this is very different to before. Instead of seeing a block get loaded from a global symbol, it looks like a lot more work is being done. It might look daunting, but it’s pretty easy to see what’s going on. It’s probably best to consider the function rearranged, but believe me that this doesn’t alter anything functionally. The reason the compiler has emitted the instructions in the order it has is for optimisation to reduce pipeline bubbles, etc. So, rearranged the function looks like this:
This is what that is doing:
Function prologue.
r7
is pushed onto the stack because it’s going to get overwritten and is a register which must be preserved across function calls.lr
is the link register and contains the address of the next instruction to execute when this function returns. See the function epilogue for more on that. Also, the stack pointer is saved intor7
.Subtract 24 from the stack pointer. This makes room for 24 bytes of data to be stored in stack space.
This little block of code is doing a lookup of the
L__NSConcreteStackBlock$non_lazy_ptr
symbol, relative to the program counter such that it works wherever the code may end up in the binary when finally linked. The value is then stored to the address of the stack pointer.The value
1073741824
is stored to the stack pointer + 4.The value
0
is stored to the stack pointer + 8. By now it may be becoming clear what’s going on. ABlock_layout
structure is being created on the stack! Up until now there’s theisa
pointer, theflags
and thereserved
values being set.The address of
___doBlockA_block_invoke_0
is stored at the stack pointer + 12. This is theinvoke
parameter of the block structure.The address of
___block_descriptor_tmp
is stored at the stack pointer + 16. This is thedescriptor
parameter of the block structure.The value
128
is stored at the stack pointer + 20. Ah. If you look back at theBlock_layout
struct you’ll see that there’s only 5 values in it. So what is this being stored after the end of the struct then? Well, you’ll notice that the value is128
which is the value of the variable captured in the block. So this must be where blocks store values that they use – after the end of theBlock_layout
struct.The stack pointer, which now points to a fully initialised block structure is put into
r0
andrunBlockA
is called. (Remember thatr0
contains the first argument to a function in the ARM EABI).Finally the stack pointer has 24 added back to it to balance out the subtraction at the start of the function. Then 2 values are popped off the stack into
r7
andpc
respectively. Ther7
balances the push from the prologue and thepc
will now get the value that was inlr
when the function began. This effectively performs the return of the function as it sets the CPU to continue executing (thepc
, program counter) from where the function was told to return to,lr
the link register.
Wow! You still with me? Brilliant!
The final bit of this little section is to check what the invoke function and the descriptor look like. We would expect them to be not much different to the global block from episode 1. Here they are:
And yep, there’s not much difference really. The only difference is the size
parameter of the block descriptor. It’s now 24
rather than 20
. This is because there’s an integer value captured by the block and so the block structure is 24 bytes rather than the standard 20. We saw the extra bytes being added to the end of the structure when it was created.
Also in the actual block function, i.e. __doBlockA_block_invoke_0
, you can see the value being read out of the end of the block structure, i.e. r0 + 20
. This is the variable captured in the block.
What about capturing object types?
The next thing to consider is what if instead of capturing an integer, it was an object type such as an NSString
. To see what happens there, consider the following code:
I won’t go into the details of doBlockA
because that doesn’t change much. What is interesting is the block descriptor structure that’s created:
Notice there are pointers to functions called ___copy_helper_block_
and ___destroy_helper_block_
. Here are the definitions of those functions:
I assume these functions are what gets run when blocks are copied and destroyed. They must be retaining and releasing the object that was captured by the block. It looks like the copy function takes 2 parameters as both r0
and r1
are addressed as if they contain valid data. The destroy function looks like it just takes 1. All of the hard work looks like it’s done by _Block_object_assign
and _Block_object_dispose
. The code for that is within the block runtime code, part of the compiler-rt
project within LLVM.
If you want to go away and have a read of the code for the blocks runtime then take a look at the source which can be downloaded from http://compiler-rt.llvm.org. In particular, runtime.c
is the file to look at.
What next?
In the next episode I shall take a look into the blocks runtime by investigating the code for Block_copy
and see just how that does its business. This will give an insight into the copy and destroy helper functions we’ve just seen get created for blocks that capture objects.
This post has been a long time coming. It’s been a draft for many months, but I’ve been busy writing my book and didn’t have time to finish it off. But now I’ve finished it and here it is!
Following on from episode 1 and episode 2 of my look inside blocks, this post takes a deeper look at what happens when a block is copied. You’ve likely heard the terminology that “blocks start off on the stack” and “you must copy them if you want to save them for later use”. But, why? And what actually happens during a copy? I’ve long wondered exactly what the mechanism is for copying a block. For example, what happens to the values captured by the block? In this post I take a look.
What we know so far
From episodes 1 and 2, we found out that the memory layout for a block is like this:
In episode 2 we found out that this struct is created on the stack when the block is initially referenced. Since it’s on the stack, the memory can be reused after the enclosing scope of the block ends. So what happens then if you want to use that block later on? Well, you have to copy it. This is done with a call to Block_copy()
or rather just send the Objective-C message copy
to it, since a block poses as an Objective-C object. This just calls Block_copy()
.
So what better than to take a look at what Block_copy()
does.
Block_copy()
First of all, we need to look in Block.h
. Here there are the following definitions:
So Block_copy()
is purely a #define
that casts the argument passed in to a const void *
and passes it to _Block_copy()
. There is also the prototype for _Block_copy()
. The implementation is in runtime.c
:
So that just calls _Block_copy_internal()
passing the block itself and WANTS_ONE
. To see what this means, we need to look at the implementation. This is also in runtime.c
. Here is the function, with the irrelevant stuff removed (mostly garbage collection stuff):
And here is what that method does:
If the passed argument is
NULL
then just returnNULL
. This makes the method safe to passing aNULL
block.Cast the argument to a pointer to a
struct Block_layout
. You may remember what one of these is from episode 1. It’s the internal data structure that makes up a block including a pointer to the implementation function of the block and various bits of metadata.If the block’s flags includes
BLOCK_NEEDS_FREE
then the block is a heap block (you’ll see why shortly). In this case, all that needs doing is the reference count needs incrementing and then the same block returned.If the block is a global block (recall these from episode 1) then nothing needs doing and the same block is returned. This is because global blocks are effectively singletons.
If we’ve gotten here, then the block must be a stack allocated block. In which case, the block needs to be copied to the heap. This is the fun part. In this first step,
malloc()
is used to create a portion of memory of the required size. If that fails, thenNULL
is returned, otherwise we carry on.Here,
memmove()
is used to copy bit-for-bit then current, stack allocated block to the portion of memory we just allocated for the heap allocated block. This just makes sure that all the metadata is copied over such as the block descriptor.Next, the flags of the block are updated. The first line ensures that the reference count is set to 0. The comment indicates that this is not needed – presumably because at this point the reference count should already be 0. I guess this line is left in just in case a bug ever exists where the reference count is not 0. The next line sets the
BLOCK_NEEDS_FREE
flag. This indicates that it’s a heap block and the memory backing it will, once the reference count drops to zero, requirefree
-ing. The| 1
on this line sets the reference count of the block to 1.Here the block’s
isa
pointer is set to be_NSConcreteMallocBlock
, which means it’s a heap block.Finally, if the block has a copy helper function then this is invoked. The compiler will generate the copy helper function if it’s required. It’s required for blocks that capture objects for example. In such cases, the copy helper function will retain the captured objects.
That’s pretty neat, eh! Now you know what happens when a block is copied! But that’s only half of the picture, right? What about when one is released?
Block_release()
The other half of the Block_copy()
picture is Block_release()
. Once again, this is actually a macro that looks like this:
Just like Block_copy()
, Block_release()
calls through to a function after casting the argument for us. This just helps out the developer, so that they don’t have to cast themselves.
Let’s take a look at _Block_release()
(with slight rearrangement for clarity and garbage collection specific code removed):
And here’s what each bit does:
First the argument is cast to a pointer to a
struct Block_layout
, since that’s what it is. And ifNULL
is passed in, then we return early to make the function safe against passing inNULL
.Here the portion of the block flags that signifies the reference count (recall from
Block_copy()
the part where the flags were set to indicate a reference count of 1) is decremented.If the new count is greater than 0, then there’s still things holding a reference to the block and so the block does not need to be freed yet.
Otherwise, if the flags include
BLOCK_NEEDS_FREE
, then this is a heap allocated block, and the reference count is 0, so the block should be freed. First of all though, the dispose helper function of the block is invoked. This is the antonym of the copy helper function. It performs the reverse, such as releasing any captured objects. Finally, the block is deallocated through use of_Block_deallocator
. If you go hunting inruntime.c
then you’ll see that this ends up being a function pointer tofree
, which just frees memory allocated withmalloc
.If we made it here and the block is global, then do nothing.
If we made it all the way to here, then something strange has happened because a stack block has attempted to be released, so a log line is printed to warn the developer. In reality, you should never see this being hit.
And that is that! There’s not really much more to it!
What’s next?
That concludes my tour into blocks, for now. Some of this material is covered in my book. It’s more about how to use blocks effectively, but there’s still a good portion of deep-dive material that should be of interest if you enjoyed this.
- A look inside blocks
- A look inside blocks: Episode 1
- A look inside blocks: Episode 1
- A Look Inside JBoss Cache
- A quick look inside the Android emulator
- Volatile fields in .NET: A look inside
- An Introduction to TrueType Fonts: A look inside the TTF format
- 2014年12月3日,A Look Inside Presentation Controllers
- Michael Kors Tas The ultra-modern Now perspective inside Notice Facility offers an easy take a look
- have a look
- have a look!
- have a look
- is a detailed look
- What does an XML document look like inside?
- ELF format -- How programs look from the inside
- UITabBarController inside a UINavigationController
- UnCommon C# keywords - A Look
- A Look-and-feel switcher
- 第1章 流程入门
- Recursive Blocks in Objective-C
- FlourineFx IIS 问题
- About redis
- 开源 免费 java CMS - FreeCMS1.4-系统配置
- A look inside blocks
- GridView对数据小类进行分别汇总
- 10 个你必须掌握的超酷 VI 命令技巧
- 安装程序制作
- 大容量U盘制作USB多重系统启动盘初探
- BeeFramework
- windows 7 iis 设置
- poj 1065 Wooden Sticks
- 在对话框上添加软键盘