ruby byte转换bit
来源:互联网 发布:js 二维数组动态赋值 编辑:程序博客网 时间:2024/04/28 12:45
C programming language allows developers to directly access the memory where variables are stored. Ruby does not allow that. There are times while working in Ruby when you need to access the underlying bits and bytes. Ruby provides two methods pack
and unpack
for that.
Here is an example.
> 'A'.unpack('b*')=> ["10000010"]
In the above case 'A' is a string which is being stored and using unpack
I am trying to read the bit value. The ASCII table says that ASCII valule of 'A' is 65 and the binary representation of 65 is 10000010
.
Here is another example.
> 'A'.unpack('B*')=> ["01000001"]
Notice the difference in result from the first case. What's the difference between b*
and B*
. In order to understand the difference first lets discuss MSB and LSB.
Most significant bit vs Least significant bit
All bits are not created equal. C
has ascii value of 67. The binary value of 67 is 1000011
.
First let's discuss MSB (most significant bit) style . If you are following MSB style then going from left to right (and you always go from left to right) then the most significant bit will come first. Because the most significant bit comes first we can pad an additional zero to the left to make the number of bits eight. After adding an additional zero to the left the binary value looks like 01000011
.
If we want to convert this value in the LSB (Least Significant Bit) style then we need to store the least significant bit first going from left to right. Given below is how the bits will be moved if we are converting from MSB to LSB. Note that in the below case position 1 is being referred to the leftmost bit.
move value 1 from position 8 of MSB to position 1 of LSBmove value 1 from position 7 of MSB to position 2 of LSBmove value 0 from position 6 of MSB to position 3 of LSBand so on and so forth
After the exercise is over the value will look like 11000010
.
We did this exercise manually to understand the difference between most significant bit
and least significant bit
. However unpack method can directly give the result in both MSB and LSB. The unpack
method can take both b*
and B*
as the input. As per the ruby documentation here is the differnce.
B | bit string (MSB first)b | bit string (LSB first)
Now let's take a look at two examples.
> 'C'.unpack('b*')=> ["11000010"]> 'C'.unpack('B*')=> ["01000011"]
Both b*
and B*
are looking at the same underlying data. It's just that they represent the data differently.
Different ways of getting the same data
Let's say that I want binary value for string hello
. Based on the discussion in the last section that should be easy now.
> "hello".unpack('B*')=> ["0110100001100101011011000110110001101111"]
The same information can also be derived as
> "hello".unpack('C*').map {|e| e.to_s 2}=> ["1101000", "1100101", "1101100", "1101100", "1101111"]
Let's break down the previous statement in small steps.
> "hello".unpack('C*')=> [104, 101, 108, 108, 111]
Directive C*
gives the 8-bit unsigned integer
value of the character. Note that ascii value of h
is 104
, ascii value of e
is 101
and so on.
Using the technique discussed above I can find hex value of the string.
> "hello".unpack('C*').map {|e| e.to_s 16}=> ["68", "65", "6c", "6c", "6f"]
Hex value can also be achieved directly.
> "hello".unpack('H*')=> ["68656c6c6f"]
High nibble first vs Low nibble first
Notice the difference in the below two cases.
> "hello".unpack('H*')=> ["68656c6c6f"]> "hello".unpack('h*')=> ["8656c6c6f6"]
As per ruby documentation for unpack
H | hex string (high nibble first)h | hex string (low nibble first)
A byte consists of 8 bits. A nibble consists of 4 bits. So a byte has two nibbles. The ascii value of 'h' is 104
. Hex value of 104 is 68
. This 68
is stored in two nibbles. First nibble, meaning 4 bits, contain the value 6
and the second nibble contains the value 8
. In general we deal with high nibble first and going from left to right we pick the value 6
and then 8
.
However if you are dealing with low nibble first then low nibble value 8
will take the first slot and then 6
will come. Hence the result in "low nibble first" mode will be 86
.
This pattern is repeated for each byte. And because of that a hex value of 68 65 6c 6c 6f
looks like 86 56 c6 c6 f6
in low nibble first format.
Mix and match directives
In all the previous examples I used *
. And a *
means to keep going as long as it has to keep going. Lets see a few examples.
A single C
will get a single byte.
> "hello".unpack('C')=> [104]
You can add more Cs
if you like.
> "hello".unpack('CC')=> [104, 101]> "hello".unpack('CCC')=> [104, 101, 108]> "hello".unpack('CCCCC')=> [104, 101, 108, 108, 111]
Rather than repeating all those directives, I can put a number to denote how many times you want previous directive to be repeated.
> "hello".unpack('C5')=> [104, 101, 108, 108, 111]
I can use *
to capture al the remaining bytes.
> "hello".unpack('C*')=> [104, 101, 108, 108, 111]
Below is an example where MSB
and LSB
are being mixed.
> "aa".unpack('b8B8')=> ["10000110", "01100001"]
pack is reverse of unpack
Method pack
is used to read the stored data. Let's discuss a few examples.
> [1000001].pack('C')=> "A"
In the above case the binary value is being interpreted as 8 bit unsigned integer
and the result is 'A'.
> ['A'].pack('H')=> "\xA0"
In the above case the input 'A' is not ASCII 'A' but the hex 'A'. Why is it hex 'A'. It is hex 'A' because the directive 'H' is telling pack to treat input value as hex value. Since 'H' is high nibble first and since the input has only one nibble then that means the second nibble is zero. So the input changes from ['A']
to ['A0']
.
Since hex value A0
does not translate into anything in the ASCII table the final output is left as it and hence the result is \xA0
. The leading \x
indicates that the value is hex value.
Notice the in hex notation A
is same as a
. So in the above example I can replace A
with a
and the result should not change. Let's try that.
> ['a'].pack('H')=> "\xA0"
Let's discuss another example.
> ['a'].pack('h')=> "\n"
In the above example notice the change. I changed directive from H
to h
. Since h
means low nibble first and since the input has only one nibble the value of low nibble becomes zero and the input value is treated as high nibble value. That means value changes from ['a']
to['0a']
. And the output will be \x0A
. If you look at ASCII table then hex value A
is ASCII value 10 which is NL line feed, new line
. Hence we see \n
as the output because it represents "new line feed".
Usage of unpack in Rails source code
I did a quick grep in Rails source code and found following usage of unpack.
email_address_obfuscated.unpack('C*')'mailto:'.unpack('C*')email_address.unpack('C*')char.unpack('H2')column.class.string_to_binary(value).unpack("H*")data.unpack("m")s.unpack("U*")
- ruby byte转换bit
- bit与byte的转换
- bit,byte,word,long word转换关系
- Byte和Bit之间的转换
- Byte和Bit之间的转换
- java 中byte 与bit 互相转换
- M BYTE K BIT的转换关系
- byte bit
- byte bit
- G M K Byte bit(位)转换的扫盲贴
- C#中将byte数组转换为8bit灰度图像
- C#中将byte数组转换为8bit灰度图像
- 关于bit与Byte之间的互相转换
- java 中 bit byte 字母 汉子 中的转换大小
- Char, Byte, Bit
- byte和bit
- bit Byte VGA
- bit与Byte区别
- 内核对象(VC_Win32) 挺好的~
- AJAX---jQuery的相关使用(一)
- java小九九的输出
- UVA - 11198 Dancing Digits(bfs+hash+线性表)
- html 的 meta 标签 简介及设定方式
- ruby byte转换bit
- java伪静态
- ubuntu vim中文乱码问题
- 文件描述符, 指针, 句柄,对象句柄
- linux下C获取文件的大小
- 项目日志(3)关于开漏、推挽、上拉的几点说明
- mysql统计一张表中条目个数的方法
- android studio 添加github上的开源库
- 控制Log