hdu 1053 Entropy
来源:互联网 发布:mac拼音声调怎么打 编辑:程序博客网 时间:2024/05/09 23:02
Entropy
Time Limit : 2000/1000ms(Java/Other) MemoryLimit : 65536/32768K (Java/Other)
Total Submission(s) :3 AcceptedSubmission(s) : 2
Problem Description
An entropy encoder is a data encoding method that achieves losslessdata compression by encoding a message with “wasted” or “extra”information removed. In other words, entropy encoding removesinformation that was not necessary in the first place to accuratelyencode the message. A high degree of entropy implies a message witha great deal of wasted information; english text encoded in ASCIIis an example of a message type that has very high entropy. Alreadycompressed messages, such as JPEG graphics or ZIP archives, havevery little entropy and do not benefit from further attempts atentropy encoding.
English text encoded in ASCII has a high degree of entropy becauseall characters are encoded using the same number of bits, eight. Itis a known fact that the letters E, L, N, R, S and T occur at aconsiderably higher frequency than do most other letters in englishtext. If a way could be found to encode just these letters withfour bits, then the new encoding would be smaller, would containall the original information, and would have less entropy. ASCIIuses a fixed number of bits for a reason, however: it’s easy, sinceone is always dealing with a fixed number of bits to represent eachpossible glyph or character. How would an encoding scheme that usedfour bits for the above letters be able to distinguish between thefour-bit codes and eight-bit codes? This seemingly difficultproblem is solved using what is known as a “prefix-freevariable-length” encoding.
In such an encoding, any number of bits can be used to representany glyph, and glyphs not present in the message are simply notencoded. However, in order to be able to recover the information,no bit pattern that encodes a glyph is allowed to be the prefix ofany other encoding bit pattern. This allows the encoded bitstreamto be read bit by bit, and whenever a set of bits is encounteredthat represents a glyph, that glyph can be decoded. If theprefix-free constraint was not enforced, then such a decoding wouldbe impossible.
Consider the text “AAAAABCD”. Using ASCII, encoding this wouldrequire 64 bits. If, instead, we encode “A” with the bit pattern“00”, “B” with “01”, “C” with “10”, and “D” with “11” then we canencode this text in only 16 bits; the resulting bit pattern wouldbe “0000000000011011”. This is still a fixed-length encoding,however; we’re using two bits per glyph instead of eight. Since theglyph “A” occurs with greater frequency, could we do better byencoding it with fewer bits? In fact we can, but in order tomaintain a prefix-free encoding, some of the other bit patternswill become longer than two bits. An optimal encoding is to encode“A” with “0”, “B” with “10”, “C” with “110”, and “D” with “111”.(This is clearly not the only optimal encoding, as it is obviousthat the encodings for B, C and D could be interchanged freely forany given encoding without increasing the size of the final encodedmessage.) Using this encoding, the message encodes in only 13 bitsto “0000010110111”, a compression ratio of 4.9 to 1 (that is, eachbit in the final encoded message represents as much information asdid 4.9 bits in the original encoding). Read through this bitpattern from left to right and you’ll see that the prefix-freeencoding makes it simple to decode this into the original text eventhough the codes have varying bit lengths.
As a second example, consider the text “THE CAT IN THE HAT”. Inthis text, the letter “T” and the space character both occur withthe highest frequency, so they will clearly have the shortestencoding bit patterns in an optimal encoding. The letters “C”, “I’and “N” only occur once, however, so they will have the longestcodes.
There are many possible sets of prefix-free variable-length bitpatterns that would yield the optimal encoding, that is, that wouldallow the text to be encoded in the fewest number of bits. One suchoptimal encoding is to encode spaces with “00”, “A” with “100”, “C”with “1110”, “E” with “1111”, “H” with “110”, “I” with “1010”, “N”with “1011” and “T” with “01”. The optimal encoding thereforerequires only 51 bits compared to the 144 that would be necessaryto encode the message with 8-bit ASCII encoding, a compressionratio of 2.8 to 1.
English text encoded in ASCII has a high degree of entropy becauseall characters are encoded using the same number of bits, eight. Itis a known fact that the letters E, L, N, R, S and T occur at aconsiderably higher frequency than do most other letters in englishtext. If a way could be found to encode just these letters withfour bits, then the new encoding would be smaller, would containall the original information, and would have less entropy. ASCIIuses a fixed number of bits for a reason, however: it’s easy, sinceone is always dealing with a fixed number of bits to represent eachpossible glyph or character. How would an encoding scheme that usedfour bits for the above letters be able to distinguish between thefour-bit codes and eight-bit codes? This seemingly difficultproblem is solved using what is known as a “prefix-freevariable-length” encoding.
In such an encoding, any number of bits can be used to representany glyph, and glyphs not present in the message are simply notencoded. However, in order to be able to recover the information,no bit pattern that encodes a glyph is allowed to be the prefix ofany other encoding bit pattern. This allows the encoded bitstreamto be read bit by bit, and whenever a set of bits is encounteredthat represents a glyph, that glyph can be decoded. If theprefix-free constraint was not enforced, then such a decoding wouldbe impossible.
Consider the text “AAAAABCD”. Using ASCII, encoding this wouldrequire 64 bits. If, instead, we encode “A” with the bit pattern“00”, “B” with “01”, “C” with “10”, and “D” with “11” then we canencode this text in only 16 bits; the resulting bit pattern wouldbe “0000000000011011”. This is still a fixed-length encoding,however; we’re using two bits per glyph instead of eight. Since theglyph “A” occurs with greater frequency, could we do better byencoding it with fewer bits? In fact we can, but in order tomaintain a prefix-free encoding, some of the other bit patternswill become longer than two bits. An optimal encoding is to encode“A” with “0”, “B” with “10”, “C” with “110”, and “D” with “111”.(This is clearly not the only optimal encoding, as it is obviousthat the encodings for B, C and D could be interchanged freely forany given encoding without increasing the size of the final encodedmessage.) Using this encoding, the message encodes in only 13 bitsto “0000010110111”, a compression ratio of 4.9 to 1 (that is, eachbit in the final encoded message represents as much information asdid 4.9 bits in the original encoding). Read through this bitpattern from left to right and you’ll see that the prefix-freeencoding makes it simple to decode this into the original text eventhough the codes have varying bit lengths.
As a second example, consider the text “THE CAT IN THE HAT”. Inthis text, the letter “T” and the space character both occur withthe highest frequency, so they will clearly have the shortestencoding bit patterns in an optimal encoding. The letters “C”, “I’and “N” only occur once, however, so they will have the longestcodes.
There are many possible sets of prefix-free variable-length bitpatterns that would yield the optimal encoding, that is, that wouldallow the text to be encoded in the fewest number of bits. One suchoptimal encoding is to encode spaces with “00”, “A” with “100”, “C”with “1110”, “E” with “1111”, “H” with “110”, “I” with “1010”, “N”with “1011” and “T” with “01”. The optimal encoding thereforerequires only 51 bits compared to the 144 that would be necessaryto encode the message with 8-bit ASCII encoding, a compressionratio of 2.8 to 1.
Input
The input file will contain a list of text strings, one per line.The text strings will consist only of uppercase alphanumericcharacters and underscores (which are used in place of spaces). Theend of the input will be signalled by a line containing only theword “END” as the text string. This line should not beprocessed.
Output
For each text string in the input, output the length in bits of the8-bit ASCII encoding, the length in bits of an optimal prefix-freevariable-length encoding, and the compression ratio accurate to onedecimal point.
Sample Input
AAAAABCDTHE_CAT_IN_THE_HATEND
Sample Output
64 13 4.9
144 51 2.8
144 51 2.8
Source
Greater New York 2000
==================================================================================================
统计每一个字符出现的次数,然后根据出现次数进行建树,然后搜索树,叶子结点在第几层,就说明该结点的编码位数为几位。
假定给出n个结点ki(i=1‥n),其权值分别为wi(i=1‥n)。要构造以此n个结点为叶结点的最优二叉树,其构造方法如下:
重复⑴、⑵,直到在F中只含有一棵二叉树为止。这棵二叉树便是最优二叉树。
#include#include#include#include#includeusing namespace std;struct tree{char ch;int count;int deep;tree *left,*right;tree(){left = right = NULL,deep = count = 0,ch = '?';}friend bool operator<(tree a,tree b){return a.count>b.count;}};struct kind{char ch;int count;}letter[201];int length;int sum;priority_queuePriorQueue;void Huffman(){sum = 0;int i;tree *a,*b,node,*c,root;queueq;for (i=0;i{node.count = letter[i].count;node.ch = letter[i].ch;PriorQueue.push(node);}while (PriorQueue.size()!=1){a = new tree;*a = PriorQueue.top(),PriorQueue.pop();b = new tree;*b = PriorQueue.top(),PriorQueue.pop();c = new tree;c->count = a->count+b->count;c->left = a,c->right = b;PriorQueue.push(*c);}root = PriorQueue.top(),PriorQueue.pop(),root.deep = 0;q.push(root);while (!q.empty()){node = q.front(),q.pop();if (node.left){node.left->deep = node.deep+1;q.push(*node.left);}if (node.right){node.right->deep = node.deep+1;q.push(*node.right);}if(!node.left&&!node.right)sum+=node.deep*node.count;}}int main(){char str[1005],i,len,count;while (scanf("%s",str)&&strcmp(str,"END")!=0){len = strlen(str);str[len] = '!';sort(str,str+len);for (length = 0,count=1,i=1;i<=len;i++){if (str[i]!=str[i-1]){letter[length].ch = str[i-1];letter[length++].count = count;count = 1;}elsecount++;}if(length==1)printf("%d %d 8.0\n",8*len,len);else{Huffman();printf("%d %d %.1lf\n",len*8,sum,len*8*1.0/sum);}}return 0;}
- Hdu 1053 Entropy
- HDU 1053 Entropy
- hdu 1053 Entropy
- hdu 1053 Entropy
- hdu Entropy 1053 哈夫曼树
- hdu 1053 Entropy
- hdu 1053 Entropy
- hdu/hdoj 1053 Entropy
- hdu 1053Entropy
- HDU - 1053 Entropy
- hdu 1053 Entropy(霍夫曼树)
- HDU 1053 Entropy
- HDU 1053 Entropy
- hdu 1053 Entropy
- hdu 1053 Entropy------霍夫曼树
- HDU 1053 Entropy 哈夫曼树
- HDU 1053:Entropy
- hdu 1053 Entropy
- tyvj 1016 装箱问题
- tyvj 1046 Blast
- 树状数组
- Android 里的版本号
- Stars
- hdu 1053 Entropy
- hdu 3231 Box Relations
- hdu 3486 Interviewe
- hdu 2824 The Eule Function 筛式欧拉函数
- hdu 1787 GCDAgain Euler函数
- 解决navicat for oracle 的不支持数据库编码问题
- hdu 1573 X问题 中国剩余定理
- 昂贵的聘礼 dijkstra算法
- Currency Exchange 反向Bellman-Ford