霍夫曼编码 zoj 2339 Hyperhuffman

来源：互联网发布：社交网络的好处知乎编辑：程序博客网时间：2024/05/18 01:13

Hyperhuffman

Time Limit: 5 Seconds Memory Limit: 32768 KB

You might have heard about Huffman encoding - that is the coding system that minimizes the expected length of the text if the codes for characters are required to consist of an integral number of bits.

Let us recall codes assignment process in Huffman encoding. First the Huffman tree is constructed. Let the alphabet consist of N characters, i-th of which occurs P_i times in the input text. Initially all characters are considered to be active nodes of the future tree, i-th being marked with P_i. On each step take two active nodes with smallest marks, create the new node, mark it with the sum of the considered nodes and make them the children of the new node. Then remove the two nodes that now have parent from the set of active nodes and make the new node active. This process is repeated until only one active node exists, it is made the root of the tree.

Note that the characters of the alphabet are represented by the leaves of the tree. For each leaf node the length of its code in the Huffman encoding is the length of the path from the root to the node. The code itself can be constrcuted the following way: for each internal node consider two edges from it to its children. Assign 0 to one of them and 1 to another. The code of the character is then the sequence of 0s and 1s passed on the way from the root to the leaf node representing this character.

In this problem you are asked to detect the length of the text after it being encoded with Huffman method. Since the length of the code for the character depends only on the number of occurences of this character, the text itself is not given - only the number of occurences of each character. Characters are given from most rare to most frequent.

Note that the alphabet used for the text is quite huge - it may contain up to 500 000 characters.

This problem contains multiple test cases!

The first line of a multiple input is an integer N, then a blank line followed by N input blocks. Each input block is in the format indicated in the problem description. There is a blank line between input blocks.

The output format consists of N output blocks. There is a blank line between output blocks.

Input

The first line of the input file contains N - the number of different characters used in the text (2 <= N <= 500 000). The second line contains N integer numbers P_i - the number of occurences of each character (1 <= P_i <= 10⁹, P_i <= P_i+1 for all valid i).

Output

Output the length of the text after encoding it using Huffman method, in bits.

Sample Input

1

3
1 1 4

Sample Output

Author: Andrew Stankevich
Source: Andrew Stankevich's Contest #2

Submit Status
本题是第五次省赛集训组队赛的一道题；看到这道题，我们想出的并不是挑战上的方法，路过就加；而是计算路径长度然后乘以频率；显然是挑战上的方法更简单一些；我们自己的代码没有过是因为一个细节：该定义成longlong的变量定义成了int；反思：1.霍夫曼编码：在学习挑战书上的优先队列时已经做过了实质相同的题目， poj3253；这是赛后学姐对我说的，当时自己就没有想到；在比赛中还是没有积极主动的想解法；在比赛中思考一定要积极主动。2.我们的前两道题目比较先A出来，可能是太浪了，大三学长领的榜没有跟，大三学长一直在A这道题目，最后过了，我们才回头看，而且其他题意明了的我们也没有能力A出来，所以因为没有跟榜，在其他的题目上浪费了太多的时间，如果跟榜的话，相信我们这道题目也能A出来；3.这道题的题意我感觉确实不好懂，当时读了两遍题目，没有读懂，又看到大三学长在A，但是没A出来，所以就觉得很难；在别人接二连三的A出来了，才慌了，又来静下心来和队友研究题意，花费了好长时间才确定题意！跟榜，而且认真读题，如果是别人领榜的题目，一遍读不出来，就一直多读几遍；4.关于读题问题，还有一道题，是上场比赛的，也是因为没能读懂题目。别人的博客链接，很详细：链接，其实题不难，关键是题意，所以在之后的比赛中，一定要沉下心来读题；

题意：本题就是霍夫曼编码，
思路：在小白书的155页有一个编码问题：给出n个字符的频率ci，給每个字符赋予一个01编码串，使得人一个字符的编码不是另一个字符编码的前缀，而且编码后的总长度（每个字符的频率与编码长度乘积的总和）尽量小；

霍夫曼算法：把每个字符看作一个单节点字数凡在一个树集合中，每棵子树的权值等于相应字符的频率。每次取权值最小的两棵树合并程一棵新树，并重新放到集合中。新树的权值等于两棵子树权值之和。
贪心，每次拿频率最小的两个字符合并，然后合并后的加到集合中。使用优先队列可以高效的实现。一共需要进行O(N)次O(logN)的操作，因此总的时间复杂度为O(N log N)；

代码：

#include<cstdio>#include<queue>#include<cstring>#include<algorithm>using namespace std;typedef long long ll;const int maxn=500005;int n;ll L[maxn];void solve(){    ll ans=0;    priority_queue<ll,vector<ll>,greater<ll> >que;    for(int i=1;i<=n;i++)        que.push(L[i]);    while(que.size()>1)//循环到只剩一块木板；    {        ll l1,l2;        l1=que.top(); que.pop();        l2=que.top(); que.pop();        ans+=l1+l2;        que.push(l1+l2);    }    printf("%lld\n",ans);}int main(){    int t;    scanf("%d",&t);    while(t--)    {        scanf("%d",&n);        for(int i=1;i<=n;i++)        {            scanf("%lld",&L[i]);        }        solve();        if(t>0)            printf("\n");    }}

0 0