Haffman Encoding and Decoding

来源:互联网 发布:毒理数据 编辑:程序博客网 时间:2024/05/21 22:58

Read this blog and it seems this is a quite decent interview question. Haffman encoding

Haffman encoding is  a classic way to encode characters. It encodes characters according to building an optimistic binary tree. The binary tree is built according to the frequency. The high frequency characters need to be encoded using short strings. Low frequency characters instead can be encoded to be longer strings.

Suppose we have a string "aaaaaabbbbccddd", [a] = 6, [b] = 4, [c] = 2, [d] = 3. Every time, take the lowest two nodes to built a tree and make the frequency the sum of the two nodes.  the the new node will added into the pool again.

Tree Construction:

1: The current nodes: 6, 4, 2, 3, the smaller two are: 2 and 3, the new root now is 5.

2: The current nodes: 6, 5, 4, the smaller two are 4, 5, the new root is 9.

3: The current nodes: 6, 9, the smaller two are 6, 9, the new root is 15

4: the current nodes : 15, there is only one node left. The tree is complete.

    15

    /      \

        A:6     9

          /     \

  B:4      5

    /     \

      C:2   D:3

To get the haffman encoding, every character's encoding equals to traverse the tree to leave nodes.

To decode the haffman sequence. everytime meets the 0, goes to left, meets the 1, goes to right. After decode one char, goes to the root again.

#include "header.h"using namespace std;// suppose we have an array aaaaaabbbbccddd// first step is to calculate frequencey information and build haffman tree.struct TreeNode {  char ch;  double freq;  TreeNode* lchild, *rchild;  TreeNode(char c = 0, double f = 0, TreeNode* l = NULL, TreeNode* r = NULL) : ch(c), freq(f), lchild(l), rchild(r) {}};struct cmp{  bool operator() (TreeNode*& a, TreeNode*& b) {    return a->freq > b->freq;  }};TreeNode* createTree(string str) {  unordered_map<char, int> charSet;  for(int i = 0; i < str.size(); ++i) {    charSet[str[i]]++;  }  priority_queue<TreeNode*, vector<TreeNode*>, cmp> que;  for(unordered_map<char, int>::iterator p = charSet.begin(); p  != charSet.end(); ++p) {    que.push(new TreeNode(p->first, (double) p->second / str.size()));  }  while(que.size() > 1) {    TreeNode* l = que.top(); que.pop();    TreeNode* r = que.top(); que.pop();    TreeNode* newNode = new TreeNode(0, l->freq + r->freq, l, r);    que.push(newNode);  }  return que.top();}// in this function, we can get each character's encode string.void encodeString(TreeNode* root, string code) {  if(!root->lchild && !root->rchild) {    cout << root->ch << ": " << code << endl;    return;  }  if(root->lchild) encodeString(root->lchild, code + '0');  if(root->rchild) encodeString(root->rchild, code + '1');}string decodeString(TreeNode* root, string str) {  string ret = "";  TreeNode* p = root;  for(int i = 0; i < str.size(); ++i) {    p = ((str[i] == '0') ? p->lchild : p->rchild);    if(p->lchild == NULL && p->rchild == NULL) {      ret += p->ch;      p = root;    }  }  return ret;}int main(void) {  TreeNode* root = createTree("aaaaaabbbbccddd");  string code = "";  encodeString(root, code);  string str = decodeString(root, "010110111");  cout << str << endl;}

0 0
原创粉丝点击