位图查找重复排序存数据

来源：互联网发布：西交大网络圈存编辑：程序博客网时间：2024/06/01 09:30

参见百度百科

一、给40亿个不重复的unsigned int的整数，没排过序的，然后再给一个数，如何快速判断这个数是否在那40亿个数当中

申请512M的内存

一个bit位代表一个unsigned int值

读入40亿个数，设置相应的bit位

读入要查询的数，查看相应bit位是否为1，为1表示存在，为0表示不存在

二、使用位图法判断整形数组是否存在重复

判断集合中存在重复是常见编程任务之一，当集合中数据量比较大时我们通常希望少进行几次扫描，这时双重循环法就不可取了。

位图法比较适合于这种情况，它的做法是按照集合中最大元素max创建一个长度为max+1的新数组，然后再次扫描原数组，遇到几就给新数组的第几位置上1，如遇到 5就给新数组的第六个元素置1，这样下次再遇到5想置位时发现新数组的第六个元素已经是1了，这说明这次的数据肯定和以前的数据存在着重复。这种给新数组初始化时置零其后置一的做法类似于位图的处理方法故称位图法。它的运算次数最坏的情况为2N。如果已知数组的最大值即能事先给新数组定长的话效率还能提高一倍。

示例代码如下：

package com.sitinspring;

public class DuplicatedArrayTest {

public static void main(String[] args) {

int [][] arr = {

{ 1 , 2 , 3 , 5 , 3 , 5 , 56, 534 , 3 , 32 } ,

{ 1 , 2 , 3 , 5 } ,

{ 1 , 2 , 3 , 5 , 3 , 5 } ,

{ 0 , 0 , 1 , 2 , 3 , 5 , 56, 534 , 78 , 32 } ,

} ;

for ( int i = 0 ;i <arr.length;i ++ ) {

System.out.print( " 数组: " );

for ( int temp:arr[i]) {

System.out.print(temp + ", " );

}

System.out.print( " 中 " );

System.out.print(hasDuplicatedItem(arr[i]) ? " 存在 " : " 不存在 " );

System.out.print( " 重复元素.\n ");

}

/**

* 判断整形数组中是否有重复数据，时间复杂度为O（n）

* @param arr

* @return

public static boolean hasDuplicatedItem( int [] arr) {

// 扫描数组找最大值

int max = arr[ 0 ];

for ( int i = 1 ;i <arr.length;i ++ ) {

if (arr[i] > max) {

max = arr[i];

}

// 按最大值创建一个新数组

int [] bitArray = new int [max +1 ];

// 按值向新数组中添值，如value为3则bitArray[3]=1

for ( int value:arr) {

if (bitArray[value] != 0 ) {

// 如果value指向的位置已经不是零，说明之前已经给这一块置1了，立即返回true表示数组有重复

return true ;

}

else {

// value指向的位置是零,则置为1表示这一位已经有数存在了

bitArray[value] = 1 ;

}

return false ;

}

输出：

数组: 1 , 2 , 3 , 5 , 3 , 5 , 56 , 534 , 3 , 32 ,中存在重复元素.

数组: 1 , 2 , 3 , 5 ,中不存在重复元素.

数组: 1 , 2 , 3 , 5 , 3 , 5 ,中存在重复元素.

数组: 0 , 0 , 1 , 2 , 3 , 5 , 56 , 534 , 78 , 32 ,中存在重复元素.

三、使用位图法进行整形数组排序

package com.heyang;

public class BitmapSorter {

public static void main(String[] args) {

int [] arr = { 1 , 7 , - 3 , 0 , 0 , 6 , 6 , 9 , - 11 } ;

bitmapSort(arr);

for ( int i:arr) {

System.out.print(i + " , " );

}

/**

* 使用位图法进行排序

* @param arr

public static void bitmapSort( int [] arr) {

// 找出数组中最值

int max = arr[ 0 ];

int min = max;

for ( int i:arr) {

if (max < i) {

max = i;

}

if (min > i) {

min = i;

}

// 得到位图数组

int [] newArr = new int [max -min + 1 ];

for ( int i:arr) {

int index = i - min;

newArr[index] ++ ;

}

// 重整arr中的元素

int index = 0 ;

for ( int i = 0 ;i <newArr.length;i ++ ) {

while (newArr[i] > 0 ) {

arr[index] = i + min;

index ++ ;

newArr[i] -- ;

}

四、位图法存数据

在 8K 字节的内存空间内，如何存 unsigned short 类型数据？

一般做法：

定义一个数组： unsigned shortarrNormal[4096];

这样做，最多只能存 4K 个 unsigned short 数据。

利用位图法：

定义一个数组： unsigned chararrBit[8192];

这样做，能存 8K*8=64K 个 unsigned short 数据。

rrBit 存放的字节位置和位位置（字节 0~8191 ，位 0~7 ）

比如写 1234 ，字节序： 1234/8 = 154; 位序： 1234 &0b111 = 2 ，那么 1234 放在 arrBit 的下标 154 字节处，把该字节的 2 号位（ 0~7）置为 1

字节位置： int nBytePos =1234/8 = 154;

位位置： int nBitPos = 1234 & 7 = 2;

// 把数组的 154 字节的 2 位置为 1

unsigned short val = 1<<nBitPos;

arrBit[nBytePos] = arrBit[nBytePos] |val; // 写入 1234 得到arrBit[154]=0b00000100

此时再写入 1236 ，

字节位置： int nBytePos =1236/8 = 154;

位位置： int nBitPos = 1236 & 7 = 4

.// / 把数组的 154 字节的 4 位置为 1

val = 1<<nBitPos;

arrBit[nBytePos] = arrBit[nBytePos] |val; // 再写入 1236 得到arrBit[154]=0b00010100

读数据元素：按位读取 arrBit ，取得位为 1 的字节位置和位位置。元素值为 8*nBytePos + nBitPos

for (i=0; i<8192; i++)

{

for (j=0; j<8; j++)

{

if (arrBit[i] & (1<<j))

{

cout <<"arrBit:" << i << " " << j <<" " << 8*i+j <<endl;

}

会输出：

arrBit:154 2 1234

arrBit:154 4 1236

删除元素：计算待删除元素的字节位置和位位置：arrBit[nBytePos] &= ~(1<< nBitPos);

比如删除 1234 ： arrBit[154] &= ~(1<<2);

阅读全文

0 0

位图 查找重复 排序 存数据

位图查找重复排序存数据