Intersection of Two Arrays II（两数组的交叉元素II）

来源：互联网发布：hadoop 结构化数据编辑：程序博客网时间：2024/05/22 03:30

Given two arrays, write a function to compute their intersection.（给定两个数组，找出它们的交叉元素）
Example:
Given nums1 = [1, 2, 2, 1], nums2 = [2, 2], return [2, 2].
Note:

Each element in the result should appear as many times as it shows in both arrays.（返回的结果中应该包含每个交叉元素的所有副本）
The result can be in any order.（结果序列可以是任意的）

Follow up:

What if the given array is already sorted? How would you optimize your algorithm?（如果数组是已排序的，又该如何优化算法？）
What if nums1’s size is small compared to nums2’s size? Which algorithm is better?（如果nums1的size小于nums2的size，哪个算法更好？）
What if elements of nums2 are stored on disk, and the memory is limited such that you cannot load all elements into the memory at once?（如果nums2的元素在磁盘中已排序，并且内存有限无法一次性载入所有元素，又该如何处理？）

1.个人分析
与之前找出交叉元素的题类似，只不过这里要求返回结果必须包含所有重复的交叉元素，这样的话就可以减去去重的操作。
思路：

直接使用两个循环来找出相同元素，找到就保存到结果数组中。
先对其中一个数组nums2进行排序，然后利用二分查找来确定交叉元素，并将其保存到结果数组中。
建立一个哈希表，然后将数组nums2的元素映射到哈希表中，最后在哈希表中查找交叉元素。

2.个人解法
（1）

vector<int> intersect(vector<int>& nums1, vector<int>& nums2){    vector<int>ret;    sort(nums2.begin(), nums2.end());    for (int i=0; i<nums1.size(); ++i)    {        if(binary_search(nums2.begin(), nums2.end(), nums1[i]))            ret.push_back(nums1[i]);    }    return ret;}

结果显示以上代码无法通过所有的测试用例，当两个数组分别为[1,2,2,1]，[2]时，结果返回[2,2]，而目标结果是[2]，这里需要对算法进行改进避免该情况的发生。

（2）

vector<int>intersect3(vector<int>& nums1, vector<int>& nums2){    vector<int>ret;    if(0 == nums1.size() || 0 == nums2.size())         return ret;    sort(nums1.begin(), nums1.end());    sort(nums2.begin(), nums2.end());    for (int i=0, j=0; i<nums1.size() && j<nums2.size(); )    {        if(nums1[i] > nums2[j])            ++j;        else if(nums1[i] < nums2[j])            ++i;        else        {            ret.push_back(nums1[i]);            ++i;             ++j;        }    }    return ret;}

该方法是先对两个数组进行排序，然后利用双指针进行查找交叉元素。

3.参考解法
（1）

vector<int>intersect2(vector<int>& nums1, vector<int>& nums2){    sort(nums1.begin(),nums1.end());    sort(nums2.begin(),nums2.end());    vector<int> result;                                                             set_intersection(nums1.begin(),nums1.end(),nums2.begin(),nums2.end(),back_inserter(result));    return result;}

这种解法使用了STL中的set_intersection，使得代码不仅高效而且非常简洁。

（2）

 vector<int> intersect(vector<int>& nums1, vector<int>& nums2) {    unordered_map<int, int> dict;    vector<int> res;    for(int i = 0; i < nums1.size(); i++)         ++dict[nums1[i]];    for(int i = 0; i < nums2.size(); i++)        if(dict.find(nums2[i]) != dict.end() && --dict[nums2[i]] >= 0)             res.push_back(nums2[i]);    return res; }

该方法是利用哈希表来判断数组中的交叉元素，这里关键是将每个元素值作为key，dict作为每个元素的计数器。另外需要注意的是，unordered_map是C++11标准中才开始支持的。

4.总结
可能是自己刚开始一心想利用二分查找方法解决前面出现的问题，最后还是没有想出基于二分查找的解决方法。后面查看他人的解法，才意识到要用双指针对两个数组同时遍历，而且前提是将两个数组排好序。第二种方法是使用哈希表，之前一直苦恼无法使用哈希表来解决类似的问题终于在这里解决了，原来只要直接把数组元素值作为key即可，没必要自己去算key值。

下面是LeetCode上针对题目中附加问题给出的参考解决思路：

If only nums2 cannot fit in memory, put all elements of nums1 into a HashMap, read chunks of array that fit into the memory, and record the intersections.（如果数组nums2无法载入内存，那就将数组nums1所有元素放入哈希表中，将部分数组元素读入内存中，并记录相应的交叉元素）
If both nums1 and nums2 are so huge that neither fit into the memory, sort them individually (external sort), then read 2 elements from each array at a time in memory, record intersections.（如果数组nums1和nums2的规模都大到无法一次性读入内存中，那就先单独对它们进行外部排序，然后每次分别读取两数组的一个元素到内存，判断并记录交叉元素）

PS：

题目的中文翻译是本人所作，如有偏差敬请指正。
“个人分析”和“个人解法”均是本人最初的想法和做法，不一定是对的，只是作为一个对照和记录。

0 0