今日头条2017春招研发岗笔试题——Paragraph

来源：互联网发布：中国微观数据库编辑：程序博客网时间：2024/05/13 14:24

（注：题解方法来自于“今日头条校园”微信公众号）

题意：给定一个英文段落（包含n个句子）和m次查询，每次给定一个句子，求段落中相同单词数量最多的句子。各个英文句子不包含标点，大小写不敏感。

题解：一种简单的做法是对原文中的每个英文句子，都预处理包含的单词集合。对于每次查询，枚举句子中的单词到各个set查找是否存在，随后统计出现的次数取max即可。

Java代码实现：

import java.util.ArrayList;import java.util.HashSet;import java.util.Iterator;import java.util.List;public class Solution {// 方法一：对于每个句子，使用一个set保存它的单词集合public List<Integer> getResult(String[] paragraph, String[] query){int paraLength = paragraph.length;List<HashSet<String>> paraSetList = new ArrayList<HashSet<String>>();// 将每一个句子的单词集合，放到一个hashset中；然后使用一个list保存各个hashsetfor(int i=0; i<paraLength; i++){String[] wordSet = paragraph[i].split(" ");HashSet<String> set = new HashSet<String>();for(int j=0; j<wordSet.length; j++){set.add(wordSet[j]);}paraSetList.add(set);}int maxIndex = 0;List<Integer> result = new ArrayList<Integer>();// 对于每一个查询字符串for(int i=0; i<query.length; i++){String currentStr = query[i].toLowerCase();String[] words = currentStr.split(" ");// query中的单词也需要去重HashSet<String> setTemp = new HashSet<String>();for(int k=0; k<words.length; k++){setTemp.add(words[k]);}int maxCount = 0;// 遍历每一个句子的单词集合set，找到单词匹配数最多的句子的下标for(int j=0; j<paraSetList.size(); j++){int count= 0;Iterator<String> iteator = setTemp.iterator();HashSet<String> set = paraSetList.get(j);while(iteator.hasNext()){if(set.contains(iteator.next())){count ++;}}if(count > maxCount){maxCount = count;maxIndex = j;}}result.add(maxIndex);}return result;}public static void main(String[] args) {String[] paragraph = {"A bad beginning makes a bad ending","A fool may ask more questions in an hour than a wise man can answer in seven years","A friend exaggerates a man virtue an enemy his crimes","A good head and an industrious hand are worth gold in any land","Always taking out of the meal and never putting in soon comes to the bottom"};String[] query = {"man of gold makes worth land seldom falls ending madness industrious","An enemy idle youth exaggerates his friend a needy age","bottom A poor man who taking a comes rich wife has never a ruler not a wife"};Solution solution = new Solution();List<Integer> result = solution.getResult(paragraph, query);for(int i=0; i<result.size(); i++){int index = result.get(i);System.out.println(paragraph[index]);}}}

题解：（方法二）一种更快的做法是对原文中出现的所有单词，通过一个hash map维护它们分别出现在哪些原文句子中。在每次查询中，枚举句子中的单词，给它在原文中出现过的句子进行计数，最后在所有计数中取max即可。

import java.util.ArrayList;import java.util.HashMap;import java.util.HashSet;import java.util.Iterator;import java.util.List;import java.util.Map;public class Solution {// 方法一：对于原文中的每个单词，使用hash map保存它们分别出现在哪些原文句子中public List<Integer> getResult(String[] paragraph, String[] query) {List<Integer> result = new ArrayList<Integer>();HashMap<String, HashSet<Integer>> map = new HashMap<String, HashSet<Integer>>();for (int i = 0; i < paragraph.length; i++) {String current = paragraph[i].toLowerCase();String[] wordsList = current.split(" ");for (int j = 0; j < wordsList.length; j++) {if (map.containsKey(wordsList[j])) {map.get(wordsList[j]).add(i);} else {HashSet<Integer> set = new HashSet<Integer>();set.add(i);map.put(wordsList[j], set);}}}// 对于query查询字符串的每一个单词for (int i = 0; i < query.length; i++) {String current = query[i].toLowerCase();HashMap<Integer, Integer> queryMap = new HashMap<Integer, Integer>();String[] words = current.split(" ");// 对query去重HashSet<String> querySet = new HashSet<String>();for (int j = 0; j < words.length; j++) {querySet.add(words[j]);}Iterator<String> iteator = querySet.iterator();while (iteator.hasNext()) {// 包含这个词的文章下标集合HashSet<Integer> paraSet = map.get(iteator.next());if(paraSet != null){Iterator<Integer> setIteator = paraSet.iterator();while(setIteator.hasNext()){Integer currentIndex = setIteator.next();if(queryMap.containsKey(currentIndex)){int count = queryMap.get(currentIndex);queryMap.put(currentIndex, count + 1);} else{queryMap.put(currentIndex, 1);}}}}int max = 0;int index = 0;// 遍历queryMap，从中找到出现次数最多的下标for(Map.Entry<Integer, Integer> entry: queryMap.entrySet()){if(entry.getValue() > max){max = entry.getValue();index = entry.getKey();} }result.add(index);}return result;}public static void main(String[] args) {String[] paragraph = { "A bad beginning makes a bad ending","A fool may ask more questions in an hour than a wise man can answer in seven years","A friend exaggerates a man virtue an enemy his crimes","A good head and an industrious hand are worth gold in any land","Always taking out of the meal and never putting in soon comes to the bottom" };String[] query = { "man of gold makes worth land seldom falls ending madness industrious","An enemy idle youth exaggerates his friend a needy age","bottom A poor man who taking a comes rich wife has never a ruler not a wife" };Solution solution = new Solution();List<Integer> result = solution.getResult(paragraph, query);for (int i = 0; i < result.size(); i++) {int index = result.get(i);System.out.println(paragraph[index]);}}}

(方法二的代码不知道写的对不对....对于我的代码实现，感觉方法二的时间复杂度并不比方法一低；当然上述代码还可以优化，比如在统计query中每个词在哪些文章中出现时，可以使用数组下标作为文章编号，值对应词在文章中出现的次数）

0 0