SSD5 option 4

来源:互联网 发布:office for mac 破解 编辑:程序博客网 时间:2024/05/17 22:21


Spellchecking

Prerequisites, Goals, and Outcomes

Prerequisites: Students should have mastered the following prerequisite skills.

  • Hash Tables - Understanding of the concept of a recursive function

  • Strings - Basic string handling skills

  • Inheritance - Enhancing an existing data structure throughspecialization

    Goals: Thisassignment is designed to reinforce the student's understanding of the use ofhash tables as searchable containers.

    Outcomes: Students successfully completing this assignment would master thefollowing outcomes.

  • Understand how touse hash tables, specifically hash sets

    Background

    Any word processing application will typically contain a spell checkfeature. Not only does this feature(特征) point out potentially(潜在的) misspelledwords; it also suggests possible corrections.

    Description

    The program to be completed for this assessment is a spell checker. Below is a screen shot of the program in execution(执行).  The program begins by opening a word listtext file, specified(指定) by a command line parameter(参数).  The program outputs an error message and terminates(终止) if it cannot open the specified word list text file.  A sampleword list text file (wordlist.txt) is given in the suppliedwordlist.ziparchive.  After successfully opening the specified word list text file,the program then stores each word into a hash table.

    The spell check program in action

    The program then opens a file to spell check. This user specifies thisfile through the command line. After opening this file, the program thencompares each word in the file against the words stored in the hash table. Theprogram considers a word to be misspelled if the word does not exist in thehash table. When this occurs, the program displays the line number the wordappeared in, the word, and a list of possible corrections.

    The list of possible corrections for a misspelled word is generated usinga simple algorithm. Any variation(变化) of a misspelled word that is itself a word (i.e. it is found in theword list file) is a possible correction. Your solution to this assessmentshould consider the following variations of a misspelled word.

    · Transposing of adjacent letters

    For the misspelledword "acr", transposing adjacent letters yields the possiblecorrections of "car" and "arc".

    · Removal of each letter

    For example,removing each letter from the misspelled word "boaot" yields only thepossible correction of "boat". Removing letters other than the second"o" does not generate a correctly spelled word.

    · Replacement of each letter

    For each characterin a misspelled word, the program should check if the replacement with anyletter generates a correctly spelled word. For the misspelled word"acr", replacing the "c" with an "i" yields"air", replacing the "r" with an "e" yields"ace", and so on.

    · Inserting any letter at anyposition in a word

    The program shouldconsider if inserting any letter at any position in a misspelled word generatesa correctly spelled word. For the misspelled word "acr", inserting an"e" after the "r" yields "acre".

    Files

    Following is a list of files needed to complete this assessment.

  • handout-files.zip contains all of the following necessary files:
    • main.cpp - This file contains the main routine.
    • hashset.h - This declares a hash set class.
    • hashset.cpp - This defines a hash set class.
    • dictionary.h - This file contains the partial definition of classDictionary. ClassDictionary inherits from classHashSet.
    • wordlist.zip - This file is an archive that contains a word list text file.
    • test.txt - This is a sample text file that contains spelling errors.

Tasks

To complete this assessment, you need to complete the implementation ofclassDictionary and complete the spell checking program contained inmain.cpp.

To begin, verify the files needed for this assessment.

  1. Extract the archive to retrieve the files needed to complete this assessment.

Following is an ordered list of steps that serves as a guide to completingthis assessment. Work and test incrementally. Save often.

  1. Begin by completing the definition of classDictionary. ClassDictionary must provide a constructor that accepts a single string as a parameter. This parameter is the file name of the word list text file. This constructor must place all the words contained in the text file into the dictionary. Remember, classDictionary is a type ofHashSet, so use the inherited methods accordingly.
  2. Next, complete the hash function encapsulated in classhash_function indictionary.h.
  3. Then, finish the implementation of functioncheck_spelling. This function already contains code that reads a file line by line. It also extracts each word from a line using an instance of class stringstream. Your task is to check the spelling of each word. Use the inheritedsearch function of classDictionary to determine if a word exists in the dictionary. If the word exists in the dictionary, assume that it is spelled correctly. It if does not exist, assume it is misspelled. For each misspelled word, generate and display a list of possible corrections.

Submission

Submitonly the following.

  1. dictionary.h - your completed classDictionary definition
  2. dictionary.cpp - if created
  3. main.cpp - your completed spell checker program

题目的描述:

给你一个单词表,将这个单词表存起来, 然后给你一段话,让你找出里面的错误的单词,并且给出合理的建议;

首先是hashset.h

// template hash set class#ifndef  _HASHSET_H_#define  _HASHSET_H_#include <iostream>#include <vector>#include <algorithm>#include <stdexcept>using namespace std;// we do not compute prime numbers but use a table insteadstatic const int num_primes = 25;static const unsigned long prime_list[] = {            53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317,            196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843,            50331653, 100663319, 201326611, 402653189, 805306457        };template <typename key_type, typename hash_func, typename key_equal>class HashSet {protected:    // hashtable entries    class Entry {    public:        key_type key;        bool used;        Entry() : used(false) {}}    ;    int entries;      // number of entries    int prime;        // index to size table    vector<Entry> *ht;    hash_func hf;        // hash function on key_type    key_equal eq;        // equality predicate on key_type    int table_size() const { return prime_list[prime];}    float load_factor() const { return float(size()) / table_size();}    int resize();public:    HashSet()            : entries(0), prime(0),    ht(new vector<Entry>(prime_list[0])) {}    virtual ~HashSet() {        delete ht;    }    virtual int size() const { return entries;}    virtual bool search(const key_type& k);    virtual void insert(const key_type& k);    virtual void remove(const key_type& k);};#endif
然后是: hashset.cpp

#include  "hashset.h"using namespace std;template <typename key_type, typename hash_func, typename key_equal>bool HashSet<key_type, hash_func, key_equal>::search(const key_type& k) {    int p = hf(k) % table_size();    while ((*ht)[p].used) {        if (eq((*ht)[p].key, k)) {       // equality predicate for key_type            return true;        }        p++;        if (p == table_size()) {            p = 0;  // wrap around to beginning        }    }    return false;}template <typename key_type, typename hash_func, typename key_equal>void HashSet<key_type, hash_func, key_equal>::remove(const key_type& k) {    int p = hf(k) % table_size();    while ((*ht)[p].used) {        if (eq((*ht)[p].key, k)) {            (*ht)[p].used = false;            entries--;            break;        }        p++;        if (p == table_size()) {            p = 0;  // wrap around to beginning        }    }}template <typename key_type, typename hash_func, typename key_equal>void HashSet<key_type, hash_func, key_equal>::insert(const key_type& k) {    if (load_factor() > .7) {        resize();    }    int pp = hf(k) % table_size();    int p = pp;    while (p < table_size() && (*ht)[p].used) {        p++;    }    if (p == table_size()) {        p = 0;    }    while ((*ht)[p].used) {        p++;    }    (*ht)[p].key = k;    (*ht)[p].used = true;    entries++;}template <typename key_type, typename hash_func, typename key_equal>int HashSet<key_type, hash_func, key_equal>::resize() {    if (prime == num_primes - 1) {        cerr << "maximal table size reached, aborting ... " << endl;        exit(2);    }    int mm = prime_list[prime];    prime++;    int m = prime_list[prime];    vector<Entry>* ptr = new vector<Entry>(m);    for (int i = 0; i < mm; ++i) {        if ((*ht)[i].used) {            key_type kk = (*ht)[i].key;            int p = hf(kk) % m;            while (p < m && (*ptr)[p].used) {                p++;            }            if (p == m) {                p = 0;            }            while ((*ptr)[p].used) {                p++;            }            (*ptr)[p].key = kk;            (*ptr)[p].used = true;        }    }    delete ht;    ht = ptr;    return m;}

dictionary.h

#ifndef  _DICTIONARY_H_#define  _DICTIONARY_H_#include  <iostream>#include  <vector>#include  <list>#include  <algorithm>#include  <string>#include  "hashset.h"#include  "hashset.cpp"using namespace std;class hash_function{public:    hash_function() {}    unsigned int operator()( const string& s )  const    {        unsigned int seed = 131;        unsigned int hash = 0;        unsigned str=0;        while( str<s.size() )        {            hash = hash * seed + (s[str++]);        }        return (hash & 0xFFFFF);    }};class equality{public:    equality() {}    bool  operator()( const string& A, const string& B )  const {return  (A == B);    }};class Dictionary: public HashSet<string, hash_function, equality> {public:    Dictionary(char* file)    {        ifstream in(file);        if ( !in )        {            cerr << " can not open the file please check it " << endl;        }        string str;        while ( !in.eof())        {            in >> str;            insert(str);        }    }    // Complete definition};#endif

main.cpp

#include <iostream>#include <fstream>#include <string>#include <sstream>#include <cstdlib>#include <cctype>#include "dictionary.h"using namespace std;void lower ( string& s );string strip_punct(const string& s);void check_spelling(ifstream& in, Dictionary& dict);int main(int argc, char* argv[]) {    // Output usage message if improper command line args were given.    if (argc != 3) {        cerr << "Usage: " << argv[0] << " wordlist_filename input_file\n";        return EXIT_FAILURE;    }    ifstream inf(argv[2]);    if (! inf) {        cerr << "Could not open " << argv[2] << "\n";        return EXIT_FAILURE;    }    // Read dictionary, but let user know what we are working on.    cout << "Loading dictionary, this may take awhile...\n";    /** 读取文件 */    Dictionary d(argv[1]);    check_spelling(inf, d);    inf.close();    return EXIT_SUCCESS;}vector<string> suggestion(Dictionary &dic, string word){    vector<string> suggestions;    unsigned int i;    char c;    /// transpose all adjacent letters    /// 交换两个相邻的字母之后看是否有相同的字符    for (i = 0; i < word.length() - 1; i++) {        string new_word(word);        char temp = new_word[i];        new_word[i] = new_word[i + 1];        new_word[i + 1] = temp;        if (dic.search(new_word)) {            suggestions.push_back(new_word);        }    }    /// remove each letter    /// 把单词中的一个字母去掉之后看有没有一些的东西    for (i = 0; i < word.length(); i++) {        string new_word(word);        new_word.erase(i, 1);        if (dic.search(new_word)) {            suggestions.push_back(new_word);        }    }    /// replace each letter    /// 别的字母替换掉字母之后的样式,看有没有一样的    for (i = 0; i < word.length(); i++) {        for (c = 'a'; c <= 'z'; c++) {            string new_word(word);            new_word.replace(i, 1, 1, c);            if (dic.search(new_word)) {                suggestions.push_back(new_word);            }        }    }    /// insert a letter at each position    /// 插入任意一个字母在任意一个位置之后看有没有相同的单词    for (i = 0; i < word.length(); i++) {        for (c = 'a'; c <= 'z'; c++) {            string new_word(word);            new_word.insert(i, 1, c);            if (dic.search(new_word)) {                suggestions.push_back(new_word);            }        }    }    vector<string> unique_suggestions;    insert_iterator<vector<string> > ins(unique_suggestions,                                         unique_suggestions.begin());    unique_copy(suggestions.begin(), suggestions.end(),                ins);    return unique_suggestions;}void check_spelling(ifstream& in, Dictionary& dict) {    int line_number = 0;    while (in) {        line_number++;        string line;        getline(in, line);        ///定义字符串的流        stringstream ss (stringstream::in | stringstream::out);        ss << line;        string word;        while (ss >> word)        {        // TODO: Complete the spell check of each word            word = strip_punct(word);            string tempword(word);            lower(tempword);            if ( !dict.search(tempword) )            {                cout << " The line " << line_number << " :  " << tempword << " ." << endl;                cout << "\t suggestion " << endl;                vector <string> suggest = suggestion(dict, tempword);                for ( int i = 0; i < suggest.size(); i++)                {                    cout << "\t" << suggest[i] << endl;                }            }        }    }}void lower(string& s){    /// Ensures that a word is lowercase    /// 把单词转化为小写字母    for (unsigned int i = 0; i < s.length(); i++) {        s[i] = tolower(s[i]);    }}string strip_punct(const string& s) {    /// Remove any single trailing    /// punctuation character from a word.    ///判断参数是否是字母。如果非空格,非数字,非字母返回 1 .    if (ispunct(s[s.length() - 1]) ) {        return s.substr (0, s.length() - 1);    }    else {        return s;    }}




0 0