Majority Element

来源：互联网发布：淘宝宝贝网址编辑：程序博客网时间：2024/05/29 16:37

算法证明见论文<MJRTY-A Fast MajorityVote Algorithml>，论文节选

5.2

The Algorithm Imagine a convention center filled with delegates (Le., voters) each carrying a placard proclaiming the name of his candidate. Suppose a floor fight ensues and delegates of different persuasions begin to knock one another down with their placards. Suppose that each delegate who knocks down a member of the opposition is simultaneously knocked down by his opponent. Clearly,should any candidate field more delegates than all the others combined, that candidate would win the floor fight and, when the chaos subsided, the only delegates left standing would be from the majority block. Should no candidate field a clear majority, the outcome is less clear; at the conclusion of the fight,delegates in favor of at most one candidate, say, the nominee, would remain standing-but the nominee might not represent a majority of all the delegates.Thus, in general, if someone remains standing at the end of such a fight, the convention chairman is obliged to count the nominee's placards (including those held by downed delegates) to determine whether a majority exists.Thus our algorithm has two parts. The first part pairs off disagreeing delegates until all remaining delegates agree. We call this the "pairing" phase.Perhaps nonobviously, pairing can be done with n comparisons. If pairing leaves any delegates standing then those delegates unanimously favor a single candidate-the nominee-who must be in the majority if a majority exists.The second part of the algorithm, called the "counting" phase, determines whether the nominee received more than half the votes. The counting phase obviously requires at most n comparisons. The focus of this paper is on the pairing phase.

Here is a bloodless way the chairman can simulate the pairing phase. He visits each delegate in turn, keeping in mind a current candidate cand and account k, which is initialized to O. Upon visiting each delegate, the chairman first determines whether k is 0; if it is, the chairman selects the delegate's candidate as the new value of cand and sets k to 1. Otherwise, the chairman asks the delegate whether his candidate is cand If so, then k is incremented by 1. If not, then k is decremented by 1. The chairman then proceeds to the next delegate. When all the delegates have been processed, cand is in the majority if a majority exists.

Proof: Suppose there are n delegates. After the chairman visits the ith delegate, 1 ~ i ~ n, the delegates he has processed can be divided into two groups: a group of k delegates in favor of cand, and a group of delegates that can be paired in such a way that paired delegates disagree. From this invariant we may conclude, after processing all of the delegates, that cand has a majority, if there is a majority. For suppose there exists an x different from cand with more than n/2 votes. Since the second group can be paired, x receives at most (n - k) /2 votes from that group. Thus, x must have received a vote from the first group, contradicting the fact that all votes in the first group are for cand.

证明关键处:x receives at most (n - k) /2 votes from that group.第二个分组总是+1 -1，且总和为0，如果X收到超过改组一半的投票则第二个分组+1 -1不会为0.

另个证明：

* A one-pass, linear time algorithm that returns the majority element of a
* range of data.  The majority element is an element that appears strictly
* more than half the time.  For example, in the sequence
*
*                                  0 1 0 0 2 0 3
*
* The number 0 is a majority element, since it appears 4/7 times.  However,
* in the sequence
*
*                                 0 1 0 0 2 0 3 3
*
* There is no majority element, since even though 0 occurs 4/8 times, this
* isn't strictly greater than half the elements.
*
* The algorithm for finding the majority element is remarkably simple, but its
* correctness is not immediately obvious.  The algorithm works as follows.  At
* each step, we maintain our "guess" of what the majority element will be, and
* also a counter.  We then scan across the array.  At each point, if the new
* element matches our current guess we increment the counter, and otherwise
* we decrement it.  If the counter is ever zero, then on the next element we
* change the counter to 1 and pick the next element as our guess.  Finally, we
* output the guessed element.  For example, here is the algorithm running on
* the earlier input.  The topmost row shows the input, below that our guess,
* and below that the counter:
*
*  INPUT    0 1 0 0 2 0 3
*  GUESS   ? 0 ? 0 0 0 0 0
*  COUNTER 0 1 0 1 2 1 2 1
*
* Since our guess at the end is zero, we output zero.  This algorithm is
* due to Boyer and Moore and is described in their paper "MJRTY - A Fast
* Majority Vote Algorithm."
*
* There are many ways to think about why this algorithm works.  One good
* intuition is to think of the algorithm as breaking the input down into lots
* of stretches of consecutive copies of particular values.  Incrementing the
* counter then corresponds to marking that multiple copies of the same value
* were found, while decrementing it corresponds to some other sequences of
* values "canceling out" the accumulation of values of a particular type.
*
* A formal proof of correctness of this algorithm (based on the proof in
* Boyer and Moore's paper) relies on a key lemma.  In this section, we'll
* let C be the number that is currently a candidate for the majority element,
* K be its count after some number of steps, and N be the number of total
* elements.
*
* Lemma 1: For any i, 1 <= i <= N, after i steps of the algorithm, the
* elements in the range [1, i] can rearranged into two groups A and B such
* that A is K copies of C, and B is a collection of elements with at most
* i / 2 copies of any one element.
*
* Let's hold off of the proof of this lemma for now, and show that if it
* holds and there is a majority element, the algorithm must be correct.  Using
* the above lemma, note that when the algorithm terminates, there must be some
* element C that was chosen with some count K.  Assume for the sake of
* contradiction that C is not the majority element; then there is some other
* element C' that must be the majority element.  Consequently, there are at
* least n / 2 elements of the range equal to C'.  Let's consider where they
* are.  By the above lemma, all the elements of the input can be broken up
* into groups A and B, where everything in group A has value C and at most
* |B| / 2 elements of |B| have value C'.  Since |A| = K and |A| + |B| = N,
* this means that there are at most (N - K) / 2 copies of C', contradicting
* the fact that C' is the actual majority element.  We have reached a
* contradiction, and so C must be the majority element at the end of the
* algorithm's run.
*
* We can now prove the claim of the lemma by induction on i.  As a base case,
* if i = 1, then K = 1 and C is the first element of the range.  Then we can
* let A be the singleton element and B be the empty set, which trivially obeys
* the criteria of the lemma.  For the inductive step, assume that for some i
* the claim holds and consider the execution of the algorithm on step i + 1.
* Let A and B be the sets A and B from the ith step.  Then we consider three
* possible cases:
*
* 1. On entry to this step, K = 0.  Then after this step finishes, K = 1
*    and C is the newest element.  This means that on entry to this step,
*    A was the empty set and B was some set where no element appeared more
*    than i/2 times in B.  If we then let A' be the singleton set containing
*    the new element and B' = B, then these sets satisfy the requirements of
*    the lemma and the claim holds.
* 2. On entry to this step, K > 0 and the new element matches the current
*    majority element.  Then we can add this element to A to get a new set
*    A' meeting the lemma's requirements, so the claim holds.
* 3. On entry to this step, K > 0, and the new element does not match the
*    current majority element.  This means that the new K is one minus the
*    previous K, but the candidate majority element does not change.  If
*    we then move one element from A into B, then place the new element into
*    the set B, then the updated A and B will satisfy the lemma's claims.
*    This is tedious but simple to check, so I'll leave it as an exercise
*    to the reader. :-)
*
* In the case where there is no majority element, the element produced by the
* algorithm will be arbitrary.  We can then check whether we have the majority
* element by performing a linear scan over the input range and counting the
* frequency of the element.

0 0