[笔记]sort

来源：互联网发布：七天网络阅卷查分注册编辑：程序博客网时间：2024/06/07 22:27

sort

the bubble sort

def bubble_sort(data):    n = len(data)    for idx1 in range(n-1):        for idx2 in range(idx1+1,n):            if data[idx1] > data[idx2]:                data[idx1], data[idx2] = data[idx2], data[idx1]    return datadata = [2,7,1,6,6,3,0,6,9]print(bubble_sort(data))[0, 1, 2, 3, 6, 6, 6, 7, 9]

A bubble sort can be modified to stop early if it finds that the list has become sorted. This means that for lists that require just a few passes, a bubble sort may have an advantage in that it will recognize the sorted list and stop.

def short_bubble(data):    n = len(data)    exchange = True    compare_num = n - 1 # at most n-1 compare times    while compare_num > 0 and exchange:        exchange = False        for idx in range(compare_num):            if data[idx] > data[idx+1]:                exchange = True                data[idx], data[idx+1] = data[idx+1], data[idx]        compare_num -= 1    return datadata = [2,7,1,6,6,3,0,6,9]print(short_bubble(data))[0, 1, 2, 3, 6, 6, 6, 7, 9]

the selection sort

def select_sort(data):    n = len(data)    exchange_times = n-1    for fill_idx in range(n-1, 0,-1):        max_idx = 0        for idx in range(1,fill_idx+1):            if data[idx] > data[max_idx]:                max_idx = idx        data[fill_idx], data[max_idx] = data[max_idx], data[fill_idx]    return datadata = [2,7,1,6,6,3,0,6,9,-1]print(select_sort(data))[-1, 0, 1, 2, 3, 6, 6, 6, 7, 9]

the insertion sort

It always maintains a sorted sublist in the lower positions of the list. Each new item is then “inserted” back into the previous sublist such that the sorted sublist is one item larger.

def insert_sort(data):    n = len(data)    for idx1 in range(1,n):        for idx2 in range(idx1):            if data[idx1] < data[idx2]:                data[idx1],data[idx2] = data[idx2],data[idx1]    return datadef insertionSort(alist):   for index in range(1,len(alist)):     currentvalue = alist[index]     position = index     while position>0 and alist[position-1]>currentvalue:         alist[position]=alist[position-1]         position = position-1     alist[position]=currentvaluedata = [2,7,1,6,6,3,0,6,9,-1]alist = [54,26,93,17,77,31,44,55,20]print(insert_sort(data))  print(insert_sort(alist))[-1, 0, 1, 2, 3, 6, 6, 6, 7, 9][17, 20, 26, 31, 44, 54, 55, 77, 93]

the shell sort

The shell sort, sometimes called the “diminishing increment sort,” improves on the insertion sort by breaking the original list into a number of smaller sublists, each of which is sorted using an insertion sort. The unique way that these sublists are chosen is the key to the shell sort. Instead of breaking the list into sublists of contiguous items, the shell sort uses an increment i, sometimes called the gap, to create a sublist by choosing all items that are i items apart.

the complexity of shell sort tends to fall somewhere between O(n) and O(n2) , based on the behavior of the element distribution.

def gap_shell_sort(data, start_pos, gap_len):    n = len(data)    for idx1 in range(start_pos+gap_len, n, gap_len):        for idx2 in range(0, idx1+1, gap_len):            if data[idx2] > data[idx1]:                data[idx2], data[idx1] = data[idx1], data[idx2]    def shell_sort(data):    sublist_len = len(data) // 2    while sublist_len > 0:        for idx in range(sublist_len):            gap_shell_sort(data, idx, sublist_len)        sublist_len //= 2alist = [54,26,93,17,77,31,44,55,20]shell_sort(alist)print(alist)[17, 20, 26, 31, 44, 54, 55, 77, 93]

the merge sort

the merge sort achives a better time complixity O(nlog(n)).
however, the space complixity is O(nlog(n)).
the core of merge sort algorithm is inpired by divid and conqure algorithm.

def merge_sort(data):    n = len(data)    if n<2:        return data    left = merge_sort(data[:n//2])    right = merge_sort(data[n//2:])    idx_left = idx_right = 0    idx = 0    while idx_left < len(left) and idx_right < len(right):        if left[idx_left] < right[idx_right]:            data[idx] = left[idx_left]            idx_left += 1            idx += 1        else:            data[idx] = right[idx_right]            idx_right += 1            idx += 1    while idx_left < len(left):        data[idx] = left[idx_left]        idx_left += 1        idx += 1    while idx_right < len(right):        data[idx] = right[idx_right]        idx_right += 1        idx += 1    return dataalist = [54,26,93,17,77,31,44,55,20]merge_sort(alist)print(alist)[17, 20, 26, 31, 44, 54, 55, 77, 93]

the quick sort

The quick sort uses divide and conquer to gain the same advantages as the merge sort, while not using additional storage. As a trade-off, however, it is possible that the list may not be divided in half. When this happens, we will see that performance is diminished.

A quick sort first selects a value, which is called the pivot value. Although there are many different ways to choose the pivot value, we will simply use the first item in the list. The role of the pivot value is to assist with splitting the list. The actual position where the pivot value belongs in the final sorted list, commonly called the split point, will be used to divide the list for subsequent calls to the quick sort.

To analyze the quickSort function, note that for a list of length n, if the partition always occurs in the middle of the list, there will again be logn divisions. In order to find the split point, each of the n items needs to be checked against the pivot value. The result is O(nlogn). In addition, there is no need for additional memory as in the merge sort process.

Unfortunately, in the worst case, the split points may not be in the middle and can be very skewed to the left or the right, leaving a very uneven division. In this case, sorting a list of n items divides into sorting a list of 0 items and a list of n−1 items. Then sorting a list of n−1 divides into a list of size 0 and a list of size n−2, and so on. The result is an O(n2) sort with all of the overhead that recursion requires.

We mentioned earlier that there are different ways to choose the pivot value. In particular, we can attempt to alleviate some of the potential for an uneven division by using a technique called median of three. To choose the pivot value, we will consider the first, the middle, and the last element in the list. In our example, those are 54, 77, and 20. Now pick the median value, in our case 54, and use it for the pivot value (of course, that was the pivot value we used originally).

Ealier version of python used quick sort to implement the sort() method of the list structure. In python 3.2, a hybrid algorithm that combines the insertion and merge sort algorithms is used instead.

def quick_sort(data):    """    using data[0] as the     """    def qsort(data, start, end):        if end - start < 1:            return         pivot = data[start]        left_idx = start         right_idx = end        # partition        while left_idx < right_idx:            while data[right_idx] >= pivot and left_idx < right_idx:                right_idx -= 1            data[left_idx] = data[right_idx]            while data[left_idx] <= pivot and left_idx < right_idx:                left_idx += 1            data[right_idx] = data[left_idx]        data[left_idx] = pivot         qsort(data, 0, left_idx-1)        qsort(data, left_idx+1, end)    qsort(data, 0, len(data)-1)alist = [54,26,93,17,77,31,44,55,20]quick_sort(alist)print(alist)              [17, 20, 26, 31, 44, 54, 55, 77, 93]def quickSort(alist):   quickSortHelper(alist,0,len(alist)-1)def quickSortHelper(alist,first,last):   if first<last:       splitpoint = partition(alist,first,last)       quickSortHelper(alist,first,splitpoint-1)       quickSortHelper(alist,splitpoint+1,last)def partition(alist,first,last):   pivotvalue = alist[first]   leftmark = first+1   rightmark = last   done = False   while not done:       while leftmark <= rightmark and \               alist[leftmark] <= pivotvalue:           leftmark = leftmark + 1       while alist[rightmark] >= pivotvalue and \               rightmark >= leftmark:           rightmark = rightmark -1       if rightmark < leftmark:           done = True       else:           temp = alist[leftmark]           alist[leftmark] = alist[rightmark]           alist[rightmark] = temp   temp = alist[first]   alist[first] = alist[rightmark]   alist[rightmark] = temp   return rightmarkalist = [54,26,93,17,77,31,44,55,20]quickSort(alist)print(alist)  [17, 20, 26, 31, 44, 54, 55, 77, 93]def qsort(data):    n = len(data)    def mid_idx(data, start, end):        mid = (start + end)//2        if data[mid]<= max(data[start], data[end]) and data[mid]>= min(data[start], data[end]):            return mid        if data[start]<= max(data[mid], data[end]) and data[start]>= min(data[mid], data[end]):            return start        if data[end]<= max(data[start], data[mid]) and data[end]>= min(data[start], data[mid]):            return end    def qsort_helper(data, start, end):        if end - start < 1:            return         pivot_idx = mid_idx(data, start, end)        data[start], data[pivot_idx] = data[pivot_idx], data[start]        pivot = data[start]        left = start        right = end        # partition        while left < right:            while data[right] >= pivot and left < right:                right -= 1            data[left] = data[right]            while data[left] <= pivot and left < right:                left += 1            data[right] = data[left]        data[left] = pivot        qsort_helper(data, start, left-1)        qsort_helper(data, left+1, end)    qsort_helper(data, 0, n-1)alist = [54,26,93,17,77,31,44,55,20]qsort(alist)print(alist)                 [17, 20, 26, 31, 44, 54, 55, 77, 93]

the Radix sort

the comparison sort algorithms achieve their goal by comparing the individual sort keys to other keys in the list. We have reviewed five sorting algorithms above. The first three ——- bubble, selection, and insertion ——have a worst case time of O(n2) while the merge sort has a worst time of O(nlogn). The quick sort, the more commonly used algorithm in language libraries, is O(n2)) in the worst case but it has an expected or average time of O(nlogn) .

the natural question is can we do better than O(nlogn) ? For a comparison sort, the answer is no. It can be shown, with the use of a decision tree and examining the permutations of all possible comparisons among the sort keys, that the worst case time for a comparison sort can be no better than O(nlogn) .

this dose not mean, however, that the sorting operation cannot be done faster than O(nlogn) . It simply means that we cannot achieve this with a comparison sort. Actually, there is a distribution sort algorithms that works in linear time. Distribution sort algorithms use techniques other than comparisons among the keys themselves to sort the sequence of keys. while these distribution algorithms are fast, they are not general purpose sorting algorithms. In other word, they cannot be applied to just any sequence of keys. Typically, these algorithms are used when the keys have certain characteristics and for specific types of applications.

radix sort is a fast distribution sorting algorithm that orders keys by examining the individual components of the keys instead of comparing the keys themselves. For example, when sorting integer keys, the individual digits of the keys are compared from least significant to most significant. This is a special purpose sorting algorithm but can be used to sort many types of keys, including positive integers, strings, and floating-point values.

radix sort

from collections import dequedef radix_sort(data, digit_num):    slots = [deque() for i in range(10)]    col_num = 1    for _ in range(digit_num):        for ele in data:            key = (ele // col_num) % 10            slots[key].append(ele)        idx = 0        for que in slots:            while que:                data[idx] = que.popleft()                idx += 1        col_num *= 10    return dataif __name__ == "__main__":    data = [23,4,67,1,4,9,49,21,18]    print(radix_sort(data, 2))

0 0