fold函数和reduce函数的区别（不特指spark）

来源：互联网发布：怎样下载电子表格软件编辑：程序博客网时间：2024/05/24 03:23

In a fold over a collection, the accumulator type may be different than the type of the collection, and a zero element is usually given. In a reduce, you don't give a zero element and the accumulator type is the same type as is in the collection. A reduce is a special case of a fold but not vice versa. Type signatures are as follows:

foldLeft :: (a -> b -> a) -> a -> [b] -> a
foldLeft (λx y. x + y) 0 [1, 2, 3] = 6
foldLeft (λx y. x * y) 1 [2, 3, 5] = 30
foldLeft (λx _. x + 1) 0 ["cat", "dog"] = 2

The function of the fold will usually not be commutative, and order of applications matters, so you have to differentiate between left-folds and right-folds. The example above is one of a left fold, because:

-- (+) is shorthand for (λx y. x + y)
foldLeft (+) 0 [1, 2, 3] = ((0 + 1) + 2) + 3

For a contrast:

foldRight (+) 0 [1, 2, 3] = 1 + (2 + (3 + 0))

In the context of addition, the difference between left and right folds doesn't matter because addition is commutative and you get the same answer either way. That doesn't apply in general.

With a reduce, the conceptual assumption is that the operation is strictly associative, and often commutative. This allows the reduce to parallelized and even distributed (as in "map reduce") while a fold (which makes no such associations) is intended to be serial. No zero element is given, and it's an error to reduce on an empty collection. The type signature of reduce is this:

reduce :: (a -> a -> a) -> [a] -> a

Assuming associativity of the operator, you can implement reduce in terms of

foldLeft[code] like so:
 
[code]
reduce f [] = error
reduce f (head:tail) = foldLeft f head tail

It's much harder to parallelize a general fold as if it were a reduce. Mathematically speaking, you can do it if you have first-class functions, by treating your b's in the collection as a-> a[code](thatis, transformations of the accumulator) through the injection [code]g b=λ a. f a b where f is the folding function, and using function composition (which is associative, although not commutative) as your reducing function. Then, you are building up a giant deferred computation of type a-> a that is finally applied to the given zero-value of type a. Whether that will be efficient is an open question, but it is mathematically sound.

In code, it looks like this:

compose :: (a -> a) -> (a -> a) -> (a -> a)
compose f g = (λx. f (g x))
 
id :: (a -> a)
id x = x
 
foldLeft :: (a -> b -> a) -> a -> [b] -> a
foldLeft f z coll = 
  (g coll) z
  where g [] = id
        g _ = reduce compose (map (λb . (λa . f a b)) coll)

阅读全文

0 0