Monads in Scala Part Two: More Maybes

来源：互联网发布：为什么网络ip地址冲突编辑：程序博客网时间：2024/06/04 00:46

scabl: Monads in Scala Part Two: More Maybes

http://scabl.blogspot.com/2013/03/monads-in-scala-2.html

In the first installment of this series, we translated the examples from the first page of thechapter on monads in the Haskell wikibook from Haskell to Scala. In this blog post, we will continue on with the second page. This page continues with more Maybe examples, which will help us to generalize our monadic laws test for Maybes a bit. We'll also learn about kleisli composition, which composes functions suitable for passing to flatMap.

All the source code presented here is available here on GitHub. I'm compiling against Scala 2.10.1. Some of this code will not compile under Scala 2.9.x.

Safe Functions

The first section of the wiki page discusses safe functions - versions of mathematical functions that normally produce a Just wrapping the result, but that produce MaybeNotwhen the input is not within the range of that mathematical function. (As a reminder, I am using MaybeNot where the Haskell examples using Nothing, to avoid confusion with the Nothing found in the Scala library.) This will allow us to "floor out" when chaining mathematical operations - anything along the way that produces a MaybeNot will cause the MaybeNot to propagate right on down the chain.

I'm not sure what Haskell's numeric types look like, but we do already get this kind of behavior when using Java doubles that underlie Scala Doubles. For example,sqrt(-1.0d) will produce double value NaN, which stands for "not a number". If we then take the log of that result - log(NaN) - we will still get NaN. Basically any math operation where one of the operands is NaN will produce a NaN. So safe math functions may not be as useful in Java or Scala, but it's a good example, so let's roll with it.

Here are my safe versions of sqrt and log:

def safeSqrt(d: Double): Maybe[Double] =
  if (d >= 0) Just(scala.math.sqrt(d)) else MaybeNot
 
def safeLog(d: Double): Maybe[Double] =
  if (d > 0) Just(scala.math.log(d)) else MaybeNot
view rawsafeMath.scala hosted with ❤ by GitHub

Now when mathematicians say things like "log square root", they mean you should take the square root first, and then take the log of that. Some people, like myself, might find this confusing, so I wanted to point that out. The initial implementation of safeLogSqrtin the example simply checks the outcome of sqrt before passing it on to log, like so:

def safeLogSqrt0(d: Double): Maybe[Double] =
  safeSqrt(d) match {
    case Just(d) => safeLog(d)
    case MaybeNot => MaybeNot
  }
view rawsafeLogSqrt0.scala hosted with ❤ by GitHub

This is a mouthful, especially when the unsafe version is as simple as this:

def unsafeLogSqrt(d: Double): Double = log(sqrt(d))

Of course, we want to use flatMap (>>= in Haskell) instead of manually testing ifsafeSqrt produced a Just or a MaybeNot. The Haskell example presented here looks like this:

def safeLogSqrt1(d: Double): Maybe[Double] =
  Just(d) flatMap safeSqrt flatMap safeLog
view rawsafeLogSqrt1.scala hosted with ❤ by GitHub

Which, according to the "left unit" Monad law, can be simplified to this:

def safeLogSqrt2(d: Double): Maybe[Double] =
  safeSqrt(d) flatMap safeLog
view rawsafeLogSqrt2.scala hosted with ❤ by GitHub

I'm not sure why the Haskell example uses the longer form here. My guess is that it's because return x >>= safeSqrt >>= safeLog looks sexier than safeSqrt(x) >>= safeLog, but that's just my best guess.

Before moving on, here's a version of safeLogSqrt that uses a for loop:

def safeLogSqrt3(d: Double): Maybe[Double] =
  for (s <- safeSqrt(d); l <- safeLog(s)) yield l
view rawsafeLogSqrt3.scala hosted with ❤ by GitHub

Kleisli Composition

The Haskell wikibook points out that you can produce an unsafe logSqrt function by simply composing the unsafe functions log and sqrt. For function composition, mathematicians use a little dot like this:

\circ

. In Haskell, the composition is done like this:

unsafeLogSqrt = log . sqrt

The function on the right is the one that gets called first. In Scala, it would look something like this:

val unsafeLogSqrt: (Double) => Double =
  scala.math.log _ compose scala.math.sqrt _
view rawunsafeLogSqrt.scala hosted with ❤ by GitHub

Let's step back and examine what's going on here. In Scala, when we follow a function's name with an underscore, we are indicating that we are talking about the function itself, and not trying to apply that function. scala.math.log is a scala.Function1, and as such, has a compose method that takes a scala.Function1 as an argument. The result is a new function that behaves by calling sqrt first, and then calling log on the result of that.

At this point, the wikibook brings in the Kleisli composition operator, <=<, and shows how you can define a safeLogSqrt with similar brevity, like so:

safeLogSqrt = safeLog <=< safeSqrt

This is similar to function composition, but ordinary function composition will not work, because the output from safeSqrt is Maybe[Double], and the input to safeLog isDouble. So we really want to chain these together in a flatMap style, as in the body of a Scala for loop.

As a first attempt at a kleisli composition operator in Scala, let's write a method that takes in two functions as arguments, and produces the kleisli composition of those two functions:

def kleisliCompose[A, B, C](                                                                                                         
  f: (B) => Maybe[C],                                                                                                                
  g: (A) => Maybe[B]):
(A) => Maybe[C] = {
  a: A =>
  for (b <- g(a); c <- f(b)) yield c
}
view rawkleisliCompose.scala hosted with ❤ by GitHub

We are producing a function that takes an A as input and produces a Maybe[C]. We first apply g, which takes an A as input and produces a Maybe[B]. Then we flatMap theMaybe[B] with f, which takes a B as input and produces a Maybe[C]. Now we can define safeLogSqrt as follows:

def safeLogSqrt4(d: Double): Maybe[Double] =
  kleisliCompose(safeLog _, safeSqrt _)(d)
view rawsafeLogSqrt4.scala hosted with ❤ by GitHub

This is nice, but wouldn't it be better if we could achieve this using an infix notation, to match scala.math.log _ compose scala.math.sqrt _? We cannot simply add a kleisliCompose method to Function1. What we do in Scala >= 2.10 in these circumstances is build an implicit class that contains the desired new method. Let's take a look at our implicit class, and then break it down:

implicit class MaybeFunction[B, C](f: (B) => Maybe[C]) {
  def kleisliCompose[A](g: (A) => Maybe[B]): (A) => Maybe[C] = {
    a: A =>
    for (b <- g(a); c <- f(b)) yield c
  }
}
view rawMaybeFunction.scala hosted with ❤ by GitHub

The kleisliCompose method itself is the same, except that type parameters B and C, and parameter f, have been lifted out of the method and into the enclosing class. TheMaybeFunction constructor takes a single Function1 as argument - in particular, aFunction1 that returns a Maybe.

Let's take note that MaybeFunction is implicit, and that it has a methodkleisliCompose. Now, when the compiler encounters something like safeLog _ kleisliCompose safeSqrt _, and tries to resolve the method call, it does not immediately find a method kleisliCompose in Function1, which is the type ofsafeLog _. Before it gives up, it looks for any implicit conversions it could use to transform safeLog _ into that has a method named kleisliCompose with an applicable signature. Assuming our MaybeFunction class is in the right scope, it will implicitly construct a MaybeFunction from the Function1, and callkleisliCompose on that.

This Pimp my Library pattern is available in languages like Ruby and Groovy via meta-programming. In Scala, it's done at compile-time in a type-safe way. While implicit classes are newly available in Scala 2.10, the same effect has been achieved for years with implicit functions. The behavior of implicit classes is more controlled than that of implicit functions, as the target of the type conversion is constrained to be the new, implicit class.

So let's go ahead and use this pimped Function1 to define our safeLogSqrt function:

def safeLogSqrt5(d: Double): Maybe[Double] =
  (safeLog _ kleisliCompose safeSqrt _)(d)
view rawsafeLogSqrt5.scala hosted with ❤ by GitHub

Testing Safe Functions

What about testing all the safe functions we have defined above? First of all, we could write tests that assure that all of the versions of safeLogSqrt presented above produce the same results. To do this, we first need to define a set of test data to use. To be sure to test every outcome, I've included input values for which safeLogSqrt will return both aJust and a MaybeNot. I've also included a value (0) for which safeSqrt will return aJust, but safeLogSqrt will return a MaybeNot. So I'm covering the three major code paths: Double to Just[Double] to Just[Double]; Double to Just[Double] toMaybeNot; and Double to MaybeNot to MaybeNot.

/** A series of doubles useful for testing safe methods. */
val doubles = List[Double](
  -2, // log, sqrt produce NaN                                                                                                       
  -1, // log, sqrt produce NaN                                                                                                       
  0, // log produces -Infinity, sqrt produces 0                                                                                      
  0.5, // log produces negative                                                                                                      
  1, // log produces 0                                                                                                               
  2) // keeping things positive  
view rawsafe.doubles.scala hosted with ❤ by GitHub

Now the test. Just a word of note here, I am switching my tests from just using asserts to using ScalaTest's ShouldMatchers. I'm using "org.scalatest" %% "scalatest" % "2.0.M5b", but any recent version should work. I have 5 versions ofsafeLogSqrt, so testing four pairs will do:

package maybe
 
import org.scalatest.FlatSpec
import org.scalatest.matchers.ShouldMatchers
 
class SafeMathSpec extends FlatSpec with ShouldMatchers {
 
  import safe._
 
  behavior of "various implementations of safe.safeLogSqrt"
  they should "agree over a range of test input" in {
    shouldAgreeOverTestInputRange(safeLogSqrt1 _, safeLogSqrt2 _)
    shouldAgreeOverTestInputRange(safeLogSqrt1 _, safeLogSqrt3 _)
    shouldAgreeOverTestInputRange(safeLogSqrt1 _, safeLogSqrt4 _)
    shouldAgreeOverTestInputRange(safeLogSqrt1 _, safeLogSqrt5 _)
  }
  
  def shouldAgreeOverTestInputRange(                                                                                                   
    safeLogSqrt1: (Double) => Maybe[Double],                                                                                           
    safeLogSqrt2: (Double) => Maybe[Double]) {
    doubles foreach { d =>
      (safeLogSqrt1(d)) should equal (safeLogSqrt2(d))
    }
  }
} 
view rawSafeMathSpec.scala hosted with ❤ by GitHub

Testing Maybe Obeys Monad Laws

In the last blog post, we developed tests to assure that Maybe obeyed the monadic laws in regards to some sample Person data. We now have another data set that we can test the Maybe monad against. But the old version of the test is hard-coded to use Persondata. Let's generalize that into a function that takes the required test data as arguments. Then we can call this function for different sets of test data. We can easily extrapolate over the types involved. We also pass a String description to include in a FlatSpecbehavior clause. The function opens like this:

def maybeShouldObeyMonadLaws[A, B, C](                                                                                               
  testDataDescription: String,                                                                                                       
  testItems: List[A],                                                                                                                
  f: Function1[A, Maybe[B]],                                                                                                         
  g: Function1[B, Maybe[C]]) {
 
  behavior of "Maybe monad with respect to " + testDataDescription
view rawmaybeShouldObeyMonadLaws-1.scala hosted with ❤ by GitHub

As you will recall, the left unit Monad law states that m >>= return = m. Let's assert this over all of our test data:

  it should "obey left unit monadic law" in {
    testItems foreach { a =>
      { Just(a) flatMap f
      } should equal {
        f(a)
      }
    }
  }
view rawmaybeSOML-2.scala hosted with ❤ by GitHub

For right unit and associativity, we need to seed a list of all possible Maybes. We'll throwMaybeNot in with the rest of our test data wrapped in Justs:

  val maybes = MaybeNot +: (testItems map { Just(_) })
 
  it should "obey right unit monadic law" in {
    maybes foreach { m =>
      { m flatMap { Just(_) }
      } should equal {
        m
      }
    }
  }
 
  it should "obey associativity monadic law" in {
    maybes foreach { m =>
      { m flatMap f flatMap g
      } should equal {
        m flatMap { a => f(a) flatMap g }
      }
    }
  }
view rawmaybeSOML-3.scala hosted with ❤ by GitHub

We have a couple more Monad tests from last time. I won't go into the details, and you can check it out in the source code. I want to get down to business, which is calling this function over our two data sets:

maybeShouldObeyMonadLaws(
  "Person data",
  Person.persons,
  { p: Person => p.mother },
  { p: Person => p.father })
  
maybeShouldObeyMonadLaws(
  "safe math operations",
  safe.doubles,
  safe.safeSqrt _,
  safe.safeLog _)
view rawmaybeSOML-4.scala hosted with ❤ by GitHub

Lookup Tables

The next major section of the wikibook page is on lookup tables. A couple of things perplexed me here. First, they are using a list of pairs as a lookup table. I did check, and it seems Haskell has a perfectly serviceable Map class. Using a list of pairs instead of a library collection class would never occur to me. I'm probably spoiled by the quality of Scala's collection library. The second thing that seemed quite different to me was the fact that they are putting raw strings and numbers into the map. There are two maps in their example that are semantically different, but both have the same type signature: string to string. Haskell has such a rich type system, it seems like they could easily do better than this. They probably made these choices to keep the example simple, and to focus the discussion on the use of Monads. But it wouldn't be natural Scala without using case classes and maps. Typing the data is trivial in Scala:

case class Name(name: String)
case class Number(number: String)
case class Registration(registration: String)
case class TaxOwed(taxOwed: Double)
view rawlookup-1.scala hosted with ❤ by GitHub

Before writing any lookup methods, we need to seed the maps. First, we'll generate a hundred or so pieces of test data for each type:

private val seedTestData: List[Int] = (1 to 100).toList
private def intToName(i: Int): Name = Name(i.toString)
private def intToNumber(i: Int): Number = Number(i.toString)
private def intToRegistration(i: Int): Registration = Registration(i.toString)
private def intToTaxOwed(i: Int): TaxOwed = TaxOwed(i.toDouble)
 
val names: List[Name] = seedTestData map intToName
val numbers: List[Number] = seedTestData map intToNumber
val registrations: List[Registration] = seedTestData map intToRegistration
val taxesOwed: List[TaxOwed] = seedTestData map intToTaxOwed
view rawlookup-2.scala hosted with ❤ by GitHub

We're going to generate the maps in such a way that there is at least one test value for every permutation of Just and MaybeNot. First, we agree that in every key/value pair in a map, the key and the value relate back to the same Int. Then, we filter each map down to the pairs whose underlying Int is divisible by some small prime. Using 2, 3, and 5 over a hundred element test set will give all permutations. Here's how it looks:

private def seedMap[A, B](                                                                                                             
  intFilter: (Int) => Boolean,                                                                                                         
  intToA: (Int) => A,                                                                                                                  
  intToB: (Int) => B): Map[A, B] = {
  import language.postfixOps
  seedTestData filter intFilter map {
    i => intToA(i) -> intToB(i)
  } toMap
}
 
// only those divisible by 2 are in the number map                                                                                     
private val numberMap: Map[Name, Number] =
  seedMap(_ % 2 == 0, intToName, intToNumber)
 
// only those divisible by 3 are in the registration map                                                                               
private val registrationMap: Map[Number, Registration] =
  seedMap(_ % 3 == 0, intToNumber, intToRegistration)
 
// only those divisible by 5 are in the tax map                                                                                        
private val taxOwedMap: Map[Registration, TaxOwed] =
  seedMap(_ % 5 == 0, intToRegistration, intToTaxOwed)
view rawlookup-3.scala hosted with ❤ by GitHub

We need to add a conversion from Option to Maybe to translate the result of Map#get. A natural place for this is in the Maybe companion object:

object Maybe {
  def apply[A](option: Option[A]): Maybe[A] = option match {
    case Some(a) => Just(a)
    case None => MaybeNot
  }
}
view rawMaybe-companion.scala hosted with ❤ by GitHub

We'll define the single-level lookup methods in terms of a generic lookup function that also takes the Map as parameter:

private def lookup[A, B](map: Map[A, B])(a: A): Maybe[B] =
  Maybe(map.get(a))
 
def lookupNumber(name: Name): Maybe[Number] =
  lookup(numberMap)(name)
 
def lookupRegistration(number: Number): Maybe[Registration] =
  lookup(registrationMap)(number)
 
def lookupTaxOwed(registration: Registration): Maybe[TaxOwed] =
  lookup(taxOwedMap)(registration)
view rawlookup-4.scala hosted with ❤ by GitHub

Finally, we can implement the lookup methods that span multiple dictionaries with forloops or flatMaps:

def lookupRegistration1(name: Name): Maybe[Registration] =
  for (
    number <- lookupNumber(name);
    registration <- lookupRegistration(number))
  yield registration
 
def lookupRegistration2(name: Name): Maybe[Registration] =
  lookupNumber(name) flatMap lookupRegistration
 
def lookupTaxOwed1(name: Name): Maybe[TaxOwed] =
  for (
    number <- lookupNumber(name);
    registration <- lookupRegistration(number);
    taxOwed <- lookupTaxOwed(registration))
  yield taxOwed
 
def lookupTaxOwed2(name: Name): Maybe[TaxOwed] =
  lookupNumber(name) flatMap lookupRegistration flatMap lookupTaxOwed
view rawlookup-6.scala hosted with ❤ by GitHub

More Testing Maybe Obeys Monad Laws

We can easily add two more Monad laws tests, like so:

maybeShouldObeyMonadLaws(
  "looking up Numbers and Registrations by Name",
  lookup.names,
  lookup.lookupNumber _,
  lookup.lookupRegistration _)
 
maybeShouldObeyMonadLaws(
  "looking up Registrations and TaxesOwed by Number",
  lookup.numbers,
  lookup.lookupRegistration _,
  lookup.lookupTaxOwed _)
view rawmaybeSOML-5.scala hosted with ❤ by GitHub

Conclusion

These were fun examples that are common use-cases for a Maybe class. We learned about kleisli composition, and solidified our understanding of the Maybe Monad. As a Java refugee, it took me a while to get used to Scala's Option class as an alternative tonulls. But once you get the hang of it, it's really handy. Methods like map, flatMap,orElse, and getOrElse, are so much more elegant than the equivalent operations with Java nulls. All that being said, I am looking forward to a non-Maybe example.