Scalpel's Scraper

Posted on April 11, 2017

Intro

I’d like to start by a post on Scalpel, for two reasons :

At its heart, Scalpel is but a simple wrapper around TagSoup. TagSoup is more or less the Haskell version of the famous Python’s BeautifulSoup library. I won’t really comment on TagSoup itself here, what interests me is the “wrapping” part of Scalpel. This post is mostly about : “how does one use Haskell common abstractions to build a high level wrapper”.

Let’s start with the github repo. Before even reading the code, I like to consider the cabal file, if only to see if this is going to be costly to build. As you can see, the library depends list is not too long - compared to, say, Lens or Yesod-Core.

You’ll notice that scalpel depends on scalpel-core. Actually, the Scalpel folder is mostly a few Curl utilities and a bunch of re-export over this core. So, the real stuff happens in the scalpel-core folder, where you’ll get three directories. We’ll ignore benchmark and tests and focus on the src/Text/HTML/Scalpel folder where most of the code lives. You can ignore the Core.hs file, once again, it’s mostly reexport.

Finally, following the breadcrumbs, we find the meaty part, in the Internal folder, the Scrape.hs file in the Internal folder. The whole post is going to be about some interesting stuff going on in this file. But first, in case you never heard about Scalpel, let’s quickly see how you can use this library.

Scalpel 101

Scalpel is a high-level HTML / XML scrapping library. In other words, it lets you get content out of an HTML or XML file, and specify what you’re looking for and where to get it. It is built around two main abstractions : Scrapers and Selectors.

Scrapers: what do we want, from where shall we get it ?

Scrapers is a parametric datatype that takes a string-like (str) and any type (a).

Running a Scraper over a str will get you Maybe a. We’ll come back on this definition, but right now it gives us this neat way of defining the purpose of our functions, e.g. :

import qualified Data.Text as T

getIntOutOfThis :: Scraper String Int

getSomeTextOufOfThis :: Scraper T.Text T.Text

getALotOfIntOutOfThis :: Scraper String [Int]

You’ll notice a suspicious, apparent, absence of input values. Please stop finding this suspicous and try to think for a moment that this is awesome. I’ll show that it is actually awesome much later, but bear with me.

Selectors : fancy paths

You can think about selectors more or less as a small reimplementation of xpath - a tad less powerful, but it will get cover a lot of its use-case.

Selectors are simply ways to pick parts from an XML tree. They are defined in the Types.hs file of the Select folder, but I won’t cover their implementation here.

Selectors are written using special operators: mostly @: for lists of predicates, e.g.:

linksWithClassImportant :: Selector
linksWithClassImportant = "a" @: [hasClass" "important"]

Though you can also use catch all selectors:

links :: Selector
links = "a"

You can create your own predicates using @=, an operator that combines an attribute name and its expected value (you’ll see an example of this later).

Finally, selectors can be combined using (//), which will look for children (in arbitrary depth), so “any <a> inside <div>” can be written as :

anyLinkInAnyDiv = "div" // "a"

Finally, all this can be combined with neat primitives functions. So if you have an xml like this one:

<languages>
  <language name="Haskell" hasType="yup"/>
  <language name="Javascript" hasType="nope"/>
</languages>

Getting the name of the first typed language could be expressed using the attr primitive could be expressed as :

firstTypedLang :: Scraper String String
firstTypedLang = 
  attr "name" $ "languages" // "language" @: ["hasType" @= "yup"]

If you want to map over all XML nodes that match, you can use the pluralized primitive attrs (and your Scraper would then return a list, in our previous example, Scraper String [String]).

There are of course several primitives that should be self-explanatory: attr, text, html, etc.

Of course, if you want to work with Data.Text rather than String, rather than Strings, you can simply change the signature to:

firstTypedLang'' :: Scraper T.Text T.Text

And you should be good to go. (Frankly, at this point, you should have a basic idea of what Data.Text is. If you have no idea what I’m talking about, once again, think String, but really, it’s not a difficult topic, just a fairly boring one, google it).

How it works

How does Scalpel - and a lot of haskell libraries - achieve this cool syntax ? This is what we’ll study today.

Newtype for functions

Boring newtypes

You have probably already encountered newtype, and if you’re like me, had a hard time grasping what it’s used for. Maybe you found the word “isomorphic” somewhere, asked a mathematically-inclined friend to explain it to you and got an headache.

The most classical use case, the one that’s so easy to undestand that even I understand it, is that this :


type Flag = Bool

f :: Flag -> String
f True = "Yup"
f False = "Nope"

f True

… will compile and the following won’t :


newtype Flag = MakeFlag Bool

f :: Flag -> String
f True = "Yup"
f False = "Nope"

f MakeFlag True

In other words, newtypes gives us the same type, but inside of a special constructor that lets the compiler help us differentiate stuff even though they are, really, the same type. Ok, it’s a box. So for instance, if we have a stupid function like:

isWordInParagraph :: String -> String -> Bool

… we have an idea of what this function is doing, but we’d like to know what String is the word and what String is the paragraph. We can make things clearer through aliases…

type Word = String
type Paragraph = String

isWordInParagraph :: Word -> Paragraph -> Bool

… but if we keep forgetting the order of our parameters (I know I will !) and we want to be safer, we can use newtype to ask our compiler to help us be extra careful :

newtype Word = MkWord String
newtype Paragraph = MkParagraph Paragraph

isWordInParagraph:: Word -> Paragraph -> Bool

isWordInParagraph (MkWord "thee")
                  (MkParagraph "Shall I compare thee to a typed language")

(All this without any computing costs due to boxing and unboxing because The Compiler Is Smart.)

Less boring newtypes

However, newtype is more powerful than that. You probably know it can be used with genericity:

newtype Stuff a = MkStuff a

So we get a MkStuff constructor for any type a. This property gets more interesting for some instances of Functor, Monad and Applicative that takes more than one parametric datatype, and let you rewrite their order, but let’s not go there.

You can replace your value by a getter, so you can easily unbox it without having to pattern match:

> newtype Stuff' a = MkStuff' { getMyStuff :: a }
> let s = MkStuff' 12
> getMyStuff s
12

More importantly, the same way you can write a datatype that contains functions, you can write a newtype that contains one function. And since you have genericity, you can store a generic function.

> newtype Stuff'' a = MkStuff'' { getMyStuff :: a -> a }
> let f = MkStuff'' (+10)
> getMyStuff f 32
42
> let f2 = MakeStuff'' (++ " world")
> getMyStuff f2 "hello "
"hello world"

So we can carry a partially applied function and store it in a newtype. OK. Let us keep this in mind because we’re going to make heavy use of this.

The scrapper newtype

Time to - finally ! - read Scalpel code. Let’s examine the scrapper newtype:

newtype Scraper str a = MkScraper {
        scrapeTagSpec :: TagSpec str -> Maybe a
    }

We won’t go into the details of TagSpec. It’s defined here and for the sake of simplicity, we’ll just say it’s a “view over an XML tree”.

So a Scraper is a function from TagSpec for “str” (which, as you probably guessed, is a string-like type) to Maybe anything. Note, though, that when using Scraper to sign your function, you abstract away this TagSpec and this Maybe. In other words : when you write your Scrapers, you don’t have to think about this stuff.

Functors for Scraper

So. Note that we lose all typeclasses implementation when we use newtype on stuff, unless we use derive - but the author of Scalpel didn’t. Which means that we don’t have Functor, Applicative nor Monad (nor any other typeclass). Though we could because we have the proper Kind for this.

So what if we WANT all this stuff ? Fortunately, the author of scalpel did the implementation for us.

But what would the functor for Scraper even mean ? It would actually mean mapping on the result of the scrapping without leaving the scrapping context.

More specifically, it means specializing from:

fmap :: (a -> b) -> f a -> f b

to:

(a -> b) -> Scraper str a -> Scraper str b

So that you can use functions over “what is going to be scrapped”, before even having to evaluate the whole scrapping process. Remember our previous code :

firstTypedLang :: Scraper String String
firstTypedLang = 
  attr "name" $ "languages" // "language" @: ["hasType" @= "yup"]

Let’s say we have a neat “upperCaseAllTheChars” function laying around, with this amazingly complex code:

upperCaseAllTheChars :: String -> String
upperCaseAllTheChars = map toUpper

We would LOVE to use it directly on the result of firstTypedLang, so it returns the name of the first typed language found in uppercase, because why not. Our types don’t really match. We can compose, though, thanks to the functor implementation: good old fmap will lift our upperCaseAllTheChars into the world of Scrapers. The result will of course still be a Scrapper, since we’re not out of our Scraping process yet.

upperCased :: Scraper String String
upperCased = fmap upperCaseAllTheChars firstTypedLang

How is this fmap for Scraper implemented ?

instance Functor (Scraper str) where
    fmap f (MkScraper a) = MkScraper $ fmap (fmap f) a

Now, if you’re having - like I did - a hard time understanding how it works exactly, you need to think about types (pro-tip: any intersting stuff in Haskell requires you to think about types anyway).

f is a function we want to apply. And a is a function too, since it’s the content boxed inside MkScraper, and as we saw, MkScraper is nothing but a fonction.

Let’s write a less concise version to understand what is happening. Let us specialize the fmap signature for what we want :

fmapScrap :: (a -> b) -> (Scrapper str a) -> (Scrapper str b)`

And without newtype we would actually get:

fmapScrap :: (a -> b)
         -> (TagSpec str -> a)
         -> (TagSpec str -> b)`

Let us look at our goal : returning a Scraper, so a function boxed in a newtype. Let us get cute:

instance Functor (Scraper str) where
    fmap f (MkScraper f') = MkScraper computeStuff
      where
        computeStuff = undefined

Ok, that’s cheating because we used undefined, but it compiles. We know that computeStuff takes an input, the TagSpec str bit. So, we could at least write that parameter:

instance Functor (Scraper str) where
    fmap f (MkScraper f') = MkScraper computeStuff
      where
        computeStuff input = undefined

That way, we build our MkScraper, we feed it a function (without its input), and if we manage to write an implementation of this damned function, this implementation will be called as soon as someone actually call scrapeTagSpec, the unboxer of our newtype:

newtype Scraper str a = MkScraper {
        scrapeTagSpec :: TagSpec str -> Maybe a
    }

Let’s actually try to write computeStuff. We have our f, typed (a -> b), we have a (MkScraper f') typed (TagSpec str -> Maybe a). And we have the TagSpec str, it’s the input parameter of my computeStuff. If we apply f' to input, we’ll get a Maybe a. If we case match over this a, and we get something, we can return a Maybe b !

instance Functor (Scraper str) where
    fmap f (MkScraper f') = MkScraper computeStuff
      where
        computeStuff input = case f' input of
           Just result -> Just (f result)
           Nothing     -> Nothing

It compiles (and it works as expected, it’s just really ugly). We can improve this by rewriting computeStuff using Maybe’s fmap:

computeStuff input = fmap f $ f' input

And even more by infixing fmap using <$>, to this complete version:

instance Functor (Scraper str) where
    fmap f (MkScraper f') = MkScraper computeStuff
      where
        computeStuff input = f <$> f' input

Not as cool as the “real” implementation but perhaps a bit easier to understand. Though of course, now, being proper haskellers, we want to eta reduce this, let’s remove this “input” parameter, it’s silly.

computeStuff = (fmap f) . f'

OK, but now, we don’t really need compute stuff, do we ?

instance Functor (Scraper str) where
    fmap f (MkScraper f') = MkScraper $ (fmap f) . f'

This is reaaaaaally close to Scalpel’s implementation (and I would actually stop here because frankly, its much easier to read).

Now functions are Functors too (and Applicative; and Monad; but really, let’s not go there). The canonical form of the Function type is (->) r. And it’s fmap implementation is:

 instance Functor ((->) r) where
    fmap = (.)

Which translates to “Mapping a function over a function is composition”. So any (.) can be replaced by fmap, thus :

> let f = (+2)
> let g = (+3)
> fmap f g $ 3
8

This is how we get :

instance Functor (Scraper str) where
    fmap f (MkScraper f') = MkScraper $ fmap (fmap f) f'

… which is Scalpel’s implementation, though mine use f' and not a because I like to remember that the content of MkScraper is a function.

Applicative for Scraper

We have a functor, can we get an applicative ? Yes, we can. But what would this Applicative means ?

Let’s start with the “required” functions of an applicative. Any Applicative has to implement pure, “put stuff in a structure”, and (<*>), also called “tie fighter”, which let you “apply a function in a structure” to “a value in a structure”. So as fmap for functor does :

fmap :: (a -> b) -> f a -> f b

(<*>) does :

(<*>) :: Applicative f => f (a -> b) -> f a -> f b

I’m gonna be honest : I have a really hard time grasping Applicatives, and I can’t really “explain” them. So it’s pretty difficult for me to tell the “use of Scraper Applicative”. Ok, here goes a simple way of thinking about Applicative : Applicative is like functor over many arguments. So if I have a function func:

func :: a -> b -> c -> d -> e

And I have f a, f b, f c, f d, an applicative would let me return an f e.

Notably, Applicatives are frequently used for data constructors. So, a simple example of Applicative for Scraper would be, if you have a simple datatype to store an HTML link and its destination :

type Url = String
type LinkName = String
data LinkAndInfo = StupidType Url LinkName

… to scrap some part of the DOM Tree to build your LinkAndInfo:

elemOne :: Scraper String Url
elemOne = attr "href" . "a"

elemTwo :: Scraper String LinkName
elemTwo = attr "text" . "a"

parseLink :: Scraper String LinkAndInfo
parseLink =  LinkAndInfo <$> elemOne <*> elemTwo
-- Or if you prefer the lifted version
parseLink =  liftA2 LinkAndInfo elemOne elemTwo

The same String, namely the first parameter of our Scraper, is going to be used as a parameter for two different extraction operations. Note that we can do the same thing with the Scrapper Monad that we’ll see later, this just a more idiomatic way of doing things.

We can now look at Scalpel’s implementation of Applicative for Scraper.

instance Applicative (Scraper str) where
    pure = MkScraper . const . Just
    (MkScraper f) <*> (MkScraper a) = MkScraper applied
        where applied tags | (Just aVal) <- a tags = ($ aVal) <$> f tags
                           | otherwise             = Nothing

Any Applicative instance should define pure, which stores a value in a default Applicative structure. When dealing with the scrapper newtype, it means we should build a function that, given a value, will return this value; this would, typically be a const; however, we need to return Maybe a value, so this const needs to be over a Just. It’s the same issue we faced with the functor instance : we need to handle both the fact that we’re dealing with a function, and the fact that this function returns Maybe. And really, it’s the composition of both pure implementations for these instances:

instance Applicative ((->) a) where
    pure = const
instance Applicative Maybe where
    pure = Just

Now for the tie fighter (<*>) itself. Oh god, I hate Applicative implementations.

Keep in mind that what’s on the left of (<*>) is very different that from what’s on the right.

I always forget that the left part is a function, because of the way we write many consecutives (<*>) over values. Of course, here, it’s worse, because we’re already wrapping functions.

Remember the type of (<*>):

(<*>) :: Applicative f => f (a -> b) -> f a -> f b

And we know that MkScraper is really :

TagSpec str -> Maybe a

But we’re going to get confused over all these letters, so I’ll change the definition of MkScraper for this one:

TagSpec input -> Maybe output

So, if we “unbox” everything, our specialized (<*>) looks like:

(<*>) :: Applicative f => 
    MkScraper (TagSpec input -> Maybe (outputA -> outputB))
   -> MkScraper (TagSpec input -> Maybe outputA)
   -> MkScraper (TagSpec input -> Maybe outputB)

So, on the left of <*>, you have a newtype that contains a function that will output a function. Once again, keep this in mind and let’s look at the (<*>)definition again:

instance Applicative (Scraper str) where
    (MkScraper f) <*> (MkScraper a) = MkScraper applied
        where applied tags | (Just aVal) <- a tags = ($ aVal) <$> f tags
                           | otherwise             = Nothing

We want to return a function, so we’ll return the helper applied, that takes a tags argument. When applied get a tags, it shall be applied. Note that we give it unapplied to MkScraper in the first line. This is exactly how we proceeded when we started build our own Functor instance for Scraper.

Look at the specialized signature again : on the right, we have a way to produce a Maybe outputA, on the left, a way to maybe produce a function that will transform an outputA to an outputB. So. Let us start with getting the outputA : that’s applying a (the boxed function on the right) to our input tags.

We know that if any element returns Nothing, we can stop our computation. That’s the easy otherwise case. The other case is a bit trickier to read. The unboxing was easy, but what on earth is this ($ aVal) <$> stuff ?

Well. We know that ($) is simply function application, so that:

($) :: (a -> b) -> a -> b

The same way we can carry partially applied functions, e.g.:

(++ "world") :: [Char] -> [Char]

… we can carry “things to apply once we’ll have a function”, e.g.:

($ "hello ") :: ([Char] -> b) -> b

The type of our example ($ "hello ") can be rougly translated by “give me a function that takes a [Char] as its input, and I shall give you the result of this function applied to your”hello “. So, we can take any function that fits this description and get a Maybe as a result, e.g.:

($ "hello ") (++ "world")

This is rather ugly and silly, but it’ll return “hello world”. So, let’s keep in mind that ($ anyvalue) transforms anyvalue into a HOF function, expecting a function that will use anyvalue as its input.

Now, since there is a <$> right after it, this HOF function is going to be lifted through fmap (<$> is the infixed fmap). So we had a:

(a -> b) -> b

But it’s really a:

f (a -> b) -> f b

Consider the type of <*> and notice how close we are.

Now look at the right of the function, we’re going to apply the f or MkScraper f. Remember that this one has a tricky type:

MkScraper (TagSpec input -> Maybe (outputA -> outputB))

Since we applied it to tags, we get the result :

Maybe (a -> b)

So. On the left we have : $(aVal) <$> that translates to f (a -> b) -> f b. And on the right we actually have f (a -> b). Guess what ! We can apply our right-side to our left-side and get f b, actually a maybe b, which is what we need to return since any MksCraper must return a Maybe value.

This can be rewritten in this way, easier to read, but with less concision though :

instance Applicative (Stuff input) where
      pure = Stuff . const . pure
      (Stuff f) <*> (Stuff a) = Stuff applied
          where applied tags | (Just aVal) <- a tags = case (f tags) of
                                                          Just boxedF -> Just $ boxedF aVal
                                                          Nothing -> Nothing
                             | otherwise = Nothing

And since we are done for the Applicative, it’s time to handle the Scary M Word.

The Scraper Monad

What would a Scraper Monad do exactly ? Well, as most monads, it would let us compose scrapers pretty much any way we might want to. Picture the following dumb XML:

<features>
  <feature id="f1" name="Show cool stuff" dependOn="f2"/>
  <feature id="f2" name="Compute cool stuff"/>
  <feature id="f3" name="Make coffee"/>
</languages>

You’ll notice a feature can depend on another feature, identified through an id attribute (yes, it’s a stupid way to store this kind of information, but let’s pretend it’s not).

Let us write a simple function that :

Monads would be of great help here:

get:: String -> Scrapper String String
featureAndRequired initialFeatureId = do 
   feat <- attr "dependOn" $ "features" // "feature" @: ["id" @= (initialFeatureId)]
   return $ attr "name" $ "features" // "feature" @: ["id" @= feat]

So, you scrap an id stored in dependOn; you unbox it as a simple and basic String, labelled feat, so that you can use it to define the next selector; finally, you return the result of the scraping through this new selector. If there was no feature matching any of these ids, or if your feature did not depend on another feature, we’ll get Nothing.

Of course, the do notation is nothing but sugar around a (>>=) version, so if can implement (>>=), we’re good to go. My example is really contrived, but you get the idea.

Good news : the monad implementation is rather simple.

instance Monad (Scraper str) where
    fail = Fail.fail
    return = pure
    (MkScraper a) >>= f = MkScraper combined
        where combined tags | (Just aVal) <- a tags = let (MkScraper b) = f aVal
                                                      in  b tags
                            | otherwise             = Nothing

We’ll ignore fail (boring) and return (it’s pure), to focus on the bind. We see the same pattern we saw for applicative and while trying to create our own version of Functor: first, we build a MkScrapper with a not-yet-applied function, that we define in the where block.

Remember we are in a bind, so we need to return the specialized version of:

m a -> (a -> m b) -> m b

… namely:

MkScraper a -> (a -> MkScraper b) -> MkScraper b

… and since MkScraper contains a function, the REAL type is:

(TagSpec str -> Maybe a)
  -> (a -> (TagSpec str -> Maybe b))
  -> (TagSpec str -> Maybe b)

Since we return MkScraper combined, it means that combined must be ultimately be a function that take a TagSpec str (that shall be the tags parameter, like in the Applicative implementation) and return Maybe b. Once again, we “simply” need to implement this combined function.

We evaluate the scrapper on the left, and we stop computing if it returned Nothing. This is, by the way, very close to the Maybe monad implementation:

instance  Monad Maybe  where
    (Just x) >>= k      = k x
    Nothing  >>= _      = Nothing

… but with an added twist, since we have an additionnal computation to handle.

So, we had a m a as an input for (>>=), we unboxed it. Now, we have a a -> m b on the right side, namely the function we bind over, f.

Let’s focus on where the composition happens:

combined tags | (Just aVal) <- a tags = let (MkScraper b) = f aVal
                                        in  b tags

What is f ? A function that takes an a, and returns a function of type (TagSpec str -> Maybe b). Do we have an a ? Yup, it’s aVal since we computed and unboxed it. So if we apply f to aVal, we get a function (TagSpec str -> Maybe b), that we can store inside a MkScraper since it matches the type !

But remember, combined is the function that will ultimately be applied, so we don’t want to return a Scraper, we want to return what the function inside a Scraper return, namely Maybe something.

We just need to apply our scraper, and we still haves its input available in our scope, it’s tags. Once applied, it will return a Maybe b. Which was our goal all along.

Yes, this is really Function and Maybe

You’ll notice that all the implementations for Functor, Applicative and Monad for Scraper are mostly combination of their Maybe and ((->) r) equivalent.

It was particularly obvious for pure and for fmap.

And of course they are : a Scraper is nothing but a function that returns Maybe (this could probably be implemented through some sort of transformer stack).

WHY ?! Why do we need all this ?

So. A good lib should be about delegating responsabilities. When you use a library, you typically want mostly “results”, ideally with as few boilerplate and complexity as possible. You want an obscur trigonometric computation done for you. You want a complex stack of side effects done for you. Or, in the instance of Scalpel, you mostly want “stuff” out of “some xml or html”.

In other words : you want to deal with high level stuff. The first lesson we can learn from the way Scraper is implemented is that it’s easy to type. Of course, you could replace all instances of Scraper String String, for instance, by their real signature (TagSepc String -> String), but this would bloat your signatures. The newtype promote concision and is ultimately easier to write.

You also want to be able to compose, in various ways, without caring about how the library get things done. The Functor, Applicative and Monad implementations give you idiomatic, standardized, Haskell ways of doing so. If you can reduce your program to a list of a -> b and b -> c and c -> d and you get a Scrapper whatever a, you CAN produce a Scrapper whatever d without having to know everything about the library and memorize tons of functions. So its easier to use.

Finally, you want, as much as possible, to be protected from any implementation change, and mostly “API break” as possible. Notice that the underlying TagSpec type could change and be replaced by some other way of storing the Tree, in the internals of Scalpel, without any need for you to change your Scraper functions.

We’ve seen of computation get handled, combined, composed. We didn’t take the time to watch how to start a computation, though. Most internal scraping will be done using the scrape function implemented in the Scrape.hs module :

scrape :: (Ord str, TagSoup.StringLike str)
       => Scraper str a -> [TagSoup.Tag str] -> Maybe a
scrape s = scrapeTagSpec s . tagsToSpec . TagSoup.canonicalizeTags

Which will call the newtype getter scrapeTagSpec (basically applying the TagSepc str -> Maybe a) function over a preprocessed TagSoup (this is where the wrapping over TagSoup lib happens, by the way). But in your code, you won’t use this.

You’re more likely to use the scrapeUrl entry point, that basically plug a curl call over all this. So you only have to provide your Scraper and an URL.

Or, for parsing a local XML rather than a distant page, you would use the scrapeStringLike function defined somewhere in core :

-- | The 'scrapeStringLike' function parses a 'StringLike' value into a list of
-- tags and executes a 'Scraper' on it.
scrapeStringLike :: (Ord str, TagSoup.StringLike str)
                 => str -> Scraper str a -> Maybe a
scrapeStringLike html scraper
    = scrape scraper (TagSoup.parseTagsOptions TagSoup.parseOptionsFast html)

Notice the entry types for these functions : they are all rather abstract and generic. It is unlikely that the StringLike constraint will drop one day : we’ll always parse XML or HTML out of things that look likes String.

The inner implementation could change drastically without you needing to change anything. Heck, it could use something else than TagSoup to model the tree. Basically, you have a powerful Façade pattern, but where you keep a lot of power over composition. So a neat encapsulation of the underlying implementation.

TL;DR in three points of why this Functor / Applicative / Monadic “API” for a lib is good :

A conclusion of some sort

In the end, for me, it’s all about this Scraper a b signature. I find it gives a great intuition of what abstraction is all about. Abstraction is not about building complex type hierarchies. It’s not even about producing handy typeclasses that you get confused about. It’s about “getting rid of noise”, or if you prefer “abstracting away stuff”. Sure, Scraper a b is really a function from a to b, with some TagSpec and Maybe thrown around, but you don’t need to know that when you’re building a Scraper. Abstract this stuff away. Focus on the task at hand.

Just build your small functions knowing that at the end, you’ll get that juicy Maybe a that you were promised by the signatures of the entry level functions scrapeStringLike and scrapeURL.

In other words : you can ignore implementation and concentrate on what you need to do. Which is what high-level programing is all about. Well, that, and using cool mathematical terms that I barely understand.

(Disclaimer : your mileage may vary)