# Scalpel's Scraper

## Intro

I’d like to start by a post on Scalpel, for two reasons :

I’ve used this library for a personal project and I liked it.

It’s a relatively readable example of some frequently used library idioms in Haskell.

At its heart, Scalpel is but a simple wrapper around TagSoup. TagSoup is more or less the Haskell version of the famous Python’s BeautifulSoup library. I won’t really comment on TagSoup itself here, what interests me is the “wrapping” part of Scalpel. This post is mostly about : “how does one use Haskell common abstractions to build a high level wrapper”.

Let’s start with the github repo. Before even reading the code, I like to consider the cabal file, if only to see if this is going to be costly to build. As you can see, the library depends list is not too long - compared to, say, Lens or Yesod-Core.

You’ll notice that scalpel depends on scalpel-core. Actually, the Scalpel folder is mostly a few Curl utilities and a bunch of re-export over this core. So, the real stuff happens in the scalpel-core folder, where you’ll get three directories. We’ll ignore benchmark and tests and focus on the src/Text/HTML/Scalpel folder where most of the code lives. You can ignore the Core.hs file, once again, it’s mostly reexport.

Finally, following the breadcrumbs, we find the meaty part, in the Internal folder, the Scrape.hs file in the Internal folder. The whole post is going to be about some interesting stuff going on in this file. But first, in case you never heard about Scalpel, let’s quickly see how you can use this library.

## Scalpel 101

Scalpel is a high-level HTML / XML scrapping library. In other words, it lets you get content out of an HTML or XML file, and specify what you’re looking for and where to get it. It is built around two main abstractions : `Scrapers`

and `Selectors`

.

### Scrapers: what do we want, from where shall we get it ?

`Scrapers`

is a parametric datatype that takes a string-like (`str`

) and any type (`a`

).

Running a Scraper over a `str`

will get you `Maybe a`

. We’ll come back on this definition, but right now it gives us this neat way of defining the purpose of our functions, e.g. :

```
import qualified Data.Text as T
getIntOutOfThis :: Scraper String Int
getSomeTextOufOfThis :: Scraper T.Text T.Text
getALotOfIntOutOfThis :: Scraper String [Int]
```

You’ll notice a suspicious, apparent, absence of input values. Please stop finding this suspicous and try to think for a moment that this is awesome. I’ll show that it is actually awesome much later, but bear with me.

### Selectors : fancy paths

You can think about selectors more or less as a small reimplementation of xpath - a tad less powerful, but it will get cover a lot of its use-case.

Selectors are simply ways to pick parts from an XML tree. They are defined in the Types.hs file of the Select folder, but I won’t cover their implementation here.

Selectors are written using special operators: mostly `@:`

for lists of predicates, e.g.:

```
linksWithClassImportant :: Selector
linksWithClassImportant = "a" @: [hasClass" "important"]
```

Though you can also use catch all selectors:

```
links :: Selector
links = "a"
```

You can create your own predicates using `@=`

, an operator that combines an attribute name and its expected value (you’ll see an example of this later).

Finally, selectors can be combined using `(//)`

, which will look for children (in arbitrary depth), so “any `<a>`

inside `<div>`

” can be written as :

`anyLinkInAnyDiv = "div" // "a"`

Finally, all this can be combined with neat primitives functions. So if you have an xml like this one:

```
<languages>
<language name="Haskell" hasType="yup"/>
<language name="Javascript" hasType="nope"/>
</languages>
```

Getting the name of the first typed language could be expressed using the `attr`

primitive could be expressed as :

```
firstTypedLang :: Scraper String String
firstTypedLang =
attr "name" $ "languages" // "language" @: ["hasType" @= "yup"]
```

If you want to map over *all* XML nodes that match, you can use the pluralized primitive `attrs`

(and your Scraper would then return a list, in our previous example, `Scraper String [String]`

).

There are of course several primitives that should be self-explanatory: `attr`

, `text`

, `html`

, etc.

Of course, if you want to work with Data.Text rather than String, rather than Strings, you can simply change the signature to:

`firstTypedLang'' :: Scraper T.Text T.Text`

And you should be good to go. (Frankly, at this point, you should have a basic idea of what `Data.Text`

is. If you have no idea what I’m talking about, once again, think String, but really, it’s not a difficult topic, just a fairly boring one, google it).

## How it works

How does Scalpel - and a lot of haskell libraries - achieve this cool syntax ? This is what we’ll study today.

### Newtype for functions

#### Boring newtypes

You have probably already encountered `newtype`

, and if you’re like me, had a hard time grasping what it’s used for. Maybe you found the word “isomorphic” somewhere, asked a mathematically-inclined friend to explain it to you and got an headache.

The most classical use case, the one that’s so easy to undestand that even I understand it, is that this :

```
type Flag = Bool
f :: Flag -> String
f True = "Yup"
f False = "Nope"
f True
```

… will compile and the following won’t :

```
newtype Flag = MakeFlag Bool
f :: Flag -> String
f True = "Yup"
f False = "Nope"
f MakeFlag True
```

In other words, newtypes gives us the same type, but inside of a special constructor that lets the compiler help us differentiate stuff even though they are, really, the same type. Ok, it’s a box. So for instance, if we have a stupid function like:

`isWordInParagraph :: String -> String -> Bool`

… we have an idea of what this function is doing, but we’d like to know what String is the word and what String is the paragraph. We can make things clearer through aliases…

```
type Word = String
type Paragraph = String
isWordInParagraph :: Word -> Paragraph -> Bool
```

… but if we keep forgetting the order of our parameters (I know I will !) and we want to be safer, we can use `newtype`

to ask our compiler to help us be extra careful :

```
newtype Word = MkWord String
newtype Paragraph = MkParagraph Paragraph
isWordInParagraph:: Word -> Paragraph -> Bool
isWordInParagraph (MkWord "thee")
(MkParagraph "Shall I compare thee to a typed language")
```

(All this without any computing costs due to boxing and unboxing because The Compiler Is Smart.)

### Less boring newtypes

However, `newtype`

is more powerful than that. You probably know it can be used with genericity:

`newtype Stuff a = MkStuff a`

So we get a `MkStuff`

constructor for any type `a`

. This property gets more interesting for some instances of Functor, Monad and Applicative that takes more than one parametric datatype, and let you rewrite their order, but let’s not go there.

You can replace your value by a getter, so you can easily unbox it without having to pattern match:

```
> newtype Stuff' a = MkStuff' { getMyStuff :: a }
> let s = MkStuff' 12
> getMyStuff s
12
```

More importantly, the same way you can write a datatype that contains functions, you can write a newtype that contains one function. And since you have genericity, you can store a generic function.

```
> newtype Stuff'' a = MkStuff'' { getMyStuff :: a -> a }
> let f = MkStuff'' (+10)
> getMyStuff f 32
42
> let f2 = MakeStuff'' (++ " world")
> getMyStuff f2 "hello "
"hello world"
```

So we can carry a partially applied function and store it in a newtype. OK. Let us keep this in mind because we’re going to make heavy use of this.

### The scrapper newtype

Time to - finally ! - read Scalpel code. Let’s examine the scrapper `newtype`

:

```
newtype Scraper str a = MkScraper {
scrapeTagSpec :: TagSpec str -> Maybe a
}
```

We won’t go into the details of `TagSpec`

. It’s defined here and for the sake of simplicity, we’ll just say it’s a “view over an XML tree”.

So a Scraper is a function from `TagSpec`

for “str” (which, as you probably guessed, is a string-like type) to `Maybe`

anything. Note, though, that when using `Scraper`

to sign your function, you abstract away this TagSpec and this Maybe. In other words : when you write your Scrapers, you don’t have to think about this stuff.

## Functors for `Scraper`

So. Note that we lose all typeclasses implementation when we use `newtype`

on stuff, unless we use `derive`

- but the author of Scalpel didn’t. Which means that we don’t have `Functor`

, `Applicative`

nor `Monad`

(nor any other typeclass). Though we could because we have the proper Kind for this.

So what if we WANT all this stuff ? Fortunately, the author of scalpel did the implementation for us.

But what would the functor for Scraper even mean ? It would actually mean mapping on the result of the scrapping without leaving the scrapping context.

More specifically, it means specializing from:

`fmap :: (a -> b) -> f a -> f b`

to:

`(a -> b) -> Scraper str a -> Scraper str b`

So that you can use functions over “what is going to be scrapped”, before even having to evaluate the whole scrapping process. Remember our previous code :

```
firstTypedLang :: Scraper String String
firstTypedLang =
attr "name" $ "languages" // "language" @: ["hasType" @= "yup"]
```

Let’s say we have a neat “upperCaseAllTheChars” function laying around, with this amazingly complex code:

```
upperCaseAllTheChars :: String -> String
upperCaseAllTheChars = map toUpper
```

We would LOVE to use it directly on the result of firstTypedLang, so it returns the name of the first typed language found in uppercase, because why not. Our types don’t really match. We can compose, though, thanks to the functor implementation: good old `fmap`

will lift our `upperCaseAllTheChars`

into the world of Scrapers. The result will of course still be a Scrapper, since we’re not out of our Scraping process yet.

```
upperCased :: Scraper String String
upperCased = fmap upperCaseAllTheChars firstTypedLang
```

How is this fmap for Scraper implemented ?

```
instance Functor (Scraper str) where
fmap f (MkScraper a) = MkScraper $ fmap (fmap f) a
```

Now, if you’re having - like I did - a hard time understanding how it works exactly, you need to think about types (pro-tip: any intersting stuff in Haskell requires you to think about types anyway).

`f`

is a function we want to apply. And `a`

is a function too, since it’s the content boxed inside `MkScraper`

, and as we saw, `MkScraper`

is nothing but a fonction.

Let’s write a less concise version to understand what is happening. Let us specialize the fmap signature for what we want :

`fmapScrap :: (a -> b) -> (Scrapper str a) -> (Scrapper str b)``

And without `newtype`

we would actually get:

```
fmapScrap :: (a -> b)
-> (TagSpec str -> a)
-> (TagSpec str -> b)`
```

Let us look at our goal : returning a `Scraper`

, so a function boxed in a `newtype`

. Let us get cute:

```
instance Functor (Scraper str) where
fmap f (MkScraper f') = MkScraper computeStuff
where
computeStuff = undefined
```

Ok, that’s cheating because we used undefined, but it compiles. We know that computeStuff takes an input, the `TagSpec str`

bit. So, we could at least write that parameter:

```
instance Functor (Scraper str) where
fmap f (MkScraper f') = MkScraper computeStuff
where
computeStuff input = undefined
```

That way, we build our MkScraper, we feed it a function (without its input), and if we manage to write an implementation of this damned function, this implementation will be called as soon as someone actually call `scrapeTagSpec`

, the unboxer of our `newtype`

:

```
newtype Scraper str a = MkScraper {
scrapeTagSpec :: TagSpec str -> Maybe a
}
```

Let’s actually try to write computeStuff. We have our `f`

, typed `(a -> b)`

, we have a `(MkScraper f')`

typed `(TagSpec str -> Maybe a)`

. And we have the `TagSpec str`

, it’s the `input`

parameter of my `computeStuff`

. If we apply `f'`

to `input`

, we’ll get a `Maybe a`

. If we case match over this `a`

, and we get something, we can return a `Maybe b`

!

```
instance Functor (Scraper str) where
fmap f (MkScraper f') = MkScraper computeStuff
where
computeStuff input = case f' input of
Just result -> Just (f result)
Nothing -> Nothing
```

It compiles (and it works as expected, it’s just really ugly). We can improve this by rewriting `computeStuff`

using `Maybe`

’s fmap:

`computeStuff input = fmap f $ f' input`

And even more by infixing `fmap`

using `<$>`

, to this complete version:

```
instance Functor (Scraper str) where
fmap f (MkScraper f') = MkScraper computeStuff
where
computeStuff input = f <$> f' input
```

Not as cool as the “real” implementation but perhaps a bit easier to understand. Though of course, now, being proper haskellers, we want to eta reduce this, let’s remove this “input” parameter, it’s silly.

`computeStuff = (fmap f) . f'`

OK, but now, we don’t really need compute stuff, do we ?

```
instance Functor (Scraper str) where
fmap f (MkScraper f') = MkScraper $ (fmap f) . f'
```

This is reaaaaaally close to Scalpel’s implementation (and I would actually stop here because frankly, its much easier to read).

Now functions are Functors too (and Applicative; and Monad; but really, let’s not go there). The canonical form of the Function type is `(->) r`

. And it’s fmap implementation is:

```
instance Functor ((->) r) where
fmap = (.)
```

Which translates to “Mapping a function over a function is composition”. So any `(.)`

can be replaced by `fmap`

, thus :

```
> let f = (+2)
> let g = (+3)
> fmap f g $ 3
8
```

This is how we get :

```
instance Functor (Scraper str) where
fmap f (MkScraper f') = MkScraper $ fmap (fmap f) f'
```

… which is Scalpel’s implementation, though mine use `f'`

and not `a`

because I like to remember that the content of MkScraper *is a function*.

## Applicative for `Scraper`

We have a functor, can we get an applicative ? Yes, we can. But what would this Applicative means ?

Let’s start with the “required” functions of an applicative. Any `Applicative`

has to implement `pure`

, “put stuff in a structure”, and `(<*>)`

, also called “tie fighter”, which let you “apply a function in a structure” to “a value in a structure”. So as `fmap`

for functor does :

`fmap :: (a -> b) -> f a -> f b`

`(<*>)`

does :

`(<*>) :: Applicative f => f (a -> b) -> f a -> f b`

I’m gonna be honest : I have a really hard time grasping Applicatives, and I can’t really “explain” them. So it’s pretty difficult for me to tell the “use of `Scraper Applicative`

”. Ok, here goes a simple way of thinking about `Applicative`

: `Applicative`

is like `functor`

over many arguments. So if I have a function `func`

:

`func :: a -> b -> c -> d -> e`

And I have `f a`

, `f b`

, `f c`

, `f d`

, an applicative would let me return an `f e`

.

Notably, `Applicatives`

are frequently used for data constructors. So, a simple example of `Applicative`

for `Scraper`

would be, if you have a simple datatype to store an HTML link and its destination :

```
type Url = String
type LinkName = String
data LinkAndInfo = StupidType Url LinkName
```

… to scrap some part of the DOM Tree to build your `LinkAndInfo`

:

```
elemOne :: Scraper String Url
elemOne = attr "href" . "a"
elemTwo :: Scraper String LinkName
elemTwo = attr "text" . "a"
parseLink :: Scraper String LinkAndInfo
parseLink = LinkAndInfo <$> elemOne <*> elemTwo
-- Or if you prefer the lifted version
parseLink = liftA2 LinkAndInfo elemOne elemTwo
```

The same String, namely the first parameter of our `Scraper`

, is going to be used as a parameter for two different extraction operations. Note that we can do the same thing with the Scrapper Monad that we’ll see later, this just a more idiomatic way of doing things.

We can now look at Scalpel’s implementation of `Applicative`

for `Scraper`

.

```
instance Applicative (Scraper str) where
pure = MkScraper . const . Just
(MkScraper f) <*> (MkScraper a) = MkScraper applied
where applied tags | (Just aVal) <- a tags = ($ aVal) <$> f tags
| otherwise = Nothing
```

Any `Applicative`

instance should define `pure`

, which stores a value in a default `Applicative`

structure. When dealing with the scrapper `newtype`

, it means we should build a function that, given a value, will return this value; this would, typically be a `const`

; however, we need to return `Maybe`

a value, so this const needs to be over a Just. It’s the same issue we faced with the functor instance : we need to handle both the fact that we’re dealing with a function, and the fact that this function returns `Maybe`

. And really, it’s the composition of both `pure`

implementations for these instances:

```
instance Applicative ((->) a) where
pure = const
```

```
instance Applicative Maybe where
pure = Just
```

Now for the tie fighter (`<*>`

) itself. Oh god, I hate Applicative implementations.

*Keep in mind that what’s on the left of (<*>) is very different that from what’s on the right*.

I *always* forget that the left part is a function, because of the way we write many consecutives (`<*>`

) over values. Of course, here, it’s worse, because we’re already wrapping functions.

Remember the type of `(<*>)`

:

`(<*>) :: Applicative f => f (a -> b) -> f a -> f b`

And we know that `MkScraper`

is really :

`TagSpec str -> Maybe a`

But we’re going to get confused over all these letters, so I’ll change the definition of `MkScraper`

for this one:

`TagSpec input -> Maybe output`

So, if we “unbox” everything, our specialized (`<*>`

) looks like:

```
(<*>) :: Applicative f =>
MkScraper (TagSpec input -> Maybe (outputA -> outputB))
-> MkScraper (TagSpec input -> Maybe outputA)
-> MkScraper (TagSpec input -> Maybe outputB)
```

So, on the left of `<*>`

, you have a `newtype`

that contains a function that will output a function. Once again, *keep this in mind* and let’s look at the `(<*>)`

definition again:

```
instance Applicative (Scraper str) where
(MkScraper f) <*> (MkScraper a) = MkScraper applied
where applied tags | (Just aVal) <- a tags = ($ aVal) <$> f tags
| otherwise = Nothing
```

We want to return a function, so we’ll return the helper `applied`

, that takes a `tags`

argument. When `applied`

get a `tags`

, it shall be applied. Note that we give it *unapplied* to `MkScraper`

in the first line. This is exactly how we proceeded when we started build our own `Functor`

instance for `Scraper`

.

Look at the specialized signature again : on the right, we have a way to produce a `Maybe outputA`

, on the left, a way to maybe produce a function that will transform an outputA to an outputB. So. Let us start with getting the outputA : that’s applying `a`

(the boxed function on the right) to our input `tags`

.

We know that if any element returns Nothing, we can stop our computation. That’s the easy `otherwise`

case. The other case is a bit trickier to read. The unboxing was easy, but what on earth is this `($ aVal) <$>`

stuff ?

Well. We know that `($)`

is simply function application, so that:

`($) :: (a -> b) -> a -> b`

The same way we can carry partially applied functions, e.g.:

`(++ "world") :: [Char] -> [Char]`

… we can carry “things to apply once we’ll have a function”, e.g.:

`($ "hello ") :: ([Char] -> b) -> b`

The type of our example `($ "hello ")`

can be rougly translated by “give me a function that takes a `[Char]`

as its input, and I shall give you the result of this function applied to your”hello “. So, we can take any function that fits this description and get a Maybe as a result, e.g.:

`($ "hello ") (++ "world")`

This is rather ugly and silly, but it’ll return “hello world”. So, let’s keep in mind that `($ anyvalue)`

transforms `anyvalue`

into a HOF function, expecting a function that will use `anyvalue`

as its input.

Now, since there is a `<$>`

right after it, this HOF function is going to be lifted through fmap (`<$>`

is the infixed `fmap`

). So we had a:

`(a -> b) -> b`

But it’s really a:

`f (a -> b) -> f b`

Consider the type of `<*>`

and notice how close we are.

Now look at the right of the function, we’re going to apply the `f`

or `MkScraper f`

. Remember that this one has a tricky type:

`MkScraper (TagSpec input -> Maybe (outputA -> outputB))`

Since we applied it to `tags`

, we get the result :

`Maybe (a -> b)`

So. On the left we have : `$(aVal) <$>`

that translates to `f (a -> b) -> f b`

. And on the right we actually have `f (a -> b)`

. Guess what ! We can apply our right-side to our left-side and get `f b`

, actually a `maybe b`

, which is what we need to return since any `MksCraper`

must return a Maybe value.

This can be rewritten in this way, easier to read, but with less concision though :

```
instance Applicative (Stuff input) where
pure = Stuff . const . pure
(Stuff f) <*> (Stuff a) = Stuff applied
where applied tags | (Just aVal) <- a tags = case (f tags) of
Just boxedF -> Just $ boxedF aVal
Nothing -> Nothing
| otherwise = Nothing
```

And since we are done for the `Applicative`

, it’s time to handle the Scary M Word.

## The `Scraper`

Monad

What would a `Scraper`

Monad *do* exactly ? Well, as most monads, it would let us compose scrapers pretty much any way we might want to. Picture the following dumb XML:

```
<features>
<feature id="f1" name="Show cool stuff" dependOn="f2"/>
<feature id="f2" name="Compute cool stuff"/>
<feature id="f3" name="Make coffee"/>
</languages>
```

You’ll notice a feature can depend on another feature, identified through an `id`

attribute (yes, it’s a stupid way to store this kind of information, but let’s pretend it’s not).

Let us write a simple function that :

Takes and

`id`

;Get the task that matches this

`id`

;Returns the name of the feature it depends of if there is one.

Monads would be of great help here:

```
get:: String -> Scrapper String String
featureAndRequired initialFeatureId = do
feat <- attr "dependOn" $ "features" // "feature" @: ["id" @= (initialFeatureId)]
return $ attr "name" $ "features" // "feature" @: ["id" @= feat]
```

So, you scrap an id stored in dependOn; you unbox it as a simple and basic `String`

, labelled `feat`

, so that you can use it to define the next selector; finally, you return the result of the scraping through this new selector. If there was no feature matching any of these ids, or if your feature did not depend on another feature, we’ll get `Nothing`

.

Of course, the `do`

notation is nothing but sugar around a `(>>=)`

version, so if can implement `(>>=)`

, we’re good to go. My example is really contrived, but you get the idea.

Good news : the monad implementation is rather simple.

```
instance Monad (Scraper str) where
fail = Fail.fail
return = pure
(MkScraper a) >>= f = MkScraper combined
where combined tags | (Just aVal) <- a tags = let (MkScraper b) = f aVal
in b tags
| otherwise = Nothing
```

We’ll ignore `fail`

(boring) and `return`

(it’s `pure`

), to focus on the `bind`

. We see the same pattern we saw for applicative and while trying to create our own version of `Functor`

: first, we build a `MkScrapper`

with a not-yet-applied function, that we define in the `where`

block.

Remember we are in a `bind`

, so we need to return the specialized version of:

`m a -> (a -> m b) -> m b`

… namely:

`MkScraper a -> (a -> MkScraper b) -> MkScraper b`

… and since MkScraper contains a function, the REAL type is:

```
(TagSpec str -> Maybe a)
-> (a -> (TagSpec str -> Maybe b))
-> (TagSpec str -> Maybe b)
```

Since we return `MkScraper combined`

, it means that `combined`

must be ultimately be a function that take a `TagSpec str`

(that shall be the `tags`

parameter, like in the `Applicative`

implementation) and return `Maybe b`

. Once again, we “simply” need to implement this `combined`

function.

We evaluate the scrapper on the left, and we stop computing if it returned Nothing. This is, by the way, very close to the Maybe monad implementation:

```
instance Monad Maybe where
(Just x) >>= k = k x
Nothing >>= _ = Nothing
```

… but with an added twist, since we have an additionnal computation to handle.

So, we had a `m a`

as an input for `(>>=)`

, we unboxed it. Now, we have a `a -> m b`

on the right side, namely the function we bind over, `f`

.

Let’s focus on where the composition happens:

```
combined tags | (Just aVal) <- a tags = let (MkScraper b) = f aVal
in b tags
```

What is `f`

? A function that takes an `a`

, and returns a function of type `(TagSpec str -> Maybe b)`

. Do we have an `a`

? Yup, it’s `aVal`

since we computed and unboxed it. So if we apply `f`

to `aVal`

, we get a function `(TagSpec str -> Maybe b)`

, that we can store inside a `MkScraper`

since it matches the type !

But remember, `combined`

is the function that will ultimately be applied, so we don’t want to return a `Scraper`

, we want to return what the function inside a `Scraper`

return, namely `Maybe something`

.

We just need to apply our scraper, and we still haves its `input`

available in our scope, it’s `tags`

. Once applied, it will return a `Maybe b`

. Which was our goal all along.

## Yes, this is really Function and Maybe

You’ll notice that all the implementations for Functor, Applicative and Monad for `Scraper`

are mostly combination of their Maybe and (`(->) r`

) equivalent.

It was particularly obvious for `pure`

and for `fmap`

.

And of course they are : a `Scraper`

is nothing but a function that returns Maybe (this could probably be implemented through some sort of transformer stack).

## WHY ?! Why do we need all this ?

So. A good lib should be about delegating responsabilities. When you use a library, you typically want mostly “results”, ideally with as few boilerplate and complexity as possible. You want an obscur trigonometric computation done for you. You want a complex stack of side effects done for you. Or, in the instance of Scalpel, you mostly want “stuff” out of “some xml or html”.

In other words : you want to deal with high level stuff. The first lesson we can learn from the way `Scraper`

is implemented is that it’s easy to type. Of course, you could replace all instances of `Scraper String String`

, for instance, by their real signature `(TagSepc String -> String)`

, but this would bloat your signatures. The `newtype`

promote concision and is ultimately *easier to write*.

You also want to be able to compose, in various ways, without caring about how the library get things done. The `Functor`

, `Applicative`

and `Monad`

implementations give you idiomatic, standardized, Haskell ways of doing so. If you can reduce your program to a list of `a -> b`

and `b -> c`

and `c -> d`

and you get a `Scrapper whatever a`

, you CAN produce a `Scrapper whatever d`

without having to know everything about the library and memorize tons of functions. So its *easier to use*.

Finally, you want, as much as possible, to be protected from any implementation change, and mostly “API break” as possible. Notice that the underlying `TagSpec`

type could change and be replaced by some other way of storing the Tree, in the internals of Scalpel, without any need for you to change your `Scraper`

functions.

We’ve seen of computation get handled, combined, composed. We didn’t take the time to watch how to start a computation, though. Most *internal* scraping will be done using the `scrape`

function implemented in the Scrape.hs module :

```
scrape :: (Ord str, TagSoup.StringLike str)
=> Scraper str a -> [TagSoup.Tag str] -> Maybe a
scrape s = scrapeTagSpec s . tagsToSpec . TagSoup.canonicalizeTags
```

Which will call the `newtype`

getter `scrapeTagSpec`

(basically applying the `TagSepc str -> Maybe a`

) function over a preprocessed `TagSoup`

(this is where the wrapping over `TagSoup`

lib happens, by the way). But in your code, you won’t use this.

You’re more likely to use the scrapeUrl entry point, that basically plug a curl call over all this. So you only have to provide your `Scraper`

and an URL.

Or, for parsing a local XML rather than a distant page, you would use the `scrapeStringLike`

function defined somewhere in core :

```
-- | The 'scrapeStringLike' function parses a 'StringLike' value into a list of
-- tags and executes a 'Scraper' on it.
scrapeStringLike :: (Ord str, TagSoup.StringLike str)
=> str -> Scraper str a -> Maybe a
scrapeStringLike html scraper
= scrape scraper (TagSoup.parseTagsOptions TagSoup.parseOptionsFast html)
```

Notice the entry types for these functions : they are all rather abstract and generic. It is unlikely that the `StringLike`

constraint will drop one day : we’ll always parse XML or HTML out of things that look likes String.

The inner implementation could change drastically without you needing to change anything. Heck, it could use something else than TagSoup to model the tree. Basically, you have a powerful Façade pattern, but where you keep a lot of power over composition. So a *neat encapsulation of the underlying implementation*.

TL;DR in three points of why this `Functor`

/ `Applicative`

/ `Monadic`

“API” for a lib is good :

things get easier to write;

lets you use stuff without caring about implementation;

provide better encapsulation.

## A conclusion of some sort

In the end, for me, it’s all about this `Scraper a b`

signature. I find it gives a great intuition of what abstraction is all about. Abstraction is not about building complex type hierarchies. It’s not even about producing handy typeclasses that you get confused about. It’s about “getting rid of noise”, or if you prefer “abstracting away stuff”. Sure, `Scraper a b`

is really a function from `a`

to `b`

, with some TagSpec and Maybe thrown around, but you don’t need to know that when you’re building a `Scraper`

. Abstract this stuff away. Focus on the task at hand.

Just build your small functions knowing that at the end, you’ll get that juicy `Maybe a`

that you were promised by the signatures of the entry level functions `scrapeStringLike`

and `scrapeURL`

.

In other words : you can ignore implementation and concentrate on what *you* need to do. Which is what high-level programing is all about. Well, that, and using cool mathematical terms that I barely understand.

(Disclaimer : your mileage may vary)