Optparse-Applicative's parser structure

Posted on May 29, 2017

I confessed in my first post about Scalpel that I have a hard time getting a good intuition about Applicative. To improve my understanding of the concept, I decided to investigate a great library, optparse-applicative - that I will nickname “the OA” in the rest of this post, for lulz and keyboard-related-lazyness reasons.

The OA can be compared, for instance, to Python’s argparse or Rust’s clap: a command-line option parser. There are several libraries in Haskell to perform this common task, conveniently listed here.

As the name suggests, the OA uses, emphatically, Applicative; the README explains this choice: building your API lib around Applicative is a common usage when it comes to parsing, established by libs like parsec , attoparsec and aeson.

The same README gives this simple usage example. This will be my main reference in this post, and I’ll try to detail every brick these few lines are built upon.

import Options.Applicative
import Data.Semigroup ((<>))

data Sample = Sample
  { hello      :: String
  , quiet      :: Bool
  , enthusiasm :: Int }

sample :: Parser Sample
sample = Sample
      <$> strOption
          ( long "hello"
         <> metavar "TARGET"
         <> help "Target for the greeting" )
      <*> switch
          ( long "quiet"
         <> short 'q'
         <> help "Whether to be quiet" )
      <*> option auto
          ( long "enthusiasm"
         <> help "How enthusiastically to greet"
         <> showDefault
         <> value 1
         <> metavar "INT" )

Without even having to read any library code, if you know a bit of Haskell, you can already see that the OA provides a parametric Parser abstraction, shown in the signature of the sample method; it also comes with a set of utility methods - here, strOption, switch and option, that we can compose inside a bigger applicative builder. These methods make heavy use of the shorter version of mappend, (<>).

Having used various cli-parsing library in several languages, I have to say this is the most intuitive lib I’ve found. For most simple usecase, you will :

Once again, we can see that sitting on top of Haskell’s common abstractions allow for powerful, simple and easy to use interfaces. With this fairly basic API, the OA will give us an automatic help method, commodities to build autocompletion, and ways to plug custom error messages.

How we will read the OA’s code

Unfortunately, the code required to produce such a clear and easy to read API is not frequently, itself, very clear and easy to read. It is fairly abstract, handle many potential cases, and sometimes relies on language extensions that beginners like me tend to avoid. And, most of all, and we will often see this when trying to read libraries, we encounter a common problem. I don’t know what’s the canonical word for this particular problem, so I’ll call it nested abstractions.

Taken individually, most abstractions are not scary. Monoids are types that can be associated to produce a new version of the same type, and that provide an identity value; that’s it. Functors are structure over a value that can be modified without modifying the structure itself. Even monads, to a point, are not that difficult to grasp conceptually. However, when we start mixing various abstractions, it gets tricky quickly.

This problem is not specific to Haskell, though. Every potent piece of software will make heavy use of the abstraction provided by the language, or to a set of custom abstractions, to the point where it gets harder to get a clear picture. And yet, I find Haskell helps here, precisely because most Haskell code will nest its abstractions. The same way that MonadTransformers will tend to be nested; you have “something over something over something”; admittedly, this is scary at first, but at the same time, it gives you an order in which to read. I would even go further and say that Haskell lets us identify easily the two entrypoints for abstraction stacks.

That said, I find there is no clear rule of thumb to decide what’s best between the “top-down” or “bottom-up” approach.

Sometimes, the definition for the broader abstraction is enough.

Sometimes, they are a bit too abstract and it’s better to start from the bottom.

It might because I’m not yet used to the whole set of common abstractions that Haskell offers, though. In this particular instance, I’ll try to “dive” as deep in the implementation as I can, till I can gradually come back to the highest level.

A final note; as during our exploration of Scalpel, you typically tackle Haskell libraries through their types first, and how these are built and composed. Only then can you have a fine grasp of the functions that will “do stuff out of these types”. Today, we will only study OA’s types, not the parsing process itself.

So Many Options

A first look at the cabal file shows another great thing about the OA : it doesn’t need much. Look at the dependencies : transformers, a library to access the OS’ processes, a pretty printer, and that’s all. In the exposed modules, we notice in the listing a Option.Applicative.Types that sounds promising, and we immediately open a 390-line long file.

This file contains the definition for the Parser type; I’m sure an experience haskeller can read it with ease, but we’re not there yet, and when we look at it, we immediately regret having opened this file, we feel our heart failing us, and we’d like to see simpler things and less language extension, thank you very much. This is perfectly normal.

I won’t copy the definition of Parser right now, because it’s scary. The type contains several recursive constructors; there is a simple one, thankfully, called OptP, and this one mostyl contains a type named Option. Since a lot of Parsers will actually be OptP, we will enter the rest of the types defined by the OA through this door.

The Option type

Option is also defined in Types.hs:

-- | A single option of a parser.
data Option a = Option
  { optMain :: OptReader a               -- ^ reader for this option
  , optProps :: OptProperties            -- ^ properties of this option

An option is but a reader and some properties. Two other types, then ! OptReader and OptProperties. The second one is simpler, so let’s start here.

The OptProperties type

In the same module :

-- | Specification for an individual parser option.
data OptProperties = OptProperties
  { propVisibility :: OptVisibility       -- ^ whether this flag is shown is the brief description
  , propHelp :: Chunk Doc                 -- ^ help text for this option
  , propMetaVar :: String                 -- ^ metavariable for this option
  , propShowDefault :: Maybe String       -- ^ what to show in the help text as the default
  , propDescMod :: Maybe ( Doc -> Doc )   -- ^ a function to run over the brief description

This is rather self-explanatory and boring. We’ll notice the odd Chunk part; Chunk is a “free monoid” defined in the Help directory, that handles stuff that happen when the user types “–help”. Actually, this whole type is mostly here to know how to display help messages. We won’t comment it any further.

The OptReader type

Once again, in the Types.hs file:

-- | An 'OptReader' defines whether an option matches an command line argument.
data OptReader a
  = OptReader [OptName] (CReader a) (String -> ParseError)
  -- ^ option reader
  | FlagReader [OptName] !a
  -- ^ flag reader
  | ArgReader (CReader a)
  -- ^ argument reader
  | CmdReader (Maybe String) [String] (String -> Maybe (ParserInfo a))
  -- ^ command reader

The various builders help us grasp the various options the OA will let you use.

Reading an option or a flag demands a list of [OptName], defined as:

data OptName = OptShort !Char
             | OptLong !String
  deriving (Eq, Ord, Show)

Once again, a classical pattern; you can type: ls --human-readable. Though I’m yet to see someone actually type that, people tend to use the shorter version: ls -h. Which is why we have a list of OptNames. For the ArgReader, this is not necessary; positional arguments are parsed in their order of appearance, so they don’t need names.

Finally, we note that an OptReader can read potentially any a; this is why we typically need a CReader that will, probably, be able to “translate” the string from the command line into a. This is the next type we need to read.

The CReader type

Again, more nesting ! A CReader (short of Command line Reader, I suppose) is:

data CReader a = CReader
  { crCompleter :: Completer
  , crReader :: ReadM a }

We will ignore Completer, used for autocompletion; and focus on the ReadM. As proper haskellers, we are of course excited by this final M, for we are attracted by the delicious smell of Monads. And this time, we are going to touch a type that stands at the core of the library.

The ReadM type

Types.hs contain the whole definition for the ReadM type:

-- | A newtype over 'ReaderT String Except', used by option readers.
newtype ReadM a = ReadM
  { unReadM :: ReaderT String (Except ParseError) a }

instance Functor ReadM where
  fmap f (ReadM r) = ReadM (fmap f r)

instance Applicative ReadM where
  pure = ReadM . pure
  ReadM x <*> ReadM y = ReadM $ x <*> y

instance Alternative ReadM where
  empty = mzero
  (<|>) = mplus

instance Monad ReadM where
  return = pure
  ReadM r >>= f = ReadM $ r >>= unReadM . f
  fail = readerError

instance MonadPlus ReadM where
  mzero = ReadM mzero
  mplus (ReadM x) (ReadM y) = ReadM $ mplus x y

-- | Return the value being read.
readerAsk :: ReadM String
readerAsk = ReadM ask

We have a newtype is nothing but :

Though I suspect the Functor, Applicative and Monad could have been automatically derived using the GeneralizedNewtypeDeriving extension, the author wrote the implementation manually. If you know nothing about Haskell, Reader is a way to combine functions who should all be able to have read-only access to a value - here, a simple String. ReaderT is a variation provided, in this case, by the mtl library, that lets us use Reader plus another monad; in this instance, the Except monad. This monad lets us combine functions that could throw errors at us rather than providing a nice value. Parsing from the command line involves reading something that a user typed; users have been proven to be unreliable; it makes sense we want to be able to throw exceptions.

The function readerAsk is simply ask from ReaderT (“getting the value we have read-access over”) rewritten to stay in our newtype. Though we can’t really be sure of it yet, it’s pretty obvious that “the value we have read access over” will be “what the user typed”. Note that, thanks to the genericity of this a, though we will always get a String as our shared value, we can transform it to any other type. So we can theoretically build a ReadM Int, or a ReadM for any type we created ourselves, actually.

Too. Many. Types.

We know have a clearer picture of what Options are. We could probably build some by hand, without using the nice interface that optparse-applicative provides. It would be rather verbose, though, since there are a lot of types involved.

Just to build a simple, dumb, ArgReader for a String, we would need to type something like:

ArgReader (CReader (mkCompleter (return . const [])) readerAsk)

And that’s not even a full Option, we would need to provide the OptProperties.

The issue with type safety is that it can become verbose rather quickly. Fortunately, Haskell provides a wealth of strategy and solution to circumvent this verbosity while keeping our safe and sound type system. Many common abstractions are really syntactic sugar. Here, Monoid is going to help us tremendously, as we’re going to see.

Constructors “à la Mod

Before going on, I’ll add a few words on a common pattern in Haskell source-code architecture and a few rules of thumb:

  1. The type model behind a library is typically exposed in a Types.hs module, as we’ve just seen.

  2. When these types get cumbersome to write directly, it is fairly common to see modules with names like Builder.hs; this will typically be shortcuts offered to the user.

  3. These API module are often separated from the innards and utilities that comes with it; a frequent idiom is to put these in a file in the same directory, named Internal.hs (sometimes, like in the OA, Internal.hs will be in a directory with name of the main module, the main module being in the top-directory). You typically won’t really need to read the Internal.hs, unless you’re trying, like we do right here, to understand the exact implementation.

With that in mind, we want to see how the OA manages to get to its pleasant, Applicative plus Monoid (well, Semigroup really) notation rather than the direct use of constructors from its datatypes. Part of the answer will be in the shortcuts functions defined in Builder.hs, but to understand these, we will need to read the Builder/Internal.hs file first, for there lives a powerful, hidden, type.

The Mod datatype

We need to investigate the Builder.Internal module, and go directly to the definitions given for a new datatype : Mod. Let’s get a quick overview:

data Mod f a = Mod (f a -> f a)
                   (DefaultProp a)
                   (OptProperties -> OptProperties)

optionMod :: (OptProperties -> OptProperties) -> Mod f a
optionMod = Mod id mempty

fieldMod :: (f a -> f a) -> Mod f a
fieldMod f = Mod f mempty id

instance Monoid (Mod f a) where
  mempty = Mod id mempty id
  mappend = (<>)

instance Semigroup (Mod f a) where
  Mod f1 d1 g1 <> Mod f2 d2 g2
    = Mod (f2 . f1) (d2 <> d1) (g2 . g1)

Mod is parametric over f and a. To build a Mod, we need a rather simple function (the (f a -> f a)), something called a DefaultProp and another simple function that takes a OptProperties and returns another OptProperties (we will do like the cool kids and call these “simple functions” endomorphism).

The two functions optionMod and fieldMod are smart constructors that let us build a Mod with default id values for the first or the second parameter.

Mod is a Monoid, and, as such, a Semigroup (a Monoid is a Semigroup with mempty, or, in more mathematical terms, an identity element; something that won’t change anything when mappended, like “adding zero” or “multiplying by 1”).

The mempty definition for Mod tells us a bit more about what DefaultProp are: they are monoids themselves, since we define them in terms of mempty when defining the identity element of Mod. And empty endomorphisms are simply… id (see by yourself the canonical Endo monoid definition).

It makes sense : an endomorphism transforms a value of type a into another value of type a; the identity element is the element that doesn’t operate any change; so an “empty endomorphism” simply returns its parameter without changing it.

The Semigroup instance for Mod tells us how we can combine two Mods. Once again, the definition is easy to read: combining endomorphisms will be, in this instance, function composition (so it’s actually the “monoid of endomorphism under composition”). As for the DefaultProp, since we know they’re monoid themselves, we can use their own definition of mappend.

So, what we have here is a bit of a pumped up Endo. If you don’t know Endo, here’s a dumb example of how you could use it:

> import Data.Monoid
> let endos = Endo (+1) <> Endo (+2) <> Endo (+3)
> :t endos
endos :: Num a => Endo a
> endos `appEndo` 0

How Mod is used

Let us go back to the first example of the way to define a parser with the OA. At this point, we should have the intuition that this bit:

        ( long "hello"
          <> metavar "TARGET"
          <> help "Target for the greeting" )

heavily uses, under the hood, the Mod datatype - hence the (<>) between long, metavar and help. But since we want to be sure, the best thing now would be to check the definition for these functions, starting with long:

-- | Specify a long name for an option.
long :: HasName f => String -> Mod f a
long = fieldMod . name . OptLong

We’ve seen this fieldMod. It’s a smart constructor for mod.

fieldMod :: (f a -> f a) -> Mod f a
fieldMod f = Mod f mempty id

And we know that OptLong is one of the constructors of OptName. HasName is a class defined in the internal of Builder. We know that we’re not always going to build stuff with names; positional arguments, for instance, don’t have names, whereas flags and optional arguments do. This typeclass, however, is only for options that have names.

class HasName f where
  name :: OptName -> f a -> f a

So ! Long will build an OptLong from its eta-reducted String parameter; it will then give this as the first parameteter of name. giving us a function of type (f a -> f a). Exactly what fieldMod requires. We get a Mod with our mistery, partially-applied name function; an empty DefaultProp; and finally, the “id” function. This Mod will be mappended to the result of metavar.

metavar :: HasMetavar f => String -> Mod f a
metavar var = optionMod $ \p -> p { propMetaVar = var }

We know that optionMod is the other smart constructor we’ve seen before. It uses id as its first parameter when constructing Mod; yet another mempty for the DefaultProp; and the (OptProperties -> OptProperties) function will be the one we have to provide. Here, it’s a basic lambda that updates the optProperties record, setting propMetaVar. Note that, once again, we need to satisfy a typeclass (HasMetavar).

So, how will the result of long and the result of metavar combine ? It’s rather easy to understand : the f a -> f a from long will be combined with the id from metavar (which is akin to only applying f a -> f a). As for the DefaultProp, being empty in both case, it will stay this way. And the OptProperties endomorphism will be the composition of id and the lambda from metavar; so, only this lambda.

After long and metavar, there will be a call to another function, help, that behave like metavar, only it modifies another field of the OptProperties record. This one will get composed to the previous lambda.

So ! Back to the small piece we were analyzing :

        ( long "hello"
          <> metavar "TARGET"
          <> help "Target for the greeting" )

We now know that the three last lines create a Mod. But what’s the deal with the strOption that takes this Mod as its parameter ?

A first look at the Parser

strOption is a shortcut and a specialization at the same time. Let’s see the signature and definition.

strOption :: IsString s => Mod OptionFields s -> Parser s
strOption = option str

It will take a Mod parametrized over OptionField as its input. We haven’t had the pleasure to meet OptionFields yet, it lives inside the Builder.Internal module and it’s pretty simple:

data OptionFields a = OptionFields
  { optNames :: [OptName]
  , optCompleter :: Completer
  , optNoArgError :: String -> ParseError }

So OptionFields is parametric over a (an a that became a s in the signature for strOption), and it contains various data, first and foremost the names of the options. Allow me to quote, once again the definition for Mod:

data Mod f a = Mod (f a -> f a)
                   (DefaultProp a)
                   (OptProperties -> OptProperties)

Mod is parametric over f and a; when using strOption, f is going to be this OptionFields. Actually, there are x-Fields datatype defined for all of the OptReader types; flags, arguments and sucommands. We still don’t really know what a is, though. While we’re at it, we know that strOption will use a Mod OptionFields, but we’re yet to understand what it will do with it. It’s calling a function named option, with a basic ReadM over String apty named str: it will return the string that the user typed in the command-line.

We might want to read option too:

option :: ReadM a -> Mod OptionFields a -> Parser a
option r m = mkParser d g rdr
    Mod f d g = metavar "ARG" `mappend` m
    fields = f (OptionFields [] mempty ExpectsArgError)
    crdr = CReader (optCompleter fields) r
    rdr = OptReader (optNames fields) crdr (optNoArgError fields)

So… option is a utility method that takes a ReadM; remember, a ReadM is mostly a ReaderT able to throw exceptions. It also takes a Mod. And it will return this famous Parser absraction that is more or less the end product of the OA.

Once again, this method is mostly delegating the handywork to another on, mkParser. d and g, the first two parameters, will be the end result of the Mod monoid, as we can see from the first line of the where block. So d is a DefaultProp and g is an endomorphism for OptProperties.

This first line of the where block, by the way !, is a thing of beauty. We’re creating a dumb Mod to initialize the metavar property with a default “ARG” value, and we mappend this to our Mod to get the final result of accumulated values; we immediately deconstruct it using pattern matching, and these deconstructed elements are the one we’re going to use for our call to mkParser. This Mod actually represent the result of all accumulated smaller, helping methods.

What about the three others lines of the where block ? Well, they mostly encapsulate precisely the boring bits we don’t want to have to write everytime. Let us do a quick recap on the types here…

A typical Parser for a contains:

fields, crdr and rdr are there to build the OptReader part of this hierarchy.

We know that the first type parameter for our Mod is OptionFields, and that f is going to be a function OptionFields -> OptionFields. We apply our f (which, as this point, might be a long serie of function composition by each mappend on Mod) to a default OptionField (without name, with an empty Completer and a default error handler).

This intermediary type will be used to build the rest: our CReader uses the ReadM given as parameter for option, plus the Completer (if any) extracted from fields; and the rdr takes the crdr we’ve just built, plus the extracted option names and error handler from fields.

In other words : option is a smart constructor built atop the Mod monoid.

On the topic of monoids for smart, incrementing construction and pleasant API, there’s a good, short and clever post from OCharles. The OA is not that different from the solution suggested by OCharles, the implementation is just a bit harder to follow.

So, at the end of the day, we have a DefaultProp (d) (still mysterious to us since we didn’t study its definition), an endomorphism on OptProperties (g) and a fully fledged OptReader (rdr). Following the type signatures, this is everything that mkParser need to make a… Parser. All this for any type a. Time to see how this mkParser operates.

The Parser Applicative

We’ve dwelved in the depth of the type stack, it’s time to go up to the Parser type itself. Since our entry point was one of its simplest constructor (OptP), we shall start with this one.

The Option constructor

In Builder/Internal.hs, we find the definition for mkParser:

mkParser :: DefaultProp a
         -> (OptProperties -> OptProperties)
         -> OptReader a
         -> Parser a
mkParser d@(DefaultProp def _) g rdr = liftOpt opt <|> maybe empty pure def
    opt = mkOption d g rdr

This should be pretty easy to read if you’re familiar with Alternative and its wonderful (<|>) operator. Basically, this code could be translated as: “try to do the thing at the left of (<|>) and if it doesn’t work, well, I’ll try to output the default.” Without even knowing the way DefaultProp is implemented, we now have a solid idea of how it’s used.

The rest is mostly a cascade of smart constructors. mkParser uses mkOption:

mkOption :: DefaultProp a
         -> (OptProperties -> OptProperties)
         -> OptReader a
         -> Option a
mkOption d g rdr = Option rdr (mkProps d g)

Which uses mkProps:

mkProps :: DefaultProp a
        -> (OptProperties -> OptProperties)
        -> OptProperties
mkProps (DefaultProp def sdef) g = props
    props = (g baseProps)
      { propShowDefault = sdef <*> def }

So. Remember this line from option and how it was used ?

fields = f (OptionFields [] mempty ExpectsArgError)

It’s the same trick at play here. We are using a OptProperties function that, per the magic of the Mod monoid, could be a composition of many functions, and we’ll apply it to the constant baseProps:

baseProps :: OptProperties
baseProps = OptProperties
  { propMetaVar = ""
  , propVisibility = Visible
  , propHelp = mempty
  , propShowDefault = Nothing
  , propDescMod = Nothing

So if we had:

strOption (metavar "STR"
          <> help "A string")

… we would automatically compose these two lambdas:

\p -> p { help = help}
\p -> p { propMetaVar = var }

However, the propShowDefault field (here to indicate if the –help message should display the default value or not) is not accessible this way, it’s stored in the DefaultProp (which handle everything default-related), hence the need for the DefaultProp parameter in mkProps, and the final update in the props function defined in the where block.

You might be surprised by the Applicative syntax sdef <*> def, particularly because we still don’t know how DefaultProp is implemented. If you’re intersted, the next subsection is for you. If not, skip it.

A word on DefaultProp

A DefaultProp is parametric over a type a. It contains a Maybe a and a Maybe (a -> String). This should of course tip you off as to its Applicative nature if you remember the signatures associated with the typeclass. In other words, it’s “maybe a way to transform any a as a String” and “maybe an a”; a potential function and a potential value (in the deconstruction in mkProps, sdef is the function and def is the value).

We can apply this potential function to this potential value using Applicative. If the function or the value is Nothing, we’ll get Nothing. If we have both Just a function and Just a value, we’ll get Just a value.

> import Data.Maybe
> let def = Just 2
> let sdef = Just show
> sdef <*> def
Just "2"
> let def' = Nothing
> sdef <*> def'

That’s an interesting “real world usage” for Applicatives. Most example for Applicatives typically use constructors over applicative values, for instance:

> data Test = Test String String deriving (Show)
> Test <$> Just "hello" <*> Just "world"
Just (Test "hello" "world")
> Test <$> ["hello", "salutations"] <*> ["world", "universe"]
[ Test "hello" "world"
, Test "hello" "universe"
, Test "salutations" "world"
, Test "salutations" "universe"]

But you can use applicative in any situation where you have “maybe a function”. You could, for example, use a datatype modeling various potential transformation using Applicative.

Feature-wise, you probably understood by now that DefaultProp let us define the default value of an argument, if any, and the way it should be displayed, should it be so.

Back to mkParser

Now that we know we’re going to get a “Option” and we globally know how it’s build, time to go back to mkParser.

mkParser :: DefaultProp a
         -> (OptProperties -> OptProperties)
         -> OptReader a
         -> Parser a
mkParser d@(DefaultProp def _) g rdr = liftOpt opt <|> maybe empty pure def
    opt = mkOption d g rdr

liftOpt is nothing but a synonym for the OptP constructor of Parser (it’s defined in the Common.hs module). I think we should now be ready to face the actual implementation of the Parser type (in the Types.hs module), the very one I didn’t want to start with:

data Parser a
  = NilP (Maybe a)
  | OptP (Option a)
  | forall x . MultP (Parser (x -> a)) (Parser x)
  | AltP (Parser a) (Parser a)
  | forall x . BindP (Parser x) (x -> Parser a)

If you’re like me and not quite familiar yet with every language extensions, you might be surprised by the mix of rather classic constructors and constraints. This is allowed by the ExistentialQuantification extension. This extension lets us add constraints at the constructor level, which vanilla Haskell won’t allow. Picture the MultP constructor without the extension:

| MultP (Parser (x -> a)) (Parser x)

The parameter for MultP do look a bit like the parameters for (<*>), don’t they ? Replace the f in f (a -> b) and f a by Parser.

However, we’re not in an instance block, we’re defining the constructor of a datatype. The way it’s written in this definition, we don’t know what x will be; we only know that x can be something else that our a. This would not compile, ghc would complain that x is not defined. So we need the forall constraint to get the desired flexibility; if we don’t want to make Parser parametric over a AND x we need ExistentialQuantification.

Granted, this datatype is rather abstract and we’re having a hard time getting a good intuition for everything it can contain. Though, without any context, we can already see that there are more or less two types (in the colloquial sense of the word !) of constructors for Parser: we have NilP and OptP, “simple constructors” on the one hand, and MultP, AltP and BindP, three recursives constructors, on the other.

Though we are not going to study in this post (it’s already fairly long !) how parsers are run, we can peek inside a utility function from Common.hs to get a better idea of what the constructors represent:

evalParser :: Parser a -> Maybe a
evalParser (NilP r) = r
evalParser (OptP _) = Nothing
evalParser (MultP p1 p2) = evalParser p1 <*> evalParser p2
evalParser (AltP p1 p2) = evalParser p1 <|> evalParser p2
evalParser (BindP p k) = evalParser p >>= evalParser . k

Let’s ignore the first two values. As our intuition suggested, MultP models an Applicative where we apply the first parser (well, actually, the result of the evaluation of the first parser) to the second one (once again, to the result of the evaluation of the second parser). AltP is the same with alternative and BindP for monads. So we could say that the Parser type is inhabited by “simple parsing” and “parsing combination patterns”.

Building parsers

Let’s go back to the initial example !

sample :: Parser Sample
sample = Sample
      <$> strOption
          ( long "hello"
         <> metavar "TARGET"
         <> help "Target for the greeting" )
      <*> switch
          ( long "quiet"
         <> short 'q'
         <> help "Whether to be quiet" )
      <*> option auto
          ( long "enthusiasm"
         <> help "How enthusiastically to greet"
         <> showDefault
         <> value 1
         <> metavar "INT" )

If we follow the definitions for strOption, switch and option, they will all get us an OptP. Actually, almost every primitive from the Builder.hs will get you a single Parser with OptP.

But Parser is, itself an applicative !

instance Applicative Parser where
  pure = NilP . Just
  (<*>) = MultP

So all our OptP are going to be combined through <*> to become MultP. In other words, an unsugarized version for the basic example could be:

ps = NilP . Just . Sample
p1 = strOption (long "hello" <> metavar "TARGET" <> help "Target for greeting")
p2 = switch (long "quiet" <> short 'q' <> help "Wether to be quiet")
p3 = option auto (long "enthusiasm" <> help "How enthusiastically to greet" <> showDefault <> value 1 <> metavar "INT")

sample = MultP (MultP (MultP ps p1) p2) p3

Which is admittedly not as pleasant to read or maintain.

This is only the structure

It is important to point out that, right now, the only thing we have is a structure. Parsers contain information (names, expected type, potential combinations…) and they even encapsulate a “real parsing function” (the ReadM), but right now, they’re not doing much. To actually parse stuff, we will need to use the execParser primitive defined in the Extra.hs module. But that’s for another post.

But we’ve seen a lot of things already: common library organization patterns, smart constructors using monoids, and a pretty extended example on how datatypes can be combined to form a beautiful interface. Note how extensible this design is; we didn’t study in depth the way it can be leverage to provide the autocompletion feature or the wealth of configuration for help or error message generation, but all the condition to implement these are there. After all, beautiful, clear, safe, structures are the foundations of any good piece of software.