Superpower: The parser combinator library [Part 2] 💻🤓

Superpower

SUPERPOWER

The parser combinator library – Part 2

Believe me: you’ll want to learn about this

 Parsing is really a superpower

You never thought you would want to learn about parsing stuff, but that’s because you either haven’t heard about this topic or you’re scared of it. If the former is true, it’s time to learn about this interesting subject. If the latter applies… fear not! Parsers are like Math. They are as cool as difficult! But simpler things are usually easy to do.

Let’s get into it!

Meet Superpower

Superpower is the parser combinator library that will make you wish you had known it before.

You can define your parser with plain C#. Even better: You can define your parsers using LINQ! Look at this:

var assignmentParser = from identifer in Identifier
                       from eq in Token.EqualTo(SimpleToken.Equal)
                       from expr in Expression
                       select new Assignment(identifer, expr);

This is a parser combinator. It’s called “combinator” because it allows you to define a parser by combing existing parsers.

In the code above we are creating a parser for assignment expressions that will parse expressions like a = 1+b where:

If you look at the LINQ syntax, you will notice that the definition is nicely readable and almost matches the definition of a prototypical assignment expression. For instance, in BNF we could define a assignment like this:

<assignment> ::= <identifier> “=” <expression>

That reads like “an assignment is an identifier followed by the equal sign followed by an expression”.

Notice how this definition resembles the syntax of the LINQ example above… I think this way to define a parser is great!

But, before creating a parser, we need to deal with tokens.

Tokens

Parsers don’t consume text directly. Doing that would greatly increase complexity a lot. To keep things easy and we invented a preliminary stage in which the stream of characters is turned into what we call “tokens”.

What’s a token?

Tokens are simpler fragments of text that have a sense for our language. A piece with the smallest meaning for us. What does it mean? Let’s illustrate it with an example:

Imagine that your language represents math expressions like 2*(3+4).

We have numbers and symbols, and after a bit of analysis, we decide that we need the following tokens:

So, our original expression “2+(3*4)” turned into a sequence of tokens would be:

Number Asterisk OpenParenthesis Number Plus Number CloseParenthesis

To make it clearer, here is the previous sequence with the values annotated:

Number(“2”) Asterisk(“*”) OpenParenthesis(“(“) Number(“3”) Plus(“+”) Number(“4”) CloseParenthesis(“)”)

 

What have we done?

In a few works, we assigned a meaning to the input characters and categorized them to make them richer: the tokens. Characters of the string “123” are no longer a just bunch of characters, but a meaningful pack: they conform a token of type “Number” and its value is “123”. In this way we can handle the input string easily in the actual parsing stage.

Tokenizers

But how do we turn an input string into a sequence of tokens? Good question. We have what Superpower calls “tokenizers”. They are artifacts that do exactly that: They turn the input text into tokens. Technically, “tokenizer” is another name for “lexer”. Lexers do what in the jargon is call the lexical analysis of the input string to tokens in a stage that is usually called “tokenization”, hence the name.

Parsing stages

To make everything clearer, I’ll show you the different stages that parsing involves.

Order Artifact Consumes Produces
1 Lexer (Tokenizer) Characters Tokens
2 Parser Tokens AST (Abstract Syntax Tree)

There are only 2, but if we were parsing a complex language, we will have much more than only 2. For instance, Stage 3 would be the semantic analysis. But I won’t cover it in this post, because it’s more related to compilers. If you want to know more, you can take a look the State Machine Compiler project I created some time ago.

How do we build a tokenizer?

Superpower has a very simple builder that will make it rather easy to create your own tokenizer. Let’s create a tokenizer for the syntax we considered above:

var ourTokenizer = new TokenizerBuilder<MyToken>()
    .Match(Numerics.Integer, MyToken.Number)
    .Match(Span.EqualTo('+'), MyToken.Plus)
    .Match(Span.EqualTo('*'), MyToken.Asterisk)
    .Match(Span.EqualTo('('), MyToken.OpenParenthesis)
    .Match(Span.EqualTo(')'), MyToken.CloseParenthesis)
    .Build();

As simple as that.

Let’s see the complete tokenizer definition and the output

void Main()
{
    var ourTokenizer = new TokenizerBuilder<MyToken>()
        .Match(Numerics.Integer, MyToken.Number)
        .Match(Span.EqualTo('+'), MyToken.Plus)
        .Match(Span.EqualTo('*'), MyToken.Asterisk)
        .Match(Span.EqualTo('('), MyToken.OpenParenthesis)
        .Match(Span.EqualTo(')'), MyToken.CloseParenthesis)
        .Build();

        var tokens = ourTokenizer.Tokenize("2*(3*4)");

        tokens.Select(x => new { Kind = x.Kind, Text = x.Span.ToStringValue() }).Dump();
}

public enum MyToken
{
       Number,
       Plus,
       Asterisk,
       OpenParenthesis,
       CloseParenthesis,
}

To see it better, I’ve captured a snapshot of LINQPad (a very useful tool) with the code and the formatted output of the tokenization

As you can see, the table in the bottom part of the snapshot, the tokenizer outputs the very same sequence of tokens we had before 😊

Parsing tokens

Once we have the tokenizer that emits a sequence of tokens, we’re ready to parse them.

This stage is called the syntactic analysis (the parsing itself) in which we will construct a tree (the AST). Why a tree? Because it’s a structure that can grow indefinitely to hold all the meaning of the constructs of our language.

Building a parser is what we did in the first part of this post. Do you remember? Let’s refresh our mind:

var assignmentParser = from identifer in Identifier
                       from eq in Token.EqualTo(SimpleToken.Equal)
                       from expr in Expression
                       select new Assignment(identifer, expr);

As I told you before, a parser in Superpower is created by composing one or more parsers.

In the case above, the Assignment Parser is made of

But how is the Identifier parser defined?

Identifier = Token.EqualTo(SimpleToken.Identifier).Select(x => x.ToStringValue());

Easy, uh? It’s a parser that will match the Identifier value of the SimpleToken enum and will return the text of the token (whatever text it holds).

And what about the parser called “Expression”?

Expression = CallExpression.Try().Or(NumberParameter).Or(TextParameter).Or(IdentifierParameter);

What does it mean? That an expression is either CallExpression, a NumberParameter, a TextParameter or an IdentifierParameter. The Try() operator is needed because if the parser consumes a token and it’s not a match, the parsing pipeline should go back and “retry” with the rest of the parsers.

I hope you get the idea.

Operators

There are a lot of operators that you can use to create your parsers. This is a list of the most common ones:

from i in Identifier
from n in Number.OptionalOrDefault()
select new { Id = i, Nomber = n };

If there’s no number after the identifier, n will be null. This operator is useful to create parsers with optional parts

(a, b, c, d)

You can define it with the following:

Example:

Parse.Chain(Operator, Operand, (operation, left, right) => new BinaryOperation(operation, left, right))

Well, this is just the start and you have plenty of tips to start creating your own parsers and tokenizers.

If you want to see one simple parser in action, I’ve created one that is part of a super-simple scripting engine called SimpleScript (https://github.com/SuperJMN/SimpleScript/tree/master/SimpleScript)

Concretely, you can take a look to the tokenizer and the parser definitions.

If you want to know more, don’t forget to ask me via Twitter via @SuperJMN to get in touch.

Learn more

Also, if you want to read more about Superpower, don’t hesitate to read the introductory article by its author, Nicholas Blumhardt. And don’t forget to check out its GitHub repo.

I hope you have learnt something about parsers with me. If you haven’t, don’t worry. Get the samples, clone the repos and get a taste. It can be hard at the beginning, but as soon as you get some working, you’ll love Superpower as much as I do.

Of course, you can use the comments section to give us your impressions and to get in touch, too.

And thanks for reading!

Written by: Jose Manuel Nieto, part of Idiwork’s team.

Step by step

Idiwork has arrived and we invite you to join us!
Raspberry Pi4 – All of a sudden!
Experiment #101 How to set up an IoT device in Azure IoT Hub
Start your own video game with Unity3D and CreatorKits
Experiment #101 How To Create a Blockchain Workbench App
XR World: Minecraft Earth
Experiment #101 Architectural Diagram
Cyberpunk 2077 and The Future
Experiment #101 How to create an Azure Function App to record telemetry readings
An Introduction to neural networks
Microsoft Mixer, streaming your life !
Introduction to Azure Notebooks applying Cognitive Services with Jupyter
AR [T] Walk
Return of Age of Empires
Idiwork and Uno Platform partnership!!
The differences between Augemented Reality, Mixed Reality and Virtual Reality !
Experiment #102 How to Create an Azure Notebooks Project and Deploy a Summarization Service
Uno Platform Conference
YouTube Video: Creator Kits, learn how to create a RPG with Unity3D !! 🤖
Experiment #102 How to construct and train a Deep Neural Network using Keras and deploy the model as an Azure Web Service 🧠☁️
Uno Platform Conference Recap 😎
How to create a Uno Platform project in Visual Studio 💻🤓
What is a Neural Network? by Alberto Pinedo 🧠💻
Experiment #102 How to Deploy and Integrate Azure Cognitive Services: Computer Vision and Text Analytics 💻📑
Next stop: Madrid Games Week 👉🤖
Our Madrid Games Week experience ! 😎🤖
Experiment #102 How to use Microsoft Flow to send an email when an event occurs 📧📌
How to create your own controllers for Mixed Reality ToolKit 🎮🤓
Experiment #102 Architectural Diagram 📍
The magic of Hytale ⚔️
Start programming your own app in Uno Platform 📲👨‍💻
Avalonia, a big candidate to create cross-platform apps with XAML 😎📲
The value of Blockchain for business by Pablo Junco ⛓️🏢
Living in the night: Bloodlines 2 🧛🎮
Uno Platform Workshop Recap 💻📌
EasyRPC. Be proud of your APIs (First Part) 🤓🚀
Paralives, reimagine your virtual life 👾👩‍💻
Experiment #103 How to Modify the Project to Work with Face Cognitive Service and Servo Motor 👩‍💻📙
Experiment #103 How to Set Up the IoT Device Hardware: Peripherals and Electronics 🔈💡
We are going to be in the GDG DevFest in Málaga! 😎👾
EasyRPC. Be proud of your APIs (Second Part) 🤓🚀
Experiment #103 How to Build a 3D Printed Box to Pack and Run the IoT Project 🚀
MediEvil, remember the fear💀
Never forget the Fallen Order 🏹🎮
Experiment #103 Architectural Diagram 📍
Grace: The perfect DI IoC container [part 1] 💻🤓
Death Stranding: a story of death and connection ☠🏹
Experiment #204 How to assembly the 3D printed parts and servo motors of the robotic arm 🦾🤖
TemTem: a dream come true 🐹💥
Grace: The perfect DI IoC container [part 2] 💻🤓
Exploring the Outer Worlds 🎇⚔
Our review from Hololens 2 👓👷
Learn to code on your iPhone for FREE 📲🚀
New life, new horizons in Animal Crossing 🎮🐱
What’s WinUI? 💻🚀
Half Life Alyx ☠🧟‍♂
How to deploy a Censorship Resistant Website for FREE 🔓🌎
Superpower: The parser combinator library [Part 1] 💻🤓
UnoPlatform and WinUI, what to expect? 🚀📲
Happiness begins in Stardew Valley 🍎🐓
Science fiction in Assassin’s Creed 👽
The Cyber Attack Lifecycle 🕵‍♀💻
Learn more about UnoPlatform! 😜📢
Create cybernetically enhanced web apps with SvelteJS 💻👇
Six Fun Drag & Drop 🧩 Programming Languages To Learn How To Code! 💻
Riot and Hytale unite! 👾😉
A new home called EVE Online 👽👇
Rejoice with UnoPlatform! May 2020 📲👇
UnoPlatform arrives to macOS 💻👈
Your new empire in Civilization VI 🎮🏹
Play Station 5 is near!
Build and defend your city, this is Manor Lords
Customization, landscape generation and more in Hytale🧝🏻‍♀️🧝🏼
Welcome, MAUI! 💣📲
Zombies will be zombies (TLOU2) 👽🧟‍♂
Rejoice with UnoPlatform! August 2020 📲👇
Final Fantasy Crystal Chronicles is back🧝🏻‍♀️⚔️
Bolt is now free in Unity3D 🎮
Feel the Flutter! 🌐
Microsoft buys ZeniMax Media (including Bethesda!)
Intel and Microsoft team up to empower AI on Edge
Linux and WSL2 - Part 1 Linux and WSL2 – Part 1: How do you run Windows applications on Linux? Or vice versa.
Guide how to Run Windows applications on Linux Linux and WSL2 – Part 2: How do you run Windows applications on Linux? Or vice versa.
Domain-driven design Domain-Driven Design: the elephant in the room
azure object anchors Azure Object Anchors: the third tool
Experiment #205 Step by step 1 Experiment #205 Applied Artificial Intelligence, the real one 🤖📹
AI Assembling the system Experiment #205 Applied Artificial Intelligence – Assembling the system ⚙️🦾
Redit Conquer all the APIs Refit – Conquer all the APIs
Experiment #205 Applied AI: the Information analysis Experiment #205 Applied Artificial Intelligence – Analysis of the information
Brand Presence step by step 1 Blog Experiment #206 Brand Presence
Experiment #206 Brand Presence - The analysis Experiment #206 Brand Presence – The analysis

Stay up to date!



1 comment

  • Steve Wash says:

    Very helpful, but the Chain example really goes over my head. Could you break that down a bit? The descriptive example you gave was 1 + 3 * 4. The sample code doesn’t seem to be doing that, or I’m just not understanding.

Leave a comment