In the Beginning:
In 1991, Tim Berners-Lee (pictured above), a CERN physicist, developed the first version of his Hypertext Markup Language, or HTML. This fledgling version only sported 18 tags, whereas today, this cornerstone language boasts over 120!! This is great for HTML, but, somewhere around 87 or so, things start to be a little hard to keep up with for a good many of us. In addition HTML's 3.0 version tended to slow browsers
There's got to be a better way!
A statue of Aaron Swartz in his natural habitat
In 2004, John Gruber and Aaron Swartz, motivated by a dream to set a markup language meant to be "as easy-to-read and easy-to-write as it is feasible", created Markdown.
In many ways, it seems like they'd succeeded at their goal. Their plain-text markup was widely embraced and accepted over the years. How-ToGeek, in 2022, wrote of the language:
In short, Markdown makes it easier to format text for web pages because its tags are simpler than HTML, and they convert to HTML automatically. This means you don't have to know HTML to write something for a web page because Markdown translates your tags into HTML for you.
Since its inception, Markdown has been adopted and utilized by well-known sites like Slack, Reddit, GitHub, Stack Overflow, Medium and, of course, Dev.to. Let's take a look at why.
Comparisons:
Here are a few examples of Markdown and HTML producing the same outputs
Paragraphs
- HTML
<h2>This is a heading tag 2</h2>
<p>A paragraph is essentially a welcome space to add text to your page.</p>
- Markdown
## This is a heading tag 2
A paragraph is essentially a welcome space to add text to your page.
These will both look like this on the web page:
This is a heading tag 2
A paragraph is essentially a welcome space to add text to your page.
Emphasis
- HTML
<p>this is a simple statement.
<i>This one is italicized.</i>
<b>And this statement is bold</b>
</p>
- Markdown
this is a simple statement.
*This one is italicized.*
**And this statement is bold**
Will render like so:
this is a simple statement.
This one is italicized.
And this statement is bold
Lists
- HTML
<ul>
<li>one</li>
<li>two</li>
<li>three</li>
</ul>
<ol>
<li>one</li>
<li>two</li>
<li>three</li>
</ol>
- Markdown
* one
- two
+ three
// any of these symbols signal an unordered list
1. one
1. two
1. three
- one
- two
- three
- one
- two
- three
Markdown cuts the tags and makes for a more plain text approach, so it's easy to see why text-heavy sites would opt to include it in their code. You may be wondering, if it's so simple, why don't we all just use Markdown and get rid of HTML altogether? Well, that's the funny thing:
Markdown IS HTML!
Parsers
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. A parser takes input data and builds a data structure. In this case, our parser will take in the plain text Markdown as data and build HTML. That's the simple version: The more fun version involves 3 parts, if you count the tree. Just stay with me, we'll get there soon enough.
Lexer:
The lexer is the first stage of parsing, or the first parsing, depending on how you think of it. A lexer will take in the input data, in this case, Markdown plain text, and search for specific patterns. The lexer is set specifically to find just the patterns needed(hashtags, asterisks, spaces..). So the lexer will search through the document and make tokens based on the exact pattern its calibrated to. I keep saying exact because of things like this. See, we're in Markdown right now, and I've already shown how to make a numbered list, so let's use that as an example of patterning. "Exact" is why this happens:
- No problems here for a numbered list, but that space after the number is extremely important for pattern recognition. Without it, 1.this is just another number 1, and the lexer doesn't set this bit as a part of the list at all.
So, the lexer takes in the information, converts it into a series of tokens, then brings all of the tokens to the parser proper. Then it's the parser that interprets the necessary Grammar to turn that starting information into... An Abstract Syntax Tree
An Abstract Syntax Tree, or AST, is a data structure used in computer science to represent the structure of a program or code snippet. It represents the text structure in an abstract manner, so it doesn't quite represent every exact detail, but just the structural ones. Once the AST is generated by the parser during the source code translation and compiling process, it is then sent to the interpreter for further processing, such as contextual analysis, optimization, and code generation.
Then the Interpreter, in this case, returns the HTML. Wild ride to shed a few tags, right?
Downloadable parsers
If you'd like to get a look at this structure working at its best before your eyes, there are plenty Markdown Parsers all over the digital world for you to choose from.
Here are just a few:
- Marked: One of the most popular for javascript, also available as a CLI tool.
- markdown-it: written in javascript and compatible with node.js. "fast, easy to extend, and safe by default"
- MDX and react-markdown: MDX allows use of JSX, and react-markdown is a react component that renders the conversion for you
Conclusion
Today, we've gone through a long and interesting (I hope) journey through information interpretation. only to end up right back where we started. Right here in Dev.to, right here in Markdown. But I'm hoping that you, like our input, have come back changed. Happy coding, and thanks for giving this article a parsing glance.
Top comments (0)