After understanding multiple dispatch and learning about roles, we kind of want to put everything together in a single thing to parse something that has a certain heft to it, like a paragraph. Let's try with this:
grammar paragraph {
token TOP { <chunk>[ (\s+) <chunk>]* }
regex like-a-word { «\H+» }
regex span { <like-a-word>[(\s+) <like-a-word>]* }
proto regex quoted {*}
regex quoted:sym<em> { ('*') ~ '*' <span> }
regex quoted:sym<alsoem> { ('~') ~ '~' <span> }
regex quoted:sym<code> { ('`') ~ '`' <span> }
regex quoted:sym<strong> { ('**') ~ '**' <span> }
regex quoted:sym<strike> { ('~~') ~ '~~' <span> }
regex link { '[' ~ ']' <span> '(' ~ ')' (\H+) }
regex chunk { <quoted> | <span> | <link> }
}
Nothing is really new if you have been following this series. We are putting everytying together in a single grammar, which, if you follow from TOP
on down, makes a paragraph out of chunks
, which can be either a link or the quoted/unquoted sequence of words we have already seen before. The link
, which is the usual markdown link construct, uses twice the pairing operator ~
; ~
is a placeholder for what is inside, a span
and a group of characters that is
not horizontal whitespace \H+
. That is also captured with the parentheses, so that we can use it later on.
Let's test that
Grammar are classes, they are code, and code has to be tested; if it's not, it's broken. So far we have used a few say
here and there, the old-fashioned way of debugging, but we really have to check that whatever we want to happen actually happens. Let's test using the originally named Test
module in Perl 6, which we use
to include it into our program.
my $simple-thing = paragraph.parse("Simple **thing**");
isa-ok( $simple-thing, "Match", "Is a Match");
can-ok( $simple-thing<chunk>, "list", "Can do lists");
is( $simple-thing<chunk>.elems, 2, "Two chunks");
my $not-so-simple-paragraph= paragraph.parse("This is *a simple* _paragraph_ with ~~struck~~ words and [links](https://to.nowhere)");
is( $not-so-simple-paragraph<chunk>.elems, 6, "6 chunks");
like( ~$not-so-simple-paragraph<chunk>[0], /This/, "Chunking OK");
is( $not-so-simple-paragraph<chunk>[5]<link><span>, "links", "Links");
my $period = "This ends with a period.";
like( ~$period, /\./, "Symbols are good");
done-testing();
We use several testing functions here, after assigning the result of parsing to a variable. isa-ok
checks for class or type. Is what is returned a Match
? This is what grammars return, as we saw before. Grammars will return nothing, and this nothingness will have the type Any
if it does not.
That is not enough. A paragraph is a list of chunks, so does $simple-thing<chunk>
has that capability? can-ok
checks for that. This simple thing should have only two chunks, a span
and a quoted
. is
checks that effectively there are two, and only two, chunks there.
If the thing gets complicated we will have to test it in a different way. Is it chunked correctly? Is the period .
actually included in the chunk? We stringify the result with ~
, which has a different role here
like( ~$period, /\./, "Symbols are good");
And check that the word-like extracting part actually does take into account not-really-word things like periods.
Done testing
We finish with done-testing()
We should actually put this into a different file. Testing is a serious thing, and it does follow some protocols, like always using the same directories, t
in the case of Perl and Perl6.
Is it complete? It pretty much is. Basically all we wanted in there is tested. To be sure, we should add coverage tests, but when I write this, there does not seem to be such a thing for Perl6 grammars. We will have to wait until next Christmas, which is when everything happens in Perl6.
Top comments (0)