DEV Community

Matthijs Groen for Kabisa Software Artisans

Posted on • Originally published at kabisa.nl on

Learning functional programming #5: Parsing arrays and numbers

This post is part 5 of the series "Learning Functional Programming using JavaScript". When the next part will be published, a link to part 6 will be added here in the top.

Earlier parts: part 1, part 2, part 3, part 4.

This is a continuation of the Functional Programming challenge I gave myself:
Write a JSON parser in JavaScript, using
the parser combinator principles from the blogpost of Scott Wlaschin.

At this point, we are able to parse null, true, false, strings and
numbers. In this post we will complete the initial implementation by adding
support for arrays and objects.

In the previous post we rewrote the .reduce to use our own. To promote this
behavior, I wanted to get rid of all calls to 'objects'. So I added a new rule
to the ESLint config in my package.json:

{
  "eslintConfig": {
    "rules": {
      "no-restricted-syntax": [
        "error",
        {
          "selector": "CallExpression[callee.type=MemberExpression][callee.property.name!='log']",
          "message": "Not allowed to call object members, except console.log()"
        }
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

In production code, I would use system functions rather than rolling my own, but here we are just doing it for science! learning functional programming concepts. I will reflect on things like this in a later post.

While this test for console.log is not fool proof, it will start warning about
.map, .join and .concat. So lets start rewriting them:

// before
const toList = mapResult((result) => [result[0]].concat(result[1]));
// after
const toList = mapResult((result) => [...[result[0]], ...result[1]]);
Enter fullscreen mode Exit fullscreen mode

Be sure to wrap result[0] into an array. In our parser results, this toList
is to flatten a list that has the form of: [a, [b, [c, d]]. So each time it is
a list being appended to a single value.

// before
const toString = mapResult((result) => result.join(""));
// after
const reduceWithStart = (reducer) => (start) => (tail) =>
  doReduce(reducer)(start)(tail);
const join = (a) => (b) => String(a) + String(b);
const toString = mapResult(reduceWithStart(join)(""));
Enter fullscreen mode Exit fullscreen mode

I created a reduceWithStart that is a variant of the already made reduce (see
previous post), but this time you can provide a start value for the result. This
is so that empty lists are converted to empty strings.

Considering we extracted 2 helper functions, the code of the toString is
actually a bit shorter!

// before
const stringParser = (string) =>
  addLabel(string)(chain([...string].map((char) => characterParser(char))));
// after
const map = (transform) => ([head, ...tail]) =>
  head ? [transform(head), ...map(transform)(tail)] : [];

const stringParser = (string) =>
  addLabel(string)(chain(map(characterParser)([...string])));
Enter fullscreen mode Exit fullscreen mode

Here I made a map function, just like we created the reduce before (see
previous post). There are other functions using .map that I needed to convert:

// before
const escapedCharParser = choice(
  specialCharacters.map(([match, result]) => parseStringResult(match)(result))
);

// after
const escapedCharParser = choice(
  map(([match, result]) => parseStringResult(match)(result))(specialCharacters)
);
Enter fullscreen mode Exit fullscreen mode

Apart from the small change in order (transform before the list) the setup is
basically the same.

// before
const anyOf = (string) => choice([...string].map(characterParser)([...string]));

// after
const anyOf = (string) => choice(map(characterParser)([...string]));
Enter fullscreen mode Exit fullscreen mode

So that was not that hard, and now we are able to enforce another rule!

Now, onto parsing arrays!

Parsing an JSON array

As Scott Wlashin explains in his excellent series,
there is a problem with parsing an array as can be seen in the EBNF notation:

value = object | array | number | string | "true" | "false" | "null";
array = "[", [ value, { ",", value } ], "]";
Enter fullscreen mode Exit fullscreen mode

A value could be an array, and an array consists of value! there is a
circular reference here. How can you write a parser that needs to parse itself
as well?

Scott Wlashin fixes this by using a
forward declaration. A
forward declaration is a placeholder, that will be replaced later on.

I tried to follow this pattern in my implementation as well:

const forwardReference = (impl = () => [FAILED, "Unfulfilled"]) => [
  (stream) => impl(stream),
  (update) => (impl = update),
];

const [valueParser, updateValueParserRef] = forwardReference();
console.log(valueParser("hello")); // [ Symbol(Failed), 'Unfulfilled' ]
updateValueParserRef(characterParser("h"));
console.log(valueParser("hello")); // [ 'h', [ 'e', 'l', 'l', 'o' ] ]
Enter fullscreen mode Exit fullscreen mode

The linter didn't complain, my implementation worked, so I was happy! Time to
start parsing that array! (No worries, this will be done properly later πŸ˜‰)

These are some examples we should be able to parse:

[null, 1, 3  , [ true ] ]
[ "string [ , ", false]
Enter fullscreen mode Exit fullscreen mode

As you can see from these examples, in and around the brackets [] and the
seperator , we can have all kinds of whitespace that does not matter. So we
need some way to ignore those:

const whitespace = " \n\t";
const whitespaceParser = many(anyOf(whitespace));

const ignoreTrailingSpaces = (parser) => andThenLeft(parser)(whitespaceParser);

const arrayStart = ignoreTrailingSpaces(characterParser("["));
const arrayEnd = ignoreTrailingSpaces(characterParser("]"));
const arraySep = ignoreTrailingSpaces(characterParser(","));

console.log(arrayStart("[    null, 1 ]")); // [ '[', [ 'n', 'u', ...
Enter fullscreen mode Exit fullscreen mode

Nice! The spaces are ignored and the next character is waiting to be parsed. Now
we can define our array value parser that makes use of the forward reference:

const arrayValue = ignoreTrailingSpaces(valueParser);
Enter fullscreen mode Exit fullscreen mode

Now we need some parser that can parse a list, divided by seperators. By
combining parsers, we are looking for something like this:

const arrayValues = sepBy(arraySep)(arrayValue);
Enter fullscreen mode Exit fullscreen mode

That ideally would ignore the seperators in the result, and make a flat list of
the result of the values.

Let's build one:

const sepBy = (sepParser) => (parser) => andThen(parser)(sepParser);

const dummyArrayValueParser = sepBy(arraySep)(nullParser);
console.log(dummyArrayValueParser("null,    null,   null"));
// [ [ null, ',' ],
//   [ 'n', 'u', 'l', 'l', ',', ' ', ' ', ' ', 'n', 'u', 'l', 'l' ] ]
Enter fullscreen mode Exit fullscreen mode

After a seperator, we always need to find another element:

const sepBy = (sepParser) => (parser) =>
  andThen(parser)(andThenRight(sepParser)(parser));

const dummyArrayValueParser = sepBy(arraySep)(nullParser);
console.log(dummyArrayValueParser("null,    null,   null"));
// [ [ null, null ], [ ',', ' ', ' ', ' ', 'n', 'u', 'l', 'l' ] ]
Enter fullscreen mode Exit fullscreen mode

But those separators and values can be repeated N times:

const sepBy = (sepParser) => (parser) =>
  andThen(parser)(many(andThenRight(sepParser)(parser)));

const dummyArrayValueParser = sepBy(arraySep)(nullParser);
console.log(dummyArrayValueParser("null,    null,   null"));
// [ [ null, [ null, null ] ], [] ]
Enter fullscreen mode Exit fullscreen mode

Yes! now to flatten that list, by using toList:

const sepBy = (sepParser) => (parser) =>
  toList(andThen(parser)(many(andThenRight(sepParser)(parser))));
Enter fullscreen mode Exit fullscreen mode

But what if the array would be empty?

const dummyArrayValueParser = sepBy(arraySep)(nullParser);
console.log(dummyArrayValueParser(""));
// [ Symbol(Failed), "Error parsing 'null':", 'Unexpected EOF' ]

console.log(dummyArrayValueParser("null"));
// [ [ null ], [] ]
Enter fullscreen mode Exit fullscreen mode

So our implementation would require at least one value. So we rename the
function to sepBy1 and create a new one that would accept 'no values'

const sepBy1 = (sepParser) => (parser) =>
  toList(andThen(parser)(many(andThenRight(sepParser)(parser))));

const sepBy = (sepParser) => (parser) =>
  orElse(sepBy1(sepParser)(parser))(resultParser([]));
Enter fullscreen mode Exit fullscreen mode

So now our array parser is complete!

const arrayValues = sepBy(arraySep)(arrayValue);
const arrayParser = addLabel("array")(
  between(arrayStart)(arrayEnd)(arrayValues)
);
Enter fullscreen mode Exit fullscreen mode

We can test it out by fulfilling our forward reference by calling updateValueParserRef:

updateValueParserRef(
  choice([
    nullParser,
    boolParser,
    numberParser,
    quotedStringParser,
    arrayParser,
  ])
);

// parsing our examples:
console.log(valueParser("[null, 1, 3  , [ true ] ]"));
// [ [ null, 1, 3, [ true ] ], [] ]

console.log(valueParser('[ "string [ , ", false]'));
// [ [ 'string [ , ', false ], [] ]
Enter fullscreen mode Exit fullscreen mode

We can now already parse a decent subset of JSON. The only thing missing right now, is
objects.

Parsing JSON objects

The EBNF definition for a JSON object is as follows:

object = "{", [ string, ":", value, { ",", string, ":", value } ], "}";
Enter fullscreen mode Exit fullscreen mode

So, "Between" brackets, are key-value pairs "seperated" by a comma. The
key-value pair is a String, ":", value. I think we already have all ingredients
for this one! Maybe we need to add some code to produce the end result, but
parsing wise we should already be close!

const objectStart = ignoreTrailingSpaces(characterParser("{"));
const objectEnd = ignoreTrailingSpaces(characterParser("}"));
const objectPairSep = ignoreTrailingSpaces(characterParser(","));
const objectKeyValSep = ignoreTrailingSpaces(characterParser(":"));
const objectKey = ignoreTrailingSpaces(quotedStringParser);
const objectValue = ignoreTrailingSpaces(valueParser);
const objectKeyValue = andThen(andThenLeft(objectKey)(objectKeyValSep))(
  objectValue
);

const objectParser = between(objectStart)(objectEnd)(
  sepBy(objectPairSep)(objectKeyValue)
);

console.log(objectParser('{ "hello": "world", "fp": true }'));
// [ [ [ 'hello', 'world' ], [ 'fp', true ] ], [] ]
Enter fullscreen mode Exit fullscreen mode

The default result is already a list of key-value pairs. Now to put them into an
object:

const toObject = doReduce((result) => ([key, value]) => ({
  ...result,
  [key]: value,
}))({});

const objectParser = mapResult(toObject)(
  between(objectStart)(objectEnd)(sepBy(objectPairSep)(objectKeyValue))
);

console.log(objectParser('{ "hello": "world", "fp": true }'));
// [ { hello: 'world', fp: true }, [] ]
Enter fullscreen mode Exit fullscreen mode

Now to update our value choice, so that objects are included:

updateValueParserRef(
  choice([
    nullParser,
    boolParser,
    numberParser,
    quotedStringParser,
    arrayParser,
    objectParser, // new!
  ])
);

const jsonParser = (stream) =>
  onSuccess(
    andThenRight(whitespaceParser)(valueParser)(stream)
  )(([result, remaining]) =>
    remaining.length > 0
      ? [FAILED, "Unexpected characters:", remaining]
      : result
  );

// data is our challenge data, see post 1
console.log(jsonParser(data)); // Works!
Enter fullscreen mode Exit fullscreen mode

I added a jsonParser to check after parsing if there are no remaining
characters left. If there are, return an error. If not, return the parsed
result.

Now I only need to build our get function to get value from our data, to
complete the challenge, and we need to check if we can tick a box for all the
rules we applied.

Getting data out of our JSON structure

In the definition of the challenge, there were 3 calls that we should be able to
do:

{
  "goals": [
    "get(data)(\"using.disallowed.0\") should result in \"No dependencies\"",
    "get(data)(\"points.full-json\") should result in 1000",
    "get(data)(\"jsonTypes.2\") should result in false"
  ]
}
Enter fullscreen mode Exit fullscreen mode

The thing is, the argument from the get is also a string that needs to be
parsed! (separted by . dots).

And we made just the tools to do that...

const pathElementParser = toString(some(satisfy((c) => c !== ".")));
const split = sepBy(characterParser("."))(pathElementParser);

const get = (data) => (path) =>
  doReduce((result) => (item) => result && result[item])(data)(split(path)[0]);

const jsonGet = get(parsedData);

console.log(jsonGet("using.disallowed.0"));
console.log(jsonGet("points.full-json"));
console.log(jsonGet("jsonTypes.2"));
Enter fullscreen mode Exit fullscreen mode

Challenge completed!

But there were some things still bothering me. I did an assignment in the
forward reference, and that felt like cheating. But first lets see how we score
on the restrictions we applied, by looking at the defined ESLint rules of our
package.json...

{
  "eslintConfig": {
    "globals": {
      "Symbol": "readonly",
      "Array": "readonly",
      "String": "readonly",
      "Number": "readonly",
      "console": "readonly"
    },
    "rules": {
      "no-console": "off",
      "no-use-before-define": "error",
      "no-eval": "error",
      "no-implied-eval": "error",
      "no-restricted-globals": ["error", "JSON"],
      "max-statements": ["error", 1, { "ignoreTopLevelFunctions": false }],
      "complexity": ["error", { "max": 3 }],
      "arrow-body-style": ["error", "as-needed"],
      "no-restricted-syntax": [
        "error",
        {
          "selector": "FunctionExpression",
          "message": "Please use Lambda notation () =>"
        },
        {
          "selector": "IfStatement",
          "message": "Try using ternary operator: true ? 1 : 2"
        },
        {
          "selector": "VariableDeclaration[kind=let],VariableDeclaration[kind=var]",
          "message": "Only use constants"
        },
        {
          "selector": "ArrowFunctionExpression[params.length > 1]",
          "message": "Only 1 argument allowed. Please use currying"
        },
        {
          "selector": "CallExpression[callee.type=MemberExpression][callee.property.name!='log']",
          "message": "Not allowed to call object members, except console.log()"
        }
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

I was checking for usage of const, but I never assigned rules that assignments
where forbidden. I was also missing my "No self referencing recursion" I wanted
to try (see first post).

Time to extend the linting rules some more, but we'll leave that for a next post.

Conclusion

The code so far:

  • A full JSON parser (only skipped hex chars in strings)
  • A function to grab values from the JSON structure
  • Check the full code here

What I learned so far:

  • Still the code is more the tools to parse than a specific implementation. I like this approach, it makes most of the code re-usable. The 'get' function is an example of reuse.
  • I followed the original blog of parser combinators quite literally. It helped a great deal in the implementation, but the forward reference felt weird. I had find a way to do it without an assignment in there.

What did I like:

  • I finished the parser! Everything worked. I never though I would be able to implement it. So far I spent 2 evenings on it.
  • It felt nice to break loose from the functions already provided in the language, and do an own implementation for it. Its not something you would do in production code, but for a challenge like this, it is cool to also do the most basics of tool building.

What did I not like:

  • The forward reference. I have to find a better way to do this. (Right now it is cheating by reassigning the incoming function argument).

Next time:

  • Removing the self referencing recursion
  • Fixing the forward reference

Top comments (0)