DEV Community

Felipe Gasper
Felipe Gasper

Posted on • Edited on

SQLite, Perl, and a Boolean

I’ve written several articles now about the trials and tribulations of character encoding in Perl. Having gained the knowledge I have, I’ve also been finding bugs in libraries we use at $work and sending patches to their maintainers.

The latest one is DBD::SQLite, CPAN’s self-contained SQLite binding. It’s a great library that I’ve used for years, but I recently noted two problems in it:

1) In its default configuration it used the SvPV macro to translate Perl strings to C strings, which is bad for reasons I detailed in “Perl’s SvPV Menace”.

2) In its (non-default) “unicode” configuration it used a “naïve” method of UTF-8 decoding that neglects validation. This mechanism can corrupt Perl’s internals by making it mistake invalid UTF-8 sequences for valid ones.

Neither of these is trivial to fix: applications may depend on the SvPV problem—what one coworker of mine calls a “load-bearing bug” 😀—while adding UTF-8 validation entails a performance hit.

In reality, DBD::SQLite needed at least 4 modes of translating between Perl and C strings:

1) The current (“load-bearing-buggy”) default.
2) Same as #1, but use SvPVbyte to avoid the SvPV bug.
3) Current “naïve unicode” behaviour.
4) A “non-naïve unicode” mode that validates incoming UTF-8.

(I eventually made two variants of this last one: one that just warns on invalid data, and the other that throws an exception.)

There was another problem, though: DBD::SQLite’s interface for controlling this was a boolean. That meant only two modes were even possible!

This exemplifies a principle a mentor of mine taught me years back: avoid boolean parameters. They restrict your ability to add additional configurations.

(And for pity’s sake, abhor unnamed booleans in particular! What does the 0 in open_file($path, 0) mean??)

To fix this my pull request had to deprecate the existing sqlite_unicode parameter. It’s an unfortunate step that’ll produce new warnings in existing applications, but the “omelet” here justifies the “broken egg”.

Top comments (6)

Collapse
 
moose profile image
moose

I have recently just really gotten into the use and development of regular expressions. I don't get to ask Perl guys questions much because you're the second I've met. Any cool tips or tricks with the regex?

Collapse
 
thibaultduponchelle profile image
Tib

2 recents and very nice advanced posts about one liners and regex:
perl.com/article/perl-one-liners-p...
perl.com/article/perl-one-liners-p...

Collapse
 
moose profile image
moose

Holy crap I didn't know how wide used Perl is. You can use it in embedded systems. Looks super cool, I got the itch to incorporate it into my project as the bastion server for administration stuff. I always thought it was some high level language just used on big servers. Way cool. Thanks for sharing

Thread Thread
 
fgasper profile image
Felipe Gasper

Welcome to Perl!

Stack Overflow, Reddit, and the like are great resources for learning the language. Lots of resources are fairly dated, but on the other hand the language’s prioritization of backward compatibility means that most older tutorials still apply well to newer versions of the language.

I’ve been looking at how to make the Unicode support smoother; feel free to hit me up if you run into stuff that’s especially confusing in that area.

Thread Thread
 
moose profile image
moose

I will take you up on that. Learning more about perl over the past two days has been pretty cool.m Thanks for help!!

Collapse
 
choroba profile image
E. Choroba

We used something like this to test Unicode and databases. Pg, SQLite and MariaDB didn't have problems.