In the first part we've seen the basic concepts of QR codes. Now let's actively do something to generate one. Let's suppose we want to create a QR code for the string https://www.qrcode.com/
(the official QR code site).
Encoding mode
First of all, we need to find the correct encoding mode. Each mode has a corresponding value, according to the following table:
Encoding Mode | Value bits |
---|---|
Numeric | 0001 (1) |
Alphanumeric | 0010 (2) |
Byte | 0100 (4) |
Kanji | 1000 (8) |
ECI | 0111 (7) |
Determining the optimal encoding mode is easily done by just checking what characters are included in the string. The only issue is detecting Kanji characters correctly. I'm no expert in Kanji by any means, so I'll just rely on the new ES2018 Unicode support of RegExp
in JavaScript:
const KANJI_RE = /^[\p{Script_Extensions=Han}\p{Script_Extensions=Hiragana}\p{Script_Extensions=Katakana}]*$/u;
I don't actually know if it's a perfect fit for Kanji mode, so if anyone knows just hit the comments! (I'll come up with a better solution later in the series, probably.)
In the end, we'll have something like this:
const NUMERIC_RE = /^\d*$/;
const ALPHANUMERIC_RE = /^[\dA-Z $%*+\-./:]*$/;
const LATIN1_RE = /^[\x00-\xff]*$/;
const KANJI_RE = /^[\p{Script_Extensions=Han}\p{Script_Extensions=Hiragana}\p{Script_Extensions=Katakana}]*$/u;
function getEncodingMode(string) {
if (NUMERIC_RE.test(string)) {
return 0b0001;
}
if (ALPHANUMERIC_RE.test(string)) {
return 0b0010;
}
if (LATIN1_RE.test(string)) {
return 0b0100;
}
if (KANJI_RE.test(string)) {
return 0b1000;
}
return 0b0111;
}
In the end, we have getEncodingMode('https://www.qrcode.com/') === 4
.
Version
Let's aim for the smallest version possible: since it's 23 characters long, we can check in various tables around (here, for example) that we're going to need at least a version 2 code. Also, since we're there, we can get the highest possibile correction level - medium, in our case.
Also this other table tells us that a version 2 can contain 28 data codewords for medium correction: those 2 spare codewords are going to be used for data information.
If we wanted a higher error correction level, we should have chosen a larger version.
Data bits
The first 4 bits of our data sequence are 0100, our encoding mode.
Then, we'll tell how long our string is going to be. We need a table again for that, as the amount of bits reserved for this value is variable:
Encoding Mode | Version 1-9 | Version 10-26 | Version 27-40 |
---|---|---|---|
Numeric | 10 | 12 | 14 |
Alphanumeric | 9 | 11 | 13 |
Byte | 8 | 16 | 16 |
Kanji | 8 | 10 | 12 |
Turning this into a handy function:
const LENGTH_BITS = [
[10, 12, 14],
[9, 11, 13],
[8, 16, 16],
[8, 10, 12]
];
function getLengthBits(mode, version) {
// ECI mode folds into byte mode
// Basically it's `Math.floor(Math.log2(mode))` but much faster
// See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/clz32
const modeIndex = 31 - Math.clz32(mode);
const bitsIndex = version > 26 ? 2 : version > 9 ? 1 : 0;
return LENGTH_BITS[modeIndex][bitsIndex];
}
Since getLengthBits(4, 2) === 8
we'll need 8 bits, and 23 (the length of our string) in binary is 10111, our first bits are:
01000001 0111....
Next, the real data. All we have to do is getting the code of the characters of the string in ISO-8859-1:
h t t p s : / / w w w . q r c o d e . c o m /
104 116 116 112 115 58 47 47 119 119 119 46 113 114 99 111 100 101 46 99 111 109 47
Now convert everything to binary and concatenate it to the previous sequence:
01000001 01110110 10000111 01000111 01000111
00000111 00110011 10100010 11110010 11110111
01110111 01110111 01110010 11100111 00010111
00100110 00110110 11110110 01000110 01010010
11100110 00110110 11110110 11010010 1111....
Now we have to put a termination block, which is exactly 4 zeroes, so the last codewords will be 11110000
. We still have 3 of the availabe 28 codewords to be filled.
Remaining space
We have filled all the 8 bits of the last codeword, otherwise we'd have to fill the remaining bits with zeroes (it's always like that in byte mode).
With the remaining codewords we can do two things:
- instead of the termination 4-bit block, we can put another encoding mode block (maybe a different one) and start another sequence - but with just 3 codewords we can't do much;
- fill the remaining codewords with the sequences 11101100 00010001 (which translates into 236 and 17 in decimal) until the limit is reached.
Why 236 and 17? I have no idea, but my guess is that they (Denso Wave?) made a lot of tries and verified that those two are the sequence that produces the most easily recognizable codes.
In the end we have:
65 118 135 71 71 7 51 162 242 247 119 119 114 231 23 38 54 246 70 82 230 54 246 210 240 236 17 236
Or, in binary:
01000001 01110110 10000111 01000111 01000111
00000111 00110011 10100010 11110010 11110111
01110111 01110111 01110010 11100111 00010111
00100110 00110110 11110110 01000110 01010010
11100110 00110110 11110110 11010010 11110000
11101100 00010001 11101100
Translating into code
Our function getByteData
will need three things:
- the content to be sequenced into codewords, of course;
- how many bits are needed to state the length of the content: as we've seen, this depends on the encoding mode (byte, in this case) and the version (for our case it's 8);
- the amount of codewords to be filled: this again depends on the QR code version and the error correction level (for our case it's 28).
For QR codes from version 10 and up we'll need 16 bits to express the length of our content, so the actual data will begin on the third codeword.
function getByteData(content, lengthBits, dataCodewords) {
const data = new Uint8Array(dataCodewords);
const rightShift = (4 + lengthBits) & 7;
const leftShift = 8 - rightShift;
const andMask = (1 << rightShift) - 1;
const dataIndexStart = lengthBits > 12 ? 2 : 1;
data[0] = 64 /* byte mode */ + (content.length >> (lengthBits - 4));
if (lengthBits > 12) {
data[1] = (content.length >> rightShift) & 255;
}
data[dataIndexStart] = (content.length & andMask) << leftShift;
for (let index = 0; index < content.length; index++) {
const byte = content.charCodeAt(index);
data[index + dataIndexStart] |= byte >> rightShift;
data[index + dataIndexStart + 1] = (byte & andMask) << leftShift;
}
const remaining = dataCodewords - content.length - dataIndexStart - 1;
for (let index = 0; index < remaining; index++) {
const byte = index & 1 ? 17 : 236;
data[index + content.length + 2] = byte;
}
return data;
}
At the end of this first step, we should have this result:
getByteData('https://www.qrcode.com/', 8, 28)
// Uint8Array(28)Ā [65, 166, 135, 71, 71, 7, 51, 162, 242, 247, 119, 119, 114, 231, 23, 38, 54, 246, 70, 82, 230, 54, 246, 210, 240, 236, 17, 236]
Remember: the above function will work for byte mode only!
Other encoding modes
We won't go into details for now, but for numeric mode we have to split the number in groups of 3 digits and encode each group with 10 bits (210 = 1024, so the space wasted is minimal).
Alphanumeric mode contains 45 symbols instead, so the string must be splitted into groups of 2 characters. Each symbol has a value (first the digits, then the uppercase Latin letters, then the space and the symbols $, %, *, +, \, -, ., /, :), so every pair of characters can be translated into a number ranging from 0 to 2024 (= 452 - 1). Therefore, we need 11 bits for every two alphanumeric characters (211 = 2048).
For Kanji mode... oh dear. First, we'd have to get the Shift JIS code of the pictogram, and the best way to do it is using a library like iconv-lite or, if you want to to it on your own, use its symbol table. Also, not every symbol can be used, but only those in the ranges 0x8140 to 0x9FFC and 0xE040 to 0xEBBF. In the end, a Kanji character will take 13 bits.
Next steps
Keep in touch, because it's all been quite easy so far. Now we'll have to deal with error data correction (EDC), and Math will be rolling!
Top comments (8)
That's horrible, and has a code smell of "I can write cute code" while making it clear that the author likes writing unmaintainable code.
I beg to differ.
I agree that's not pretty anche could be refactored in a slightly more readable way, but you're missing the big picture. The whole algorithm contains some quite advanced mathematics, and for the sole task to dive into this a double ternary - that I've already explained before - is basically the least of your problems. Speaks for itself the fact that in almost 3 years you're the only one that complained about this.
On the other hand, I must underline how your way of giving feedback is, without a doubt, horrible.
I was wondering a bit about the result from the getByteData function. The "hand code" section result differs slightly from the final output. It seems to be either the length bits or the first character that differs, IIUC.
"hand code": 65, 118, 135, 71, 71 ...
getByteData: 65, 166, 135, 71, 71 ...
Maybe I missed some difference. I get 65, 118 ... so I hope that's correct :)
Nice article and thanks for sharing with the community
Excited to read more of your writings
type error in the last code block
Uint8Array(26) -> Uint8Array(28)
Good catch, thanks š
Hi AutJ, welcome to the DEV Community! Iām a moderator here, and Iām glad to have you on board. You are joining a wonderful community and to ensure that DEV is a respectful and inclusive space for developers to learn, share, and connect.
Iām sure you will do a great job to contribute positively to this community. If you ever need any assistance or advice, please feel free to reach out to me or any other moderators. We are always here to help. š
firstfinger.in/signup-for-devmaste...
i didn't get the encoding. :(
h = 104 why make 65 ? why we don't cenvert [104 116 116 112 115 58 47 47 119 119 119 46 113 114 99 111 100 101
46 99 111 109 47] this array into binary..? :/