# Coverage This file is just a collection of tests designed to activate code in MD4C which may otherwise be hard to hit. It's to improve our test coverage. ## `md_is_unicode_whitespace__()` Unicode whitespace (here U+2000) forms a word boundary so these cannot be resolved as emphasis span because there is no closer mark. ```````````````````````````````` example *foo *bar .

*foo *bar

```````````````````````````````` ## `md_is_unicode_punct__()` Ditto for Unicode punctuation (here U+00A1). ```````````````````````````````` example *foo¡*bar .

*foo¡*bar

```````````````````````````````` ## `md_get_unicode_fold_info()` ```````````````````````````````` example [Příliš žluťoučký kůň úpěl ďábelské ódy.] [PŘÍLIŠ ŽLUŤOUČKÝ KŮŇ ÚPĚL ĎÁBELSKÉ ÓDY.]: /url .

Příliš žluťoučký kůň úpěl ďábelské ódy.

```````````````````````````````` ## `md_decode_utf8__()` and `md_decode_utf8_before__()` ### Alphanumerical Character (i.e. not whitespace, not punctuation) Non-whitespace & non-punctuation characters below suppress `_` from being recognized as an emphasis because `_` should be seen as in-word character: Example of 1-byte UTF-8 sequence (U+0058): ```````````````````````````````` example X__foo__X .

X__foo__X

```````````````````````````````` Example of 2-byte UTF-8 sequence (U+0158): ```````````````````````````````` example Ř__foo__Ř .

Ř__foo__Ř

```````````````````````````````` Example of 3-byte UTF-8 sequence (U+0BA3): ```````````````````````````````` example ண__foo__ண .

ண__foo__ண

```````````````````````````````` Example of 4-byte UTF-8 sequence (U+13142): ```````````````````````````````` example 𓅂__foo__𓅂 .

𓅂__foo__𓅂

```````````````````````````````` ### Whitespace character Whitespace on the other hand should not suppress `_`: Example of 1-byte UTF-8 sequence (U+0009): ```````````````````````````````` example x→__foo__→ .

x foo

```````````````````````````````` (The initial `x` to suppress indented code block.) Example of 2-byte UTF-8 sequence (U+00A0): ```````````````````````````````` example __foo__ .

foo

```````````````````````````````` Example of 3-byte UTF-8 sequence (U+2000): ```````````````````````````````` example  __foo__ .

 foo 

```````````````````````````````` (AFAIK, there is no 4-byte UTF-8 whitespace.) ### Punctuation character Punctuation also should not suppress `_`: Example of 1-byte UTF-8 sequence (U+002E): ```````````````````````````````` example .__foo__. .

.foo.

```````````````````````````````` Example of 2-byte UTF-8 sequence (U+00B7): ```````````````````````````````` example ·__foo__· .

·foo·

```````````````````````````````` Example of 3-byte UTF-8 sequence (U+0C84): ```````````````````````````````` example ಄__foo__಄ .

foo

```````````````````````````````` Example of 4-byte UTF-8 sequence (U+1039F): ```````````````````````````````` example 𐎟__foo__𐎟 .

𐎟foo𐎟

```````````````````````````````` ## `md_is_link_destination_A()` ```````````````````````````````` example [link]() .

link

```````````````````````````````` ## `md_link_label_eq()` ```````````````````````````````` example [foo bar] [foo bar]: /url .

foo bar

```````````````````````````````` ## `md_is_inline_link_spec()` ```````````````````````````````` example > [link](/url 'foo > bar') .

link

```````````````````````````````` ## `md_build_ref_def_hashtable()` All link labels in the following example all have the same FNV1a hash (after normalization of the label, which means after converting to a vector of Unicode codepoints and lowercase folding). So the example triggers quite complex code paths which are not otherwise easily tested. ```````````````````````````````` example [foo]: /foo [qnptgbh]: /qnptgbh [abgbrwcv]: /abgbrwcv [abgbrwcv]: /abgbrwcv2 [abgbrwcv]: /abgbrwcv3 [abgbrwcv]: /abgbrwcv4 [alqadfgn]: /alqadfgn [foo] [qnptgbh] [abgbrwcv] [alqadfgn] [axgydtdu] .

foo qnptgbh abgbrwcv alqadfgn [axgydtdu]

```````````````````````````````` For the sake of completeness, the following C program was used to find the hash collisions by brute force: ~~~ #include #include static unsigned etalon; #define MD_FNV1A_BASE 2166136261 #define MD_FNV1A_PRIME 16777619 static inline unsigned fnv1a(unsigned base, const void* data, size_t n) { const unsigned char* buf = (const unsigned char*) data; unsigned hash = base; size_t i; for(i = 0; i < n; i++) { hash ^= buf[i]; hash *= MD_FNV1A_PRIME; } return hash; } static unsigned unicode_hash(const char* data, size_t n) { unsigned value; unsigned hash = MD_FNV1A_BASE; int i; for(i = 0; i < n; i++) { value = data[i]; hash = fnv1a(hash, &value, sizeof(unsigned)); } return hash; } static void recurse(char* buffer, size_t off, size_t len) { int ch; if(off < len - 1) { for(ch = 'a'; ch <= 'z'; ch++) { buffer[off] = ch; recurse(buffer, off+1, len); } } else { for(ch = 'a'; ch <= 'z'; ch++) { buffer[off] = ch; if(unicode_hash(buffer, len) == etalon) { printf("Dup: %.*s\n", (int)len, buffer); } } } } int main(int argc, char** argv) { char buffer[32]; int len; if(argc < 2) etalon = unicode_hash("foo", 3); else etalon = unicode_hash(argv[1], strlen(argv[1])); for(len = 1; len <= sizeof(buffer); len++) recurse(buffer, 0, len); return 0; } ~~~ ## Flag `MD_FLAG_COLLAPSEWHITESPACE` ```````````````````````````````` example foo bar → baz .

foo bar baz

. --fcollapse-whitespace ````````````````````````````````