# Coverage This file is just a collection of tests designed to activate code in MD4C which may otherwise be hard to hit. It's to improve our test coverage. ## `md_is_unicode_whitespace__()` Unicode whitespace (here U+2000) forms a word boundary so these cannot be resolved as emphasis span because there is no closer mark. ```````````````````````````````` example *foo *bar .
*foo *bar
```````````````````````````````` ## `md_is_unicode_punct__()` Ditto for Unicode punctuation (here U+00A1). ```````````````````````````````` example *foo¡*bar .*foo¡*bar
```````````````````````````````` ## `md_get_unicode_fold_info()` ```````````````````````````````` example [Příliš žluťoučký kůň úpěl ďábelské ódy.] [PŘÍLIŠ ŽLUŤOUČKÝ KŮŇ ÚPĚL ĎÁBELSKÉ ÓDY.]: /url .Příliš žluťoučký kůň úpěl ďábelské ódy.
```````````````````````````````` ## `md_decode_utf8__()` and `md_decode_utf8_before__()` ### Alphanumerical Character (i.e. not whitespace, not punctuation) Non-whitespace & non-punctuation characters below suppress `_` from being recognized as an emphasis because `_` should be seen as in-word character: Example of 1-byte UTF-8 sequence (U+0058): ```````````````````````````````` example X__foo__X .X__foo__X
```````````````````````````````` Example of 2-byte UTF-8 sequence (U+0158): ```````````````````````````````` example Ř__foo__Ř .Ř__foo__Ř
```````````````````````````````` Example of 3-byte UTF-8 sequence (U+0BA3): ```````````````````````````````` example ண__foo__ண .ண__foo__ண
```````````````````````````````` Example of 4-byte UTF-8 sequence (U+13142): ```````````````````````````````` example 𓅂__foo__𓅂 .𓅂__foo__𓅂
```````````````````````````````` ### Whitespace character Whitespace on the other hand should not suppress `_`: Example of 1-byte UTF-8 sequence (U+0009): ```````````````````````````````` example x→__foo__→ .x foo
```````````````````````````````` (The initial `x` to suppress indented code block.) Example of 2-byte UTF-8 sequence (U+00A0): ```````````````````````````````` example __foo__ .foo
```````````````````````````````` Example of 3-byte UTF-8 sequence (U+2000): ```````````````````````````````` example __foo__ .foo
```````````````````````````````` (AFAIK, there is no 4-byte UTF-8 whitespace.) ### Punctuation character Punctuation also should not suppress `_`: Example of 1-byte UTF-8 sequence (U+002E): ```````````````````````````````` example .__foo__. ..foo.
```````````````````````````````` Example of 2-byte UTF-8 sequence (U+00B7): ```````````````````````````````` example ·__foo__· .·foo·
```````````````````````````````` Example of 3-byte UTF-8 sequence (U+0C84): ```````````````````````````````` example ಄__foo__಄ .಄foo಄
```````````````````````````````` Example of 4-byte UTF-8 sequence (U+1039F): ```````````````````````````````` example 𐎟__foo__𐎟 .𐎟foo𐎟
```````````````````````````````` ## `md_is_link_destination_A()` ```````````````````````````````` example [link]() . ```````````````````````````````` ## `md_link_label_eq()` ```````````````````````````````` example [foo bar] [foo bar]: /url . ```````````````````````````````` ## `md_is_inline_link_spec()` ```````````````````````````````` example > [link](/url 'foo > bar') .```````````````````````````````` ## `md_build_ref_def_hashtable()` All link labels in the following example all have the same FNV1a hash (after normalization of the label, which means after converting to a vector of Unicode codepoints and lowercase folding). So the example triggers quite complex code paths which are not otherwise easily tested. ```````````````````````````````` example [foo]: /foo [qnptgbh]: /qnptgbh [abgbrwcv]: /abgbrwcv [abgbrwcv]: /abgbrwcv2 [abgbrwcv]: /abgbrwcv3 [abgbrwcv]: /abgbrwcv4 [alqadfgn]: /alqadfgn [foo] [qnptgbh] [abgbrwcv] [alqadfgn] [axgydtdu] .
foo qnptgbh abgbrwcv alqadfgn [axgydtdu]
```````````````````````````````` For the sake of completeness, the following C program was used to find the hash collisions by brute force: ~~~ #includefoo bar baz
. --fcollapse-whitespace ````````````````````````````````