pub fn decode_utf8_chars(input: &mut &str) -> ModalResult<String>
Expand description
Decodes UTF-8 characters from a string using MTREE-specific escape sequences.
MTREE uses various decodings.
- the VIS_CSTYLE encoding of
strsvis(3)
, which encodes a specific set of characters. Of these, only the following control characters are allowed in filenames:- \s Space
- \t Tab
- \r Carriage Return
- \n Line Feed
#
is encoded as\#
to differentiate between comments.- For all other chars, octal triplets in the style of
\360\237\214\240
are used. Checkunicode_char
for more info.
ยงSolution
To effectively decode this pattern we use winnow instead of a handwritten parser, mostly to have convenient backtracking and error messages in case we encounter invalid escape sequences or malformed escaped UTF-8.