Case Conversion: Mapping and Folding

/

/

Dyalog v18.0 introduced ⎕C, which converts the case of characters in an array by mapping to lower case, mapping to upper case, or folding. This superseded an earlier experimental I-beam (819⌶) that could map to lower or upper case but not fold – this I-beam is currently deprecated and will be removed in Dyalog v20.0.

There’s often confusion about the difference between mapping and folding. Mapping is used when you want characters in an array to be in a particular case, whereas folding is used when you want to eliminate case to perform case-insensitive comparisons. The confusion arises because case insensitive comparisons can usually be done well by mapping to lower case rather than folding – indeed, on first inspection, mapping to lower case and folding appear to be the same thing.

The difference is best illustrated through some examples. Used monadically, the I-beam maps to lower case and ⎕C folds:

      819⌶'Hello'
hello
      ⎕C'Hello'
hello

A case-insensitive match function that uses 819⌶ to map both arguments to lower case before checking if they match might look like this:

      cc←≡⍥(819⌶)

In many cases it will appear to work without issues:

      'hello' cc 'HELLO'
1
      'hello' cc 'GOODBYE'
0

However, it doesn’t always work. Greek, for example, has two different lower-case sigma characters (σ and ς) but only one upper case (Σ). ίσως and ΊΣΩΣ are case-insensitively equal, but the function does not work:

      'ίσως' cc 'ΊΣΩΣ'
0

This is because when these two arrays are mapped to lower case they become ίσως and ίσωσ respectively, which do not match. If we use folding instead:

      cc←≡⍥⎕C

the comparisons work as expected:

      'hello' cc 'HELLO'
1
      'hello' cc 'GOODBYE'
0
      'ίσως' cc 'ΊΣΩΣ'
1

This works because folding converts the Greek words to ίσωσ and ίσωσ respectively – every different sigma character, even the lower-case ones, have been changed to σ, and now the two arrays match.

The main use of case conversion is to perform caseless comparisons, so you might wonder why mapping to upper and lower case is supported at all. There are still occasions where you might need that – most notably, when formatting text for display.

If you are still using 819⌶ to perform caseless comparison you should change to using ⎕C to get correct behaviour. And do not forget that 819⌶ will not be supported beyond Dyalog v19.0.

Want to learn more? Adám Brudzewsky explains mapping and folding in this webinar, beginning at 00:06:33.

Leave a Reply

Your email address will not be published. Required fields are marked *