Dyadic Grade

/

/

⎕io=0 is assumed throughout. The essay talks only about but the same ideas apply to .

Background

has the distinction of being the first (in 1980) APL primitive function defined on major cells: the result orders items of a vector, rows of a matrix, planes of a 3-d array, etc. In the ordering major cells are compared in ravelled order, with leading items being more significant than trailing (lexicographic ordering). Moreover, in dyadic grade ⍺⍋⍵, specifies “alphabets” to be used in comparing the items of character array .
Dyadic grade has always been an APL primitive which is hard for me to understand, in that way kind of like dyadic transpose ☺. I sat down to really understand it, starting from the simplest cases to the general case. The following is a record of my explorations.

Vector Left Argument

   gv← {⍋⍺⍳⍵}
   a0← 'abcdefghij'
   x0← 'chthonic'
   a0 gv x0
0 7 1 3 6 2 4 5
   a0 ⍋ x0
0 7 1 3 6 2 4 5
   x0 ⌷⍨ ⊂ a0 gv x0
cchhiton

That is, grade the indices of in . If an item of is not in then its index is ≢⍺.

Higher-Rank Left Argument with Unique Items

The coordinates of A[i;j;k;…] or A[⊂i,j,k,…] is the vector i,j,k,…. The phrase ⍳⍴A produces the array of coordinates. For example, if is the (2 26)-matrix of the upper and lower case English letters,

   ABCDEFGHIJKLMNOPQRSTUVWXYZ
   abcdefghijklmnopqrstuvwxyz

the corresponding coordinates are

   ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬   ┬────┬────┐
   │0 0│0 1│0 2│0 3│0 4│0 5│0 6│0 7│0 8│   │0 24│0 25│
   ├───┼───┼───┼───┼───┼───┼───┼───┼───┼ … ├────┼────┤
   │1 0│1 1│1 2│1 3│1 4│1 5│1 6│1 7│1 8│   │1 24│1 25│
   └───┴───┴───┴───┴───┴───┴───┴───┴───┴   ┴────┴────┘

If the items of are unique,

   gu← {⍋ 0 2 1 ⍉ (⊂(,⍺)⍳⍪⍵) ⌷ ⌽ (⍴⍺) ⍪⍨ ⍉(⍴⍺)⊤⍳×/⍴⍺}

That is, ⍺⍋⍵ obtains as the grade of the reversed coordinates of in . (If an item does not occur in , its coordinates are ⍴⍺.) The implements that in , the first axis is least significant and the last axis is most significant. For the (2 26)-matrix above, case (the first axis) is less significant than A-Z and a-z (the last axis).

   ⊢ a1←' ',⎕av[(⎕av⍳'Aa')∘.+⍳26]
 ABCDEFGHIJKLMNOPQRSTUVWXYZ
 abcdefghijklmnopqrstuvwxyz
   ⊢ x1←↑' '(≠⊆⊢)' Jay roger Roger adam Adam jay'
Jay
roger
Roger
adam
Adam
jay
   a1 gu x1
4 3 0 5 2 1
   a1 ⍋ x1
4 3 0 5 2 1
   x1 ⌷⍨ ⊂ a1 gu x1
Adam
adam
Jay
jay
Roger
roger

Higher-Rank Left Arguments

Suppose does have duplicates? For purposes of , the coordinates of an item c are

   ⌊⌿(c=,⍺)⌿↑,⍳⍴⍺

That is, the minimum of coordinates of all items equal to c. Note that the expression also works if c is a unique item. Therefore, for a general , with or without duplicates, ⍺⍋⍵ obtains as

   gr← {⍋ 0 2 1 ⍉ (⊂(∪,⍺)⍳⍪⍵) ⌷ ⌽ (⍴⍺) ⍪⍨ (,⍺) {⌊⌿⍵}⌸ ⍉(⍴⍺)⊤⍳×/⍴⍺}

The “minimum of coordinates” computation is exploited to effect equal coodinates for disparate characters. For example, an ordering where upper and lower case are significant but diacritical marks are not, can be implemented as follows:

   A    ⍝ A has a leading blank column
 AÀÁÂÃÄÅBCÇDEÈÉÊËFGHIÌÍÎÏJKLMNÑOÒÓÔÕÖØPQRSTUÙÚÛÜVWXYÝZ
 aàáâãäåbcçdeèéêëfghiìíîïjklmnñoòóôõöøpqrstuùúûüvwxyýz
 À       Ç  È       Ì        Ñ Ò                   Ý
 Á       ç  É       Í        ñ Ó                   ý
 Â          Ê       Î          Ô
 Ã          Ë       Ï          Ö
 Ä          è       ì          Õ
 Å          é       í          Ø
 à          ê       î          ò
 á          ë       ï          ó
 â                             ô
 ã                             õ
 ä                             ö
 å                             ø
   ⍴A
14 54
   ('È'=,A)⌿↑,⍳⍴A                ('è'=,A)⌿↑,⍳⍴A
0 13                          1 13
2 12                          6 12
   ⌊⌿('È'=,A)⌿↑,⍳⍴A              ⌊⌿('è'=,A)⌿↑,⍳⍴A
0 12                          1 12
   ('E'=,A)⌿↑,⍳⍴A                ('e'=,A)⌿↑,⍳⍴A
0 12                          1 12
   ⌊⌿('E'=,A)⌿↑,⍳⍴A              ⌊⌿('e'=,A)⌿↑,⍳⍴A
0 12                          1 12

'È' occurs twice in A with coordinates 0 13 and 2 12. The coordinates assigned to 'È' are the minimum of these, 0 12. In contrast, 'E' occurs once and its coordinates are 0 12, the same as those for 'È'. Therefore, 'E' and 'È' are considered equal for purposes of dyadic grade. Similarly, 'e' and 'è' have coordinates 1 12 and are considered equal by , but they follow 'E' and 'È' (because their coordinates are 0 12).
For example:

   ⊢ x← ↑' '(≠⊆⊢)' roger adàm Röger rÖger Adåm JÃY JAY JÃY adåm adàm'
roger
adàm
Röger
rÖger
Adåm
JÃY
JAY
JÃY
adåm
adàm
   A gr x
4 1 8 9 5 6 7 2 3 0
   A ⍋ x
4 1 8 9 5 6 7 2 3 0
   x ⌷⍨⊂ A gr x
Adåm
adàm
adåm
adàm
JÃY
JAY
JÃY
Röger
rÖger
roger

Lest you think that dyadic grade in its full generality suffices to implement any ordering: in “telephone book” ordering, “1600 Pennsylvania Avenue” and “Unter den Linden 23” are ordered as if 1600 were spelled out as “Sixteen Hundred” and 23 as “Dreiundzwanzig”. A program to do that ought to be très amusant.

Code Archeology

The above code are improved versions of what appeared in Peter Wooster, Extended Upgrade and Downgrade, SHARP APL Technical Notes 35, I.P. Sharp Associates, 1980-09-15. It is interesting to study the code from the two eras. (The code from 1980 is lightly edited for executability and clarity.)
2018

gv← {⍋⍺⍳⍵}
gu← {⍋ 0 2 1 ⍉ (⊂(,⍺)⍳⍪⍵) ⌷ ⌽ (⍴⍺) ⍪⍨ ⍉(⍴⍺)⊤⍳×/⍴⍺}
gr← {⍋ 0 2 1 ⍉ (⊂(∪,⍺)⍳⍪⍵) ⌷ ⌽ (⍴⍺) ⍪⍨ (,⍺) {⌊⌿⍵}⌸ ⍉(⍴⍺)⊤⍳×/⍴⍺}

1980

eu← {d⊤⍳×/d←⍴⍵}
er← {¯1+÷(÷1+d⊤⍳×/d←⍴⍵)⌈.×a∘.=a←,⍵}
fv← {⍋⍺⍳⍵}
fu← {⍋(⍒0 1,1↓0×⍳⍴⍴⍵)⍉(⊖(eu ⍺),⍴⍺)[;(,⍺)⍳⍵]}
fr← {⍋(⍒0 1,1↓0×⍳⍴⍴⍵)⍉(⊖(er ⍺),⍴⍺)[;(,⍺)⍳⍵]}
gv, fv vector left argument
gu, fu higher-ranked left argument with unique items
gr, fr higher-ranked left argument

In the sequence gv gu gr, a function is more general than the preceding one and subsumes it. Likewise fv fu fr.
Comparison of the code illustrates advances in APL between 1980 and 2018:

  • {⌊⌿⍵}⌸ minimum of major cells corresponding to identical keys
  • ∪      unique items
  • ⍪⍵     ravel major cells
  • ⍺⍪⍵    catenate on first axis
  • ⍨      commute operator
  • dfns

Alternatives

If a left argument is large and complicated and is used repeatedly, it may be worthwhile for the APL interpreter to perform precomputations on it. Thus:

   U← ∪,A
   C← ⌽ (⍴A) ⍪⍨ (,A) {⌊⌿⍵}⌸ ⍉(⍴A)⊤⍳×/⍴A
   ⍴U        ⍴C
107       108 2
   ⍪U        C
           0  0
A          1  0
À          1  0
Á          1  0
          1  0
à         1  0
Ä          1  0
Å          1  0
B          8  0
C          9  0
Ç          9  0
…           …
x         50  1
y         51  1
ý         51  1
z         53  1
          14 54
   gp← (U C)∘{U C←⍺ ⋄ ⍋0 2 1⍉C[U⍳⍪⍵;]}
   gp x
4 1 8 9 5 6 7 2 3 0
   A ⍋ x
4 1 8 9 5 6 7 2 3 0

It makes sense that the number of columns in the coordinate matrix C is equal to the rank of the alphabet array A: The rank is the number of dimensions, a-z, upper/lower case, color, etc.; each row of C is a vector of the numeric value for each dimension.
With 20/20 hindsight, the above code can be seen as an argument against defining dyadic grade to do ordering with specified alphabets. After all,

   ⍺⍋⍵  ←→  ⍋0 2 1⍉C[U⍳⍪⍵;]

and specifying U and C directly makes the computation easier to understand, easier to use, and as it happens is faster than the primitive in the current implementation. The inverse calculation, from U C to the alphabet array A, is an amusing bit of code left as an exercise for the reader☺.
One can further argue that the current definition of dyadic grade precludes an alternative attractive but incompatible definition:

   ⍺⍋⍵  ←→  ⍺⌷⍨⊂⍋⍵

That is, order by the grade of , whence ⍋⍨ sorts. In Dyalog APL version 17.0, monadic grade is extended to work with a TAO (total array ordering). With a TAO and this alternative definition, ⍋⍨ sorts any array.
The present exposition exposes a difficulty with extending the current dyadic grade to work with TAO: It is axiomatic that monadic grade compares cells itemwise, stopping at the first pair of unequal items. Dyadic grade does not do that in general. For example, with an upper and lower case alphabet, you don’t stop comparing 'Rogerz' and 'rogers' on encountering 'R' and 'r'.

Leave a Reply

Your email address will not be published. Required fields are marked *