Solving the 2014 APL Problem Solving Competition – How Tweet It Is

/

/

This post is the continuation of the series where we examine some of the problems selected for the 2014 APL Problem Solving Competition.
The problems presented in Phase 1 of the competition were selected because they could be solved succinctly, generally in a single line of APL code. This makes them well suited for experimentation on TryAPL.org.

Problem 2 of Phase 1, entitled “How tweet it is” reads

“Twitter messages have a 140 character limit – what if the limit was even shorter? One way to shorten the message yet retain most readability is to remove interior vowels from its words. Write a dfn which takes a character vector and removes the interior vowels from each word.”

Test cases:
      {your_solution} 'if you can read this, it worked!'
if yu cn rd ths, it wrkd!
      {your_solution} 'APL is REALLY cool'
APL is RLLY cl
      {your_solution} '' ⍝ an empty vector argument should return an empty vector
      {your_solution} 'a' ⍝ your solution should work with a single character message
a

We’ll examine a couple of approaches to this problem – one that’s more “traditional APL” code, and another that makes use of a really helpful Dyalog feature.
This problem could be restated as “find and remove the vowels that aren’t at the beginning or end of a word”. To start with, we need to determine where the words are and where the vowels are. A word is a contiguous set of letters; multiple words are separated by spaces or punctuation. For simplicity’s sake, we’ll ignore contractions and possessives.

The “Traditional APL” Approach

This approach employs a technique that is not commonly found outside of APL and its brethren – using a Boolean vector to determine which elements to remove or keep. First, let’s find where all the vowels are:

      string←'If you can read this, it worked!'
      vowels←{⍵∊'aeiouAEIOU'}
      vowels string
1 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0

To help illustrate what’s happening, I’ll write a little operator called “show” to compactly display the string, the Boolean vector, and the elements that would be selected by applying the Boolean to the string.

      show←{⍵⍪⍵{↑(1 0⍕ ⍵)(⍵\⍵/⍺)}⍺⍺ ⍵}
      vowels show string
If you can read this, it worked!
10001100100011000010001000100100
I   ou  a   ea    i   i   o  e

Next we want to remove vowels that aren’t at either end of a word. First, find where the words are by finding where the letters are.  There are several ways to do this; the most obvious may be to use a character vector constant:

      letters←{⍵∊'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'}

Long character constants seem a bit awkward to me.  So, another technique uses the Unicode Conversion system function to return the 26 characters starting at the code points for each of ‘a’ and ‘A’:

      letters←{⍵∊⎕UCS (⍳26)∘.+¯1+⎕UCS'aA'}

Yet another way might be to use the code point values directly and do numerical operations:

      letters←{{((⍵≥65)∧⍵≤90)∨(⍵≥97)∧⍵≤122}⎕UCS ⍵}

Which technique you choose is largely a matter of taste and style. All three return the same result and have comparable performance. My personal preference is the second one – it has fewer characters for me to mistype 🙂

      letters show string
If you can read this, it worked!
11011101110111101111001101111110
If you can read this  it worked

So now let’s mark the interior letters of the words. This employs a technique known as shift and compare that I learned in the early 1980s when I was privileged to work with Bob Smith. Among Bob’s many contributions to the APL world was a book on Boolean Functions and Techniques. To mark the interior letters, we’ll do both a right and left shift:

      interior←{⍵∧(¯1↓0,⍵)∧1↓⍵,0}
      {interior letters ⍵} show string
If you can read this, it worked!
00001000100011000110000000111100
    o   a   ea   hi       orke

The last step is to find interior vowels and negate:

      {(vowels ⍵)∧interior letters ⍵} show string
If you can read this, it worked!
00001000100011000010000000100100
    o   a   ea    i       o  e
      {(vowels ⍵)⍲interior letters ⍵} show string
If you can read this, it worked!
11110111011100111101111111011011
If y u c n r  d th s, it w rk d!

Putting it all together…

      tweet←{⍵/⍨~(⍵∊'aeiouAEIOU')∧{(1↓⍵,0)∧¯1↓0,⍵}⍵∊⎕UCS(⍳26)∘.+¯1+⎕UCS'aA'}
      tweet string
If yu cn rd ths, it wrkd!

The Other Technique – Using Regular Expressions

In version 13.0, Dyalog introduced the system functions ⎕S and ⎕R as interfaces to the PCRE (Perl Compatible Regular Expression) library. Like APL, regular expressions may seem a bit alien at first, but in the years since their incorporation into Dyalog, I’ve grown to appreciate their power and flexibility – they can frequently accomplish complex string manipulations more succinctly than their APL equivalents thus furthering Dyalog’s power as a tool of thought, notation and execution.

      tweet←{('\B[AEIOU]\B' ⎕R '' ⍠ 1) ⍵}
      tweet string
If yu cn rd ths, it wrkd!

The expression above replaces any vowel (⍠ 1means case-insensitive) that is not at the beginning or end of a word with the empty vector, effectively removing the interior vowels. A blog post is not enough space to give an adequate overview of regular expressions. But I hope the expression above piques your interest and encourages you to experiment with ⎕S and ⎕R on TryAPL.org or with a Dyalog system of your own.

Leave a Reply

Your email address will not be published. Required fields are marked *