RogerBW's Blog: The Weekly Challenge 190: Decoding Capital

I’ve been doing the Weekly Challenges. The latest involved checking capitalisation and decoding an ambiguous string. (Note that this is open until 13 November 2022.)

Task 1: Capital Detection

You are given a string with alphabetic characters only: A..Z and a..z.

Write a script to find out if the usage of Capital is appropriate if it satisfies at least one of the following rules:
1) Only first letter is capital and all others are small.
2) Every letter is small.
3) Every letter is capital.

Although in Perl I reach for them anyway, I think this is very clearly a job for regular expressions. Each formulation can easily be coded into a regexp:

[A-Z][a-z]+
[a-z]+
[A-Z]+

then it's just a matter of anchoring them (i.e. the whole input string must match the pattern) and testing.

sub capitaldetection($s) {
    if ($s =~ /^([A-Z]+|[a-z]+|[A-Z][a-z]+)$/) {
        return 1;
    } else {
        return 0;
    }
}

Lua patterns are regexp-like but don't have alternation, so there I check each option separately. And PostScript of course doesn't have a regexp library at all, so… I start with a crude upper-case tester (I don't care about characters which aren't in A-Z or a-z):

/uppercase {
    (_) 0 get lt {
        true
    } {
        false
    } ifelse
} bind def

and then looking at the first two characters of the string tells me what cases the rest may be.

/capitaldetection {
    4 dict begin
    /s exch def
    /first s 0 get uppercase def
    /second s 1 get uppercase def
    first not second and {
        false
    } {
        true
        2 1 s length 1 sub {
            s exch get uppercase /this exch def
            first second and this not and { % ABc
                pop false exit
            } if
            second not this and { % AbC or abC
                pop false exit
            } if
        } for
    } ifelse
    end
} bind def

Task 2: Decoded List

You are given an encoded string consisting of a sequence of numeric characters: 0..9, $s.

Write a script to find all the valid different decodings in sorted order.

Encoding is simply done by mapping A,B,C,D,... to 1,2,3,4,... etc.

So for example "11" can be (1, 1) = "AA", or (11) = "K".

I used my standard breadth-first search pattern, stuffing valid match patterns into hash keys to avoid possible duplication (not sure this can actually happen but it was an excuse for making arrays into hash keys), and then decoded the results. Most of the complexity is in the string-slicing (and in Rust I ended up using a Vec<char> for simplicity).

Python:

from collections import deque

def decodedlist(s):
  stack = deque()

A stack entry consists of a list of strings: the last entry in the list is the remaining un-parsed part.

  stack.append([s])
  out = set()
  while True:
    ent = stack.popleft()
    tail = ent.pop()

If there's no more unparsed content, we have a full parse; stick it on the output list.

    if len(tail) == 0:
      out.add(tuple(ent))
    else:

If we get here there's at least one un-parsed character. "0" isn't a valid character index, but any other number is.

      if tail[0] != "0":
        q = ent.copy()
        q.append(tail[0])
        q.append(tail[1:])
        stack.append(q)

If there are two or more un-parsed characters, there may be a decoding: "26" is valid, but "27" is not. I could do this with regexps but instead I decode it and take it as a number to check for validity.

      if len(tail) >= 2:
        v = int(tail[0:2])
        if v >= 1 and v <= 26:
          q = ent.copy()
          q.append(tail[0:2])
          q.append(tail[2:])
          stack.append(q)
    if len(stack) == 0:
      break

Now we build the output list: alphazero is the character offset (which in ASCII will be 64, but this code can in theory work in non-ASCII environments as long as the alphabet is sequential).

  k = []
  alphazero = ord("A") - 1

Decode each character, join them into a string, and put the result in the list to be returned.

  for x in out:
    k.append("".join(chr(int(cs)+alphazero) for cs in x))
  k.sort()
  return k

Full code on github.

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.