RogerBW's Blog: The Weekly Challenge 365: Alphabet Digit Counter Token

I’ve been doing the Weekly Challenges. The latest involved string mangling and regular expressions. (Note that this ends today.)

Task 1: Alphabet Index Digit Sum

You are given a string $str consisting of lowercase English letters, and an integer $k.

Write a script to convert a lowercase string into numbers using alphabet positions (a=1 — z=26), concatenate them to form an integer, then compute the sum of its digits repeatedly $k times, returning the final value.

Getting character codes is one of those things that varies hugely across languages. JavaScript:

function alphabetindexdigitsum(a, k) {

Start my working string.

    let st = "";

Look at each character in the input string

    for (let c of a.split("")) {

Calculate its alphabetic code, and append the ASCII representation of the base-10 representation of that code to the working string.

        st += (c.charCodeAt(0) - 96);
    }

Convert that to an integer. (Not strictly necessary here I think, since floppy types will probably treat it as a string anyway, but I solve these first in Rust, and its type enforcement has been so good for spotting the kind of trivial error I used to make a lot in Perl that I tend to do it explicitly elsewhere too.

    let v = 0 + st;

Run through a number of cycles.

    for (let _dummy = 0; _dummy < k; _dummy++) {

Of course I could convert the number back to a string, split it into digit characters and add them together. But I like to avoid type conversions where that's possible, so I do it mathematically instead. (This would conveniently also work for base 2, base 327, or any other base.)

        let j = 0;
        while (v > 0) {
            j += v % 10;
            v = Math.floor(v / 10);
        }
        v = j;
    }

Return the final result.

    return v;
}

Task 2: Valid Token Counter

You are given a sentence.

Write a script to split the given sentence into space-separated tokens and count how many are valid words. A token is valid if it contains no digits, has at most one hyphen surrounded by lowercase letters, and at most one punctuation mark (!, ., ,) appearing only at the end.

Since this is essentially a ladder of regular expressions connected by simple logic, it looks very much the same in every language, so I didn't bother for most of them. (And Raku's weird divergent syntax for its "regular expressions" just irks me.)

Perl:

sub validtokencounter($a) {

Initialise the counter for the final result.

  my $count = 0;

Look at each word-token.

  foreach my $k (split ' ', $a) {

Check that it contains no digits.

    if ($k =~ /[0-9]/) {
      next;
    }

Check that it doesn't have multiple dashes.

    if ($k =~ /-.*-/) {
      next;
    }

Chec that, if it does have a dash, that dash is surrounded by letters.

    if ($k =~ /-/ &&
        $k !~ /[a-z]-[a-z]/) {
      next;
    }

Check that there is no punctuation mark followed by another character. (This combines "at most one punctuation mark" and "appearing only at the end".)

    if ($k =~ /[.,!]./) {
      next;
    }

We've passed all the tests, so increment the counter.

    $count += 1;
  }
  $count;
}

Full code on codeberg.

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.