RogerBW's Blog

Perl Weekly Challenge 90: Ethiopian DNA 10 December 2020

I’ve been doing the Perl Weekly Challenges. The latest involved base sequences and a binary multiplication system. (Note that this is open until 13 December 2020.)

TASK #1 › DNA Sequence

DNA is a long, chainlike molecule which has two strands twisted into a double helix. The two strands are made up of simpler molecules called nucleotides. Each nucleotide is composed of one of the four nitrogen-containing nucleobases cytosine (C), guanine (G), adenine (A) and thymine (T).

You are given DNA sequence, GTAAACCCCTTTTCATTTAGACAGATCGACTCCTTATCCATTCTCAGAGATGTGTTGCTGGTCGCCG.

Write a script to print nucleiobase count in the given DNA sequence. Also print the complementary sequence where Thymine (T) on one strand is always facing an adenine (A) and vice versa; guanine (G) is always facing a cytosine (C) and vice versa.

This is a bit odd. Usually we get at least a couple of test cases, rather than just the single query. And why return the base count when this is just the length of the string? Hey ho. Let's just do this the really easy way, with regexp substitution.

sub gs {
  my $bs=shift;
  (my $cs=$bs) =~ s/G/X/g;
  $cs =~ s/C/G/g;
  $cs =~ s/X/C/g;
  $cs =~ s/A/X/g;
  $cs =~ s/T/A/g;
  $cs =~ s/X/T/g;
  return [length($bs),$cs];
}

Similarly in Raku, only with slighly different syntax:

  $cs ~~ s:g/C/G/;

For the other languages I used hashes/dicts and maps. Python:

def gs(bs):
    l={'A': 'T','T': 'A','C': 'G','G': 'C'}
    os=''.join(l[i] for i in bs)
    return [len(os),os]

Ruby:

def gs(bs)
  l={'A' => 'T', 'T' => 'A', 'C' => 'G', 'G' => 'C'}
  os=bs.chars.map{|i| l[i]}.join('')
  return [os.length(),os]
end

Rust (which does have maps, but I have to do so much variable furkling that they'd just reduce readability):

fn gs(bs: String) -> (i32, String) {
    let mut l: HashMap<char, char>=HashMap::new();
    l.insert('C','G');

(etc.)

    let mut os=String::new();
    for i in bs.chars() {
        os.push(*l.get(&i).unwrap());
    }
    return (os.len() as i32,os);
}

TASK #2 › Ethiopian Multiplication

You are given two positive numbers $A and $B.

Write a script to demonstrate Ethiopian Multiplication using the given numbers.

The supplied link makes it clear that this is the "Russian Peasant" variant; halve A, double B, if A on that row is odd then add B to the total.. So what I'll produce will look like, for 13×238:

   13   238 ->   238
    6   476
    3   952 ->   952
    1  1904 ->  1904
               -----
                3094

I'm arbitrarily restricting numbers to 5 digits, and using the same algorithm for each language: generate each line, push it into a list, then print out the whole list at the end. Nothing terribly clever here. Here's the Raku version:

sub em($aa,$bb) {
  my ($a,$b)=($aa,$bb);
  my $s=0;
  my @demo;
  while ($a > 0) {
    my $line=sprintf('%5d %5d',$a,$b);
    if ($a +& 1 == 1) {
      $s+=$b;
      $line ~= sprintf(' -> %5d',$b);
    }
    $a +>= 1;
    $b +<= 1;
    push @demo,$line;
  }
  push @demo,'               -----';
  push @demo,sprintf('               %5d',$s);
  for @demo {
    say $_;
  }
  return $s;
}

Python and Rust use their own distinct formatting systems rather than standard sprintf. I hope it makes them feel terribly special.

Full code on github.


  1. Posted by Peter at 11:35am on 10 December 2020

    "TASK #1 › DNA Sequence" is much easier to do using e.g. tr[TAGC][ATCG] rather than your search and replace gymnastics. The hash/dict wheeze can be done in Perl too — I've needed it at times — using the /e modifier such as in something like my %lut=qw{ T A A T G C C G }; s/(.)/$lut{$1}/ge. I would expect the tr form to be orders of magnitude faster for this specific problem.

    A lot of these so-called challenges do seem to boil down to "have you read perlfunc recently?" rather than anything particularly interesting or insightful.

    In Rust, I'd use a simple pattern match to transform the characters (match c { 'T' => 'A', 'A' => 'T', etc }. LLVM is fairly likely to recognise it as a substitution of one byte value with another and vectorise it into magic SSE instructions. Doing it using a HashMap/BTreeMap will completely foil it and the performance will be much worse; possibly even more so than Perl tr.

  2. Posted by RogerBW at 11:58am on 10 December 2020

    Indeed, I came late to tr and it's not really part of my lexicon even now.

    Now that I've coaxed Rust into doing some of the things I want it to do, I should probably try to tackle the book again and make sense of why the syntax works the way it does.

  3. Posted by Peter at 09:50am on 17 December 2020

    tr is important to know about for the things it is suited for, i.e. mainly programming problems checking to see if you're aware of this particular shibboleth. It can be quite handy for a quick skim of a string to see if it contains a particular character because it returns the count of substitutions, although s///g also returns the match count. Actual character substitutions seem relatively rare in real-world code. I think I mainly only do it in horrendous shell pipelines when dealing with broken commands which want or don't want NUL-terminated lines, and a quick tr(1) shows them who's the boss.

    The only time I've needed to do a one-for-one character substitution in sensible code in the last year was when hacking on the internals of a web framework to do the equivalent of tr[ ][+] and vice-versa in Rust. This is actually surprisingly annoying to do in-place on an &mut str without allocating a new String and copying into it because of UTF-8. (The web framework was using &mut [u8] because it couldn't guarantee the input was valid UTF-8 so neatly side-stepped that problem.) I have a draft blog post about that whcich prompted by one of your earlier Perl Weekly Challenges which I really ought to publish.

    If you have any difficult Rust questions, you have my email address...

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.

Search
Archive
Tags 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s 2300ad 3d printing action advent of code aeronautics aikakirja anecdote animation anime army astronomy audio audio tech base commerce battletech bayern beer boardgaming book of the week bookmonth chain of command children chris chronicle church of no redeeming virtues cold war comedy computing contemporary cornish smuggler cosmic encounter coup covid-19 crime crystal cthulhu eternal cycling dead of winter doctor who documentary drama driving drone ecchi economics en garde espionage essen 2015 essen 2016 essen 2017 essen 2018 essen 2019 essen 2022 essen 2023 essen 2024 existential risk falklands war fandom fanfic fantasy feminism film firefly first world war flash point flight simulation food garmin drive gazebo genesys geocaching geodata gin gkp gurps gurps 101 gus harpoon historical history horror hugo 2014 hugo 2015 hugo 2016 hugo 2017 hugo 2018 hugo 2019 hugo 2020 hugo 2021 hugo 2022 hugo 2023 hugo 2024 hugo-nebula reread in brief avoid instrumented life javascript julian simpson julie enfield kickstarter kotlin learn to play leaving earth linux liquor lovecraftiana lua mecha men with beards mpd museum music mystery naval noir non-fiction one for the brow opera parody paul temple perl perl weekly challenge photography podcast politics postscript powers prediction privacy project woolsack pyracantha python quantum rail raku ranting raspberry pi reading reading boardgames social real life restaurant reviews romance rpg a day rpgs ruby rust scala science fiction scythe second world war security shipwreck simutrans smartphone south atlantic war squaddies stationery steampunk stuarts suburbia superheroes suspense television the resistance the weekly challenge thirsty meeples thriller tin soldier torg toys trailers travel type 26 type 31 type 45 vietnam war war wargaming weather wives and sweethearts writing about writing x-wing young adult
Special All book reviews, All film reviews
Produced by aikakirja v0.1