RogerBW's Blog

The Weekly Challenge 165: Scaling the Fits 19 May 2022

I’ve been doing the Weekly Challenges. The latest involved generating SVG and fitting lines to points. (Note that this is open until 22 May 2022.)

Task 1: Scalable Vector Graphics (SVG)

Your task is to accept a series of points and lines in the following format, one per line, in arbitrary order:

For anything more than these small examples I'd have used a dedicated XML generator (I've used Perl's SVG library before but found it too restrictive, while XML::LibXML handles the fiddly stuff while still letting me tweak details, and doesn't simply force me to learn a different version of the same syntax). But since I don't have to worry about escaping characters in this case, and since I'll have to write the raw code anyway for languages that don't have an XML library…

I've borrowed the styles from the task 2 example.

Raku:

my @points;
my @lines;

my @x;
my @y;

Read the format, split on commas; we should have either two or four numbers. Put all the X and Y coordinates into lists of their own, which I'll use for bounds calculation, and the lines and points into lists of lines and points.

for lines() {
    .chomp;
    my @f=split(",",$_,:skip-empty);
    for 0..@f.list -> $i {
        if ($i % 2 == 0) {
            push @x,@f[$i];
        } else {
            push @y,@f[$i];
        }
    }
    if (@f.elems == 4) {
        push @lines,@f;
    } elsif (@f.elems == 2) {
        push @points,@f;
    } else {
        die "Invalid input $_\n";
    }
}

Bounds calculation. Minimum and maximum X and Y values.

my $mnx=min(@x);
my $mxx=max(@x);

my $mny=min(@y);
my $mxy=max(@y);

My actual bounds go outside these by 10%.

my @lo=(
  $mnx-($mxx-$mnx)/10,
  $mny-($mxy-$mny)/10,
    );
my @hi=(
  $mxx+($mxx-$mnx)/10,
  $mxy+($mxy-$mny)/10,
    );

Which gives me overall width and height.

my $w=(@hi[0]-@lo[0]);
my $h=(@hi[1]-@lo[1]);

At this point I can emit the preamble, which includes width and height, and I'm setting up a viewbox as well. (So the "page" will show everything I'm plotting.)

say '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>';
say '<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">';
say "<svg width=\"$w\" height=\"$h\" viewBox=\"@lo[0] @lo[1] $w $h\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:svg=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">";

Plot the lines.

if (@lines) {
  say "  <g id=\"lines\" stroke=\"#369\" stroke-width=\"4\">";
  for @lines -> @p {
    say "    <line x1=\"@p[0]\" y1=\"@p[1]\" x2=\"@p[2]\" y2=\"@p[3]\" />";
  }
  say "  </g>";
}

Plot the points (because a line can obscure a point more easily than a point can obscure a line).

if (@points) {
  say "  <g fill=\"#f73\" id=\"points\">";
  for @points -> @p {
    say "    <circle cx=\"@p[0]\" cy=\"@p[1]\" r=\"3\" />";
  }
  say "  </g>";
}

say "</svg>";

Task 2: Line of Best Fit

When you have a scatter plot of points, a line of best fit is the line that best describes the relationship between the points, and is very useful in statistics. Otherwise known as linear regression, here is an example of what such a line might look like:

The method most often used is known as the least squares method, as it is straightforward and efficient, but you may use any method that generates the correct result.

Calculate the line of best fit for the following 48 points:

I produce this on standard output, which can then be fed to the standard input of task 1.

Given this explicit invitation to try non-standard line fits, I decided to implement the Theil-Sen Estimator: the slope of the best-fit line is the median of the slopes of each pair of points, while the offset is the median of the deviations from that line.

I used the naïve method of calculating the median; Hoare gives us the "quickselect" algorithm which works by partitioning the set, but that seemed like too much work to implement.

Lua:

function median(s0)
   local s = s0
   table.sort(s)
   return s[math.floor(#s/2)+1]
end

I'll want to store all the points.

local points = {}

Read the lines, break them down, and get the numbers. (This varies a lot between languages; this input is in a nasty format, particularly considering the multiple spaces.)

while true do
   local line = io.read()
   if line == nil then break end
   local f = {}
   for x,y in string.gmatch(line,'(%d+),(%d+)') do
      print(string.format('%d,%d',x,y))
      table.insert(points,{x,y})
   end
end

Calculate the slope between each pair of points (that don't have identical X-values).

local slope = {}
for i = 1,#points-1 do
   for j = i+1,#points do
      if points[i][1] ~= points[j][1] then
         table.insert(slope,
                      (points[j][2]-points[i][2])/(points[j][1]-points[i][1])
         )
      end
   end
end

local m = median(slope)

Calculate offsets to get the Y-intercept. (Also populate x that I'll use later.)

local offset = {}
local x = {}
for i,p in pairs(points) do
   table.insert(offset,p[2] - m * p[1])
   table.insert(x,p[1]+0)
end
local c = median(offset)

m and c are the actual useful values, but we want to plot a graph with them. For minimum and maximum X values, calculate Y values, and output the set as line coordinates for the plotter.

local l = {}

table.sort(x)

for i,xb in pairs({x[1],x[#x]}) do
   table.insert(l,xb)
   table.insert(l,xb * m + c)
end

print(string.format('%f,%f,%f,%f',table.unpack(l)))

Full code on github.

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.

Search
Archive
Tags 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s 2300ad 3d printing action advent of code aeronautics aikakirja anecdote animation anime army astronomy audio audio tech base commerce battletech bayern beer boardgaming book of the week bookmonth chain of command children chris chronicle church of no redeeming virtues cold war comedy computing contemporary cornish smuggler cosmic encounter coup covid-19 crime crystal cthulhu eternal cycling dead of winter doctor who documentary drama driving drone ecchi economics en garde espionage essen 2015 essen 2016 essen 2017 essen 2018 essen 2019 essen 2022 essen 2023 essen 2024 existential risk falklands war fandom fanfic fantasy feminism film firefly first world war flash point flight simulation food garmin drive gazebo genesys geocaching geodata gin gkp gurps gurps 101 gus harpoon historical history horror hugo 2014 hugo 2015 hugo 2016 hugo 2017 hugo 2018 hugo 2019 hugo 2020 hugo 2021 hugo 2022 hugo 2023 hugo 2024 hugo-nebula reread in brief avoid instrumented life javascript julian simpson julie enfield kickstarter kotlin learn to play leaving earth linux liquor lovecraftiana lua mecha men with beards mpd museum music mystery naval noir non-fiction one for the brow opera parody paul temple perl perl weekly challenge photography podcast politics postscript powers prediction privacy project woolsack pyracantha python quantum rail raku ranting raspberry pi reading reading boardgames social real life restaurant reviews romance rpg a day rpgs ruby rust scala science fiction scythe second world war security shipwreck simutrans smartphone south atlantic war squaddies stationery steampunk stuarts suburbia superheroes suspense television the resistance the weekly challenge thirsty meeples thriller tin soldier torg toys trailers travel type 26 type 31 type 45 vietnam war war wargaming weather wives and sweethearts writing about writing x-wing young adult
Special All book reviews, All film reviews
Produced by aikakirja v0.1