# RogerBW's Blog

The Weekly Challenge 165: Scaling the Fits 19 May 2022

I’ve been doing the Weekly Challenges. The latest involved generating SVG and fitting lines to points. (Note that this is open until 22 May 2022.)

Task 1: Scalable Vector Graphics (SVG)

Your task is to accept a series of points and lines in the following format, one per line, in arbitrary order:

For anything more than these small examples I'd have used a dedicated XML generator (I've used Perl's `SVG` library before but found it too restrictive, while `XML::LibXML` handles the fiddly stuff while still letting me tweak details, and doesn't simply force me to learn a different version of the same syntax). But since I don't have to worry about escaping characters in this case, and since I'll have to write the raw code anyway for languages that don't have an XML library…

I've borrowed the styles from the task 2 example.

Raku:

``````my @points;
my @lines;

my @x;
my @y;
``````

Read the format, split on commas; we should have either two or four numbers. Put all the X and Y coordinates into lists of their own, which I'll use for bounds calculation, and the lines and points into lists of lines and points.

``````for lines() {
.chomp;
my @f=split(",",\$_,:skip-empty);
for 0..@f.list -> \$i {
if (\$i % 2 == 0) {
push @x,@f[\$i];
} else {
push @y,@f[\$i];
}
}
if (@f.elems == 4) {
push @lines,@f;
} elsif (@f.elems == 2) {
push @points,@f;
} else {
die "Invalid input \$_\n";
}
}
``````

Bounds calculation. Minimum and maximum X and Y values.

``````my \$mnx=min(@x);
my \$mxx=max(@x);

my \$mny=min(@y);
my \$mxy=max(@y);
``````

My actual bounds go outside these by 10%.

``````my @lo=(
\$mnx-(\$mxx-\$mnx)/10,
\$mny-(\$mxy-\$mny)/10,
);
my @hi=(
\$mxx+(\$mxx-\$mnx)/10,
\$mxy+(\$mxy-\$mny)/10,
);
``````

Which gives me overall width and height.

``````my \$w=(@hi-@lo);
my \$h=(@hi-@lo);
``````

At this point I can emit the preamble, which includes width and height, and I'm setting up a viewbox as well. (So the "page" will show everything I'm plotting.)

``````say '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>';
say '<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">';
``````

Plot the lines.

``````if (@lines) {
say "  <g id=\"lines\" stroke=\"#369\" stroke-width=\"4\">";
for @lines -> @p {
say "    <line x1=\"@p\" y1=\"@p\" x2=\"@p\" y2=\"@p\" />";
}
say "  </g>";
}
``````

Plot the points (because a line can obscure a point more easily than a point can obscure a line).

``````if (@points) {
say "  <g fill=\"#f73\" id=\"points\">";
for @points -> @p {
say "    <circle cx=\"@p\" cy=\"@p\" r=\"3\" />";
}
say "  </g>";
}

say "</svg>";
``````

Task 2: Line of Best Fit

When you have a scatter plot of points, a line of best fit is the line that best describes the relationship between the points, and is very useful in statistics. Otherwise known as linear regression, here is an example of what such a line might look like:

The method most often used is known as the least squares method, as it is straightforward and efficient, but you may use any method that generates the correct result.

Calculate the line of best fit for the following 48 points:

I produce this on standard output, which can then be fed to the standard input of task 1.

Given this explicit invitation to try non-standard line fits, I decided to implement the Theil-Sen Estimator: the slope of the best-fit line is the median of the slopes of each pair of points, while the offset is the median of the deviations from that line.

I used the naïve method of calculating the median; Hoare gives us the "quickselect" algorithm which works by partitioning the set, but that seemed like too much work to implement.

Lua:

``````function median(s0)
local s = s0
table.sort(s)
return s[math.floor(#s/2)+1]
end
``````

I'll want to store all the points.

``````local points = {}
``````

Read the lines, break them down, and get the numbers. (This varies a lot between languages; this input is in a nasty format, particularly considering the multiple spaces.)

``````while true do
if line == nil then break end
local f = {}
for x,y in string.gmatch(line,'(%d+),(%d+)') do
print(string.format('%d,%d',x,y))
table.insert(points,{x,y})
end
end
``````

Calculate the slope between each pair of points (that don't have identical X-values).

``````local slope = {}
for i = 1,#points-1 do
for j = i+1,#points do
if points[i] ~= points[j] then
table.insert(slope,
(points[j]-points[i])/(points[j]-points[i])
)
end
end
end

local m = median(slope)
``````

Calculate offsets to get the Y-intercept. (Also populate `x` that I'll use later.)

``````local offset = {}
local x = {}
for i,p in pairs(points) do
table.insert(offset,p - m * p)
table.insert(x,p+0)
end
local c = median(offset)
``````

`m` and `c` are the actual useful values, but we want to plot a graph with them. For minimum and maximum X values, calculate Y values, and output the set as line coordinates for the plotter.

``````local l = {}

table.sort(x)

for i,xb in pairs({x,x[#x]}) do
table.insert(l,xb)
table.insert(l,xb * m + c)
end

print(string.format('%f,%f,%f,%f',table.unpack(l)))
``````

Full code on github.