I've been doing the
Perl Weekly Challenges. This one
dealt with e, and URLs again.
The first challenge was to calculate e. There's a well-known
expansion, so I used it.
use Math::BigFloat try => 'GMP';
my $a=0;
my $b=Math::BigFloat->new(1);
my $e=Math::BigFloat->new(1);
while (1) {
$a++;
$b/=$a;
$e+=$b;
print "$e\n";
}
Perl6 has arbitrary precision built in, with a ratio type to retain
precision when the numerator and denominator become too different:
my $a=0;
my $b=FatRat.new(1,1);
my $e=Rat.new(1);
while (1) {
$a++;
$b/=$a;
$e+=$b;
print "$e\n";
}
I note that it runs distinctly more slowly than the Perl5 version, not
surprising since it's not using the ferociously-optimised GMP library.
The other challenge was to apply URI canonicalisation, which meant
revisiting challenge 17 (and slightly improving the parsing code
there). I didn't have time to Perl-6-ify this one. I still haven't
found a formal definition of what makes a valid URI and what doesn't;
hey ho.
my $u=urlparse($url);
I wasn't going to write my own un-escaper, so:
use URI::Escape;
foreach my $mode (keys %{$u}) {
if (exists $u->{$mode}) {
$u->{$mode}=uri_unescape($u->{$mode});
}
}
Make scheme and host lower case.
foreach my $mode (qw(scheme host)) {
if (exists $u->{$mode}) {
$u->{$mode}=lc($u->{$mode});
}
}
Remove default ports (I'm sure this isn't the full set, but the first
two cover the vast majority of ports in URIs anyway, and don't ask me
how ftp on non-standard ports works because I don't know and don't
care).
if (exists $u->{port} && exists $u->{scheme}) {
if (my $dp={http => 80,
https => 443,
ftp => 21,
smtp => 25,
telnet => 22,
ldap => 389,
ldaps => 686,
}->{$u->{scheme}}) {
if ($dp==$u->{port}) {
delete $u->{port};
}
}
}
print urlassemble($u),"\n";
The parser has had some minor upgrades.
sub urlparse {
my ($url)=@_;
my %match;
if ($url =~ m!//!) {
$url =~ m!^(?<scheme>.*?)://(?:(?:(?<userinfo>.*)@)?(?<host>[-._a-z0-9]+)(?::(?<port>[0-9]+))?)?(?<pqf>.*)!i;
map {$match{$_}=$+{$_}} keys %+;
} else { # if no userinfo-host-port component, split on the last colon
$url =~ m!^(?<scheme>.*):(?<pqf>[^:]*)!;
map {$match{$_}=$+{$_}} keys %+;
}
$match{pqf} =~ m!(?<path>[^?#]*)(?:\?(?<query>[^#]*))?(?:\#(?<fragment>.*))?$!;
map {$match{$_}=$+{$_}} keys %+;
delete $match{pqf};
return \%match;
}
And the assembler simply puts the relevant punctuation back in.
sub urlassemble {
my $u=shift;
my $out=$u->{scheme}.':';
if (exists $u->{host}) {
$out.='//';
if (exists $u->{userinfo}) {
$out.=$u->{userinfo}.'@';
}
$out.=$u->{host};
if (exists $u->{port}) {
$out.=':'.$u->{port};
}
}
$out.=$u->{path};
if (exists $u->{query}) {
$out.='?'.$u->{query};
}
if (exists $u->{fragment}) {
$out.='#'.$u->{fragment};
}
return $out;
}
Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.