I’ve been doing the Perl Weekly
Challenges (I missed 31 because of
getting ready for Essen, and didn’t have time to do this one in
Perl6). This week’s was about counting entities and generating ASCII
bar charts.
Create a script that either reads standard input or one or more
files specified on the command-line. Count the number of times [each
item occurs] and then print a summary, sorted by the count of each
entry.
For extra credit, add a -csv option to your script, which would generate:
Those of us who speak Unix recognise this as the extremely useful
formulation |sort|uniq -c|sort -nr
, which I use often enough that I
can type it as though it were a long and familiar word. (Sort the
lines, count how often each one occurs, sort that list numerically in
descending order.)
But in Perl the most obvious approach is to build a hash keyed on the
lines, so we do:
use Getopt::Std;
use Text::CSV_XS;
my %o;
getopts('c',\%o);
my %s;
while (<>) {
chomp;
$s{$_}++;
}
Then sort by the key values, descending, and all is done.
my $csv = Text::CSV_XS->new;
foreach my $k (sort {$s{$b} <=> $s{$a} ||
$a cmp $b} keys %s) {
if ($o{c}) {
$csv->say(*STDOUT,[$k,$s{$k}]);
} else {
print "$k $s{$k}\n";
}
}
The use of Text::CSV_XS is possibly a heavier-weight approach than
this problem really requires, but I’ve been bitten by the vagaries of
CSV “standard” formatting before. If someone had asked me to do this
for a real problem, I’d use the module so that when their specific
requirements for CSV files turned out to be subtly different from what
I’d produced I could just tweak the module parameters rather than
re-invent things from scratch.
(It's entirely standard until you need to include a comma within a
data field. Or a quotation mark of some sort. Or a non-ASCII
character. Or transfer files between Unix and the outside world. Or…)
Write a function that takes a hashref where the keys are labels and
the values are integer or floating point values. Generate a bar
graph of the data and display it to stdout.
If you fancy then please try this as well: (a) the function could
let you specify whether the chart should be ordered by (1) the
labels, or (2) the values.
I know that NeilB, who contributed these questions, maintains a
module to produce tabular
output…
Terminal width is always a slightly fiddly thing, so I allow the
caller to specify it; then I scale the largest bar to the full width
of the terminal (minus the width of the longest label, and the
decoration), and the others grow or shrink accordingly. Yes, there’s a
bug here if the allowed width is too narrow for the labels and
decoration; and this function doesn’t allow for negative values
either. The third parameter should be non-zero if you want ordering by
labels.
use List::Util qw(max);
sub generate_bar_graph {
my $data=shift;
my $width=shift || $ENV{COLUMNS} || 80;
my $labelordering=shift or 0;
my @k=keys %{$data};
if ($labelordering) {
@k=sort @k;
} else {
@k=sort {$data->{$b} <=> $data->{$a}} @k;
}
my $kl=max(map {length($_)} @k);
my $bw=$width-$kl-3;
my $scale=$bw/max(values %{$data});
my $format='%-'.$kl.'s | %-'.$bw."s\n";
foreach my $k (@k) {
printf($format,$k,'#' x ($scale*$data->{$k}));
}
}
Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.