Dustbin Day, iCalendar, and PhantomJS 18 April 2023

I wanted to get dustbin collection days into the house calendar server. Shouldn't be too hard, right?

It's not quite as simple as "recyclables week A, main rubbish week B", because collections get deferred for bank holidays (especially around Christmas), and sometimes (as last summer when the council refused to pay extra money to the contracting company, I mean "had a labour shortage") some collections get cancelled completely.

The local council provides this information in various ways. It used to put a card through the door a couple of times a year, and sometimes it still does; a PDF version of that is made available, but generally the new one isn't released (electronically or physically) until after the old one has expired. And of course that requires me to type in all the exceptions by hand, and doesn't get updated for emergencies.

But help is at hand! They have a web page on which you can specify your address, and get back the next collection for each sort of rubbish. Not much in the way of advance notice, but they do actually keep it up to date for extra bank holidays and such like. So I can just scrape that and parse the page, right? Right?


If you are me, you already know your house's UPRN, which of course is what they (quite reasonably) use as an input to the lookup. But you can't just submit that. Or even type in an address. Or even bookmark the results page. No, you have to go in through their postcode lookup. Which needs JavaScript, so that's rather beyond what poor old WWW::Mechanize can manage. (Somewhere behind all this there's a straightforward API call, but I wasn't able to get it to respond to my prodding any more simply than going through the pages; the necessary parameters are put together by the JavaScript, and even replaying a request captured in the browser didn't work reliably.)

This calls, in fact, for a headless browser. Selenium is the canonical answer to this problem, but that needs a great big Java daemon – and Java in general doesn't have the best of security reputations, nor what one might call a small footprint. So instead I ended up using PhantomJS – canonically a dead project, but it still works, it's in Debian/stable, and it's much more lightweight.

This is basically a central lump of code with tentacles. To the user it presents itself as a JavaScript interpreter; to the web it runs a WebGTK browser. One directs it with JavaScript, which I've been learning since last year, and one can also mark code as to be run inside the context of the loaded page.

So the procedure ends up being:

  • load the first page
  • enter my postcode
  • click on the lookup
  • wait
  • check the dropdown for my address
  • select it, and trigger a "change" event on the dropdown
  • wait
  • submit the form
  • wait
  • get back the results page, and parse it for the dates

In-browser JavaScript has useful methods like document.getElementsByTagName() so I do the final HTML parsing there, and dump JSON onto stdout for a calmanager plugin to pick up and update my iCalendar server. (That does things like lumping multiple collections together into a single calendar entry, and making the actual diary event go off on the previous evening to remind me to put the bins out on the night before what might be an early morning pickup.)

I'm not planning to make this code public, but if you have a use for it, let me know.

I wonder how much the council paid for this overcomplicated setup?

