RogerBW's Blog

Finding Tesco 18 April 2018

Well, my wife wanted a sandwich while we were on the road. And unlike me she is particular about what she likes to eat, so she wanted to find branches of Tesco. This is a job for: COMPUTERS!

Now, of course, with a modern navi one can tell it to "find Tesco" and get a list of branches sorted by distance. But I wanted something a bit more subtle than that, a list of branches close to motorway junctions; I'd rather go thirty miles down the motorway where I'm going anyway, and then one mile on local roads to a shop, than ten miles on local roads from where I am now.

So this requires two sets of data: all branches of Tesco, and all motorway junctions. And it requires data that are freely available, because I'm not going to pay anything for this hobby project.

For some reason, British supermarkets are reluctant to publish a list of shops and locations. Aldi used to a few years ago, in a variety of formats for different GPS receivers, but they've removed the download now. The supermarkets obviously have the information, because they all have a "store finder" on their web sites which searches based on distance from a particular point, but they refuse to let anyone else use the data.

Fortunately there's another source of this information, the Food Standards Agency, data available under Open Government Licence. This is the food hygiene rating information for every business that sells food… in theory. Not all local authorities participate, and the data are clearly entered by hand as the fields are not used consistently. But you can generally get an identifiable "Tesco" out of the name field, and mostly a latitude and longitude. So that's the first half of the problem solved, with a half-gig download. There's more information here too, which I may do something with one day.

<EstablishmentDetail>
  <FHRSID>564486</FHRSID>
  <LocalAuthorityBusinessID>13/00441/MIXED</LocalAuthorityBusinessID>
  <BusinessName>Tesco</BusinessName>
  <BusinessType>Retailers - supermarkets/hypermarkets</BusinessType>
  <BusinessTypeID>7840</BusinessTypeID>
  <AddressLine1>19 Denmark Street</AddressLine1>
  <AddressLine2>Eden</AddressLine2>
  <AddressLine3>High Wycombe</AddressLine3>
  <AddressLine4>Buckinghamshire</AddressLine4>
  <PostCode>HP11 2DB</PostCode>
  <RatingValue>5</RatingValue>
  <RatingKey>fhrs_5_en-GB</RatingKey>
  <RatingDate>2017-02-16</RatingDate>
  <LocalAuthorityCode>025</LocalAuthorityCode>
  <LocalAuthorityName>Wycombe</LocalAuthorityName>
  <LocalAuthorityWebSite>http://www.wycombe.gov.uk</LocalAuthorityWebSite>
  <LocalAuthorityEmailAddress>Food.Ratings@wycombe.gov.uk</LocalAuthorityEmailAddress>
  <Scores>
    <Hygiene>0</Hygiene>
    <Structural>5</Structural>
    <ConfidenceInManagement>0</ConfidenceInManagement>
  </Scores>
  <SchemeType>FHRS</SchemeType>
  <NewRatingPending>False</NewRatingPending>
  <Geocode>
    <Longitude>-0.75622500000000</Longitude>
    <Latitude>51.63081500000000</Latitude>
  </Geocode>
</EstablishmentDetail>

The other half of the problem is motorway junctions, and that gets more fiddly. There's a page for each motorway on Wikipedia, but scraping out the coordinates from the manually-edited tables would be hard work. The best bet seems to be OpenStreetMap, but the data are split across multiple places. There are highway=motorway_junction nodes, in a format that's pretty much self-documenting:

<node id="212531" lat="51.6024835" lon="-0.6885638" version="11" \
 timestamp="2013-10-18T13:16:23Z" changeset="18419340" uid="24119" user="Mauls">
  <tag k="ref" v="3"/>
  <tag k="name" v="Knaves Beech Interchange"/>
  <tag k="exit_to" v="High Wycombe (E)  A40"/>
  <tag k="highway" v="motorway_junction"/>
</node>

but that's missing a key thing: which motorway it's on. ("ref" is the junction number, though sometimes there's an initial "J" and sometimes there isn't.) What I end up doing is to look at the way structures tagged as highway=motorway (each way representing a road segment and containing pointers to a series of nodes) to see which ones contain that junction node, and pull out their "ref" fields (the road name). That specific node 212531 appears in two _way_s, of which this is one (the connoisseur will readily spot that this is the carriageway leading away from the junction; the other way is the one that leads to it). In this case both the _way_s are called "M40", so this is an easy one.

<way id="147738900" version="7" timestamp="2017-05-17T13:13:46Z"\
 changeset="48762781" uid="9065" user="brianboru">
  <nd ref="212531"/>
  <nd ref="1609319453"/>
  <nd ref="1609319447"/>
  <nd ref="1609319603"/>
  <nd ref="1609319645"/>
  <nd ref="1609319643"/>
  <tag k="lit" v="yes"/>
  <tag k="ref" v="M40"/>
  <tag k="lanes" v="3"/>
  <tag k="oneway" v="yes"/>
  <tag k="source" v="survey"/>
  <tag k="bicycle" v="no"/>
  <tag k="highway" v="motorway"/>
  <tag k="old_ref" v="A40(M)"/>
  <tag k="maxspeed" v="70 mph"/>
  <tag k="operator" v="Highways England"/>
  <tag k="project_name" v="High Wycombe Bypass"/>
  <tag k="carriageway_ref" v="A"/>
  <tag k="source:maxspeed" v="survey"/>
  <tag k="highways_england:area" v="DBFO30"/>
</way>

Other data aren't as convenient. The interchange between M42 and M40 is junction 3A of the M42, but there's no way I can see to pin that down from the data; lacking clear information, I have to represent this one as "M40/M42 J3A".

And of course junctions may be duplicated. There's another M40 junction 3, the eastbound carriageway. Should one duplicate the entry? Average the coordinates? That's what I'm doing now, but motorway names are ambiguous; there's an M1 in Northern Ireland, and I don't want to average M1 J8 (Hemel Hempstead) with M1 J8 (Dublin and the South)! The easiest solution to this is to trim out Northern Ireland from the data and then average within each junction ID.

There are even separate entries for M40 J1A and M25 J16, even though they're the same junction. Oh dear. I'll shut my eyes and pretend I didn't see that, because fixing it requires detailed knowledge of the road system, an arbitrary decision that junctions closer than a certain distance apart should be lumped together, or more cleverness than I feel like deploying right now.

Anyway, command-line tools will allow one to cut down a full OpenStreetMap dump (66GB compressed) to just the UK (in an intermediate binary format, 1.8GB uncompressed). This took roughly eleven hours.

bzcat planet.osm.bz2 | osmconvert --out-o5m -B=greatbritain.bpl - >gb.o5m

Side note: I wrote my own crude boundary polygon for Great Britain since extracting the high-resolution one from the OSM data seemed too much like hard work, and using a lat/long rectangle includes some of Northern Ireland.

greatbritain
1
    -7.0578307250976557    49.844452298526114
    -5.0583190063476557    53.74164947881156
    -3.6520690063476557    53.9231890869385
    -3.9926451782226557    54.60015730382969
    -5.2560729125976557    54.568324126095987
    -6.2173766235351557    55.416002034911862
    -7.1622008422851557    55.877857131066307
    -8.1289977172851557    56.803363579857283
    -7.7554625610351557    58.473094360179466
    -4.1299742797851557    58.975006706247314
    -1.3614195922851557    61.124211199993404
    -0.13095084228515574   60.804277937012856
    -3.1851500610351557    58.068656768869701
    -1.6086119750976557    57.841319481594326
    -1.3888854125976557    57.512360074410402
    -0.83407584228515574   54.796968330785219
    0.20962532958984426    54.219727026703261
    1.8575745483398443     52.928199365400012
    2.0827942749023443     52.662478985972136
    1.8850403686523443     51.963989220286713
    1.2643128295898443     51.576476578419985
    1.8520813842773443     51.199401968747665
    1.0445862670898443     50.71150872070961
    -5.7312316040039057    49.816105201818296
END

All right, I probably didn't need to include the Shetlands.

The next step is to filter out the relevant information, and bring it back into standard OSM XML form.

osmfilter gb.o5m --keep=highway=motorway >motorway.osm

This takes less than a minute, and produces a manageably small file of only a few megabytes.

Then it's time for Perl and LibXML. This is a multi-step process (and the code is ugly, but I plan to release it once I've cleaned it up a bit):

  1. Find all highway=motorway ways; find the motorway names for them, and take the node list for each. Build a hash mapping node IDs to motorway names. (70,635 entries in the data set I used.)

  2. Look at all highway=motorway_junction nodes. Use the hash to find which motorway(s) they're on, and construct a user-friendly name (e.g. "M40 J3"). For each node, store this with a set of lat/long coordinates, averaging coordinate sets that have the same name attached. (706 distinct junctions. Should be fewer.)

  3. Look at the FSA data, pulling out supermarkets that match "Tesco" in the name and have lat/long data; for some it's simply missing. (2,249 of these. Officially Tesco has 3,493 branches in the UK. But for comparison, the number of shops tagged in OpenStreetMap as shop=supermarket and having either operator=Tesco or a name matching "Tesco" is 1,778. At some point I may try combining the two data sets.)

  4. For each supermarket, check each junction (yes, this is an O(N²) approach, sorry about that). If lat and long are within half a degree, work out the actual distance from junction to supermarket using Geo::Ellipsoid. (The "within half a degree" calculation is faster by a factor of 30 or so than working out the range precisely, and will cover 30-40 miles at these latitudes, which is plenty.)

  5. If I haven't already associated a supermarket with this junction, or the distance is shorter than the one that's already there, associate this one, storing lat/long, address and distance.

  6. Produce the final list of junctions and associated supermarkets, in a format suitable for loading into the navi. This includes the straight-line distance from junction to shop.

The whole process takes about thirty seconds, and spits out records like:

-0.692733,51.604876,"M40 J3 (0.2) Tesco","Tesco Stores, London Road, Loudwater, Buckinghamshire, HP10 9RT"

And to make this easier to examine, I dumped the data as a Viking file too; it's a handy general-purpose geodata visualisation tool. (Triangles are the calculated junction positions; circles are the shops.)

And other data fall out readily: the M25 junction farthest from a Tesco is junction 3, at 3.6 miles.

Of all the motorway junctions, the farthest from a Tesco is M6 junction 40, 17.3 miles from the Market Place branch in Ambleside.

There are three Tescos which are the nearest shop to seven junctions (though some of those are spurious extra junction records): in Bonnybridge, Eccles and Leeds.

All but eight of the 706 notional junctions are within ten miles of a Tesco.

There is however no consideration of accessibility: a Tesco that's near a junction between two motorways, with no way to get out of the motorway network and to the shop, will still be listed.

Here's that M25/M40 junction, where the calculations have not only produced two separate junctions as starting points, but they're far enough apart that they link to different Tescos. And you can't get to either of them, because you can't leave the motorway network at that junction.

One possible solution to this would be to winnow candidate Tescos for each junction down to a few (perhaps using that lat/long filter), then use a fast routeing engine such as OSRM to calculate actual travel times to each. Maybe for the mark 2.

Obviously one could generalise this: find the nearest supermarket of any sort to each junction. (If I'm driving on my own, I'll buy a sandwich anywhere – though small independent shops tend to have fairly nasty ones – but I'd rather get off the motorway than pay the 50% markup at a service area) But really that should be the nearest of each brand: I don't want not to be told about the Tesco 1.5 miles away from the junction when there's an Asda 1.2 miles away. (And my wife still gets the Tesco-only version of the file to load onto her navi.)

I don't suppose anyone else will actually want the data, but give me a shout if you do.

See also:
Finding Supermarket Fuel


  1. Posted by Dave D at 11:18am on 18 April 2018

    I was thinking that this could also be used to look for cheap supermarket petrol near motorways. Obviously your food standards data doesn't include whether the store has petrol.

    One other flaw that I didn't see mentioned in the details is that while the Tesco in Loudwater is virtually on the M40, it's a limited-access junction, so you can't simply pull off, buy your sandwich, and trivially resume your journey.

    Can you examine all the ways into a junction, and filter out junctions that have only motorways (to filter out the junctions that don't allow you to leave the network) ?

  2. Posted by Michael Cule at 11:54am on 18 April 2018

    Whereof one cannot speak, thereof one must be silent.

  3. Posted by Chris Bell at 12:30pm on 18 April 2018

    I feel that I ought to explain that Tesco are the only shop which reliably sells at least some of its sarnies unfuckedaboutwith: no needless bits of cucumber polluting them, a decided lack of unwanted mayonnaise or pickles, that sort of thing. They even have some with white bread, which of course the more upmarket places don't because, you know, Health. I don't want to bother to pretend that eating sandwiches is going to be particularly healthy, and I don't like hard bits in my bread, so Tesco is the place I prefer to go to.

    Just in case anyone thought I had an idealogical objection to any of the others....

  4. Posted by RogerBW at 12:32pm on 18 April 2018

    Dave,

    ① the answer to this would probably be to dump the FSA data and just osmfilter the OpenStreetMap corpus for amenity=fuel nodes, then check brand tags to find acceptable ones.

    ② one might be able to parse something about limited-access junctions by deconstructing every single way that partakes in a given junction, but this would be hard work. (Perhaps if there's only one in and one out…)

    ③ if I pull all highway= ways out of the corpus, rather than just highway=motorway ones, then I should be able to do that.

  5. Posted by Michael Cule at 01:23pm on 18 April 2018

    I'm with Chris on the vileness of cucumber though I'm OK with reasonable amounts of mayo and pickles in the right places. (Not together obviously.)

    It is peculiar that it's so hard to find a simple ham sandwich in the UK. Everybody had got to be clever in their filled-bread-products. I don't know why.

    All this can be solved if you're in a position to go to somewhere that will make sarnies to a specification but one doesn't always have time.

  6. Posted by RogerBW at 01:26pm on 18 April 2018

    I assume that it's because, if you don't load it down with additional flavours, people might be able to taste the quality of your ham.

    (I like mayonnaise and pickled gherkins on a ham sandwich - in fact I have one in front of me as I type - but I don't expect anyone else to agree.)

  7. Posted by Chris Suslowicz at 07:23pm on 18 April 2018

    Why not? I like that combination too, and will cheerfully add sliced tomato and red onion to the mix as well.

    I don't make sandwiches often enough though. Maybe I should change that and save money as well as getting something that I actually like?

  8. Posted by RogerBW at 08:51am on 19 April 2018

    Having experimented, I find that idea 3, "filter for motorway junctions that link to non-motorways", won't work. There are two nodes labelled as "Handy Cross Junction" (J4 of the M40); each of them is the point of departure of the off-slip from the main carriageway, and therefore joins two highway=motorway ways to a highway=motorway_link way (which, after several more segments, ends up on the Handy Cross roundabout). But the road that joins eastbound M40 to clockwise M25, the most north-westerly part of that junction complex, is also a highway=motorway_link. One could say "from each motorway_junction, trace each way sequence until you get to either another motorway_junction node or a non-motorway way", but this would be computationally expensive and slow.

    This is particularly true since the highway=motorway list is small enough to fit into memory on the machine I'm using, whereas the highway= list with all roads on it definitely isn't, and therefore requires a different approach to parsing.

    I've looked at the Wikipedia motorway junction lists, but they don't consistently have coordinates attached.

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.

Search
Archive
Tags 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s 2300ad 3d printing action advent of code aeronautics aikakirja anecdote animation anime army astronomy audio audio tech base commerce battletech bayern beer boardgaming book of the week bookmonth chain of command children chris chronicle church of no redeeming virtues cold war comedy computing contemporary cornish smuggler cosmic encounter coup covid-19 crime crystal cthulhu eternal cycling dead of winter doctor who documentary drama driving drone ecchi economics en garde espionage essen 2015 essen 2016 essen 2017 essen 2018 essen 2019 essen 2022 essen 2023 essen 2024 existential risk falklands war fandom fanfic fantasy feminism film firefly first world war flash point flight simulation food garmin drive gazebo genesys geocaching geodata gin gkp gurps gurps 101 gus harpoon historical history horror hugo 2014 hugo 2015 hugo 2016 hugo 2017 hugo 2018 hugo 2019 hugo 2020 hugo 2021 hugo 2022 hugo 2023 hugo 2024 hugo-nebula reread in brief avoid instrumented life javascript julian simpson julie enfield kickstarter kotlin learn to play leaving earth linux liquor lovecraftiana lua mecha men with beards mpd museum music mystery naval noir non-fiction one for the brow opera parody paul temple perl perl weekly challenge photography podcast politics postscript powers prediction privacy project woolsack pyracantha python quantum rail raku ranting raspberry pi reading reading boardgames social real life restaurant reviews romance rpg a day rpgs ruby rust scala science fiction scythe second world war security shipwreck simutrans smartphone south atlantic war squaddies stationery steampunk stuarts suburbia superheroes suspense television the resistance the weekly challenge thirsty meeples thriller tin soldier torg toys trailers travel type 26 type 31 type 45 vietnam war war wargaming weather wives and sweethearts writing about writing x-wing young adult
Special All book reviews, All film reviews
Produced by aikakirja v0.1