Well, my wife wanted a sandwich while we were on the road. And unlike
me she is particular about what she likes to eat, so she wanted to
find branches of Tesco. This is a job for: COMPUTERS!
Now, of course, with a modern navi one can tell it to "find
Tesco" and get a list of branches sorted by distance. But I wanted
something a bit more subtle than that, a list of branches close to
motorway junctions; I'd rather go thirty miles down the motorway
where I'm going anyway, and then one mile on local roads to a shop,
than ten miles on local roads from where I am now.
So this requires two sets of data: all branches of Tesco, and all
motorway junctions. And it requires data that are freely available,
because I'm not going to pay anything for this hobby project.
For some reason, British supermarkets are reluctant to publish a list
of shops and locations. Aldi used to a few years ago, in a variety of
formats for different GPS receivers, but they've removed the download
now. The supermarkets obviously have the information, because they all
have a "store finder" on their web sites which searches based on
distance from a particular point, but they refuse to let anyone else
use the data.
Fortunately there's another source of this information, the
Food Standards Agency,
data available under Open Government Licence. This is the food hygiene
rating information for every business that sells food… in theory. Not
all local authorities participate, and the data are clearly entered by
hand as the fields are not used consistently. But you can generally
get an identifiable "Tesco" out of the name field, and mostly a
latitude and longitude. So that's the first half of the problem
solved, with a half-gig download. There's more information here too,
which I may do something with one day.
<EstablishmentDetail>
<FHRSID>564486</FHRSID>
<LocalAuthorityBusinessID>13/00441/MIXED</LocalAuthorityBusinessID>
<BusinessName>Tesco</BusinessName>
<BusinessType>Retailers - supermarkets/hypermarkets</BusinessType>
<BusinessTypeID>7840</BusinessTypeID>
<AddressLine1>19 Denmark Street</AddressLine1>
<AddressLine2>Eden</AddressLine2>
<AddressLine3>High Wycombe</AddressLine3>
<AddressLine4>Buckinghamshire</AddressLine4>
<PostCode>HP11 2DB</PostCode>
<RatingValue>5</RatingValue>
<RatingKey>fhrs_5_en-GB</RatingKey>
<RatingDate>2017-02-16</RatingDate>
<LocalAuthorityCode>025</LocalAuthorityCode>
<LocalAuthorityName>Wycombe</LocalAuthorityName>
<LocalAuthorityWebSite>http://www.wycombe.gov.uk</LocalAuthorityWebSite>
<LocalAuthorityEmailAddress>Food.Ratings@wycombe.gov.uk</LocalAuthorityEmailAddress>
<Scores>
<Hygiene>0</Hygiene>
<Structural>5</Structural>
<ConfidenceInManagement>0</ConfidenceInManagement>
</Scores>
<SchemeType>FHRS</SchemeType>
<NewRatingPending>False</NewRatingPending>
<Geocode>
<Longitude>-0.75622500000000</Longitude>
<Latitude>51.63081500000000</Latitude>
</Geocode>
</EstablishmentDetail>
The other half of the problem is motorway junctions, and that gets
more fiddly. There's a page for each motorway on Wikipedia, but
scraping out the coordinates from the manually-edited tables would be
hard work. The best bet seems to be OpenStreetMap, but the data are
split across multiple places. There are highway=motorway_junction
nodes, in a format that's pretty much self-documenting:
<node id="212531" lat="51.6024835" lon="-0.6885638" version="11" \
timestamp="2013-10-18T13:16:23Z" changeset="18419340" uid="24119" user="Mauls">
<tag k="ref" v="3"/>
<tag k="name" v="Knaves Beech Interchange"/>
<tag k="exit_to" v="High Wycombe (E) A40"/>
<tag k="highway" v="motorway_junction"/>
</node>
but that's missing a key thing: which motorway it's on. ("ref" is
the junction number, though sometimes there's an initial "J" and
sometimes there isn't.) What I end up doing is to look at the way
structures tagged as highway=motorway (each way representing a road
segment and containing pointers to a series of nodes) to see which
ones contain that junction node, and pull out their "ref" fields
(the road name). That specific node 212531 appears in two _way_s, of
which this is one (the connoisseur will readily spot that this is the
carriageway leading away from the junction; the other way is the one
that leads to it). In this case both the _way_s are called "M40", so
this is an easy one.
<way id="147738900" version="7" timestamp="2017-05-17T13:13:46Z"\
changeset="48762781" uid="9065" user="brianboru">
<nd ref="212531"/>
<nd ref="1609319453"/>
<nd ref="1609319447"/>
<nd ref="1609319603"/>
<nd ref="1609319645"/>
<nd ref="1609319643"/>
<tag k="lit" v="yes"/>
<tag k="ref" v="M40"/>
<tag k="lanes" v="3"/>
<tag k="oneway" v="yes"/>
<tag k="source" v="survey"/>
<tag k="bicycle" v="no"/>
<tag k="highway" v="motorway"/>
<tag k="old_ref" v="A40(M)"/>
<tag k="maxspeed" v="70 mph"/>
<tag k="operator" v="Highways England"/>
<tag k="project_name" v="High Wycombe Bypass"/>
<tag k="carriageway_ref" v="A"/>
<tag k="source:maxspeed" v="survey"/>
<tag k="highways_england:area" v="DBFO30"/>
</way>
Other data aren't as convenient. The interchange between M42 and M40
is junction 3A of the M42, but there's no way I can see to pin that
down from the data; lacking clear information, I have to represent
this one as "M40/M42 J3A".
And of course junctions may be duplicated. There's another M40
junction 3, the eastbound carriageway. Should one duplicate the entry?
Average the coordinates? That's what I'm doing now, but motorway names
are ambiguous; there's an M1 in Northern Ireland, and I don't want to
average M1 J8 (Hemel Hempstead) with M1 J8 (Dublin and the South)! The
easiest solution to this is to trim out Northern Ireland from the data
and then average within each junction ID.
There are even separate entries for M40 J1A and M25 J16, even though
they're the same junction. Oh dear. I'll shut my eyes and pretend I
didn't see that, because fixing it requires detailed knowledge of the
road system, an arbitrary decision that junctions closer than a
certain distance apart should be lumped together, or more cleverness
than I feel like deploying right now.
Anyway, command-line tools will allow one to cut down a full
OpenStreetMap dump (66GB compressed) to just the UK (in an
intermediate binary format, 1.8GB uncompressed). This took roughly
eleven hours.
bzcat planet.osm.bz2 | osmconvert --out-o5m -B=greatbritain.bpl - >gb.o5m
Side note: I wrote my own crude boundary polygon for Great Britain
since extracting the high-resolution one from the OSM data seemed too
much like hard work, and using a lat/long rectangle includes some of
Northern Ireland.
greatbritain
1
-7.0578307250976557 49.844452298526114
-5.0583190063476557 53.74164947881156
-3.6520690063476557 53.9231890869385
-3.9926451782226557 54.60015730382969
-5.2560729125976557 54.568324126095987
-6.2173766235351557 55.416002034911862
-7.1622008422851557 55.877857131066307
-8.1289977172851557 56.803363579857283
-7.7554625610351557 58.473094360179466
-4.1299742797851557 58.975006706247314
-1.3614195922851557 61.124211199993404
-0.13095084228515574 60.804277937012856
-3.1851500610351557 58.068656768869701
-1.6086119750976557 57.841319481594326
-1.3888854125976557 57.512360074410402
-0.83407584228515574 54.796968330785219
0.20962532958984426 54.219727026703261
1.8575745483398443 52.928199365400012
2.0827942749023443 52.662478985972136
1.8850403686523443 51.963989220286713
1.2643128295898443 51.576476578419985
1.8520813842773443 51.199401968747665
1.0445862670898443 50.71150872070961
-5.7312316040039057 49.816105201818296
END
All right, I probably didn't need to include the Shetlands.
The next step is to filter out the relevant information, and bring it
back into standard OSM XML form.
osmfilter gb.o5m --keep=highway=motorway >motorway.osm
This takes less than a minute, and produces a manageably small file of
only a few megabytes.
Then it's time for Perl and LibXML. This is a multi-step process (and
the code is ugly, but I plan to release it once I've cleaned it up a
bit):
-
Find all highway=motorway ways; find the motorway names for them,
and take the node list for each. Build a hash mapping node IDs to
motorway names. (70,635 entries in the data set I used.)
-
Look at all highway=motorway_junction nodes. Use the hash to find
which motorway(s) they're on, and construct a user-friendly name
(e.g. "M40 J3"). For each node, store this with a set of lat/long
coordinates, averaging coordinate sets that have the same name
attached. (706 distinct junctions. Should be fewer.)
-
Look at the FSA data, pulling out supermarkets that match "Tesco"
in the name and have lat/long data; for some it's simply missing.
(2,249 of these. Officially Tesco has 3,493 branches in the UK. But
for comparison, the number of shops tagged in OpenStreetMap as
shop=supermarket and having either operator=Tesco or a name
matching "Tesco" is 1,778. At some point I may try combining the
two data sets.)
-
For each supermarket, check each junction (yes, this is an O(N²)
approach, sorry about that). If lat and long are within half a
degree, work out the actual distance from junction to supermarket
using Geo::Ellipsoid. (The "within half a degree" calculation is
faster by a factor of 30 or so than working out the range
precisely, and will cover 30-40 miles at these latitudes, which is
plenty.)
-
If I haven't already associated a supermarket with this junction,
or the distance is shorter than the one that's already there,
associate this one, storing lat/long, address and distance.
-
Produce the final list of junctions and associated supermarkets, in
a format suitable for loading into the navi. This includes the
straight-line distance from junction to shop.
The whole process takes about thirty seconds, and spits out records
like:
-0.692733,51.604876,"M40 J3 (0.2) Tesco","Tesco Stores, London Road, Loudwater, Buckinghamshire, HP10 9RT"
And to make this easier to examine, I dumped the data as a
Viking file too; it's a handy
general-purpose geodata visualisation tool. (Triangles are the
calculated junction positions; circles are the shops.)
And other data fall out readily: the M25 junction farthest from a
Tesco is junction 3, at 3.6 miles.
Of all the motorway junctions, the farthest from a Tesco is M6
junction 40, 17.3 miles from the Market Place branch in Ambleside.
There are three Tescos which are the nearest shop to seven junctions
(though some of those are spurious extra junction records): in
Bonnybridge, Eccles and Leeds.
All but eight of the 706 notional junctions are within ten miles of a
Tesco.
There is however no consideration of accessibility: a Tesco that's
near a junction between two motorways, with no way to get out of the
motorway network and to the shop, will still be listed.
Here's that M25/M40 junction, where the calculations have not only
produced two separate junctions as starting points, but they're far
enough apart that they link to different Tescos. And you can't get to
either of them, because you can't leave the motorway network at that
junction.
One possible solution to this would be to winnow candidate Tescos for
each junction down to a few (perhaps using that lat/long filter), then
use a fast routeing engine such as OSRM to calculate actual travel
times to each. Maybe for the mark 2.
Obviously one could generalise this: find the nearest supermarket of
any sort to each junction. (If I'm driving on my own, I'll buy a
sandwich anywhere – though small independent shops tend to have fairly
nasty ones – but I'd rather get off the motorway than pay the 50%
markup at a service area) But really that should be the nearest of
each brand: I don't want not to be told about the Tesco 1.5 miles
away from the junction when there's an Asda 1.2 miles away. (And my
wife still gets the Tesco-only version of the file to load onto her
navi.)
I don't suppose anyone else will actually want the data, but give me a
shout if you do.
Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.