Fast nearest-location finder for SQL (MySQL, PostgreSQL, SQL Server)

I’ve spent enough time goofing around with location-finder software that it’s worth writing up how to do it.   Of course, finding distances on the surface of the earth means using Great Circle distances, worked out with the Haversine formula, also called the Spherical Cosine Law formula. The problem is this:

Given a table of locations with latitudes and longitudes, which of those locations are nearest to a given location?

Tables of locations

Where can I get a table of locations with latitudes and longitudes, you ask? Do an internet search for “free zipcode download” or “free postcode download.” Then get that loaded into a MySQL table. There are free downloads for plenty of different types of geographic data with latitude and longitude location attached.

Here’s a table of 5-digit zip codes for the United States. US Zip Code Data for MySQL. You should be able to load this into your MySQL server. If you’re going to go into production, please don’t rely on this data.

Here is the US Zip Code Data for SQL Server, if you happen to need that.

The logic in this article works for MySQL, MariaDB, PostgreSQL, and Microsoft SQL Server. Oracle works a little differently; here’s an article showing how to do this in Oracle.

Please be careful when using ZIP codes or postcodes as a way to determine locations.  ZIP and postcodes were designed for the single purpose of helping optimize postal delivery. Their value for other purposes is limited, and may give you wrong results. For example, here’s an article by a geographer about the water crisis in Flint, Michigan USA. For a long time it looked like kids in Flint weren’t getting lead poisoning because researchers only looked at their home ZIP codes to figure out where they lived. But they were getting lead poisoning. Don’t make the same mistake the MIchigan state government made.

Boring but necessary geography

Latitudes and longitudes are expressed in degrees. Latitude describes how far north or south of the equator a point is located.  Points along the equator have latitudes of zero. The north pole has a positive (north) latitude of 90, and the south pole a negative (south) latitude of -90. Accordingly, northern-hemisphere locations have positive latitude, and southern-hemisphere locations have negative latitude.

Longitude describes how far east a point is, from the prime meridian: an arbitrary line on the earth surface running from pole to pole. The Empire State Building in New York City, USA, has a negative (west) longitude, specifically -73.9857. The Taj Mahal in Agra, India, has a positive (east) longitude, specifically 78.0422. The Greenwich Observatory near London, England, has, by definition, a longitude of zero.

Latitudes therefore are values in the range [-90, 90]. Longitudes are values in the range (-180, 180].  These values are sometimes expressed in degrees, minutes, and seconds, rather than degrees and decimals.  If you’re intending to do calculations, convert the minutes and seconds to decimals first.

In the Napoleonic era, the meter was first defined so there were ten million of them in the distance from the equator to one of the poles. So, the original number of meters in a degree of latitude was 10,000,000 / 90 or 111.111 km. However the earth bulges a little, so 111.045 km per degree is considered a better approximation.

For the purpose of the kind of calculation we’re doing here, we’ll assume the earth is a sphere.  It isn’t really; it bulges a bit at the equator, but for a location-finder problem the spherical assumption is good enough.

This formula (111.045 km per degree) works fine when you’re moving north or south: if you are changing your latitude but not your longitude. It also works if you’re moving east or west — changing your longitude — along the equator.  But north or south of the equator, the lines of longitude get closer together, so if you move a degree to the east or west, you move less than 111.045 km.  The distance you actually move when you go one degree east or west is actually this number of km.

111.045 * cos(latitude)

Those of us in some former British colonies use miles. A nautical mile is defined as a minute (1/60th of a degree) of latitude. So there are 69 statute miles per degree or 60 nautical miles per degree.  If you are dealing with such applications as GPS control of ploughing with teams of oxen, you may find it helpful to know that there are 552 furlongs per degree.

Some USA-centric applications mess up the longitudes, representing them as positive rather than negative for western-hemisphere locations. If you’re debugging something be on the lookout for this.

The Great Circle Distance Formula

What’s the distance along the surface of the (spherical) earth between two arbitrary points, in degrees, given by their latitude and longitude in degrees?  That’s determined by the Spherical Cosine Law, or the Haversine Formula, which is this in MySQL syntax.

DEGREES(ACOS(LEAST(1.0,COS(RADIANS(lat1)) * COS(RADIANS(lat2)) *
             COS(RADIANS(long1) - RADIANS(long2)) +
             SIN(RADIANS(lat1)) * SIN(RADIANS(lat2)))))

It’s the distance along the surface of the spherical earth. It works equally well when the locations in question are your apartment and your local supermarket, or when they are the airports in Sydney, Australia and Reykjavik, Iceland. Notice that this result is in degrees.  That means we have to multiply it by  111.045, our value for km per degree, if we want the distance in km.

Notice that MS SQL Server requires a float or double parameter to RADIANS.  RADIANS(30) returns an incorrect value, but RADIANS(30.0) works correctly. In general, MS SQL Server doesn’t coerce integer values to float or double reliably, so be careful not to try to use an integer where you should use a float. Also, please keep in mind that US zip codes, even though they look like numbers, are character strings. Where I live we have zip codes like ‘01950’. This is *not* the same thing as 1950.

Querying the Nearest Locations

Cool. So to find the nearest points in a database to a given point, we can write a query like this.  Let’s use the point with latitude 42.81 and longitude -70.81. This MySQL query finds the fifteen nearest points to the given point in order of distance.

SELECT zip, primary_city, latitude, longitude,
      111.045* DEGREES(ACOS(LEAST(1.0, COS(RADIANS(latpoint))
                 * COS(RADIANS(latitude))
                 * COS(RADIANS(longpoint) - RADIANS(longitude))
                 + SIN(RADIANS(latpoint))
                 * SIN(RADIANS(latitude))))) AS distance_in_km
 FROM zip
 JOIN (
     SELECT  42.81  AS latpoint,  -70.81 AS longpoint
   ) AS p ON 1=1
 ORDER BY distance_in_km
 LIMIT 15

Notice the use of the join to put latpoint and longpoint into the query. It’s convenient to write the query that way because latpoint and longpoint are referred to multiple times in the formula. (MySQL doesn’t need the “ON 1=1” but PostgreSQL does.)

(In SQL Server, use “SELECT TOP(15) zip …” in place of “LIMIT 15.”)

Great. We’re done, right? Not so fast! This query is accurate,  but it is very slow.

Optimization

The query is slow because It must compute the haversine formula for every possible pair of points. So it makes your MySQL server do a lot of math, and it forces it to scan through your whole location table.  How can we optimize this?  It would be nice if we could use indexes on the latitude and longitude columns in the table. To do this, let’s introduce a constraint. Let’s say we only care about points in the zip code table that are within 50 km of the (latpoint, longpoint). Let’s figure out how to use an index to eliminate points that are further away.

Remember, from our background information earlier in this article, that a degree of latitude is 111.045 km.  So, if we have an index on our latitude column, we can use a SQL clause like this to eliminate the points that are too far north or too far south to possibly be within 50 km.

latitude BETWEEN latpoint - (50.0 / 111.045)
             AND latpoint + (50.0 / 111.045)

This WHERE clause lets MySQL use an index to omit lots of latitude points before computing the haversine distance formula. It allows MySQL to perform a range scan on the latitude index.

Finally, we can use a similar but more complex SQL clause to eliminate points that are too far east or west.  This clause is more complex because degrees of longitude are smaller distances the further away from the equator we move. This is the formula.

longitude BETWEEN longpoint - (50.0 / (111.045 * COS(RADIANS(latpoint))))
              AND longpoint + (50.0 / (111.045 * COS(RADIANS(latpoint))))

So, putting it all together, this query finds the neareast 15 points that are within a  bounding box of 50km of the (latpoint,longpoint).

This query, even though it’s a bit complicated, takes advantage of your latitude and longitude indexes and works efficiently.

Notice as part of the overall query that we JOIN this little subquery.

        SELECT  42.81  AS latpoint,  -70.81 AS longpoint,
                50.0 AS radius,      111.045 AS distance_unit

The purpose of this is to make it easier for application software to provide the parameters needed for the query. latpoint and longpoint are the specific location for which you need nearby places. radius specifies how far away the search should go. Finally distance_unit should be 111.045 if you want to give your distances in kilometers and 69.0 if you want them in statute miles.

Limiting diagonal distance

But, this bounding-box query has the potential of returning some points that lie more than 50km diagonally from your (latpoint, longpoint): it’s only checking a bounding rectangle, not the diagonal distance. Let’s enhance the query to eliminate points more than 50km as the crow flies.

Using Miles Instead of Km

Finally, many people need to use miles instead of km for their distances.  This is straightforward.  Just change the value of the distance_unit to 69.0.

That’s the query for a typical store-finder or location-finder application based on latitude and longitude.  You should be able to adapt it to your use without too much trouble.

Adapting this query to other location-table definitions

This query is, of course, written to run with a particular table definition for zip, a US zip code table.  That zip  table has columns named zip, primary_city, latitude, and longitude, among others. Notice that the table is referred to by FROM zip AS z in the query. That makes it have the alias z.

Your location table very likely has different columns. It should be straightforward to rewrite this query to use yours.  Look for the columns referred to as z.something in the query, and replace those with the column names from your table. So, for example, if your table is called SHOP and has shopnameshoplat, and shoplong columns, you’ll put z.shopname in place of z.primary_city and so forth. You’ll refer to your table by mentioning FROM SHOP as z in the query.

  84 comments for “Fast nearest-location finder for SQL (MySQL, PostgreSQL, SQL Server)

                                                                Leave a Reply

                                                                Your email address will not be published. Required fields are marked *