It’s possible to use the geospatial extension in MySQL for an efficient location finder. For this to be worth the trouble, the following conditions must hold.
- You must use a MyISAM table for your geospatial data, or use version 5.7.5 or later of MySQL.
- A NOT NULL qualification on your geometry column is required
- A spatial index is needed:
ALTER TABLE flags ADD SPATIAL INDEX (coordinates)
- Use of the GeomFromText and MBRContains / MBRWithin functions in your SELECT statement.
For example, a geospatial table might have the following definition.
CREATE TABLE geozip (
zip VARCHAR(5) CHARACTER SET utf8 NOT NULL,
geo GEOMETRY NOT NULL,
PRIMARY KEY zip (zip),
SPATIAL KEY geo (geo) )
ENGINE=MYISAM
Once you have created your geozip table, you can populate it with this kind of query.
INSERT INTO geozip
SELECT DISTINCT
zip,
GEOMFROMTEXT( CONCAT('POINT(', latitude, ' ', longitude, ')') ) AS geo
FROM zip
Looking up rows in this table might be done by this query.
SELECT zip, X(geo), Y(geo), distance
FROM (
SELECT zip, geo,r,
units * DEGREES( ACOS( COS(RADIANS(latpoint)) *
COS(RADIANS(X(geo))) *
COS(RADIANS(longpoint) - RADIANS(Y(geo))) +
SIN(RADIANS(latpoint)) *
SIN(RADIANS(X(geo))))) AS distance
FROM geozip
JOIN (
SELECT 42.81 AS latpoint, -70.81 AS longpoint,
10.0 AS r, 69.0 AS units
) AS p ON (1=1)
WHERE MbrContains(GeomFromText(
CONCAT('LINESTRING(', latpoint-(r/units),' ',
longpoint-(r /(units* COS(RADIANS(latpoint)))),
',',
latpoint+(r/units) ,' ',
longpoint+(r /(units * COS(RADIANS(latpoint)))), ')')),
geo)
) AS d
WHERE distance <= r
ORDER BY distance
The secret is using the MbrContains(box,geo) function (“Minimum bounding rectangle Contains”) for your WHERE search. That function can exploit your spatial index. The first parameter, the box parameter, specifies a bounding box, and the second specifies a point. When you use a constant value for the first parameter, the function uses the index.
In this example we specify the first parameter with this sort of expression:
GeomFromText('LINESTRING(LatSW LonSW, LatNE LonNE)')
The LINESTRING is a diagonal line across the bounding box from its southwest to northeast corner. That LINESTRING’s value is generated based on latpoint/longpoint by the long CONCAT() expression in the query. For the example lat/long value of 42.81/-70.81, the LINESTRING looks like this:
GeomFromText('LINESTRING(42.6650 -71.0076,42.9549 -70.6124)')
Finally, if you are dealing with locations that are close together, you may find that computing your distances
using the Vincenty formula is more accurate. You would do that with this query.
SELECT zip, X(geo), Y(geo), distance FROM (
SELECT zip, geo,r,
units * DEGREES(
ATAN2(
SQRT(
POW(COS(RADIANS(latpoint))*SIN(RADIANS(longpoint-Y(geo))),2) +
POW(COS(RADIANS(X(geo)))*SIN(RADIANS(latpoint)) -
(SIN(RADIANS(X(geo)))*COS(RADIANS(latpoint)) *
COS(RADIANS(longpoint-Y(geo)))) ,2)),
SIN(RADIANS(X(geo)))*SIN(RADIANS(latpoint)) +
COS(RADIANS(X(geo)))*
COS(RADIANS(latpoint))*
COS(RADIANS(longpoint-Y(geo)
)
)) AS distance
FROM geozip
JOIN (
SELECT 42.81 AS latpoint, -70.81 AS longpoint,
50.0 AS r, 69.0 AS units
) AS p ON (1=1)
WHERE MBRCONTAINS(GEOMFROMTEXT(
CONCAT('LINESTRING(',
latpoint-(r/units),' ',
longpoint-(r /(units* COS(RADIANS(latpoint)))),
',',
latpoint+(r/units) ,' ',
longpoint+(r /(units * COS(RADIANS(latpoint)))),
')')), geo)
) AS d
WHERE distance <= r
ORDER BY distance
Percona’s Alex Rubin has prepared some helpful details on the geo extension.
Thanks so much Ollie. Do you know the performance difference between this vs using floats for lat/long (MySql)? I’m comparing your other example here: http://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/
This kind of geospatial index will be fast compared to indexed lat and long columns if
1. the number of points to search is larger (50K or more, roughly) AND
2. the geospatial search is by far the most selective of the searches you are doing.
That is
WHERE MBRCONTAINS(…)
is fast, and
WHERE point_Type = 7 AND MBRCONTAINS(…)
not so much. That’s because a geo index can’t be included in a compound index with other columns.
Great post! Thanks!
I assume we are using miles here. If we wanted Kms we would change the “69.0 AS units” to “111.045 AS units” – is this correct?
Exactly so. 60 nautical miles per degree, 69 statute miles per degree, 111045 meters (111.045 km) per degree, 5,520 chains per degree.
Hi Ollie, thanks for these posts! if we changed r = 2 and set units to 11045.0 would this calculate a 2 meter bounding box? I am looking at using this technique to cache reverse geocoded addresses and so the range needs to be smaller.
Use 111045 to specify your units in meters.
Be careful: two meters is likely to be too small a radius to cope with the imprecision in commercial GPS data.