Fast nearest-location finder for Oracle

Elsewhere I have written an article on fast location finding for the MySQL and PostgreSQL databases. This article shows an adaptation of that work for the Oracle database.

If you need data to work with, here’s a US Zip Code table in SQL suitable for an Oracle database. Please don’t use this data in production.

The Great Circle Distance Formula

What’s the distance along the surface of the (spherical) earth between two arbitrary points, in degrees, given by their latitude and longitude in degrees?  That’s determined by the Spherical Cosine Law, or the Haversine Formula, which is this in Oracle syntax.

57.2957795 * (ACOS(MIN(1.0,COS(0.0174532925*(lat1)) * COS(0.0174532925*(lat2)) *
                           COS(0.0174532925*(long1-long2)) +
                           SIN(0.0174532925*(lat1)) * SIN(0.0174532925*(lat2)))))

This formula is all about angles. The constant “57.2957795” converts radians to degrees, and the inverse constant “0.0174532925” converts degrees to radians.

It’s the distance along the surface of the spherical earth. It works equally well when the locations in question are your apartment and your local supermarket, or when they are the airports in Sydney, Australia and Reykjavik, Iceland. Notice that this result is in degrees.  That means we have to multiply it by  111.045, our value for km per degree, if we want the distance in km.

Querying the Nearest Locations

Cool. So to find the nearest points in a database to a given point, we can write a query like this.  Let’s use the point with latitude 42.81 and longitude -70.81. This Oracle query finds the fifteen nearest points to the given point in order of distance.

SELECT * FROM(
 SELECT z.zip, z.primary_city, z.state,
        z.latitude, z.longitude,
        p.radius,
        p.distance_unit
           * rad2deg * (ACOS(MIN(1.0, COS(deg2rad * (p.latpoint))
                                    * COS(deg2rad * (z.latitude))
                                    * COS(deg2rad * (p.longpoint - z.longitude))
                                    + SIN(deg2rad * (p.latpoint))
                                    * SIN(deg2rad * (z.latitude)))) AS distance
  FROM zip z
  JOIN (
        SELECT  42.81  AS latpoint,    -70.81 AS longpoint,
                50.0 AS radius,        111.045 AS distance_unit,
		57.2957795 AS rad2deg,
                0.0174532925 AS deg2rad
	      FROM  DUAL
    ) p ON 1=1
  ORDER BY distance
)
WHERE ROWNUM <= 15

Notice the use of the join to put latpoint and longpoint into the query. It’s convenient to write the query that way because latpoint and longpoint are referred to multiple times in the formula. We also include the degrees-to-radians and radians-to-degrees constants in the joined table. Notice also the use of “ROWNUM < = 15” to limit the number of rows in the resultset.

Great. We’re done, right? Not so fast! This query is accurate,  but it is very slow.

Optimization

The query is slow because It must compute the haversine formula for every possible pair of points. So it makes your Oracle server do a lot of math, and it forces it to scan through your whole location table.  How can we optimize this?  It would be nice if we could use indexes on the latitude and longitude columns in the table. To do this, let’s introduce a constraint. Let’s say we only care about points in the zip code table that are within 50 km of the (latpoint, longpoint). Let’s figure out how to use an index to eliminate points that are further away.

Remember, from our background information in the other article, that a degree of latitude is 111.045 km.  So, if we have an index on our latitude column, we can use a SQL clause like this to eliminate the points that are too far north or too far south to possibly be within 50 km.

latitude BETWEEN latpoint - (50.0 / 111.045)
             AND latpoint + (50.0 / 111.045)

This WHERE clause lets Oracle use an index to omit lots of latitude points before computing the haversine distance formula. It allows Oracle to perform a range scan on the latitude index.

Finally, we can use a similar but more complex SQL clause to eliminate points that are too far east or west.  This clause is more complex because degrees of longitude are smaller distances the further away from the equator we move. This is the formula.

longitude BETWEEN longpoint - (50.0 / (111.045 * COS(deg2rad*(latpoint))))
              AND longpoint + (50.0 / (111.045 * COS(deg2rad*(latpoint))))

So, putting it all together, this query finds the neareast 15 points that are within 50km of the (latpoint,longpoint).

SELECT * FROM (
SELECT zip, primary_city, state,
       latitude, longitude, distance
  FROM (
 SELECT z.zip, z.primary_city, z.state,
        z.latitude, z.longitude,
        p.radius,
        p.distance_unit
            * rad2deg * (ACOS(MIN(1.0,COS(deg2rad * (p.latpoint))
                                    * COS(deg2rad * (z.latitude))
                                    * COS(deg2rad * (p.longpoint - z.longitude))
                                    + SIN(deg2rad * (p.latpoint))
                                    * SIN(deg2rad * (z.latitude))))) AS distance
  FROM zip z
  JOIN (
        SELECT  42.81  AS latpoint,    -70.81 AS longpoint,
                50.0 AS radius,        111.045 AS distance_unit,
                57.2957795 AS rad2deg,
                0.0174532925 AS deg2rad
	      FROM  DUAL
    ) p ON 1=1
  WHERE z.latitude
     BETWEEN p.latpoint  - (p.radius / p.distance_unit)
         AND p.latpoint  + (p.radius / p.distance_unit)
    AND z.longitude
     BETWEEN p.longpoint - (p.radius / (p.distance_unit * COS(deg2rad * (p.latpoint))))
         AND p.longpoint + (p.radius / (p.distance_unit * COS(deg2rad * (p.latpoint))))
 )
 WHERE distance <= radius
 ORDER BY distance
)
WHERE ROWNUM <= 20

This query, even though it’s a bit complicated, takes advantage of your latitude and longitude indexes and works efficiently.

Notice as part of the overall query that we JOIN this little subquery.

        SELECT  42.81  AS latpoint,    -70.81 AS longpoint,
                50.0 AS radius,        111.045 AS distance_unit,
                57.2957795 AS rad2deg,
                0.0174532925 AS deg2rad
	      FROM  DUAL

The purpose of this is to make it easier for application software to provide the parameters needed for the query. latpoint and longpoint are the specific location for which you need nearby places. radius specifies how far away the search should go. We include the radians-degrees conversion constants. Finally distance_unit should be 111.045 if you want to give your distances in kilometers and 69.0 if you want them in statute miles.

Using Miles Instead of Km

Finally, many people need to use miles instead of km for their distances.  This is straightforward.  Just change the value of the “distance_unit” to 69.0.

That’s the query for a typical store-finder or location-finder application based on latitude and longitude.  You should be able to adapt it to your use without too much trouble.

Adapting this query to other location-table definitions

This query is, of course, written to run with a particular table definition for zip, a US zip code table.  That zip  table has columns named zip, primary_city, latitude, and longitude, among others. Notice that the table is referred to by FROM zip z in the query. That makes it have the alias z.

Your location table very likely has different columns. It should be straightforward to rewrite this query to use yours.  Look for the columns referred to as z.something in the query, and replace those with the column names from your table. So, for example, if your table is called SHOP and has shopnameshoplat, and shoplong columns, you’ll put z.shopname in place of z.primary_city and so forth. You’ll refer to your table by mentioning FROM SHOP z in the query.

Using a Stored Procedure

You can use a PL/SQL Stored Procedure to compute the great circle distance if you prefer. Here is a stored procedure for that purpose.

create or replace FUNCTION haversine  (
  lat1 IN REAL, lon1 IN REAL, 
  lat2 IN REAL, lon2 IN REAL)
RETURN REAL
DETERMINISTIC
IS

  DEGREES REAL := 57.2957795;
  RADIANS REAL := 0.0174532925;

BEGIN
  return DEGREES * (ACOS(MIN(1.0,
              COS(RADIANS *(lat1)) *
              COS(RADIANS *(lat2)) *
              COS(RADIANS *(lon2 - lon1)) +
              SIN(RADIANS * (lat1)) * SIN(RADIANS * (lat2))
            )));
END;

When you use that stored procedure your query is a little easier to read. Your queries will also run quite a bit faster because Oracle compiles its procedures.

SELECT * FROM (
SELECT zip, primary_city, state,
       latitude, longitude, distance
  FROM (
 SELECT z.zip, z.primary_city, z.state,
        z.latitude, z.longitude,
        p.radius,
        p.distance_unit
           * haversine(latitude,   longitude,
                       latpoint,   longpoint) AS distance
  FROM zip z
  JOIN (
        SELECT  42.81  AS latpoint,    -70.81 AS longpoint,
                50.0 AS radius,        111.045 AS distance_unit,
                0.0174532925 AS deg2rad
	      FROM  DUAL
    ) p ON 1=1
  WHERE z.latitude
     BETWEEN p.latpoint  - (p.radius / p.distance_unit)
         AND p.latpoint  + (p.radius / p.distance_unit)
    AND z.longitude
     BETWEEN p.longpoint - (p.radius / (p.distance_unit * COS(deg2rad * (p.latpoint))))
         AND p.longpoint + (p.radius / (p.distance_unit * COS(deg2rad * (p.latpoint))))
 )
 WHERE distance <= radius
 ORDER BY distance
)
WHERE ROWNUM <= 20

But, if you choose to use a stored procedure, you might consider using the Vincenty formula rather than the spherical cosine law formula. Numerically speaking, it’s a little more stable when presented with very short distances between points.

14 thoughts on “Fast nearest-location finder for Oracle”

  1. Sorry for repeating the question..
    I have the table, the data contains the regions, longitude and latitude, I want to appear at the point in the region. All projects are in the table, the distance is 2 kilometers.

    Can I contact you at the e-mail?

    Reply
  2. Hello, can you help me?
    I have a table of studies, It has longitude and latitude coordinates. I want at the point of the event to show the diameter of the ring distance 2000 meters

    Thank you

    Reply
  3. Hello, can you help?
    I have a table of studies,It has longitude and latitude coordinates . I want at the point of the event to show the diameter of the ring distance 2000 meters

    Thank you. 😊

    Reply
    • Do you mean you want to draw a circle of diameter 2km? Where will you draw that circle? 1km in degrees of latitude is (1.0 / 111.045). In degrees of longitude it depends on the latitude (lines of longitude are further apart in km near the equator) it is (1.0 / (111.045 * COS(0.0174532925 * latitude)) So draw an ellipse with its major axis running north-south.

      Reply
  4. Hi Ollie,

    We are using following query in production to find nearest buildings, and it takes 30 secs. Is there a way to tune this query ?

    WITH WUILDING AS
    (SELECT GEO_LATITUDE,GEO_LONGITUDE,6387.7 AS RADIUS, 57.29577951 AS DEGTORAD FROM BUILDING_ADS
    WHERE BUILDING_NK = :p)
    SELECT VP FROM (
    SELECT /*+ PARALLEL 8*/ BA.BUILDING_NK AS BUILDING_ID, MV.VP AS VP,
    (NVL(WO.RADIUS,0) * ACOS((SIN(NVL(WO.GEO_LATITUDE,0) / WO.DEGTORAD) * SIN(NVL(BA.GEO_LATITUDE,0) / WO.DEGTORAD)) +
    (COS(NVL(WO.GEO_LATITUDE,0) / WO.DEGTORAD) * COS(NVL(BA.GEO_LATITUDE,0) / WO.DEGTORAD) *COS(NVL(BA.GEO_LONGITUDE,0) …

    Reply
  5. Computing longitude BETWEEN longpoint – (50.0 / (111.045 * COS(deg2rad*(latpoint))))
    AND longpoint + (50.0 / (111.045 * COS(deg2rad*(latpoint))))

    This Seems like a slow operation to me. If you just ignore all this as simply use 111.045 as the upper bound regardless of latitude you will still have a proper subset (although a little more area would of course be included). Given that stores tend to be near people, it should be safe to assume that 99% of the world’s population and stores are between latitudes 60N and 60S, and cosine 60 is 0.5, so you are only doubling the potential areas included at the extreme north/south region for 99% of your possible data points.

    Calculating distance by the Haversine formula is still quite expensive — for short distances (50 km or less) this flat earth formula:

    flat_earth_distance(lat1, lng1, lat2, lng2) — lat/long are in degrees
    deglen := 111.045 — km
    x := lat1 – lat2 — don’t care if pos. or neg. since we square it
    y := (lng1 – lng2)*cos(lat1) — cosine probably has to convert to radians first
    return deglen*sqrt(x*x + y*y)
    — or return distance squared and avoid the square root calculation

    Results in an distance error of no more than 0.4% even when the the latitude is extreme as 65 deg. You could even use a case statement to use Haversine when latitude is extreme.

    If someone is looking for a store with 50 km and you just happen to miss by a full 0.4% you will at worst case report that a store 50.2 km is in range. Seems like a good compromise in most real-world use cases.

    However, I would not bother with the faster flat earth distance formula if the number of returned points is relatively large, unless you have tested and determined that the database is exceptionally slow when using the Haversine formula.

    http://jonisalonen.com/2014/computing-distance-between-coordinates-can-be-simple-and-fast/ has a nice article on using the flat earth approximation.

    Reply
    • With respect, you’re pitching a false economy. This would be a concern if we were still in the era of software floating point computation. But, these days, the time consumed by a computation is several orders of magnitude faster than the other times consumed. Here’s the analogy.

      • Christopher Columbus writes a fifty-word letter :: do the computation.
      • Christopher Columbus walks to the dock and hands it to a ship captain :: retrieve the data from the MySQL table, with an SSD.
      • The ship captain sets sail, sails across the ocean, and delivers the letter to Queen Isabella :: send the data over the network to the database client.

      Who cares how long it takes to write the letter? The trick is to ship an optimal amount of cargo across the ocean.

      Reply
  6. Awesome! Thanks for sharing your coded procedure! I wasn’t even sure Oracle had all the trig functions built in to do this and you coded the entire procedure.

    Reply
    • Well, Oracle lacks the RADIANS() and DEGREES() functions that are exposed by standard IEEE floating point units. But that’s not a problem, because multiplications can do the trick.

      Reply
    • If you have a table of zipcodes containing long/lat columns, you can, trivially, look up the long / lat for a zipcode in the table. Google’s benefit is their enormous collection of data behind their apis. If you do this in your own DBMS you must provide the data.

      Reply
  7. I wanted to thank you. Your post helped me a lot with my project. I was trying to find a way to let my users to find something in their circle, and here you helped me.

    Reply

Leave a Comment