Zipcode distance matching with SQL Server
I recently needed to add a nearest postcode/zip lookup using SQL Server based on latitude and longitude. This finally gave me a chance to play around with the Geography datatype in SQL Server 2008.
There are various places to source data from. This usually depends how much information you get, but 10 minutes of googling found me 60000 city/state/zip/lat/long records. For reference, my table looks like this:
CREATE TABLE [dbo].[ZipCodes]( [Zip] [nvarchar](20) NOT NULL, [City] [nvarchar](100) NOT NULL, [State] [nvarchar](100) NOT NULL, [Latitude] [FLOAT] NOT NULL, [Longitude] [FLOAT] NOT NULL, [Timezone] [INT] NOT NULL, [IsDST] [bit] NOT NULL, CONSTRAINT [PK_ZIpCodes] PRIMARY KEY CLUSTERED ( [Zip] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
There are also indexes on city, state, latitude, longitude and latitude/longitude. I’m not an expert with SQL so the last one is just for good measure. This lets me query on any column with half decent results.
Once you’ve got from the CSV or SQL script you found/purchased in to your database you need to add a Geography column.
ALTER TABLE ZipCodes ADD [GeoLocation] GEOGRAPHY GO
Then you can build fill the column using the data you have for latitude and longitude:
UPDATE ZipCodes SET [GeoLocation] = geography::STPointFromText('POINT(' + CAST([Longitude] AS VARCHAR(20)) + ' ' + CAST([Latitude] AS VARCHAR(20)) + ')', 4326)
Now you have to decide whether to build the query using inline SQL in your code or create a stored procedure for it. My instinct says stored procedure but right now inline SQL is fine.
DECLARE @myposition geography; SELECT @myposition = GeoLocation FROM ZipCodes WHERE Zip LIKE '10005' ;--OR City LIKE 'New York' OR State LIKE 'NY'; SELECT z.*,z.GeoLocation.STDistance(@myposition) AS Distance FROM ZipCodes z WHERE z.GeoLocation.STDistance(@myposition) <= 1000 ORDER BY Distance ASC
In this case, the query is built in .NET with certain parts left out depending what the user searched for, but that’s irrelevant. The last part is commented out because you want to get the most precise match, so if you have a zip code, use it. If not you can use the City and State columns.
Zip City State Latitude Longitude Timezone IsDST GeoLocation Distance 10005 New York NY 40.706019 -74.00858 -5 1 0xE6100000010CEE7C3F355E5A4440B8AF03E78C8052C0 0 10270 New York NY 40.706925 -74.008154 -5 1 0xE6100000010C6DC5FEB27B5A4440F1F44A59868052C0 105.505094488081 10045 New York NY 40.70859 -74.008687 -5 1 0xE6100000010C09F9A067B35A44406ADE718A8E8052C0 288.849104985383 10271 New York NY 40.708669 -74.01043 -5 1 0xE6100000010C6D567DAEB65A444038F8C264AA8052C0 336.208294678286 10041 New York NY 40.703042 -74.01009 -5 1 0xE6100000010C448B6CE7FB594440226C787AA58052C0 356.447222439371 10006 New York NY 40.707904 -74.01342 -5 1 0xE6100000010C516B9A779C5A44400D71AC8BDB8052C0 457.225354635998 10038 New York NY 40.709677 -74.00365 -5 1 0xE6100000010C50FC1873D75A444000917EFB3A8052C0 589.364962050259 10048 New York NY 40.712086 -74.01227 -5 1 0xE6100000010CA5BDC117265B4440696FF085C98052C0 746.071595261018 10250 New York NY 40.709387 -74.016529 -5 1 0xE6100000010C26E4839ECD5A4440931804560E8152C0 766.967118714565 10279 New York NY 40.713062 -74.00857 -5 1 0xE6100000010C88635DDC465B4440B8AF03E78C8052C0 788.442916610519 10280 New York NY 40.707467 -74.0178 -5 1 0xE6100000010CC3F5285C8F5A44409A779CA2238152C0 795.10810871016 10286 New York NY 40.714231 -74.011895 -5 1 0xE6100000010CCF66D5E76A5B4440A2B437F8C28052C0 952.338009037327 10007 New York NY 40.714754 -74.00721 -5 1 0xE6100000010C2497FF907E5B4440FF21FDF6758052C0 984.36004707576
These all seem roughly around the same place and are ordered by distance, but what do I know, I’ve never been to New York!
I’ve found it quicker to use a temporary table when doing the actual SELECT, ie:
DECLARE @miles [FLOAT] = 1; DECLARE @myposition geography; SELECT @myposition = GeoLocation FROM ZipCodes WHERE Zip LIKE '11204'; --WHERE City LIKE 'Zionville';-- OR State LIKE 'NY'; SELECT z.*,z.GeoLocation.STDistance(@myposition) AS Distance INTO #HASHTABLE FROM ZipCodes z WHERE z.GeoLocation.STDistance(@myposition) <= (@miles*1609.344) ORDER BY Distance ASC SELECT z.[Latitude], z.[Longitude], u.*, z.GeoLocation.STDistance(@myposition) AS Distance FROM Users u INNER JOIN ZipCodes z ON u.PostCode = z.Zip WHERE u.PostCode IN (SELECT Zip FROM #hashtable) OR u.City IN (SELECT City FROM #hashtable) DROP TABLE #Hashtable


Comments
Leave a Comment