Nov
6
2011

Zipcode distance matching with SQL Server



Comments available as RSS 2.0

I recently needed to add a nearest postcode/zip lookup using SQL Server based on latitude and longitude. This finally gave me a chance to play around with the Geography datatype in SQL Server 2008.

There are various places to source data from. This usually depends how much information you get, but 10 minutes of googling found me 60000 city/state/zip/lat/long records. For reference, my table looks like this:

CREATE TABLE [dbo].[ZipCodes](
	[Zip] [nvarchar](20) NOT NULL,
	[City] [nvarchar](100) NOT NULL,
	[State] [nvarchar](100) NOT NULL,
	[Latitude] [FLOAT] NOT NULL,
	[Longitude] [FLOAT] NOT NULL,
	[Timezone] [INT] NOT NULL,
	[IsDST] [bit] NOT NULL,
 CONSTRAINT [PK_ZIpCodes] PRIMARY KEY CLUSTERED 
(
	[Zip] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]


There are also indexes on city, state, latitude, longitude and latitude/longitude. I’m not an expert with SQL so the last one is just for good measure. This lets me query on any column with half decent results.

Once you’ve got from the CSV or SQL script you found/purchased in to your database you need to add a Geography column.

ALTER TABLE ZipCodes
ADD [GeoLocation] GEOGRAPHY
GO

Then you can build fill the column using the data you have for latitude and longitude:

UPDATE ZipCodes
SET [GeoLocation] = geography::STPointFromText('POINT(' + CAST([Longitude] AS VARCHAR(20)) + ' ' + CAST([Latitude] AS VARCHAR(20)) + ')', 4326)

Now you have to decide whether to build the query using inline SQL in your code or create a stored procedure for it. My instinct says stored procedure but right now inline SQL is fine.

DECLARE @myposition geography;
SELECT @myposition = GeoLocation FROM ZipCodes
WHERE Zip LIKE '10005' ;--OR City LIKE 'New York' OR State LIKE 'NY';  
 
SELECT z.*,z.GeoLocation.STDistance(@myposition) AS Distance FROM ZipCodes z
WHERE z.GeoLocation.STDistance(@myposition) <= 1000
ORDER BY Distance ASC

In this case, the query is built in .NET with certain parts left out depending what the user searched for, but that’s irrelevant. The last part is commented out because you want to get the most precise match, so if you have a zip code, use it. If not you can use the City and State columns.

Zip	City	State	Latitude	Longitude	Timezone	IsDST	GeoLocation					Distance
10005	New York	NY	40.706019	-74.00858	-5	1	0xE6100000010CEE7C3F355E5A4440B8AF03E78C8052C0	0
10270	New York	NY	40.706925	-74.008154	-5	1	0xE6100000010C6DC5FEB27B5A4440F1F44A59868052C0	105.505094488081
10045	New York	NY	40.70859	-74.008687	-5	1	0xE6100000010C09F9A067B35A44406ADE718A8E8052C0	288.849104985383
10271	New York	NY	40.708669	-74.01043	-5	1	0xE6100000010C6D567DAEB65A444038F8C264AA8052C0	336.208294678286
10041	New York	NY	40.703042	-74.01009	-5	1	0xE6100000010C448B6CE7FB594440226C787AA58052C0	356.447222439371
10006	New York	NY	40.707904	-74.01342	-5	1	0xE6100000010C516B9A779C5A44400D71AC8BDB8052C0	457.225354635998
10038	New York	NY	40.709677	-74.00365	-5	1	0xE6100000010C50FC1873D75A444000917EFB3A8052C0	589.364962050259
10048	New York	NY	40.712086	-74.01227	-5	1	0xE6100000010CA5BDC117265B4440696FF085C98052C0	746.071595261018
10250	New York	NY	40.709387	-74.016529	-5	1	0xE6100000010C26E4839ECD5A4440931804560E8152C0	766.967118714565
10279	New York	NY	40.713062	-74.00857	-5	1	0xE6100000010C88635DDC465B4440B8AF03E78C8052C0	788.442916610519
10280	New York	NY	40.707467	-74.0178	-5	1	0xE6100000010CC3F5285C8F5A44409A779CA2238152C0	795.10810871016
10286	New York	NY	40.714231	-74.011895	-5	1	0xE6100000010CCF66D5E76A5B4440A2B437F8C28052C0	952.338009037327
10007	New York	NY	40.714754	-74.00721	-5	1	0xE6100000010C2497FF907E5B4440FF21FDF6758052C0	984.36004707576

These all seem roughly around the same place and are ordered by distance, but what do I know, I’ve never been to New York!

I’ve found it quicker to use a temporary table when doing the actual SELECT, ie:

DECLARE @miles [FLOAT] = 1;
DECLARE @myposition geography;
SELECT @myposition = GeoLocation FROM ZipCodes
WHERE Zip LIKE '11204';
--WHERE City LIKE 'Zionville';-- OR State LIKE 'NY';
 
SELECT z.*,z.GeoLocation.STDistance(@myposition) AS Distance INTO #HASHTABLE FROM ZipCodes z
WHERE z.GeoLocation.STDistance(@myposition) <= (@miles*1609.344)
ORDER BY Distance ASC
 
SELECT z.[Latitude], z.[Longitude], u.*, z.GeoLocation.STDistance(@myposition) AS Distance
FROM Users u INNER JOIN ZipCodes z ON u.PostCode = z.Zip
WHERE u.PostCode IN (SELECT Zip FROM #hashtable) OR u.City IN (SELECT City FROM #hashtable) 
DROP TABLE #Hashtable

Comments

Leave a Comment

Login using OpenID or enter your details below to leave a comment.

OpenID
Anonymous


Comment