Publicly Available Datasets

Sometimes learning to use data systems like MySQL means you need to get your hands on various publicly available sets of data.

Here are some sources.

Pro Publica, the investigative news powerhouse, has a Data Store. It’s mostly health care related material. Not all their datasets are free, but some are.

If you’re interested in historical meteorological observations (weather data), the Iowa State University Agronomy Department has assembled a fantastic collection of data from airports and other observation stations worldwide.

There are plenty of sources for US zipcodes (postcodes to the rest of you out there on the globe). I have prepared one on this web site, by combining US 2010 census data with a freely available zip table. It is here: US Zip Code Data. Unzip it and load it into SQL.

Here’s one.  Here’s another, free for personal  or educational use.  It’s not hard to find, on the intertubes, similar tables for postcodes in various nations around the world. (Note: strictly speaking, postcodes and zipcodes are not the same as geographical areas. They’re designed to ease the task of sorting postal mail, not to define geographical areas. Still, most people think of them as coded place names and they work fairly well for that.)

The US Census Bureau offers lots of downloadable data about places and populations, for example here. They also offer tables, from the 1990 census, of the most popular surnames and given names, here.

The US Government offers a collection of open data here. The USA Bureau of Transportation Statistics publishes lots of information here.

MaxMind offers a collection of data for mapping Internet Protocol addresses to geolocations here.

Google offers a fantastic open-source tool for scrubbing and analyzing messy data in their Open Refine project. Unlike many Google tools, this one runs on your computer and doesn’t require you to upload your dataset to a Google-owned server.

Leave a Comment