Publicly Available Datasets

Sometimes learning to use data systems like MySQL means you need to get your hands on various publicly available sets of data. Here are some sources. Pro Publica, the investigative news powerhouse, has a Data Store. It’s mostly health care related material. Not all their datasets are free, but some are. If you’re interested in … Read more

The Vincenty great-circle distance formula

This Vincenty formula is a more numerically stable version of the spherical cosine law formula (commonly and wrongly known as the Haversine formula) for computing great circle distances. The question of numerical stability comes up specifically when the distances between points are small. In those cases the cosine is very close to 1, so the … Read more

SQL Reporting by time intervals

A version of this article specific to the Oracle DBMS is here. It’s often helpful to use SQL to group information by periods of time. For example, we might like to examine sales data. For example, we might have a table of individual sales transactions like so. Sales: sales_id int sales_time datetime net decimal(7,2) tax … Read more

Stored function for haversine distance computation

In another article I described the process of using MySQL to compute great-circle distances between various points on the earth then their latitudes and longitudes are known.  To do this requires the formula commonly called the haversine formula. It’s actually the spherical cosine law formula, and is shown here. There’s a more numerically stable formula — … Read more

Mean Absolute Deviation

Nassim Taleb wrote a provocative article on Edge.Org calling for using the Mean Absolute Deviation in place of the more popular standard deviation as a measure of the variability of a collection of observations. His reasoning is persuasive to me, especially his claim that the standard deviation is widely misapplied and misunderstood. MySQL (like many … Read more

What’s a date?

What is a date?  This seems like a silly question.  Indeed, if you are an independent local business person, it is a silly question.  A date is, for example, the seventh of September, 2011 (“2011-09-07”).  It describes a period of 24 hours that starts at midnight and ends just before midnight. If you only care … Read more

Processing dates and times in SQL

I’ve spent too many working days figuring out how to handle dates and times in relational data base management systems. From looking at the questions on stackoverflow.com it’s obvious that a lot of people have the same kinds of questions and confusions I have. I’m a lazy programmer. I’d much rather do this stuff right than … Read more