Tag: statistics

Coursera Data Science Specialization: A Student’s Review

datasciencelogoCoursera is a pure play online education provider distributing classes in a wide variety of subjects, from The Music of the Beatles to Analyzing Global Trends for Business and Society. Many courses are offered in the native languages of those who developed them, such as Peking University’s Methodologies in Social Research, while others have been translated for more widespread use – see Yale’s Financial Markets instructed by Prof. Robert Shiller (I’m a fan).

Part of the Massive Open Online Course (or “MOOC”) site’s push is linking courses developed by accredited higher-learning institutions into specializations, series of classes designed to develop skills in a particular field. There are seven specializations as of this writing, and the one I dove into is called Data Science.

Made up of nine segments created by Johns Hopkins University’s Bloomberg School of Public Health – taught by Professor Brian Caffo and Assistant Professors Roger Peng and Jeff Leek – it’s a half-year commitment for Energizer bunnies with a math/programming bent, and probably twelve to eighteen months if right this moment you’re distracted by your Instagram feed. Ok, maybe twenty-four to thirty-six.

Yours truly took the fast track, doubling and tripling up on classes at the outset, leaving the purportedly hard stuff for the windup to winter solstice. What follows is a summary of each class, including comparison to what was “sold”, tips for getting the most from them i.e. scoring well and then some, as well as supplementary materials discovered that turned out worthwhile. It’s the truth within from a dedicated student’s point of view, and a long road. So feel free to skip to the conclusions; just don’t make fun of my grades.


DataCamp is roast tenderloin for the brain

Unless you’ve been living under a rock, you’ve probably heard the term big data. Yes, there’s a lot of bits and bytes out there, created not only by sheer prevalence of tweets about microwaveable mac n’ cheese (and cats … lots of cats), but also via trivial technological advances in the areas of computer security, genome sequencing, and even what bar you’re drinking at geo-location. With all these haystacks came the desire for finding needles; as a result number crunching has undeniably experienced a renaissance.

I became curious about what it all meant. But seeing as I was a “B” student (on a good day) when it came to sample sizes and p-values – preferring debits, credits and present value calculations to wondering why anyone would make decisions based on a measurement that sounded like a hot drink (squared) – the whiskered was summarily sent to its demise.

Last summer I signed on for a nine-course regimen in this data science, offered by Johns Hopkins University in conjunction with Coursera, fully expecting it to be a pile of mumbo jumbo I could whiz through before year’s end. A comprehensive review of those courses is planned for publication here in early 2015, and I can state with 95% confidence that I’ll meet the hard deadline. However, I lied; I was really a “C” student in regression and the like (when I even went to class), so the series subject matter was, in several cases both proverbially and actually, more than I bargained for.

To weed through the mess, truly comprehend the fundamentals, and find some nuggets of practicality within required an investment significantly in excess of my original, cocksure estimate. In addition to the acquisition of several texts in statistical inference, modeling and prediction, I also found some excellent online resources to assist in the cause. One of those was DataCamp, an R-programming oriented teaching tool which turned out to be a savior during the work. Real learn-as-you-do material.


So without further ado – that is, barring failure to recall that less than one standard deviation from the mean does not a null hypothesis rejection make (which I might) – I give the site two big thumbs up.

MG signing off (because he still can’t simulate a random normal distribution with R, but he fakes it like a champ)

More Whisky Geekery

From Luba Gloukhov of Revolution Analytics

The first time I had an Islay single malt, my mind was blown. In my first foray into the world of whiskies, I took the plunge into the smokiest, peatiest beast of them all — Laphroig. That same night, dreams of owning a smoker were replaced by the desire to roam the landscape of smoky single malts.

As an Islay fan, I wanted to investigate whether distilleries within a given region do in fact share taste characteristics. For this, I used a dataset profiling 86 distilleries based on 12 flavor categories.

It gets all down and dirty with the number crunching after that, but in the grand scheme of using R there’s always a cool plot of data to be had …

Clusters of single malt taste

Clusters of single malt taste

Peruse the whole thing if you like Scotch. And, if you are partial to statistics, regardless of the data set, and don’t have the budget for SPSS, might I suggest following this tutorial which will get R, R Studio, and related toolsets ready for work in a jiffy.

And for the manufactured suggested retail price of $0. Now if you could only get a bottle for that.

MG signing off (to sip some Scotch and crunch some numbers, just not necessarily in that order)

World Wide Whiskey

India consumes about half the world’s whiskey, but the country also has a massive population. They stick to the home grown, much as the US does …


Plenty of fun facts on whiskey to be found here, much in nifty graphical presentations. Comprehension subject to the visual acuity of the viewer, so don’t get started drinking until afterwards.

MG signing off (because whiskey and Red Bull is sacrilege, but bottle service is still a hoot)

New Homes Sales DID NOT Rise

Just because the Commerce Department said it doesn’t make it true.

PS: I love the way they screw with statistics. Who in their right mind could possible believe any analysis with a +-10% margin of error?!

Email use statistics, made simple

CNET had this piece on a recent email survey out of the Bay Area: E-mail and its discontents.

Unfortunately, they miss on a point or two, so Spamroll will help them out.