Unless you’ve been living under a rock, you’ve probably heard the term big data. Yes, there’s a lot of bits and bytes out there, created not only by sheer prevalence of tweets about microwaveable mac n’ cheese (and cats … lots of cats), but also via trivial technological advances in the areas of computer security, genome sequencing, and even
what bar you’re drinking at geo-location. With all these haystacks came the desire for finding needles; as a result number crunching has undeniably experienced a renaissance.
I became curious about what it all meant. But seeing as I was a “B” student (on a good day) when it came to sample sizes and p-values – preferring debits, credits and present value calculations to wondering why anyone would make decisions based on a measurement that sounded like a hot drink (squared) – the whiskered was summarily sent to its demise.
Last summer I signed on for a nine-course regimen in this data science, offered by Johns Hopkins University in conjunction with Coursera, fully expecting it to be a pile of mumbo jumbo I could whiz through before year’s end. A comprehensive review of those courses is planned for publication here in early 2015, and I can state with 95% confidence that I’ll meet the hard deadline. However, I lied; I was really a “C” student in regression and the like (when I even went to class), so the series subject matter was, in several cases both proverbially and actually, more than I bargained for.
To weed through the mess, truly comprehend the fundamentals, and find some nuggets of practicality within required an investment significantly in excess of my original, cocksure estimate. In addition to the acquisition of several texts in statistical inference, modeling and prediction, I also found some excellent online resources to assist in the cause. One of those was DataCamp, an R-programming oriented teaching tool which turned out to be a savior during the work. Real learn-as-you-do material.
So without further ado – that is, barring failure to recall that less than one standard deviation from the mean does not a null hypothesis rejection make (which I might) – I give the site two big thumbs up.
MG signing off (because he still can’t simulate a random normal distribution with R, but he fakes it like a champ)