Menu

Michael Gracie

Traveling Sidekick

Yours truly likes traveling light. There’s method to the madness too: during college I dropped Thermodynamics 101 because I knew I couldn’t pass it, but in order to gain some credits that same semester I picked up Tax Accounting for Surreptitious Offshore Trusts Based in Tropical Locales. I wound up graduating with a degree in Aerospace Engineering passing the CPA exam in 15 minutes; followed that up with a masters in Bimini Ring Game Double Hauling While Hung Over, and the rest is history. I long to reconfigure baggage because I’m not certified to launch rockets.

Further, sometimes you gotta play it safe. Even if you build systems with brick shit-house parts, a client might claim something isn’t working because they live in Colorado hence they are stoned to bejesusthey are part of an email circulation list that includes Windows/Outlook users infected with nasty malware … the tuck pointing didn’t dry in time.

Enter stage left, the Acer Chromebook c670 …

acerchromebook

I’ve intrepidly sought out a reason to charge down the Boot-Ubuntu-Linux-Via-USB path, but have been unsuccessful. The screen can’t touch a Retina display and there is no way to test your latest Python-based machine learning algorithm with it, but you can’t stuff three grand worth of Macbook Pro down the back of your pants while the gate agent for an overbooked international flight is meticulously counting your “personal items” either.

In other words … the little puppy, which will set the average US citizen back a whopping $200, is one superior, high-value, remote work tool.

MG signing off (because the Chromebook is stark white while my attitude is decidedly dark – ultra-sharp contrast is the result)

UPDATE: And prices are dropping.

Learn Python, Faster

The premise for picking up the language is a melting pot of tangential reasoning, including having a shiny new Macbook Pro to fiddle with (yes, old reliable is gone), and that Coursera was conveniently offering a beginner’s class in the language. We can feasibly toss in that R has physical memory limitations when it comes to dataset size – not that yours truly will ever hit that ceiling, but this IS the excuses part of the story after all – and that there are plenty of data analysis tools available for Python, without such constraints. Finally, the fish barely bite in the winter (plus I need new waders), and every time I think about hitting a bucket of balls it starts snowing. I reiterate – plenty of excuses.

pythonfordataanalysisInside of two weeks I was learning a bit of Python, but seeing as daylight savings time is still a dream it was easy to get ahead of the pre-defined pace of Coursera’s Programming for Everyone. So I started reading the related textbook Python for Informatics (PDF download) and doing the exercises within, figuring I could just fill in the blanks as the assignments came due. High quality, engaging material.

Passed chapter ten, where the related course actually ends, but when I attempted to use Python to model all protein-coded genes in the human body against cancer incidents replicate a historical trust transactions analysis I’d previous done in R, failing at the twelfthsixth … second if/elif loop, I realized I still knew jack squat about Python. Almost dumb.

Long story [cut] short, Codecademy wound up the perfect solution for getting the skills tuned towards par while impatience loomed mightily nearby. Knocked off their entire Python segment in just a few evenings and found it well worth the time. If a blitzkrieg pace can be considered time.

(more…)

Coursera Data Science Specialization: A Student’s Review

datasciencelogoCoursera is a pure play online education provider distributing classes in a wide variety of subjects, from The Music of the Beatles to Analyzing Global Trends for Business and Society. Many courses are offered in the native languages of those who developed them, such as Peking University’s Methodologies in Social Research, while others have been translated for more widespread use – see Yale’s Financial Markets instructed by Prof. Robert Shiller (I’m a fan).

Part of the Massive Open Online Course (or “MOOC”) site’s push is linking courses developed by accredited higher-learning institutions into specializations, series of classes designed to develop skills in a particular field. There are seven specializations as of this writing, and the one I dove into is called Data Science.

Made up of nine segments created by Johns Hopkins University’s Bloomberg School of Public Health – taught by Professor Brian Caffo and Assistant Professors Roger Peng and Jeff Leek – it’s a half-year commitment for Energizer bunnies with a math/programming bent, and probably twelve to eighteen months if right this moment you’re distracted by your Instagram feed. Ok, maybe twenty-four to thirty-six.

Yours truly took the fast track, doubling and tripling up on classes at the outset, leaving the purportedly hard stuff for the windup to winter solstice. What follows is a summary of each class, including comparison to what was “sold”, tips for getting the most from them i.e. scoring well and then some, as well as supplementary materials discovered that turned out worthwhile. It’s the truth within from a dedicated student’s point of view, and a long road. So feel free to skip to the conclusions; just don’t make fun of my grades.

(more…)

Get another wax job

If you already concocted a lifetime supply of fly floatant and still have a truckload of paraffin wax left over, here’s another use for it: fire starters.

Cut up egg cartons (the paper kind, NOT the styrofoam ones) packed with dryer lint, then submerged in the melted (via the double boil method) wax until the bubbles quit bubblin’. Set to dry – they’ll be hard as rocks when they are – and then chop to suit depending on the prevalence of pyromania on your mind.

firestarters

Burn forever, kinda like tax legislation. Credit goes to Nate O’ Taylor, a.k.a. Doctor MacGyver, for the instructions.

MG signing off (to sit by a warm fire because someone stole my Gator Snuggie)

DataCamp is roast tenderloin for the brain

Unless you’ve been living under a rock, you’ve probably heard the term big data. Yes, there’s a lot of bits and bytes out there, created not only by sheer prevalence of tweets about microwaveable mac n’ cheese (and cats … lots of cats), but also via trivial technological advances in the areas of computer security, genome sequencing, and even what bar you’re drinking at geo-location. With all these haystacks came the desire for finding needles; as a result number crunching has undeniably experienced a renaissance.

I became curious about what it all meant. But seeing as I was a “B” student (on a good day) when it came to sample sizes and p-values – preferring debits, credits and present value calculations to wondering why anyone would make decisions based on a measurement that sounded like a hot drink (squared) – the whiskered was summarily sent to its demise.

Last summer I signed on for a nine-course regimen in this data science, offered by Johns Hopkins University in conjunction with Coursera, fully expecting it to be a pile of mumbo jumbo I could whiz through before year’s end. A comprehensive review of those courses is planned for publication here in early 2015, and I can state with 95% confidence that I’ll meet the hard deadline. However, I lied; I was really a “C” student in regression and the like (when I even went to class), so the series subject matter was, in several cases both proverbially and actually, more than I bargained for.

To weed through the mess, truly comprehend the fundamentals, and find some nuggets of practicality within required an investment significantly in excess of my original, cocksure estimate. In addition to the acquisition of several texts in statistical inference, modeling and prediction, I also found some excellent online resources to assist in the cause. One of those was DataCamp, an R-programming oriented teaching tool which turned out to be a savior during the work. Real learn-as-you-do material.

datacamp

So without further ado – that is, barring failure to recall that less than one standard deviation from the mean does not a null hypothesis rejection make (which I might) – I give the site two big thumbs up.

MG signing off (because he still can’t simulate a random normal distribution with R, but he fakes it like a champ)

Owned

The truck is highly dependable. That fly rod casts smooth as butter. Those boots can be worn all day. That 1911 is not off-the-shelf. Those wool sweaters are warm. All Craftsman hand tools.

Charles Hugh-Smith poses the question

Being freed from being owned is a form of liberation with many manifestations.

The frenzied acquisition of more stuff is supposed to be an unalloyed good: good for “growth,” good for the consumer who presumably benefits from more stuff and good for governments collecting taxes on the purchase of all the stuff.

But the frenzy to acquire more stuff raises a question: do we own our stuff, or does our stuff own us?

MG signing off (to look for a shop that sells time)

The surefire cure for clicking hard drives

When the hard drive clicking began, I started looking at replacement machines. But after the upgrades Old Reliable has endured, I figured why send it to the glue factory now?

Four and half hours to clone, with the fabulous Carbon Copy Cloner, and four and a half minutes to install.

solid state drive

MG signing off (because the clicking sound is gone, along with most all other laptop noise too)

Plugging mcrypt into PHP, on Mac OS X Yosemite 10.10

Mavericks rode in on Mountain Lions, but the park ranger at Yosemite wouldn’t let them in. “We need mcrypt” they said, “And we don’t want to recompile PHP!” The ranger handed them a map, and it led them here.

The following instructions are for those a) developing on OS X Yosemite 10.10.X, b) need the capabilities provided by mcrypt during their PHP development (like using PHPMyAdmin or Magento eCommerce), and c) do not want to recompile PHP from scratch or run MAMP. Mcrypt will get loaded dynamically with PHP by following these instructions.

Before you begin, grab the following bits …

1) libmcrypt-2.5.8, which you can find here; download libmcrypt (not mcrypt!), by clicking the link labeled “libmcrypt-2.5.8.tar.gz” on the SourceForge page;

2) PHP 5.5.14 source code, which is available here; NOTE – you may update OS X at some point and PHP may get updated along with it – for the author OS X 10.10 was being run, and PHP 5.5.14 is what’s included with that particular OS version; if necessary use phpinfo() to check your version of PHP and then download the PHP source for that version;

3) Xcode 6.1, which you can get from the App Store. You will also need the Command Line Tools (OS X 10.10) for XCode, which you get by selecting “Xcode/Open Developer Tool/More Developer Tools..” from the Xcode menu, then logging into your Apple Developer account (yes, you need one of those too);

and

4) Homebrew (http://mxcl.github.com/homebrew/) which can be installed by typing ruby -e "$(curl -fsSL https://raw.githubusercontent
.com/Homebrew/install/master/install)" at the command line.

Now for the down and dirty…

(more…)

Reproducibility is Rigor, not Popular

rigĀ·or
noun
the quality of being extremely thorough, exhaustive, or accurate.
“his analysis is lacking in rigor”

As Professor Jeff Leek of Johns Hopkins University points out

Both the scientific community and the popular press are freaking out about reproducibility right now. I think they have good reason to, because even the US Congress is now investigating the transparency of science. It has been driven by the very public reproducibility disasters in genomics and economics.

What is reproducibility? In the simplest sense, it is the inherent “ability” of any information created via data analysis to stand up to scrutiny. Not just perusal, cursory acknowledgment, wink and nod, but a detailed re-analysis, preferably in precisely the same manner that the original information was derived.

(more…)

Fin-angler

Seeking assistance with data.table functions via the R help files, and instead [somehow] being reminded that brown trouts will soon be hitting the dance floor …

screenshotreminder

First read as fin angler. Then the double-take.

MG signing off (knowing the trouts will be fast; but probably none too friendly … like R)