Tag: Python

Bring Blogger images into WordPress, the hard way

You migrate from Blogger to self-hosted WordPress. Your posts move over just fine, but for some reason (or another) your images forget their bus pass. Those pornographic stupid cat, hastily-prepared food, and trying-to-make-people-think-you-are-wealthy-instead-of-deep-in-debt vacation photos still show on the new site as they are properly referenced in the posts, but they actually remain on Google’s servers. You (or your client) don’t like that.

Meanwhile, the two plugins you found to solve this problem, Archive Remote Images and Cache Images, haven’t been updated in years. You take your chances anyway because you are lazy (if it is a personal site), or consistently over-promise and under-deliver (due to the impossibility of getting real work done at coffee shops). Either way, you must now hope you made a full site and database backup beforehand. If you did, you’re solution is now staring you in the face.

The script I concocted (shown after the jump) will get you a folder full of those images – with clean and pretty naming conventions – that you can upload to your wp-content directory, along with a SQL script to update links in your WordPress posts. Said programmatic wizardry dirty hack is written in Python – debugged using version 3.5.2 Anaconda custom (x86_64) on macOS 10.12.3 to be precise – and does rely on some SQL prep work. If you do not know Python, SQL and how to navigate directories while a terminal prompt blinks back, you have two choices: Google it (after determining what the definition of “it” is), or inquire about retaining me to do your work for you.

I’ll make the decision whether to continue easy too; if you cannot execute the following block of code sans assistance you are officially deemed “without paddle” …

SELECT * FROM `wp_posts` WHERE `post_content` LIKE "%blogspot%"
INTO OUTFILE '/home/dump/blogspotposts.csv'

That look easy? Then proceed.

First, decide whether to run on your desktop (for future upload) or directly on server. Next, create a directory underneath where the script is located called /bspics. Lastly, make sure the directory the code is in is writable by all.

The code can be found here -> processblogspotimagelinks.py

Once you have changed the obvious stuff to suit your need, run it. Your /bspics directory will fill up with those images I promised – you can then place that entire directory underneath /wp-content – and you’ll also have a file called bsreplacescript.sql which you will run against your WordPress database to update image links in the associated posts.

Important [final] note: the coding was an iterative process, and some data analysis was done between steps in order to account for string possibilities encountered, generating clean file names, etc. It could be refactored, but wasn’t because 1) the end result works as intended and 2) removing those iterations would handicap attempts to modify it for a different data set.

MG signing off (to solve some not-so-commonplace problems)

Learn Python, Faster

The premise for picking up the language is a melting pot of tangential reasoning, including having a shiny new Macbook Pro to fiddle with (yes, old reliable is gone), and that Coursera was conveniently offering a beginner’s class in the language. We can feasibly toss in that R has physical memory limitations when it comes to dataset size – not that yours truly will ever hit that ceiling, but this IS the excuses part of the story after all – and that there are plenty of data analysis tools available for Python, without such constraints. Finally, the fish barely bite in the winter (plus I need new waders), and every time I think about hitting a bucket of balls it starts snowing. I reiterate – plenty of excuses.

pythonfordataanalysisInside of two weeks I was learning a bit of Python, but seeing as daylight savings time is still a dream it was easy to get ahead of the pre-defined pace of Coursera’s Programming for Everyone. So I started reading the related textbook Python for Informatics (PDF download) and doing the exercises within, figuring I could just fill in the blanks as the assignments came due. High quality, engaging material.

Passed chapter ten, where the related course actually ends, but when I attempted to use Python to model all protein-coded genes in the human body against cancer incidents replicate a historical trust transactions analysis I’d previous done in R, failing at the twelfthsixth … second if/elif loop, I realized I still knew jack squat about Python. Almost dumb.

Long story [cut] short, Codecademy wound up the perfect solution for getting the skills tuned towards par while impatience loomed mightily nearby. Knocked off their entire Python segment in just a few evenings and found it well worth the time. If a blitzkrieg pace can be considered time.


Even Redmond Is Liking Open Source…Finally!

I’ve knocked Microsoft more than a few times, but unlike others, it isn’t out of some kind of “hatred.” Its more being dumbfounded that they haven’t seen the writing on the wall – open yourselves up and the money will still come. Slow to move, and now stuggling to find themselves once again.

On that note, it is with great pleasure that I found Microsoft may finally be getting a clue. Their development tools have always been cake to use, and adding a UNIX-based open source scripting language to the mix ain’t going to hurt them.