Menu

Michael Gracie

Bring Blogger images into WordPress, the hard way

You migrate from Blogger to self-hosted WordPress. Your posts move over just fine, but for some reason (or another) your images forget their bus pass. Those pornographic stupid cat, hastily-prepared food, and trying-to-make-people-think-you-are-wealthy-instead-of-deep-in-debt vacation photos still show on the new site as they are properly referenced in the posts, but they actually remain on Google’s servers. You (or your client) don’t like that.

Meanwhile, the two plugins you found to solve this problem, Archive Remote Images and Cache Images, haven’t been updated in years. You take your chances anyway because you are lazy (if it is a personal site), or consistently over-promise and under-deliver (due to the impossibility of getting real work done at coffee shops). Either way, you must now hope you made a full site and database backup beforehand. If you did, you’re solution is now staring you in the face.

The script I concocted (shown after the jump) will get you a folder full of those images – with clean and pretty naming conventions – that you can upload to your wp-content directory, along with a SQL script to update links in your WordPress posts. Said programmatic wizardry dirty hack is written in Python – debugged using version 3.5.2 Anaconda custom (x86_64) on macOS 10.12.3 to be precise – and does rely on some SQL prep work. If you do not know Python, SQL and how to navigate directories while a terminal prompt blinks back, you have two choices: Google it (after determining what the definition of “it” is), or inquire about retaining me to do your work for you.

I’ll make the decision whether to continue easy too; if you cannot execute the following block of code sans assistance you are officially deemed “without paddle” …

SELECT * FROM `wp_posts` WHERE `post_content` LIKE "%blogspot%"
INTO OUTFILE '/home/dump/blogspotposts.csv'
FIELDS TERMINATED BY '|'

That look easy? Then proceed.

First, decide whether to run on your desktop (for future upload) or directly on server. Next, create a directory underneath where the script is located called /bspics. Lastly, make sure the directory the code is in is writable by all.

The code can be found here -> processblogspotimagelinks.py

Once you have changed the obvious stuff to suit your need, run it. Your /bspics directory will fill up with those images I promised – you can then place that entire directory underneath /wp-content – and you’ll also have a file called bsreplacescript.sql which you will run against your WordPress database to update image links in the associated posts.

Important [final] note: the coding was an iterative process, and some data analysis was done between steps in order to account for string possibilities encountered, generating clean file names, etc. It could be refactored, but wasn’t because 1) the end result works as intended and 2) removing those iterations would handicap attempts to modify it for a different data set.

MG signing off (to solve some not-so-commonplace problems)