Tag: SQL

Bring Blogger images into WordPress, the hard way

You migrate from Blogger to self-hosted WordPress. Your posts move over just fine, but for some reason (or another) your images forget their bus pass. Those pornographic stupid cat, hastily-prepared food, and trying-to-make-people-think-you-are-wealthy-instead-of-deep-in-debt vacation photos still show on the new site as they are properly referenced in the posts, but they actually remain on Google’s servers. You (or your client) don’t like that.

Meanwhile, the two plugins you found to solve this problem, Archive Remote Images and Cache Images, haven’t been updated in years. You take your chances anyway because you are lazy (if it is a personal site), or consistently over-promise and under-deliver (due to the impossibility of getting real work done at coffee shops). Either way, you must now hope you made a full site and database backup beforehand. If you did, you’re solution is now staring you in the face.

The script I concocted (shown after the jump) will get you a folder full of those images – with clean and pretty naming conventions – that you can upload to your wp-content directory, along with a SQL script to update links in your WordPress posts. Said programmatic wizardry dirty hack is written in Python – debugged using version 3.5.2 Anaconda custom (x86_64) on macOS 10.12.3 to be precise – and does rely on some SQL prep work. If you do not know Python, SQL and how to navigate directories while a terminal prompt blinks back, you have two choices: Google it (after determining what the definition of “it” is), or inquire about retaining me to do your work for you.

I’ll make the decision whether to continue easy too; if you cannot execute the following block of code sans assistance you are officially deemed “without paddle” …

SELECT * FROM `wp_posts` WHERE `post_content` LIKE "%blogspot%"
INTO OUTFILE '/home/dump/blogspotposts.csv'

That look easy? Then proceed.

First, decide whether to run on your desktop (for future upload) or directly on server. Next, create a directory underneath where the script is located called /bspics. Lastly, make sure the directory the code is in is writable by all.

The code can be found here -> processblogspotimagelinks.py

Once you have changed the obvious stuff to suit your need, run it. Your /bspics directory will fill up with those images I promised – you can then place that entire directory underneath /wp-content – and you’ll also have a file called bsreplacescript.sql which you will run against your WordPress database to update image links in the associated posts.

Important [final] note: the coding was an iterative process, and some data analysis was done between steps in order to account for string possibilities encountered, generating clean file names, etc. It could be refactored, but wasn’t because 1) the end result works as intended and 2) removing those iterations would handicap attempts to modify it for a different data set.

MG signing off (to solve some not-so-commonplace problems)

Your site seems safe, but your database isn’t

Your website may be safe from all the cross-site and cross-frame scripting attacks that the kiddies throw at you, but that doesn’t mean you are in the clear. Real hackers (meaning those over the age of 16, with degrees in computer science and an empty fridge that needs money to fill) are trying to get at your database, and with rising frequency.

Now you can wait for your database provider to issue a seemingly never-ending series of patches (and get the same warm, fuzzy feeling as being a PC user), pull out the dusty manuals (dusty since your app has been doing all the queries for you and you forgot what an SQL command was) and fix it yourself, or shift data collection to another provider (which may be less costly at the margin line than spending the time on the first two options).

Your call.

Follow up on the Bit Torrent thing

So I am mulling this real time streaming data concept when I come across this article from Forbes: Data On The Fly. It talks about Michael Stonebraker of Ingres and Postgres fame, who started a new company called Streambase. This outfit has produced a derivative of SQL they call StreamSQL, and the claim is it can process data as it comes in the pipe, before it is written to disk. Great for big hedge funds and market makers.

Seems like this StreamSQL could prove a pretty interesting tool for a peer-to-peer database network. Finding out what data you really need to hold on to before you write would fit into the limited storage capacity scenario likely existing at each node in such a network.