Moving Stuff Finished :-)

Yesterday evening I followed the excellent instructions from the WordPress Codex on Migrating from Movable Type to WordPress. This proved to be extremely easy with all the posts, categories and comments coming through perfectly.

I had some files (photos, ppt etc) manually uploaded, so logged in to move those over. As well as migrating I moved the blog from /blog/ up to the root of the site. That all went swimmingly and I picked a nice shiny new theme called demar.

The only thing left was to sort out redirects for all the old links – so I don’t lose all the link love I’ve managed to build up over the years.

After reading through the various options I had a few issues. I’d tinkered a bit with the MT URLs in the past and had a lot of legacy stuff hanging around. I decided I’d just do it with apache’s .htaccess. Not a choice for everyone, but my regex skills aren’t too shabby, so I figured I’d start there.

The URLs fell into a few different patterns:

/blog/atom.xml, index.xml, etc – the feeds. For now these can all redirect to /feed/ so we start with

RewriteEngine On
RewriteRule ^blog/atom.xml$ /feed/ [L,R=301]
RewriteRule ^blog/index.rdf$ /feed/ [L,R=301]
RewriteRule ^blog/index.xml$ /feed/ [L,R=301]

Feeds dealt with I moved on to the root of the blog, adding

RewriteRule ^blog/$ / [L,R=301]
RewriteRule ^blog$ / [L,R=301]

and then onto the archives, where we start to get trickier. First we have categories which take the form /blog/archives/cat_somewhere_I_put_stuff.html. WordPress creates a different pattern by default – /categories/somewhere-i-put-stuff. Not too hard, first we pull out the words, then glue them back together again.

RewriteRule ^blog/archives/cat_([^_]*)_([^_]*)_([^_]*)_(.*)\.html$ /category/$1-$2-$3-$4 [L,R=301]
RewriteRule ^blog/archives/cat_([^_]*)_([^_]*)_(.*)\.html$ /category/$1-$2-$3 [L,R=301]
RewriteRule ^blog/archives/cat_([^_]*)_(.*)\.html$ /category/$1-$2 [L,R=301]
RewriteRule ^blog/archives/cat_(.*)\.html$ /category/$1 [L,R=301]

Each of these regexs pulls out categories of 4 words long, 3 words, 2 words and 1 word respectively. If you have categories with more words in then you’ll need to add longer versions of these, ordering them longest first.

Next the monthly archives, in MT /blog/archives/2004_04.html and in WP /2004/04/

RewriteRule ^blog/archives/([0-9]{4})_([0-9]{2})\.html$ /$1/$2/ [L,R=301]

easy enough.

Then we have the individual posts. These fall into two groups, name based files and numbered files. The name based files are all /blog/archives/categoryname/postname.html. I had thought these were going to be a pig, but I discovered completely by accident that if you simply use the postname part of that then WordPress figures out which post you meant and redirects to its nice new WP URL. Sweet.

RewriteRule ^blog/archives/[^/]*/(.*)\.html$ /$1 [L,R=301]

The exception to this turns out to be posts that have a hyphen in the name. MT strips the hyphen, leaving WP with a name that doesn’t match. I put a rule in specifically for the one post I have that was affected:

RewriteRule ^blog/archives/personal/beijing_sightse.html$ /2008/05/04/beijing-sight-seeing/ [L,R=301]

Which just leaves the numbered posts: /blog/archives/000217.html. The numbered entries proved to be tricky. While you can just append the number /1234 like so and WordPress will fid a post for you, the posts weren’t matching up. As many of these had been indexed by Google and linked by others I wanted to hook them up to the right posts.

Fortunately I had Movable Type still rendering my site as static HTML files, so with a quick bit of bash magic I pulled out the numbered posts and made rules to map them to the Movable Type post name based permalinks (which we already did rewrite rules for above):

find . -type f \
| grep "^\./[0-9]*\.html$" \
| xargs grep permalink \
| awk ‘{print $1 " " $17}’ \
| sed -e ‘s%^\./%RewriteRule ^blog/archives/%’ \
-e ‘s%\.html:%.html$%’ \
-e ‘s%href="http://www.dynamicorange.com%%’ \
-e ‘s%">Permalink</a><br%%’ \
> rules.txt

Each line of rules.txt ends up looking like this

RewriteRule ^blog/archives/000285.html$ /blog/archives/semantic-web/vocabs.html [L,R=301]

which results in a second redirect to just /vocabs and then a third as WP works out where to take you finally. Not great to be bouncing around so much, but much better than losing the link.

Good luck if you decide to make the same move.

Flickr Bashing

I spent some time a little while ago writing a bash interface to the Flickr api, using curl to handle the http interactions.

I was doing it because I have a backlog of around 11,000 photos that need sorting and uploading. To get them backed up I decided to simply upload them all marked as private and tagged as ‘toProcess’ so that I could then start to go through them online.

I’d written most of the raw bash scripts to do the job, but hadn’t put enough error handling in, or enough testing, to trust them with my live flickr account.

That was back at Christmas, so coming back to it fresh a thought occurred to me. At work we’ve been working with php a lot, and php works just dandy as a command line scripting language as well as a web language. Not only that, but php also has a great Flickr library already, it’s tested and in production use – phpFlickr by Dan Coulter.

So, with all my photos already organised in folders…

#!/usr/bin/php
<?
$path = ini_get(‘include_path’);
$myPath = dirname(__FILE__);
ini_set(‘include_path’, $path . $PATH_SEPERATOR . $myPath);
ini_set(‘include_path’, $path . $PATH_SEPERATOR . $myPath . DIRECTORY_SEPARATOR . ‘phpFlickr-2.2.0’);
require_once ‘phpFlickr.php’;
$f = new phpFlickr(‘your api key here‘, ‘your secret here‘);
$f-&gt;setToken(‘your token here‘);
$f-&gt;auth(‘write’);
$searchPath = $myPath.DIRECTORY_SEPARATOR.’later/’;
$uploadDir = opendir($searchPath);
$is_public = 0;
$is_friend = 0;
$is_family = 0;
while (false !== ($listing = readdir($uploadDir)))
{
$isHidden = ((preg_match(‘/^\./’, $listing)) &gt; 0) ? true : false;
if (is_dir($searchPath.DIRECTORY_SEPARATOR.$listing) && !$isHidden)
{
echo $listing."\n";
$year = (preg_match(‘/[0-9]{4}/’, $listing)) ? substr($listing, 0, 4) : null;
$setName = $listing;
$setDir = opendir($searchPath.DIRECTORY_SEPARATOR.$listing);
$tags = "$year \"$setName\" toprocess";
while (false !== ($setListing = readdir($setDir)))
{
$isJpeg = ((preg_match(‘/\.jpg$/i’, $setListing)) &gt; 0) ? true : false;
if ($isJpeg)
{
$photoFilename = $searchPath.DIRECTORY_SEPARATOR.$listing.DIRECTORY_SEPARATOR.$setListing;
$response = $f-&gt;async_upload($photoFilename, $setListing, ”, $tags, $is_public, $is_friend, $is_family);
echo $response."\t".$photoFilename."\n";
rename($photoFilename, $photoFilename.’.done’);
//sleep(1);
stream_set_blocking(STDIN, FALSE);
$stdin = fread(STDIN, 1);
if ($stdin != ”)
{
echo "Exiting from stdin keypress\n";
exit(0);
}
}
}
}
}
closedir($uploadDir);
?>

This script wanders through the folders uploading any jpegs it finds, tagging them with the name of the folder they came from as well as a year taken from the first four digits of the folder name. All so simple thanks to Dan Coulter’s work.

vocabs

Nadeem and I have been working on several ontologies (for RDF) over the past few months and are intending to publish all of them.

The first to get published is an initial cut of AIISO (pronounced ey-s-oh pronunciation key), the Academic Institution Internal Structure Ontology.

We put this together really quickly to cover an internal need to document the departmental, school and faculty structures of higher education institutions. As of writing we know of two issues with it…

We named two of the predicates knowledgeGrouping and organisationalUnit after the things they link to. An OrganisationalUnit is any kind of department, faculty etc and a KnowledgeGrouping are collections of knowledge that get taught, things like modules and courses. The problem with the organisationalUnit and knowledgeGrouping properties is two fold, firstly they don’t actually describe the relationship and secondly they don’t give any indication of direction. So, if we say:

<http://broadminster.org> <aiiso:organisationalUnit> <http://broadminster.org/faculty-of-science>

it’s clear to us as people that the faculty of science is a part of Broadminster, but it’s not obvious from the ontology that’s what’s going on. We plan to change that to either use our own partOf property or possibly re-use dcterms:partOf within aiiso.

The other thing we failed to do was to make the appropriate links with FOAF. Aiiso’s OrganisalUnit is a specialisation of foaf:Group, so that needs to go into the ontology.

The other thing that’s come up in conversation that we’re fairly sure we’ve got right (though others disagree) is that the descriptions are somewhat self-referential. A faculty, for example, is described as follows:

A Faculty is an OrganisationalUnit that represents a group of people recognised by an OrganisationalUnit as forming a cohesive group referred to by the organisation as a faculty.

this defines the semantics of being a faculty as nothing more than ‘because that’s what you say you are’. This caused some debate internally about whether or not the semantics of being a faculty were consistent across institutions – does the term faculty have the same meaning at MIT as it does at Harvard or Virginia?

Perhaps faculty is reasonably consistent, but college and school certain vary substantially and are re-used in areas outside of higher education. My secondary school is certainly not the same kind of thing as the Winchester School of Art, a part of the University of Southampton.

I think the way to solve this is for aiiso to become clearer about its scope, its intention to describe higher education institutions and not other parts of the academic sector. That leaves others free to define ontologies for kindergarten, high school and the rest.

Photo: MIT’s Stata Center night shot by paul+photos=moody

Decidedly obvious

designing-the-obvious-cover.jpg

Just finished reading Designing the Obvious by Robert Hoekman Jr. It turns out to be quite a simple book, an easy read, bringing together Hoekman’s own ideas and thoughts with plenty of anecdotal lessons.

Hoekman references all the usual usability heavyweights, Nielsen, Cooper, Krug and so on sometimes agreeing and sometimes offering alternatives views. Personas? Well, maybe, says Hoekman going on to explain that great software comes not from a deep understanding of the users, but from a deep understanding of the activities.

Under what Hoekman calls Interface Surgery he covers the redesign of a form to make it obvious, introducing well thought out approaches to form validation and form layout. The things he suggest are, inline with the books title, obvious – but given that every form you ever use on the web gets this stuff wrong it can hardly be considered common sense.

Usefully I stumbled across bits and pieces that are of immediate use. We’re designing some stuff at work right now that requires interaction with a tree structure. We’ve been struggling a bit with this, with me advocating a fairly standard tree control yet all of us feeling that it doesn’t quite work.

Hoekman provides a great explanation of why tree controls don’t work. He rightly points out that while they are present in Windows Explorer they are not the default, and as most users don’t change the defaults most users will not have come across the tree view regularly. He also decomposes the interactions used in a typical tree view and shows how complex they are. In short, don’t use tree views – ever.

Hoekman doesn’t leave us high and dry with navigation of these structures though. He describes the as Columns view provided by OS X’s Finder and contrasts the interactions of that view with those of the tree view. I’m sold on his explanations and will be looking to try that with the application we’re working on.

Overall, good book, easy read, nicely packed with useful ideas.