Continuing the theme of the last two posts, the old Posterous blog content
is now available as JSON inside CouchDB. I’m now going to combine a few
pieces that are unique to CouchDB to build up the components that will
become blog fodder for OctoPress.
Octopress, which is based on Jekyll, uses a mixture of YAML and
markdown for pages and posts. We’ll use a [show] function, which passes
a JSON document to a JavaScript transformation function, to build this up.
First, the posterous format includes a whole lot of stuff we won’t need.
For OctoPress we want title, display_date,tags, and body_full only:
Excerpt of typical post as a CouchDB JSON doc
1234567891011
{"_id":"21298063","_rev":"1-8ba11a44954e3171de4f4fa9d68c3210","is_owned_by_current_user":true,"slug":"setting-up-a-shared-photo-library-in-picasa3","tags":[],"title":"setting up a shared photo library in Picasa3 on MacOS","display_date":"2009/01/09 13:49:53 -0800","body_full":"setting up a shared photo library for several...",}
Assuming OctoPress can parse the date format, this will be easy. Let’s
map these to title, date, categories, and content to
build our YAML like this:
Typical OctoPress YAML header
12345678
---layout:posttitle:"settingupasharedphotolibraryinPicasa3onMacOS"date:2009/01/09 13:49:53 -0800comments:truecategories:[]---...body_full goes here ...
The show function is pretty straightforward:
CouchDB show to transform Posterous JSON into YAML
First I check that all the entities we require are present. This avoids
generating an expensive exception in the JavaScript engine if later
on I try to access data that isn’t actually present.
return an object comprising the body content, and the headers
the body is built up from JSON properties of the supplied doc object.
That seems like a good start, so wrap that up into a design document,
drop it into your CouchDB and test it out:
Notice how we needed to query using Content-Type: application/text?
Try that same link in your browser. You’re prompted for a download
that refers to the _id stored in CouchDB.
It would be nicer to get that with the correct markdown filename
already. Let’s use doc.slug for the name, prefixed with
the date of the original post. Octopress expects a yyyy-mm-dd format
so I’ve sprinkled liberally with regex pixie dust.
Finally, an additional HTTP header
Content-Disposition: attachment; filename=<file.ext> is required
to prvide the proposed name via our show function.
Now’s a good time to append the actual blog post content too.
CouchDB show to transform Posterous JSON to OctoPress
12345678910111213141516171819202122
function(doc,req){if(doc.slug&&doc.title&&doc.display_date&&doc.tags&&doc.body_full){// Replace / with - and trim display_date to yyyy-mm-dd- only// to match the octopress expected post format.// This will be passed as an HTTP header and will be used by// browsers or wget as the proposed filename.varpost_date=doc.display_date.replace(/\//g,'-').replace(/^([-0-9]+).+/,"$1");varpost_name='attachment; filename='+post_date+'-'+doc.slug+'.md';return{body:'---\nlayout: post\n'+'title: "'+doc.title+'"\n'+'date: '+post_date+'\n'+'comments: true\n'+'categories: '+doc.tags+'\n---\n'+doc.body_full+'\n',headers:{'Content-Type':'application/text','Content-Disposition':post_name}}}}
I’ve put this into a separate show function, and this is what comes back:
$ curl --silent --header "Content-Type: application/text"\ http://localhost:5984/posts/_design/posts/_show/octo/31797293
* About to connect() to localhost port 5984 (#0)* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 5984 (#0)> GET /posts/_design/posts/_show/octo/31797293 HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 \ OpenSSL/0.9.8r zlib/1.2.5
> Host: localhost:5984
> Accept: */*
> Content-Type: application/text
>
< HTTP/1.1 200 OK
< Vary: Accept
< Server: CouchDB/1.1.1 (Erlang OTP/R14B04)< Etag: "6PML44SHRNE54M212K0O6BLXZ"< Date: Thu, 22 Dec 2011 13:02:36 GMT
< Content-Type: application/text
< Content-Length: 1668
< Content-Disposition: attachment; filename=2010-10-28-ubuntu-saves-the-day.md
<
{[data not shown]* Connection #0 to host localhost left intact* Closing connection #0---
layout: post
title: "ubuntu saves the day"date: 2010-10-28
comments: truecategories:
---
My work laptop had a BSOD today, which looks like it was caused by bit rot ...
$ wget http://localhost:5984/posts/_design/posts/_show/octo/31797293 \ --content-disposition
Resolving localhost... 127.0.0.1, ::1, fe80::1
Connecting to localhost|127.0.0.1|:5984... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1668 (1.6K)[application/text]100%[======>] 1,668 --.-K/s in 0s
2011-12-22 14:04:46 (79.5 MB/s) - `2010-10-28-ubuntu-saves-the-day.md' saved [1668/1668]
Now we can transform arbitrary Posterous blog entries via CouchDB
into Markdown format. Next time, I’ll use CouchDB to pull all the
data out in one swoop.
Previously I used the Posterous API to retrieve all my blogs posts.
In this post, I’m going to show how easy it is to use CouchDB’s
_bulk_docs API to get lots of data in via JSON import. Later on,
I’ll transform it using CouchDB’s show, view and list functions.
CouchDB’s bulk loading API requires JSON documents to be
embedded in an array called docs within a parent JSON object
that contains optional parameters to indicate to CouchDB how to
handle the upload. Note that the _id value must be a string.
Posterous has a really neat feature of their API docs - you can use it directly
from the web page. Unfortunately, I only need it to migrate off to Octopress.
Log into Posterous and then open the API page. Use the first entry to
obtain your API token. Put this, your login and password below and you should
be able to obtain a list of your Posterous Sites, or Spaces as they’re now
called.
My work laptop had a BSOD today, which looks like it was caused by bit rot on the root partition. While everything’s backed up onto S3, restores from NZ of 20GiB of data take a while, so I was kinda hoping to recover smoothly without getting the IT guys to visit, who probably will just rebuild it. I’m pretty sure that’s fair payment for getting chocolate inside my laptop’s fan.
It was a good opportunity to give ubuntu maverick a spinup before it goes on the big iMac at home as dual-boot. The install is slowly tweaked each time, and it’s really clean. I am pretty sure my mum could do this without any help now, and the fresh look is nice - it’s truly a class act OS now. One workaround was needed to resolve what is probably a stuck trackpad on the loan laptop http://xpapad.wordpress.com/2009/09/09/dealing-with-mouse-and-touchpad-freeze… with a ‘rmmod psmouse’ and then it was all go. Everything works which really is an impressive step forward for Canonical, with strong OEM relations clearly now paying off. Hats off guys! Anyway long story short, nautilus and brasero to the rescue, and I now have a bunch of md5 checksummed DVDs stashed before the hired goons come tomorrow to blow it away. I love the ntfs integration in linux, and the new maverick Ubuntu gets thumbs up all round - especially as it’s now got CouchDB 1.0.1 included - yay!