Continuing the theme of the last two posts, the old Posterous blog content
is now available as JSON inside CouchDB. I’m now going to combine a few
pieces that are unique to CouchDB to build up the components that will
become blog fodder for OctoPress.
Octopress, which is based on Jekyll, uses a mixture of YAML and
markdown for pages and posts. We’ll use a show function, which passes
a JSON document to a JavaScript transformation function, to build this up.
First, the posterous format includes a whole lot of stuff we won’t need.
For OctoPress we want title, display_date,tags, and body_full only:
Excerpt of typical post as a CouchDB JSON doc
1234567891011
{"_id":"21298063","_rev":"1-8ba11a44954e3171de4f4fa9d68c3210","is_owned_by_current_user":true,"slug":"setting-up-a-shared-photo-library-in-picasa3","tags":[],"title":"setting up a shared photo library in Picasa3 on MacOS","display_date":"2009/01/09 13:49:53 -0800","body_full":"setting up a shared photo library for several...",}
Assuming OctoPress can parse the date format, this will be easy. Let’s
map these to title, date, categories, and content to
build our YAML like this:
Typical OctoPress YAML header
12345678
---layout:posttitle:"settingupasharedphotolibraryinPicasa3onMacOS"date:2009/01/09 13:49:53 -0800comments:truecategories:[]---...body_full goes here ...
The show function is pretty straightforward:
CouchDB show to transform Posterous JSON into YAML
First I check that all the entities we require are present. This avoids
generating an expensive exception in the JavaScript engine if later
on I try to access data that isn’t actually present.
return an object comprising the body content, and the headers
the body is built up from JSON properties of the supplied doc object.
That seems like a good start, so wrap that up into a design document,
drop it into your CouchDB and test it out:
Notice how we needed to query using Content-Type: application/text?
Try that same link in your browser. You’re prompted for a download
that refers to the _id stored in CouchDB.
It would be nicer to get that with the correct markdown filename
already. Let’s use doc.slug for the name, prefixed with
the date of the original post. Octopress expects a yyyy-mm-dd format
so I’ve sprinkled liberally with regex pixie dust.
Finally, an additional HTTP header
Content-Disposition: attachment; filename=<file.ext> is required
to provide the proposed name via our show function.
Now’s a good time to append the actual blog post content too.
CouchDB show to transform Posterous JSON to OctoPress
12345678910111213141516171819202122
function(doc,req){if(doc.slug&&doc.title&&doc.display_date&&doc.tags&&doc.body_full){// Replace / with - and trim display_date to yyyy-mm-dd- only// to match the octopress expected post format.// This will be passed as an HTTP header and will be used by// browsers or wget as the proposed filename.varpost_date=doc.display_date.replace(/\//g,'-').replace(/^([-0-9]+).+/,"$1");varpost_name='attachment; filename='+post_date+'-'+doc.slug+'.md';return{body:'---\nlayout: post\n'+'title: "'+doc.title+'"\n'+'date: '+post_date+'\n'+'comments: true\n'+'categories: '+doc.tags+'\n---\n'+doc.body_full+'\n',headers:{'Content-Type':'application/text','Content-Disposition':post_name}}}}
I’ve put this into a separate show function, and this is what comes back:
$ curl --silent --header "Content-Type: application/text"\ http://localhost:5984/posts/_design/posts/_show/octo/31797293
* About to connect() to localhost port 5984 (#0)* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 5984 (#0)> GET /posts/_design/posts/_show/octo/31797293 HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 \ OpenSSL/0.9.8r zlib/1.2.5
> Host: localhost:5984
> Accept: */*
> Content-Type: application/text
>
< HTTP/1.1 200 OK
< Vary: Accept
< Server: CouchDB/1.1.1 (Erlang OTP/R14B04)< Etag: "6PML44SHRNE54M212K0O6BLXZ"< Date: Thu, 22 Dec 2011 13:02:36 GMT
< Content-Type: application/text
< Content-Length: 1668
< Content-Disposition: attachment; filename=2010-10-28-ubuntu-saves-the-day.md
<
{[data not shown]* Connection #0 to host localhost left intact* Closing connection #0---
layout: post
title: "ubuntu saves the day"date: 2010-10-28
comments: truecategories:
---
My work laptop had a BSOD today, which looks like it was caused by bit rot ...
$ wget http://localhost:5984/posts/_design/posts/_show/octo/31797293 \ --content-disposition
Resolving localhost... 127.0.0.1, ::1, fe80::1
Connecting to localhost|127.0.0.1|:5984... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1668 (1.6K)[application/text]100%[======>] 1,668 --.-K/s in 0s
2011-12-22 14:04:46 (79.5 MB/s) - `2010-10-28-ubuntu-saves-the-day.md' saved [1668/1668]
Now we can transform arbitrary Posterous blog entries via CouchDB
into Markdown format. Next time, I’ll use CouchDB to pull all the
data out in one swoop.