I spent some time this weekend hacking together a python tool that will take an archive from ljdump and massage it into something the WordPress LiveJournal importer can digest.
The WP importer seems to be expecting XML in the format used by ljArchive. ljdump also stores each entry in a separate file, and comments for each entry in another file still. The WP importer is a web form which allows you to submit an XML file, and doing this once for each entry would be excruciating. Furthermore, there are some differences in the structure of the XML used by ljArchive and ljdump, so some restructuring was necessary.
My script, ‘convertdump.py’ is still a little rough around the edges, but it seems to work (with some caveats listed below). I’m going to hold off on a ‘less geeky’ release until I polish it up a bit, but for those of you who want to grab it now and start testing or playing with it, you can clone my repository here:
A few things to be aware of:
- There are two arguments required on the command-line:
- username of the archive to process
- a number of entries to limit the resultant xml to. WP tends to time out on large files, so for people who can’t adjust their PHP timeouts and file upload size parameters, they can create several smaller archives as opposed to one large one
- My script is currently messing up when processing the security information for posts. Therefore it will make public any private or friends only entries. I also think that even if I was processing this information correctly, the WP importer would ignore them anyway and still import them as public entries. I’m going to look further into this, but my plan in the short term is to add a command line argument which omits protected/private entries from the archive file.
- WP doesn’t do any of the special LJ tags (such as <lj user=…>), so I think I’ll modify the script to convert some of these special tags into normal hrefs. I probably won’t try to do anything special with lj-cut, however, because that could turn into a can of worms.. :)
Anyway, if this is useful to you, and you aren’t afraid of hacked together, barely tested software, give it a swing. I’m happy to hear about bugs, feature requests, or accept patches. :)
A few other things to know:
- I’m a git newbie, so I very well may have messed things up
- I’m (mostly) a python newbie, so this might not be the most elegant/neat/efficient python code ever produced
- I’m not planning on abandoning LJ any time soon, I’m just preparing a set of contingencies "just in case". My plan is to be read to replace sean-graham.com with little effort and incident in the case of LJ’s early demise.