Friday, June 29, 2007

Hackweek: The devil is in the details (or ODF generation woes)

Yesterday, I was poluting planets with my story of converting images included in WordPerfect files. There were still some missing links and thus it is nice to give you another screenshot. Christian Lippka, a supreme intelligence of OpenOffice.org Draw and Impress fame, helped me to examine the OpenDocument Drawing objects that I was creating. The problem was that they were not scaling to the frame they were supposed to sit in. And we found the problem. (OK, he found it!) The solution is to include in the drawing document this XML snippet:

<office:settings>
  <config:config-item-set config:name="ooo:view-settings">
    <config:config-item config:name="VisibleAreaTop" config:type="int">0</config:config-item>
    <config:config-item config:name="VisibleAreaLeft" config:type="int">0</config:config-item>
    <config:config-item config:name="VisibleAreaWidth" config:type="int">7941</config:config-item>
    <config:config-item config:name="VisibleAreaHeight" config:type="int">19124</config:config-item>
  </config:config-item-set>
</office:settings>

The two magic numbers are width and height of the image in 100ths of milimeter. BTW, I am interested to see what will happen when this code starts to be used for their WordPerfect importer also by KWord. But, here I will leave Ariya the pleasure to handle possible implementation differences.

And so, thanks to the light of Christian, this is how the document looks like now. Ok, I have to confess a little cheating. The dimension of the frame is for the while not read from the file by a parser. I was quite lazy to start to parse this information. It is for the time being hardcoded in the code, but should not be conceptually difficult to parse the information,... just boring to death.

I would like to mention also another person that gave me a useful tool that I was using during this week. Far from being anything close to XSL(T) fan, I found extremely useful Svante Schubert's transforms that allow to load and export files in OpenDocument flat XML format in and from OpenOffice.org. It is a load easier like this to make little experiments with document without having to run zip and unzip zillion of time.

So, what remains to be done? Naturally, to write the parser of the box information for different WP file-formats inside libwpd. This is something I am really finding very boring and ungrateful task. So, if you want to be my personal hero, send me a patch.

For those WordPerfect users that have a load of documents with images, the documentation says that one "Graphics Filename" prefix packet can point to several "Graphics Cached File Data" packets that contain the graphic information. Nevertheless, the documentation does not say how the data is split in this case. In several full-blown WPG streams or in one WPG stream that has chunks stored in different prefix packets? I was unable to create a document with several pointers in the "Graphics Filename" prefix packet, so if you have some of them, here I am to receive them. Or even better, send me a patch for their handling in libwpd.

Thursday, June 28, 2007

Hackweek @ Novell

It was said, it was repeated and repeated. This week, is a hackfest week at Novell. We are hacking (programming for those who do not like the word) on different projects/ideas that are close to our heart. Naturally, TrainedMonkey(tm) could not remain behind. And guess, what projects he chose to hack on... Naturally, libwpd, libwpg and their OpenOffice.org-related filters. It is really nice to be upstream maintainer of several things, like that, when given a week of free hacking, one is likely not to lack ideas what to do with the time.

The import of embedded pictures is one of the most overdue features in our WordPerfect(tm) converter. So, I jumped on the task when this opportunity came. And fortunately, not everything had to be coded from scratch. Ariya, in the frame of a Google Summer of Code project gave a decisive push to libwpg and he has kept working on improving it. We came to the point, where standalone WordPerfect Graphics files can be nicely converted to SVG or ODG including embedded bitmaps. So, what were the steps to do?

First of all, it was necessary to port libwpd/writerperfect to generate OpenDocument instead of the legacy OpenOffice 1.0 file-format. This was needed because of the nice way, one can incorporate binary objects in OpenDocument represented as flat XML. This was accomplished on Monday, and the conversion of all documents from libwpd regression suite produces a nice ODF stream that validates against the OpenDocument 1.1 strict schema.

Next step was actually to parse the embedded image data in WordPerfect documents. A nice discovery is that in old WP5.x documents, the images are stored without the WPG header. So, one had to hack into libwpd a possibility to force the document parsing even if it does not recognize it. This was done on Tuesday.

The same day in the evening, I started to hack on passing the data libwpd gives us to libwpg and processing its output and incorporating the images into the OpenDocument text stream. This work continued on Wednesday and resulted in finally seeing an image. The scaling and anchoring was not good, but at least one could see something else then lines of cheesy C++. But it was really hard to make the generated documents (although valid according the ODF schema) load nice and display the pictures well. I poked some people whose brain should be a bit better then a brain of TrainedMonkey(tm), but did not manage to get more intelligent. And since a soup is never eaten as hot as it is cooked, I left it there for the night.

A night brought some rest for the mind and I started my day by a nice chat with the guy who is expert in pictures embedded in OpenOffice.org Writer documents. I came to the conclusion, that the best would be to include those images directly as <draw:object> inside a <draw:frame>. The advantage is that this would keep the images as editable pictures, the same way the Corel customers can edit them inside the WordPerfect Office(tm) applications. This meant to throw a big chunk of Wednesday's code in favour of a more elegant solution. For those familiar with libwp* world, the OdgExporter class was copied to writerperfect too :-) (Now it is part of Novell's WPG import filter for OpenOffice.org, of wpg2odg tool, of perfectspot graphics viewer, and now of writerperfect). There is a way of improvement, since it is likely, that it should not be very difficult to join wpg2odg and writerperfect in one source package.

I spent some time trying to debug my code, since the images were not showing. Then, by chance, I discovered that the devil was in the right office:mimetype attribute of the embedded object :-(. So, now one can see the images inside a frame. The scaling is still not correct, and I am not sure at this moment, whether I will be able to scale the image as whole or will have to scale every single shape on import, but it results in something that is editable. And that was the goal of the rewrite.

Tomorrow, I will try to code again the parser of the box information, so that we extract from the file information of the box anchoring, size and position. This promisses to be quite a boring stuff, but necessary missing link.

BTW, libwpd and libwpg will not refuse any fine hacker that would like to contribute to enhance our conversion feature-set while still keepeing our stability and document import rate.