convert .html, .doc, .rtf etc to .sxw and .pdf through OpenOffice in command line

convert .html, .doc, .rtf etc to .sxw and .pdf through OpenOffice in command line

I have got a task to convert files form MS Word, RTF, HTML formats to SXW ( OpenOffice 1.0 Text Document ) and PDF formats in command line.

I have found a solution on forum http://www.oooforum.org/ and it was nontrivial to me. Needs to write StarBasic module and execute OpenOffice in invisible mode by DOS batch file or BASH script.

"soffice" -invisible "macro:///MyLibrary.MyModule.MyFunction(/fullpath/filename.ext)"

If you have a different type of transformation you can to change my or write your own function with a different filter. In code of its you have to use a internal name of choosen filter according to that list . Certainly, that filter should be available for Import or Export depends on what you are going to do to read or to save. If you will not define some filter then OpenOffice will read or save in a default format.

Steps:

1. Execute OpenOffice

$ soffice

2. Create a library.

Under OpenOffice 1.1.x in main menu Tools | Marcos | Macro...
  Then press the button  "Organizer...".
  Choose the tab  "Libraries".
  Click on the library name.
  And press the button "New..."

Under OpenOffice 1.9.x in main menu Tools | Marcos | Organize Macros | OpenOffice.org...
  Then press the button  "Organizer...".
  Choose the tab  "Libraries".
  Press the button "New..." 

Name the library. In my case it is "LevShuvalov".

3. Create a module in just created library

You still in the "Macro Organizer" pop-up window.
Choose the tab "Modules".
Press the button "New Module..." for OpenOffice 1.1.x and "New..." for OpenOffice 1.9.x.

Name the module. In my case it is "Convert". And press the button "Edit"

4. Download a recent versoin of the module. Or copy-paste the code below, but in this case be careful with line feeds.

REM  *****  BASIC  *****
'
' This module for command line conversion HTML, DOC, RTF etc to SXW and PDF through OpenOffice.
'
' Code is based on answers, examplas and functions by Danny Brewer (God bless him) on:
' http://www.oooforum.org/forum/viewtopic.phtml?
' t=3772&start=0&postdays=0&postorder=asc&highlight=object+variable+set+storetourl
' list of filters internal names from:
' http://framework.openoffice.org/files/documents/25/897/filter_description.html

'****************************************************************************
' Copyright (c) 2003 Danny Brewer, 2005 Lev Shuvalov
'
' Anyone may run this code.
' If you wish to modify or distribute this code, then
' you are granted a license to do so under the terms
' of the Gnu Lesser General Public License.
' See:  http://www.gnu.org/licenses/lgpl.html
'****************************************************************************
'
' 2005-05-25 tested OpenOffice 1.1.4 and 1.9.104
' 2005-05-25 replace Left(cFile, Len(cFile) - 4) + ".ext" to   cutExtension( cFile ) + ".ext"
' 2005-05-25 add function cutExtension. Filename could be without extension,
'                     also extension could be longer or shorter than 3 characters
' 2005-05-24 tested on OpenOffice 1.1.1 and 1.9.95
' 2005-05-24 initial

Sub HTMLtoSXW( cFile )
  If Not FileExists ( cFile ) Then exit Sub
  oDoc = StarDesktop.loadComponentFromURL( ConvertToURL( cFile ), "_blank", 0,_
   Array( MakePropertyValue( "Hidden", true), MakePropertyValue( "FilterName", "HTML (StarWriter)")))
  ' Save the document using no filter.  Thus saves a native Writer document.
  oDoc.storeToURL( ConvertToURL( cutExtension( cFile ) + ".sxw" ) , Array() )
  oDoc.close( True )
End Sub

' need for OpenOffice 1.1.1, under 1.9.x possible to use AsPDF
Sub HTMLtoPDF( cFile )
  If Not FileExists ( cFile ) Then exit Sub
  oDoc = StarDesktop.loadComponentFromURL( ConvertToURL( cFile ), "_blank", 0,_
   Array( MakePropertyValue( "Hidden", true), MakePropertyValue( "FilterName", "HTML (StarWriter)")))
  oDoc.storeToURL( ConvertToURL( cutExtension( cFile ) + ".pdf" ),_
   Array(MakePropertyValue( "FilterName", "writer_pdf_Export" )) )
  oDoc.close( True )
End Sub

Sub AsSXW( cFile )
  If Not FileExists ( cFile ) Then exit Sub
  oDoc = StarDesktop.loadComponentFromURL( ConvertToURL( cFile ) , "_blank", 0,_
   Array( MakePropertyValue( "Hidden", true) ) )
  ' Save the document using no filter.  Thus saves a native Writer document.
  oDoc.storeToURL( ConvertToURL(  cutExtension( cFile ) + ".sxw" ), Array() )
  oDoc.close( True )
End Sub

Sub AsPDF(cFile)
  If Not FileExists ( cFile ) Then exit Sub

  Dim strFilterSubName as String
  oDoc = StarDesktop.loadComponentFromURL( ConvertToURL( cFile ), "_blank", 0,_
   array(MakePropertyValue("Hidden", true)) )
  If not IsNull(oDoc) Then
    strFilterSubName = ""
    ' select appropriate filter
    If oDoc.SupportsService("com.sun.star.presentation.PresentationDocument") Then
      strFilterSubName = "impress_pdf_Export"
    ElseIf oDoc.SupportsService("com.sun.star.sheet.SpreadsheetDocument") Then
      strFilterSubName = "calc_pdf_Export"
    ElseIf oDoc.SupportsService("com.sun.star.text.WebDocument") Then
      strFilterSubName = "writer_web_pdf_Export"
    ElseIf oDoc.SupportsService("com.sun.star.text.GlobalDocument") Then
      strFilterSubName = "writer_globaldocument_pdf_Export"
    ElseIf oDoc.SupportsService("com.sun.star.text.TextDocument") Then
      strFilterSubName = "writer_pdf_Export"
    ElseIf oDoc.SupportsService("com.sun.star.drawing.DrawingDocument") Then
      strFilterSubName = "draw_pdf_Export"
    ElseIf oDoc.SupportsService("com.sun.star.formula.FormulaProperties") Then
      strFilterSubName = "math_pdf_Export"
    ElseIf oDoc.SupportsService("com.sun.star.chart.ChartDocument") Then
      strFilterSubName = "chart_pdf_Export"
    Else
    '
    EndIf
  EndIf 

  If Len(strFilterSubName) > 0 Then
    oDoc.storeToURL( ConvertToURL( cutExtension( cFile ) + ".pdf" ),_
     Array(MakePropertyValue( "FilterName", strFilterSubName )) )
  EndIf
  oDoc.close(True)
End Sub

Function MakePropertyValue( Optional cName As String, Optional uValue )_
 As com.sun.star.beans.PropertyValue
  Dim oPropertyValue As New com.sun.star.beans.PropertyValue
  If Not IsMissing( cName ) Then
    oPropertyValue.Name = cName
  EndIf
  If Not IsMissing( uValue ) Then
    oPropertyValue.Value = uValue
  EndIf
  MakePropertyValue() = oPropertyValue
End Function

' cut last piece of filename which beginning with "."
Function cutExtension ( cFile )
  splitFilename = Split ( cFile, "." )
  uboundArray = UBound( splitFilename )
  If uboundArray > 0 Then '-------- filename has a part with dot
    ' assemble filename back without last part
    For index=0 to uboundArray - 1
      saveFilename = saveFilename + splitFilename(index) + "."
    Next index
    ' cut last "."
    cutExtension = Left( saveFilename, Len( saveFilename ) - 1 )
  Else '--------------------------- filename has not a part with dot
    cutExtension = cFile
  Endif
End Function

5. Save and compile the module.

6. Download a recent versoin of DOS batch file or/and BASH script. Or copy-paste text below:

@echo off
rem
rem  Run command line conversion HTML, DOC, RTF etc to SXW and PDF through OpenOffice.
rem
rem  2005-05-26  revision

set OfficeDir=C:Program FilesOpenOffice.org 1.9.104program
rem set OfficeDir=C:Program FilesOpenOffice.org1.1.1program

if "%1"=="" (
  echo:
  echo  Cannot find what file[s] need to convert
  echo:
  echo  usage:   OOo_convert.bat full_path_to_exact_file_or_wildcard
  echo  example:
  echo           OOo_convert.bat C:tmpconverttest.doc
  echo           OOo_convert.bat C:tmpconvert*.htm
  echo:
  exit
)

echo Converting...
  for %%c in ("%1") DO (
    echo %%c

    "%OfficeDir%soffice" -invisible "macro:///LevShuvalov.Convert.HTMLtoSXW(%%c)"
    rem "%OfficeDir%soffice" -invisible "macro:///LevShuvalov.Convert.HTMLtoPDF(%%c)"
    rem "%OfficeDir%soffice" -invisible "macro:///LevShuvalov.Convert.AsSXW(%%c)"
    rem "%OfficeDir%soffice" -invisible "macro:///LevShuvalov.Convert.AsPDF(%%c)"
  )
echo Done.
exit

or

#!/bin/bash
#
#  Run command line conversion HTML, DOC, RTF etc to SXW and PDF through OpenOffice.
#
#  2005-05-27  revision

OfficeDir="/home/users/lev/oo14"

if [ "$#" -lt "1" ]; then
  echo
  echo  "Cannot find what file(s) need to convert"
  echo
  echo  "usage:   OOo_convert.sh full_path_to_exact_file_or_wildcard"
  echo  "example:"
  echo  "         OOo_convert.sh /tmp/convert/test.doc"
  echo  "         OOo_convert.sh /tmp/convert/*htm"
  echo
  exit 1
fi

echo Converting...
  for file in "$@"
  do
    echo "$file"

    "$OfficeDir/soffice" -invisible "macro:///LevShuvalov.Convert.HTMLtoSXW($file)"
    #"$OfficeDir/soffice" -invisible "macro:///LevShuvalov.Convert.HTMLtoPDF($file)"
    #"$OfficeDir/soffice" -invisible "macro:///LevShuvalov.Convert.AsSXW($file)"
    #"$OfficeDir/soffice" -invisible "macro:///LevShuvalov.Convert.AsPDF($file)"
  done
echo Done.
exit 0

7. Under linux change permissions for OOo_convert.sh.

$chmod 750 OOo_convert.sh

8. Change the variable OfficeDir and choose function for conversion then you can run script. Do not forget to give a full path to a temporary directory / folder for conversion or to a exact file.

>OOo_convert.bat C:tmpconverttest.doc

or

$OOo_convert.sh /tmp/convert/test.doc

If you for some reason you remove line of code If Not FileExists ( cFile ) Then exit Sub and give an incorrect file name for conversion also if you give a relative path to a temporary directory or to a exact file. You will get error message under OpenOffice 1.1.x “Object variable not set” and under OpenOffice 1.9.x “URL seems to be unsupported one

If a chosen filter is incorrect for current file format you also will get “URL seems to be unsupported one“.

Many thanks to Danny Brewer who actually made these functions, I just collect them. :)

Leave a comment

Your comment