convert .html, .doc, .rtf etc to .sxw and .pdf through OpenOffice in command line
convert .html, .doc, .rtf etc to .sxw and .pdf through OpenOffice in command line
I have got a task to convert files form MS Word, RTF, HTML formats to SXW ( OpenOffice 1.0 Text Document ) and PDF formats in command line.
I have found a solution on forum http://www.oooforum.org/ and it was nontrivial to me. Needs to write StarBasic module and execute OpenOffice in invisible mode by DOS batch file or BASH script.
"soffice" -invisible "macro:///MyLibrary.MyModule.MyFunction(/fullpath/filename.ext)"
If you have a different type of transformation you can to change my or write your own function with a different filter. In code of its you have to use a internal name of choosen filter according to that list . Certainly, that filter should be available for Import or Export depends on what you are going to do to read or to save. If you will not define some filter then OpenOffice will read or save in a default format.
Steps:
1. Execute OpenOffice
$ soffice
2. Create a library.
Under OpenOffice 1.1.x in main menu Tools | Marcos | Macro... Then press the button "Organizer...". Choose the tab "Libraries". Click on the library name. And press the button "New..." Under OpenOffice 1.9.x in main menu Tools | Marcos | Organize Macros | OpenOffice.org... Then press the button "Organizer...". Choose the tab "Libraries". Press the button "New..." Name the library. In my case it is "LevShuvalov".
3. Create a module in just created library
You still in the "Macro Organizer" pop-up window. Choose the tab "Modules". Press the button "New Module..." for OpenOffice 1.1.x and "New..." for OpenOffice 1.9.x. Name the module. In my case it is "Convert". And press the button "Edit"
4. Download a recent versoin of the module. Or copy-paste the code below, but in this case be careful with line feeds.
REM ***** BASIC *****
'
' This module for command line conversion HTML, DOC, RTF etc to SXW and PDF through OpenOffice.
'
' Code is based on answers, examplas and functions by Danny Brewer (God bless him) on:
' http://www.oooforum.org/forum/viewtopic.phtml?
' t=3772&start=0&postdays=0&postorder=asc&highlight=object+variable+set+storetourl
' list of filters internal names from:
' http://framework.openoffice.org/files/documents/25/897/filter_description.html
'****************************************************************************
' Copyright (c) 2003 Danny Brewer, 2005 Lev Shuvalov
'
' Anyone may run this code.
' If you wish to modify or distribute this code, then
' you are granted a license to do so under the terms
' of the Gnu Lesser General Public License.
' See: http://www.gnu.org/licenses/lgpl.html
'****************************************************************************
'
' 2005-05-25 tested OpenOffice 1.1.4 and 1.9.104
' 2005-05-25 replace Left(cFile, Len(cFile) - 4) + ".ext" to cutExtension( cFile ) + ".ext"
' 2005-05-25 add function cutExtension. Filename could be without extension,
' also extension could be longer or shorter than 3 characters
' 2005-05-24 tested on OpenOffice 1.1.1 and 1.9.95
' 2005-05-24 initial
Sub HTMLtoSXW( cFile )
If Not FileExists ( cFile ) Then exit Sub
oDoc = StarDesktop.loadComponentFromURL( ConvertToURL( cFile ), "_blank", 0,_
Array( MakePropertyValue( "Hidden", true), MakePropertyValue( "FilterName", "HTML (StarWriter)")))
' Save the document using no filter. Thus saves a native Writer document.
oDoc.storeToURL( ConvertToURL( cutExtension( cFile ) + ".sxw" ) , Array() )
oDoc.close( True )
End Sub
' need for OpenOffice 1.1.1, under 1.9.x possible to use AsPDF
Sub HTMLtoPDF( cFile )
If Not FileExists ( cFile ) Then exit Sub
oDoc = StarDesktop.loadComponentFromURL( ConvertToURL( cFile ), "_blank", 0,_
Array( MakePropertyValue( "Hidden", true), MakePropertyValue( "FilterName", "HTML (StarWriter)")))
oDoc.storeToURL( ConvertToURL( cutExtension( cFile ) + ".pdf" ),_
Array(MakePropertyValue( "FilterName", "writer_pdf_Export" )) )
oDoc.close( True )
End Sub
Sub AsSXW( cFile )
If Not FileExists ( cFile ) Then exit Sub
oDoc = StarDesktop.loadComponentFromURL( ConvertToURL( cFile ) , "_blank", 0,_
Array( MakePropertyValue( "Hidden", true) ) )
' Save the document using no filter. Thus saves a native Writer document.
oDoc.storeToURL( ConvertToURL( cutExtension( cFile ) + ".sxw" ), Array() )
oDoc.close( True )
End Sub
Sub AsPDF(cFile)
If Not FileExists ( cFile ) Then exit Sub
Dim strFilterSubName as String
oDoc = StarDesktop.loadComponentFromURL( ConvertToURL( cFile ), "_blank", 0,_
array(MakePropertyValue("Hidden", true)) )
If not IsNull(oDoc) Then
strFilterSubName = ""
' select appropriate filter
If oDoc.SupportsService("com.sun.star.presentation.PresentationDocument") Then
strFilterSubName = "impress_pdf_Export"
ElseIf oDoc.SupportsService("com.sun.star.sheet.SpreadsheetDocument") Then
strFilterSubName = "calc_pdf_Export"
ElseIf oDoc.SupportsService("com.sun.star.text.WebDocument") Then
strFilterSubName = "writer_web_pdf_Export"
ElseIf oDoc.SupportsService("com.sun.star.text.GlobalDocument") Then
strFilterSubName = "writer_globaldocument_pdf_Export"
ElseIf oDoc.SupportsService("com.sun.star.text.TextDocument") Then
strFilterSubName = "writer_pdf_Export"
ElseIf oDoc.SupportsService("com.sun.star.drawing.DrawingDocument") Then
strFilterSubName = "draw_pdf_Export"
ElseIf oDoc.SupportsService("com.sun.star.formula.FormulaProperties") Then
strFilterSubName = "math_pdf_Export"
ElseIf oDoc.SupportsService("com.sun.star.chart.ChartDocument") Then
strFilterSubName = "chart_pdf_Export"
Else
'
EndIf
EndIf
If Len(strFilterSubName) > 0 Then
oDoc.storeToURL( ConvertToURL( cutExtension( cFile ) + ".pdf" ),_
Array(MakePropertyValue( "FilterName", strFilterSubName )) )
EndIf
oDoc.close(True)
End Sub
Function MakePropertyValue( Optional cName As String, Optional uValue )_
As com.sun.star.beans.PropertyValue
Dim oPropertyValue As New com.sun.star.beans.PropertyValue
If Not IsMissing( cName ) Then
oPropertyValue.Name = cName
EndIf
If Not IsMissing( uValue ) Then
oPropertyValue.Value = uValue
EndIf
MakePropertyValue() = oPropertyValue
End Function
' cut last piece of filename which beginning with "."
Function cutExtension ( cFile )
splitFilename = Split ( cFile, "." )
uboundArray = UBound( splitFilename )
If uboundArray > 0 Then '-------- filename has a part with dot
' assemble filename back without last part
For index=0 to uboundArray - 1
saveFilename = saveFilename + splitFilename(index) + "."
Next index
' cut last "."
cutExtension = Left( saveFilename, Len( saveFilename ) - 1 )
Else '--------------------------- filename has not a part with dot
cutExtension = cFile
Endif
End Function
5. Save and compile the module.
6. Download a recent versoin of DOS batch file or/and BASH script. Or copy-paste text below:
@echo off
rem
rem Run command line conversion HTML, DOC, RTF etc to SXW and PDF through OpenOffice.
rem
rem 2005-05-26 revision
set OfficeDir=C:Program FilesOpenOffice.org 1.9.104program
rem set OfficeDir=C:Program FilesOpenOffice.org1.1.1program
if "%1"=="" (
echo:
echo Cannot find what file[s] need to convert
echo:
echo usage: OOo_convert.bat full_path_to_exact_file_or_wildcard
echo example:
echo OOo_convert.bat C:tmpconverttest.doc
echo OOo_convert.bat C:tmpconvert*.htm
echo:
exit
)
echo Converting...
for %%c in ("%1") DO (
echo %%c
"%OfficeDir%soffice" -invisible "macro:///LevShuvalov.Convert.HTMLtoSXW(%%c)"
rem "%OfficeDir%soffice" -invisible "macro:///LevShuvalov.Convert.HTMLtoPDF(%%c)"
rem "%OfficeDir%soffice" -invisible "macro:///LevShuvalov.Convert.AsSXW(%%c)"
rem "%OfficeDir%soffice" -invisible "macro:///LevShuvalov.Convert.AsPDF(%%c)"
)
echo Done.
exit
or
#!/bin/bash
#
# Run command line conversion HTML, DOC, RTF etc to SXW and PDF through OpenOffice.
#
# 2005-05-27 revision
OfficeDir="/home/users/lev/oo14"
if [ "$#" -lt "1" ]; then
echo
echo "Cannot find what file(s) need to convert"
echo
echo "usage: OOo_convert.sh full_path_to_exact_file_or_wildcard"
echo "example:"
echo " OOo_convert.sh /tmp/convert/test.doc"
echo " OOo_convert.sh /tmp/convert/*htm"
echo
exit 1
fi
echo Converting...
for file in "$@"
do
echo "$file"
"$OfficeDir/soffice" -invisible "macro:///LevShuvalov.Convert.HTMLtoSXW($file)"
#"$OfficeDir/soffice" -invisible "macro:///LevShuvalov.Convert.HTMLtoPDF($file)"
#"$OfficeDir/soffice" -invisible "macro:///LevShuvalov.Convert.AsSXW($file)"
#"$OfficeDir/soffice" -invisible "macro:///LevShuvalov.Convert.AsPDF($file)"
done
echo Done.
exit 0
7. Under linux change permissions for OOo_convert.sh.
$chmod 750 OOo_convert.sh
8. Change the variable OfficeDir and choose function for conversion then you can run script. Do not forget to give a full path to a temporary directory / folder for conversion or to a exact file.
>OOo_convert.bat C:tmpconverttest.doc
or
$OOo_convert.sh /tmp/convert/test.doc
If you for some reason you remove line of code If Not FileExists ( cFile ) Then exit Sub and give an incorrect file name for conversion also if you give a relative path to a temporary directory or to a exact file. You will get error message under OpenOffice 1.1.x “Object variable not set” and under OpenOffice 1.9.x “URL seems to be unsupported one”
If a chosen filter is incorrect for current file format you also will get “URL seems to be unsupported one“.
Many thanks to Danny Brewer who actually made these functions, I just collect them.