- pdf2odt 0.7.0
- Навигация
- Ссылки проекта
- Статистика
- Метаданные
- Сопровождающие
- Классификаторы
- Описание проекта
- What is pdf2odt
- Links
- Installation and use in Linux
- Installation and use in Windows
- Dependencies
- Changelog
- gutschke/pdf2odt
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- How to convert pdf file to an odt file?
- 5 Answers 5
- Thread: How to convert PDF to RTF/DOC/ODT format
- How to convert PDF to RTF/DOC/ODT format
- Re: How to convert PDF to RTF/DOC/ODT format
- Re: How to convert PDF to RTF/DOC/ODT format
- Re: How to convert PDF to RTF/DOC/ODT format
- Re: How to convert PDF to RTF/DOC/ODT format
- Re: How to convert PDF to RTF/DOC/ODT format
- Re: How to convert PDF to RTF/DOC/ODT format
- Re: How to convert PDF to RTF/DOC/ODT format
- Re: How to convert PDF to RTF/DOC/ODT format
- How can I convert an ODT file to a PDF?
- 7 Answers 7
- Example commands
pdf2odt 0.7.0
pip install pdf2odt Скопировать инструкции PIP
Выпущен: 24 мар. 2021 г.
Change files and directories permisions and owner recursivily from current directory
Навигация
Ссылки проекта
Статистика
Метаданные
Лицензия: GNU General Public License v3 (GPLv3) (GPL-3)
Метки change, permissions, ownner, files, directories
Сопровождающие
Классификаторы
- Development Status
- 4 — Beta
- Intended Audience
- System Administrators
- License
- OSI Approved :: GNU General Public License v3 (GPLv3)
- Programming Language
- Python :: 3
- Topic
- System :: Systems Administration
Описание проекта
What is pdf2odt
It’s a script to convert pdf to LibreOffice Writer document. Pdf pages are converted as images. It uses pdftoppm from poppler to make conversion
Links
Installation and use in Linux
To install in other distributions, you must have poppler installed to use pdftoppm command. You can use your distribution package manager
pip install pdf2odt
Once installed you can use it typing:
pdf2odt –pdf doc.pdf doc.odt
If you want OCR, you have to install tesseract application then you have to run
pdf2odt –pdf doc.pdf –tesseract doc.odt
Installation and use in Windows
You need python installed. It works with the latest version. Don’t forget to add python executables to PATH, marking it in the installation process.
pip install pdf2odt
Now you have to download poppler for windows from https://blog.alivate.com.au/poppler-windows/. Uncompress the downloaded file and add its installation directory to Windows environment path. Here you have how to do it https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/
Now you can use it typing in windows shell:
pdf2odt –pdf doc.pdf doc.odt
If you want OCR, ou have to download tesseract for windows fromm https://github.com/UB-Mannheim/tesseract/wiki. Then you have to add its installation directory to Windows environment path too.
pdf2odt –pdf doc.pdf –tesseract doc.odt
Dependencies
https://www.python.org/, as the main programming language.
https://poppler.freedesktop.org/, to convert pdf to images using pdftoppm.
Changelog
Fixed bug with tesseract parameter position. Thanks @maxlem-neuralium
Now temporal files are generated with tempfile module.
Tesseract language is now showed in output
Now pdf2odt validates PDF document
Now pdf2odt detects if tesseract language selected is supported.
Added OCR support with tesseract
Now uses process concurrency and shows a progress bar
Fixed problem with white spaces paths in windows.
Improved metadata information.
Now works on Windows with popper for windows installation
gutschke/pdf2odt
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
This script can convert PDF, PNG and JPG files to either ODT or ODS Open Document files.
In order to perform the conversion, all input files are first converted to image files and then included as background images for the ODT and ODS Open Document files. The Open Document files can then be edited by writing on top of the image(s).
ODT files have one page style per included background image. When editing the file, simply write text on top of the background image. But be careful not to insert any page breaks without first adjusting page styles for the page break.
ODS files have one sheet per background image. Each sheet is set up to have lots of tiny cells. In order to edit the document, you should select rectangular regions of these small cells and then merge them into bigger cells. Once all editable cells have been set up, the remainder of the tiny cells should be protected. Afterwards, the document behaves very much like a «normal» spreadsheet.
On the command line, specify any number of input files followed by the output file. The output file format is determined by the file name extension of the output file, or alternatively by the name of the script or by an optional command line argument.
All output files are generated in the system’s default paper size unless overridden on the command line.
For more details on how to use the script, run without any arguments and it’ll print usage information.
awk bash coreutils dc file ghostscript imagemagick sed zip
How to convert pdf file to an odt file?
I want to convert a .pdf file to an .odt file so that I can further convert it to a .doc file. Is there any software/script that can do this. I have tried to copy the content of the .pdf file and pasted it in liberoffice writer the formatting isn’t preserved.
The document is confidential so I’d prefer not to use any on-line service for the conversion.
Any help is highly appreciated.
5 Answers 5
You could take a look at PDF Utilities (poppler-utils via Synaptic or apt-get) which includes pdftotext:
Poppler is a PDF rendering library based on Xpdf PDF viewer.
This package contains command line utilities (based on Poppler) for getting information of PDF documents, convert them to other formats, or manipulate them:
* pdfdetach — lists or extracts embedded files (attachments)
* pdffonts — font analyzer
* pdfimages — image extractor
* pdfinfo — document information
* pdfseparate — page extraction tool
* pdftocairo — PDF to PNG/JPEG/PDF/PS/EPS/SVG converter using Cairo
* pdftohtml — PDF to HTML converter
* pdftoppm — PDF to PPM/PNG/JPEG image converter
* pdftops — PDF to PostScript (PS) converter
* pdftotext — text extraction
* pdfunite — document merging tool
Of course, success will depend on how the pdf file was generated. If you get what you want as a text file, you could then save that as an .odt file.
Edit: I forgot to provide the source for the quote. It’s from the description tab in Synaptic for PDF Utilities (based on Poppler).
Thread: How to convert PDF to RTF/DOC/ODT format
Thread Tools
Display
How to convert PDF to RTF/DOC/ODT format
Is there a program available for linux that can convert PDF to RTF/DOC/ODT format?
Re: How to convert PDF to RTF/DOC/ODT format
Since PDF is a final presentation format, the conversion to PDF is generally one-way (i.e. lossy).
You can, if you like, simply open the PDF in Evince, highlight all the text, and then copy and paste it into OpenOffice writer. Edit it, and save in whatever format you like.
One neat little feature in Evince is that OCR is built-in, so you can even copy and paste text from PDFs that consist of images.
Re: How to convert PDF to RTF/DOC/ODT format
Since PDF is a final presentation format, the conversion to PDF is generally one-way (i.e. lossy).
You can, if you like, simply open the PDF in Evince, highlight all the text, and then copy and paste it into OpenOffice writer. Edit it, and save in whatever format you like.
One neat little feature in Evince is that OCR is built-in, so you can even copy and paste text from PDFs that consist of images.
Is there any tool/scipt available that can automate the process and also convert the images. When I google pdf 2 rtf/doc I find a lot of tools but they are for windows.
I looked at xpdf which is good in converting the text but pdf2images[pdfimages] option does not work.
Re: How to convert PDF to RTF/DOC/ODT format
The pdf’s that I am trying to convert are also available in postscript. Is there a way I cna convert postscript files into RTF/DOC/ODT with the images intact.
Re: How to convert PDF to RTF/DOC/ODT format
Gimp, Abiword and Scribus can import PDF and save in a different format.
Re: How to convert PDF to RTF/DOC/ODT format
If you want to convert a batch of such files, I can’t think of anything but the ps2ascii command (from the ghostscript package), or pstotext. Or will lose formatting though.
What exactly is your purpose?
Re: How to convert PDF to RTF/DOC/ODT format
If you want to convert a batch of such files, I can’t think of anything but the ps2ascii command (from the ghostscript package), or pstotext. Or will lose formatting though.
What exactly is your purpose?
I am trying to convert a few pdf files which also include images [images are important in this case].
I do not care about the if they are formatted or not.
Re: How to convert PDF to RTF/DOC/ODT format
Gimp — couldn’t convert images
Scribus — couldn’t open the pdf file
Re: How to convert PDF to RTF/DOC/ODT format
A bit cagey about your purposes? Never mind; perhaps it’s personal. Don’t blame me if the suggestions aren’t helpful, though.
Both postscript and PDF? Sounds almost like LaTeX output.
Since these files are intended to be final documents, you’re going to have to expect to do some work if you want to convert them back into something similar to the working documents they were created from.
Let’s say you have a file called Report.pdf. Do this:
That will open two versions of the document. Pick the one that you think is best. Use OpenOffice Writer’s Insert menu to put the images in. Tidy up the document. Save as .odt.
Note that the ps* commands work on both actual PostScript and PDF. The pdfimages command is available from either the poppler-utils or xpdf-utils package.
I’m quite surprised that GIMP didn’t work because of images. GIMP is an image-editing program! It works great at importing PDFs; the only problem is that it rasterises everything.
How can I convert an ODT file to a PDF?
Does anyone know how to convert an ODT file (LibreOffice) to PDF ?
7 Answers 7
You can also use the command-line of libreoffice for your purpose. That gives you the advantage of batch conversion. But single files are also possible. This example converts all ODT files in the current directory to PDF:
Get more information on command-line options with:
Just open the document with libre office and choose Export as PDF. :
For a command line solution there is unoconv that converts files from the command line:
Note: unoconv depends on Libre Office.
Here are a few more details about the «non-GUI» method.
You can use this method not only to convert ODT files to PDF. It will also work for MS Word DOCX files (it will work as well as LibreOffice is able to handle the particular ODT), and, in general all file types which LibreOffice can open.
I do not think that there is a binary named libreoffice as one of the other answers suggested. However, there is soffice(.bin) — the binary that can be used to start LibreOffice from the command line. It is usually located in /usr/lib/libreoffice/program/ ; and very often, a symlink /usr/bin/soffice points to that location.
Then, in most cases the parameters —headless —convert-to pdf are not sufficient. It needs to be:
Be sure to follow exactly this capitalization!
Next, the command will not work if there is already a LibreOffice GUI instance up and running on your system. It is caused by bug #37531, known since 2011. Add this additional parameter to your command:
This will create a new, separate environment which can be used by a second, headless LO instance without interfering with a possibly running first GUI LO instance started by the same user.
Also, make sure that the —outdir /pdf you specify does exist, and that you have write permission to it. Or, rather use a different output dir. Even if it is just for a first testing and debugging round:
This works for me on Mac OS X Yosemite 10.10.5 with LibreOffice v5.1.2.2 (using my specific path for the binary soffice which will be different on Ubuntu anyway. ). It also works on Debian Jessie 8.0 (using path /usr/lib/libreoffice/program/soffice ). Sorry, cannot test it on Ubuntu right now.
If all this doesn’t work, when you try to process DOCX:
It may be a problem with the specific DOCX file you try the command with. So create a very simple DOCX document of your own first. Use LibreOffice itself for this. Write «Hello World!» on an otherwise empty page. Save it as DOCX.
Try again. Does it work with the simple DOCX?
If it again doesn’t work, repeat step 7, but save as ODT this time.
Repeat step 8, but make sure to reference the ODT this time.
Last: Use full path to soffice , to soffice.bin and to libreoffice and run each with the -h parameter:
- Do you get an output here?
- For which one of the three binaries/symlinks?
- Record the outputs.
- Tell us your outputs.
Compare them to the command line you used:
- Are there any changes in parameter names, capitalizations, number of dashes used, etc.
For comparison, my own (Mac OS X) output is here:
Add one more argument to your command line to enforce the application of an input filter when soffice opens your DOCX file:
Nautilus Script
This script utilizes libreoffice to convert files compatible with LibreOffice to PDF.
For installation instructions see here: How can I install a Nautilus script?
I’m adding a new answer, because in recent times a series of new conversion paths were opened by Pandoc gaining the capability to read ODT files.
When Pandoc reads in a file format, it converts it into an internal format, «native» (which is a form of JSON).
From its native form, it can then export the document into a whole range of other formats. Not only PDF, but also DocBook, HTML, EPUB, DOCX, ASCIIdoc, DokuWiki, MediaWiki and what-not.
Since here the wanted output format is PDF, we have another choice of different paths, provided by what Pandoc is calling a pdf-engine. Here is the list of currently available PDF engines (valid for Pandoc v2.7.2 and later — previous versions may support only a smaller list):
pdflatex: This requires LaTeX to be installed in addition to Pandoc.
xelatex: This requires XeLaTeX to be installed in addition to Pandoc (also available as an additional package to general TeX distributions).
context: This requires ConTeXt to be installed in addition to Pandoc; ConTeXt is available as an additional package to most general TeX distributions).
lualatex: This requires LuaTeX to be installed in addition to Pandoc (also available as an additional package to general TeX distributions).
pdfroff: This requires GNU Roff to be installed in addition to Pandoc.
wkhtml2pdf: This requires wkhtmltopdf to be installed in addition to Pandoc.
prince: This requires PrinceXML to be installed in addition to Pandoc.
weasyprint: This requires weasyprint to be installed in addition to Pandoc.
There are some more and newer PDF engines now integrated into Pandoc, which I have not yet used myself and which I currently cannot describe in more detail: tectonic and latexmk.
WARNING: Do not expect that the appearance of your original document will be identical in all the PDF outputs to the print preview or PDF export of the ODT! Pandoc, when converting does not preserve layouts, it preserves the contents and the structure of documents: paragraphs remain paragraphs, emphasized words remain emphasized, headings remain headings, etc. But the overall look can change considerably.
Example commands
pdflatex:
XeLaTeX:
LuaLaTeX:
ConTeXt:
GNU troff:
wkhtmltopdf:
PrinceXML:
weasyprint:
Above commands are the most basic for the conversion. Depending on the PDF engine you pick, there may be many other options possible to control the appearance of the output PDF file. For example, the following additional parameters may be added to all those paths routing through LaTeX:
which will use a custom page size (a bit larger than DIN A4) with margins of 2cm on the top edge and 1.12cm at the other three edges).
Note: I decided to delete my answer from this question and to post a modified version of it here when I realised that unoconv doesn’t deal with psw files at all well, and doesn’t convert them successfully to other formats. There may also be problems with docx and xlsx formats.
However, Libreoffice fully supports many file types; full documentation is available at the official site, which details the valid input and output formats.
You could use the command-line libreoffice convert utility or unoconv, which is available in the repositories. I find unoconv to be very useful, and it is probably what you want. Even though Takkat has briefly mentioned unoconv , I thought it would be useful to give some more details and a batch conversion one-liner.
Using the terminal you could cd to the directory containing your files and then batch convert all of them by running a one-liner like this:
(This one-liner is a modification of my translate script featured in this answer.)
If you later want to use any other file formats, just substitute the odt and pdf for any other supported input and output formats. You can find the supported formats for a file type by entering unoconv -f odt —show . To convert a single file use, for example, unoconv -f pdf myfile.odt .
Further information on and options for the program can be found by entering in terminal man unoconv or by going to the Ubuntu manpages online.