PDF4Cat.Converter class

Class bases

class PDF4Cat.converter.Converter(*args, **kwargs)[source]

Bases: Img2Pdf, Pdf2Img, OCR, any_doc_convert, soffice_convert

Parent class of PDF4Cat.converter submodule

class PDF4Cat.converter.any_doc_convert(*args, **kwargs)[source]

Bases: PDF4Cat

Subclass of PDF4Cat parent class

Parameters

doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:

Raises

TypeError – If you use doc_file with input_doc_list (you can use only one)

convert2pdf(output_pdf, use_soffice=False)[source]

Pdf to any (using PyMuPDF or Libre Office)

Parameters

output_pdf (None, optional) – Output pdf file
use_soffice (bool, optional) – Use Libre Office converter

docx2html(output_doc, style_map=None)[source]

docx to html (using PyMuPDF)

Parameters: output_html (None, optional) – Output html file

docx2pdf(output_pdf)[source]

docx to pdf (using PyMuPDF [docx=>html=>pdf])

Parameters: output_pdf (None, optional) – Output pdf file

gen_images4conv(pdf) → bytes[source]

Generator, generate BytesIO object

Parameters: pdf (None, optional) – pdf object (PDF4Cat.open)
Yields: bytes – BytesIO

pdf2docx(output_docx)[source]

Pdf to docx (using PyMuPDF)

Parameters: output_docx (None, optional) – Output docx file

pdf2pptx(output_pptx, A4=True)[source]

Pdf to pptx (using PyMuPDF)

Parameters

output_pptx (None, optional) – Output pptx file
A4 (bool, optional) – Use Inches for A4 page

class PDF4Cat.converter.Img2Pdf(*args, **kwargs)[source]

Bases: PDF4Cat

Subclass of PDF4Cat parent class

Parameters

doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:

Raises

TypeError – If you use doc_file with input_doc_list (you can use only one)

gen_imagesi2p(fimages: str = '{name}_{num}.pdf', start_from: int = 0) → tuple[source]

Generator, generate name with BytesIO object

Parameters

fimages (str, optional) – Format image filenames
start_from (int, optional) – Enumerate from n

Yields

tuple – filename, BytesIO

img2pdf(output_pdf=None) → None[source]

Image to pdf

Parameters: output_pdf (None, optional) – Output pdf file

imgs2pdf(output_pdf=None) → None[source]

Multiple images to pdf

Parameters: output_pdf (None, optional) – Output pdf file

imgs2pdfs_zip(out_zip_file: str, fimages: str = '{name}_{num}.pdf', start_from: int = 0) → None[source]

Multiple images to multiple pdfs and compress to zip (using gen_imagesi2p generator)

Parameters

out_zip_file (str) – Output zip file
fimages (str, optional) – Format image filenames
start_from (int, optional) – Enumerate from n

class PDF4Cat.converter.Pdf2Img(*args, **kwargs)[source]

Bases: PDF4Cat

Subclass of PDF4Cat parent class

Parameters

doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:

Raises

TypeError – If you use doc_file with input_doc_list (you can use only one)

gen_imagesp2i(pages: list = [], fimages: str = '{name}_{num}.png', start_from: int = 0, zoom: float = 1.5) → tuple[source]

Generator, generate name with BytesIO object

Parameters

pages (list, optional) – List of pages to select like [1, 3, 5, 15]
fimages (str, optional) – Format image filenames
start_from (int, optional) – Enumerate from n
zoom (float, optional) – Zoom image (look fitz.Matrix docs)

Yields

tuple – filename, BytesIO

pdf2imgs_zip(out_zip_file: str, pages: list = [], fimages: str = '{name}_{num}.png', start_from: int = 0, zoom: float = 1.5) → None[source]

Multiple pdfs to multiple images and compress to zip (using gen_imagesp2i generator)

Parameters

out_zip_file (str) – Output zip file
pages (list, optional) – List of pages to select like [1, 3, 5, 15]
fimages (str, optional) – Format image filenames
start_from (int, optional) – Enumerate from n
zoom (float, optional) – Zoom image (look fitz.Matrix docs)

class PDF4Cat.converter.OCR(*args, **kwargs)[source]

Bases: PDF4Cat

Subclass of PDF4Cat parent class

Parameters

doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:

Raises

TypeError – If you use doc_file with input_doc_list (you can use only one)

gen_pdfImagesOCR(pages: list = [], language: str = 'eng', zoom: float = 1.5) → tuple[source]

Generator, generate BytesIO object

Parameters

pages (list, optional) – List of pages to select like [1, 3, 5, 15]
language (str, optional) – Language to ocr (look fitz.pdfocr_tobytes)
zoom (float, optional) – Zoom image (look fitz.Matrix docs)

Yields

tuple – BytesIO

pdfocr(language: str = 'eng', output_pdf=None, pages: list = [], start_from: int = 0, zoom: float = 1.5) → None[source]

OCR pdf to file

Parameters

language (str, optional) – Language to ocr (look fitz.pdfocr_tobytes)
output_pdf (None, optional) – Output pdf file
pages (list, optional) – List of pages to select like [1, 3, 5, 15]
start_from (int, optional) – Enumerate from n
zoom (float, optional) – Zoom image (look fitz.Matrix docs)

class PDF4Cat.converter.any_doc_convert(*args, **kwargs)[source]

Bases: PDF4Cat

Subclass of PDF4Cat parent class

Parameters

doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:

Raises

TypeError – If you use doc_file with input_doc_list (you can use only one)

convert2pdf(output_pdf, use_soffice=False)[source]

Pdf to any (using PyMuPDF or Libre Office)

Parameters

output_pdf (None, optional) – Output pdf file
use_soffice (bool, optional) – Use Libre Office converter

docx2html(output_doc, style_map=None)[source]

docx to html (using PyMuPDF)

Parameters: output_html (None, optional) – Output html file

docx2pdf(output_pdf)[source]

docx to pdf (using PyMuPDF [docx=>html=>pdf])

Parameters: output_pdf (None, optional) – Output pdf file

gen_images4conv(pdf) → bytes[source]

Generator, generate BytesIO object

Parameters: pdf (None, optional) – pdf object (PDF4Cat.open)
Yields: bytes – BytesIO

pdf2docx(output_docx)[source]

Pdf to docx (using PyMuPDF)

Parameters: output_docx (None, optional) – Output docx file

pdf2pptx(output_pptx, A4=True)[source]

Pdf to pptx (using PyMuPDF)

Parameters

output_pptx (None, optional) – Output pptx file
A4 (bool, optional) – Use Inches for A4 page

class PDF4Cat.converter.soffice_convert(*args, **kwargs)[source]

Bases: PDF4Cat

Subclass of PDF4Cat parent class

Parameters

doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:

Raises

TypeError – If you use doc_file with input_doc_list (you can use only one)

soffice_convert2pdf(output_pdf: str)[source]

Libre Office converter wrapper for convert document to pdf

Parameters: output_pdf (str) – Output pdf file
Raises: NotImplementedError – If Libre Office not support this conversion

soffice_convert2pdf_a(a: int, output_pdf: str)[source]

Libre Office converter wrapper for convert document to pdf/a

Parameters

a (int) – A type (0, 1) [0 - pdf 1.4; 1 - pdf/a]
output_pdf (str) – Output pdf file

Raises

NotImplementedError – If Libre Office not support this conversion

soffice_convert_to(doc_type: str, output_doc: str)[source]

Libre Office converter wrapper for convert document to any supported by soffice

Parameters

doc_type (str) – Output document type to convert
output_doc (str) – Output document file