PDF4Cat.Converter class

Class bases

class PDF4Cat.converter.Converter(*args, **kwargs)[source]

Bases: Img2Pdf, Pdf2Img, OCR, any_doc_convert, soffice_convert

Parent class of PDF4Cat.converter submodule

class PDF4Cat.converter.any_doc_convert(*args, **kwargs)[source]

Bases: PDF4Cat

Subclass of PDF4Cat parent class

Parameters
  • doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)

  • input_doc_list (list, optional) – List of input docs

  • passwd (str, optional) – Document password (for crypt/decrypt)

  • progress_callback (None, optional) – Progress callback like:

Raises

TypeError – If you use doc_file with input_doc_list (you can use only one)

convert2pdf(output_pdf, use_soffice=False)[source]

Pdf to any (using PyMuPDF or Libre Office)

Parameters
  • output_pdf (None, optional) – Output pdf file

  • use_soffice (bool, optional) – Use Libre Office converter

docx2html(output_doc, style_map=None)[source]

docx to html (using PyMuPDF)

Parameters

output_html (None, optional) – Output html file

docx2pdf(output_pdf)[source]

docx to pdf (using PyMuPDF [docx=>html=>pdf])

Parameters

output_pdf (None, optional) – Output pdf file

gen_images4conv(pdf) bytes[source]

Generator, generate BytesIO object

Parameters

pdf (None, optional) – pdf object (PDF4Cat.open)

Yields

bytes – BytesIO

pdf2docx(output_docx)[source]

Pdf to docx (using PyMuPDF)

Parameters

output_docx (None, optional) – Output docx file

pdf2pptx(output_pptx, A4=True)[source]

Pdf to pptx (using PyMuPDF)

Parameters
  • output_pptx (None, optional) – Output pptx file

  • A4 (bool, optional) – Use Inches for A4 page

class PDF4Cat.converter.Img2Pdf(*args, **kwargs)[source]

Bases: PDF4Cat

Subclass of PDF4Cat parent class

Parameters
  • doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)

  • input_doc_list (list, optional) – List of input docs

  • passwd (str, optional) – Document password (for crypt/decrypt)

  • progress_callback (None, optional) – Progress callback like:

Raises

TypeError – If you use doc_file with input_doc_list (you can use only one)

gen_imagesi2p(fimages: str = '{name}_{num}.pdf', start_from: int = 0) tuple[source]

Generator, generate name with BytesIO object

Parameters
  • fimages (str, optional) – Format image filenames

  • start_from (int, optional) – Enumerate from n

Yields

tuple – filename, BytesIO

img2pdf(output_pdf=None) None[source]

Image to pdf

Parameters

output_pdf (None, optional) – Output pdf file

imgs2pdf(output_pdf=None) None[source]

Multiple images to pdf

Parameters

output_pdf (None, optional) – Output pdf file

imgs2pdfs_zip(out_zip_file: str, fimages: str = '{name}_{num}.pdf', start_from: int = 0) None[source]

Multiple images to multiple pdfs and compress to zip (using gen_imagesi2p generator)

Parameters
  • out_zip_file (str) – Output zip file

  • fimages (str, optional) – Format image filenames

  • start_from (int, optional) – Enumerate from n

class PDF4Cat.converter.Pdf2Img(*args, **kwargs)[source]

Bases: PDF4Cat

Subclass of PDF4Cat parent class

Parameters
  • doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)

  • input_doc_list (list, optional) – List of input docs

  • passwd (str, optional) – Document password (for crypt/decrypt)

  • progress_callback (None, optional) – Progress callback like:

Raises

TypeError – If you use doc_file with input_doc_list (you can use only one)

gen_imagesp2i(pages: list = [], fimages: str = '{name}_{num}.png', start_from: int = 0, zoom: float = 1.5) tuple[source]

Generator, generate name with BytesIO object

Parameters
  • pages (list, optional) – List of pages to select like [1, 3, 5, 15]

  • fimages (str, optional) – Format image filenames

  • start_from (int, optional) – Enumerate from n

  • zoom (float, optional) – Zoom image (look fitz.Matrix docs)

Yields

tuple – filename, BytesIO

pdf2imgs_zip(out_zip_file: str, pages: list = [], fimages: str = '{name}_{num}.png', start_from: int = 0, zoom: float = 1.5) None[source]

Multiple pdfs to multiple images and compress to zip (using gen_imagesp2i generator)

Parameters
  • out_zip_file (str) – Output zip file

  • pages (list, optional) – List of pages to select like [1, 3, 5, 15]

  • fimages (str, optional) – Format image filenames

  • start_from (int, optional) – Enumerate from n

  • zoom (float, optional) – Zoom image (look fitz.Matrix docs)

class PDF4Cat.converter.OCR(*args, **kwargs)[source]

Bases: PDF4Cat

Subclass of PDF4Cat parent class

Parameters
  • doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)

  • input_doc_list (list, optional) – List of input docs

  • passwd (str, optional) – Document password (for crypt/decrypt)

  • progress_callback (None, optional) – Progress callback like:

Raises

TypeError – If you use doc_file with input_doc_list (you can use only one)

gen_pdfImagesOCR(pages: list = [], language: str = 'eng', zoom: float = 1.5) tuple[source]

Generator, generate BytesIO object

Parameters
  • pages (list, optional) – List of pages to select like [1, 3, 5, 15]

  • language (str, optional) – Language to ocr (look fitz.pdfocr_tobytes)

  • zoom (float, optional) – Zoom image (look fitz.Matrix docs)

Yields

tuple – BytesIO

pdfocr(language: str = 'eng', output_pdf=None, pages: list = [], start_from: int = 0, zoom: float = 1.5) None[source]

OCR pdf to file

Parameters
  • language (str, optional) – Language to ocr (look fitz.pdfocr_tobytes)

  • output_pdf (None, optional) – Output pdf file

  • pages (list, optional) – List of pages to select like [1, 3, 5, 15]

  • start_from (int, optional) – Enumerate from n

  • zoom (float, optional) – Zoom image (look fitz.Matrix docs)

class PDF4Cat.converter.any_doc_convert(*args, **kwargs)[source]

Bases: PDF4Cat

Subclass of PDF4Cat parent class

Parameters
  • doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)

  • input_doc_list (list, optional) – List of input docs

  • passwd (str, optional) – Document password (for crypt/decrypt)

  • progress_callback (None, optional) – Progress callback like:

Raises

TypeError – If you use doc_file with input_doc_list (you can use only one)

convert2pdf(output_pdf, use_soffice=False)[source]

Pdf to any (using PyMuPDF or Libre Office)

Parameters
  • output_pdf (None, optional) – Output pdf file

  • use_soffice (bool, optional) – Use Libre Office converter

docx2html(output_doc, style_map=None)[source]

docx to html (using PyMuPDF)

Parameters

output_html (None, optional) – Output html file

docx2pdf(output_pdf)[source]

docx to pdf (using PyMuPDF [docx=>html=>pdf])

Parameters

output_pdf (None, optional) – Output pdf file

gen_images4conv(pdf) bytes[source]

Generator, generate BytesIO object

Parameters

pdf (None, optional) – pdf object (PDF4Cat.open)

Yields

bytes – BytesIO

pdf2docx(output_docx)[source]

Pdf to docx (using PyMuPDF)

Parameters

output_docx (None, optional) – Output docx file

pdf2pptx(output_pptx, A4=True)[source]

Pdf to pptx (using PyMuPDF)

Parameters
  • output_pptx (None, optional) – Output pptx file

  • A4 (bool, optional) – Use Inches for A4 page

class PDF4Cat.converter.soffice_convert(*args, **kwargs)[source]

Bases: PDF4Cat

Subclass of PDF4Cat parent class

Parameters
  • doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)

  • input_doc_list (list, optional) – List of input docs

  • passwd (str, optional) – Document password (for crypt/decrypt)

  • progress_callback (None, optional) – Progress callback like:

Raises

TypeError – If you use doc_file with input_doc_list (you can use only one)

soffice_convert2pdf(output_pdf: str)[source]

Libre Office converter wrapper for convert document to pdf

Parameters

output_pdf (str) – Output pdf file

Raises

NotImplementedError – If Libre Office not support this conversion

soffice_convert2pdf_a(a: int, output_pdf: str)[source]

Libre Office converter wrapper for convert document to pdf/a

Parameters
  • a (int) – A type (0, 1) [0 - pdf 1.4; 1 - pdf/a]

  • output_pdf (str) – Output pdf file

Raises

NotImplementedError – If Libre Office not support this conversion

soffice_convert_to(doc_type: str, output_doc: str)[source]

Libre Office converter wrapper for convert document to any supported by soffice

Parameters
  • doc_type (str) – Output document type to convert

  • output_doc (str) – Output document file