PDF4Cat.Converter class
Class bases
- class PDF4Cat.converter.Converter(*args, **kwargs)[source]
Bases:
Img2Pdf
,Pdf2Img
,OCR
,any_doc_convert
,soffice_convert
Parent class of PDF4Cat.converter submodule
- class PDF4Cat.converter.any_doc_convert(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- convert2pdf(output_pdf, use_soffice=False)[source]
Pdf to any (using PyMuPDF or Libre Office)
- Parameters
output_pdf (None, optional) – Output pdf file
use_soffice (bool, optional) – Use Libre Office converter
- docx2html(output_doc, style_map=None)[source]
docx to html (using PyMuPDF)
- Parameters
output_html (None, optional) – Output html file
- docx2pdf(output_pdf)[source]
docx to pdf (using PyMuPDF [docx=>html=>pdf])
- Parameters
output_pdf (None, optional) – Output pdf file
- gen_images4conv(pdf) bytes [source]
Generator, generate BytesIO object
- Parameters
pdf (None, optional) – pdf object (PDF4Cat.open)
- Yields
bytes – BytesIO
- class PDF4Cat.converter.Img2Pdf(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- gen_imagesi2p(fimages: str = '{name}_{num}.pdf', start_from: int = 0) tuple [source]
Generator, generate name with BytesIO object
- Parameters
fimages (str, optional) – Format image filenames
start_from (int, optional) – Enumerate from n
- Yields
tuple – filename, BytesIO
- img2pdf(output_pdf=None) None [source]
Image to pdf
- Parameters
output_pdf (None, optional) – Output pdf file
- imgs2pdf(output_pdf=None) None [source]
Multiple images to pdf
- Parameters
output_pdf (None, optional) – Output pdf file
- imgs2pdfs_zip(out_zip_file: str, fimages: str = '{name}_{num}.pdf', start_from: int = 0) None [source]
Multiple images to multiple pdfs and compress to zip (using gen_imagesi2p generator)
- Parameters
out_zip_file (str) – Output zip file
fimages (str, optional) – Format image filenames
start_from (int, optional) – Enumerate from n
- class PDF4Cat.converter.Pdf2Img(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- gen_imagesp2i(pages: list = [], fimages: str = '{name}_{num}.png', start_from: int = 0, zoom: float = 1.5) tuple [source]
Generator, generate name with BytesIO object
- Parameters
pages (list, optional) – List of pages to select like [1, 3, 5, 15]
fimages (str, optional) – Format image filenames
start_from (int, optional) – Enumerate from n
zoom (float, optional) – Zoom image (look fitz.Matrix docs)
- Yields
tuple – filename, BytesIO
- pdf2imgs_zip(out_zip_file: str, pages: list = [], fimages: str = '{name}_{num}.png', start_from: int = 0, zoom: float = 1.5) None [source]
Multiple pdfs to multiple images and compress to zip (using gen_imagesp2i generator)
- Parameters
out_zip_file (str) – Output zip file
pages (list, optional) – List of pages to select like [1, 3, 5, 15]
fimages (str, optional) – Format image filenames
start_from (int, optional) – Enumerate from n
zoom (float, optional) – Zoom image (look fitz.Matrix docs)
- class PDF4Cat.converter.OCR(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- gen_pdfImagesOCR(pages: list = [], language: str = 'eng', zoom: float = 1.5) tuple [source]
Generator, generate BytesIO object
- Parameters
pages (list, optional) – List of pages to select like [1, 3, 5, 15]
language (str, optional) – Language to ocr (look fitz.pdfocr_tobytes)
zoom (float, optional) – Zoom image (look fitz.Matrix docs)
- Yields
tuple – BytesIO
- pdfocr(language: str = 'eng', output_pdf=None, pages: list = [], start_from: int = 0, zoom: float = 1.5) None [source]
OCR pdf to file
- Parameters
language (str, optional) – Language to ocr (look fitz.pdfocr_tobytes)
output_pdf (None, optional) – Output pdf file
pages (list, optional) – List of pages to select like [1, 3, 5, 15]
start_from (int, optional) – Enumerate from n
zoom (float, optional) – Zoom image (look fitz.Matrix docs)
- class PDF4Cat.converter.any_doc_convert(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- convert2pdf(output_pdf, use_soffice=False)[source]
Pdf to any (using PyMuPDF or Libre Office)
- Parameters
output_pdf (None, optional) – Output pdf file
use_soffice (bool, optional) – Use Libre Office converter
- docx2html(output_doc, style_map=None)[source]
docx to html (using PyMuPDF)
- Parameters
output_html (None, optional) – Output html file
- docx2pdf(output_pdf)[source]
docx to pdf (using PyMuPDF [docx=>html=>pdf])
- Parameters
output_pdf (None, optional) – Output pdf file
- gen_images4conv(pdf) bytes [source]
Generator, generate BytesIO object
- Parameters
pdf (None, optional) – pdf object (PDF4Cat.open)
- Yields
bytes – BytesIO
- class PDF4Cat.converter.soffice_convert(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- soffice_convert2pdf(output_pdf: str)[source]
Libre Office converter wrapper for convert document to pdf
- Parameters
output_pdf (str) – Output pdf file
- Raises
NotImplementedError – If Libre Office not support this conversion