Welcome to PDF4Cat’s documentation!
PDF4Cat module
PDF4Cat.Converter class
Class bases
- class PDF4Cat.converter.Converter(*args, **kwargs)[source]
Bases:
Img2Pdf
,Pdf2Img
,OCR
,any_doc_convert
,soffice_convert
Parent class of PDF4Cat.converter submodule
- class PDF4Cat.converter.any_doc_convert(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- convert2pdf(output_pdf, use_soffice=False)[source]
Pdf to any (using PyMuPDF or Libre Office)
- Parameters
output_pdf (None, optional) – Output pdf file
use_soffice (bool, optional) – Use Libre Office converter
- docx2html(output_doc, style_map=None)[source]
docx to html (using PyMuPDF)
- Parameters
output_html (None, optional) – Output html file
- docx2pdf(output_pdf)[source]
docx to pdf (using PyMuPDF [docx=>html=>pdf])
- Parameters
output_pdf (None, optional) – Output pdf file
- gen_images4conv(pdf) bytes [source]
Generator, generate BytesIO object
- Parameters
pdf (None, optional) – pdf object (PDF4Cat.open)
- Yields
bytes – BytesIO
- class PDF4Cat.converter.Img2Pdf(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- gen_imagesi2p(fimages: str = '{name}_{num}.pdf', start_from: int = 0) tuple [source]
Generator, generate name with BytesIO object
- Parameters
fimages (str, optional) – Format image filenames
start_from (int, optional) – Enumerate from n
- Yields
tuple – filename, BytesIO
- img2pdf(output_pdf=None) None [source]
Image to pdf
- Parameters
output_pdf (None, optional) – Output pdf file
- imgs2pdf(output_pdf=None) None [source]
Multiple images to pdf
- Parameters
output_pdf (None, optional) – Output pdf file
- imgs2pdfs_zip(out_zip_file: str, fimages: str = '{name}_{num}.pdf', start_from: int = 0) None [source]
Multiple images to multiple pdfs and compress to zip (using gen_imagesi2p generator)
- Parameters
out_zip_file (str) – Output zip file
fimages (str, optional) – Format image filenames
start_from (int, optional) – Enumerate from n
- class PDF4Cat.converter.Pdf2Img(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- gen_imagesp2i(pages: list = [], fimages: str = '{name}_{num}.png', start_from: int = 0, zoom: float = 1.5) tuple [source]
Generator, generate name with BytesIO object
- Parameters
pages (list, optional) – List of pages to select like [1, 3, 5, 15]
fimages (str, optional) – Format image filenames
start_from (int, optional) – Enumerate from n
zoom (float, optional) – Zoom image (look fitz.Matrix docs)
- Yields
tuple – filename, BytesIO
- pdf2imgs_zip(out_zip_file: str, pages: list = [], fimages: str = '{name}_{num}.png', start_from: int = 0, zoom: float = 1.5) None [source]
Multiple pdfs to multiple images and compress to zip (using gen_imagesp2i generator)
- Parameters
out_zip_file (str) – Output zip file
pages (list, optional) – List of pages to select like [1, 3, 5, 15]
fimages (str, optional) – Format image filenames
start_from (int, optional) – Enumerate from n
zoom (float, optional) – Zoom image (look fitz.Matrix docs)
- class PDF4Cat.converter.OCR(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- gen_pdfImagesOCR(pages: list = [], language: str = 'eng', zoom: float = 1.5) tuple [source]
Generator, generate BytesIO object
- Parameters
pages (list, optional) – List of pages to select like [1, 3, 5, 15]
language (str, optional) – Language to ocr (look fitz.pdfocr_tobytes)
zoom (float, optional) – Zoom image (look fitz.Matrix docs)
- Yields
tuple – BytesIO
- pdfocr(language: str = 'eng', output_pdf=None, pages: list = [], start_from: int = 0, zoom: float = 1.5) None [source]
OCR pdf to file
- Parameters
language (str, optional) – Language to ocr (look fitz.pdfocr_tobytes)
output_pdf (None, optional) – Output pdf file
pages (list, optional) – List of pages to select like [1, 3, 5, 15]
start_from (int, optional) – Enumerate from n
zoom (float, optional) – Zoom image (look fitz.Matrix docs)
- class PDF4Cat.converter.any_doc_convert(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- convert2pdf(output_pdf, use_soffice=False)[source]
Pdf to any (using PyMuPDF or Libre Office)
- Parameters
output_pdf (None, optional) – Output pdf file
use_soffice (bool, optional) – Use Libre Office converter
- docx2html(output_doc, style_map=None)[source]
docx to html (using PyMuPDF)
- Parameters
output_html (None, optional) – Output html file
- docx2pdf(output_pdf)[source]
docx to pdf (using PyMuPDF [docx=>html=>pdf])
- Parameters
output_pdf (None, optional) – Output pdf file
- gen_images4conv(pdf) bytes [source]
Generator, generate BytesIO object
- Parameters
pdf (None, optional) – pdf object (PDF4Cat.open)
- Yields
bytes – BytesIO
- class PDF4Cat.converter.soffice_convert(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- soffice_convert2pdf(output_pdf: str)[source]
Libre Office converter wrapper for convert document to pdf
- Parameters
output_pdf (str) – Output pdf file
- Raises
NotImplementedError – If Libre Office not support this conversion
PDF4Cat.doc class
Class bases
- class PDF4Cat.doc.Doc(*args, **kwargs)[source]
Bases:
Merger
,Splitter
,Crypter
,Effects
,PdfOptimizer
Parent class of PDF4Cat.doc submodule
- class PDF4Cat.doc.Merger(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- class PDF4Cat.doc.Splitter(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- gen_split(from_pdf=None, pages: list = [], fpages: str = '{name}_{num}.pdf', start_from: int = 0) tuple [source]
Generator, generate name with BytesIO object
- Parameters
from_pdf (None, optional) – pdf document name (default use main doc from class param)
pages (list, optional) – List of pages to select like [1, 3, 5, 15]
fpages (str, optional) – Format pdf filenames
start_from (int, optional) – Enumerate from n
- Yields
tuple – filename, BytesIO
- split_pages2zip(out_zip_file: str, pages: list = [], fpages: str = '{name}_{num}.pdf', start_from: int = 0) None [source]
Split pages to different pdfs and compress to zip
- Parameters
out_zip_file (str) – Output zip file
pages (list, optional) – List of pages to select like [1, 3, 5, 15]
fpages (str, optional) – Format pdf filenames
start_from (int, optional) – Enumerate from n
- class PDF4Cat.doc.Crypter(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
- crypt_to(user_passwd: str = None, owner_passwd: str = None, perm: dict = None, crypt_type: int = 5, output_pdf: str = None) None [source]
Crypt pdf and save to file (don’t forget give password in class parameter)
- Parameters
user_passwd (str, optional) – Pdf user password
owner_passwd (str, optional) – Pdf owner password
perm (dict, optional) – Permissions see past example - :perm
crypt_type (int, optional) – Crypt type, default AES256 (PDF4Cat.PDF_ENCRYPT_AES_256)
output_pdf (None, optional) – Output pdf file
- Raises
TypeError – “Missing user and owner password!”
- perm = int(PDF4Cat.PDF_PERM_ACCESSIBILITY
- PDF4Cat.PDF_PERM_PRINTPDF4Cat.PDF_PERM_COPYPDF4Cat.PDF_PERM_ANNOTATE)
- class PDF4Cat.doc.Effects(*args, **kwargs)[source]
Bases:
Rotate
Parent class of PDF4Cat.Doc submodule
- class PDF4Cat.doc.PdfOptimizer(*args, **kwargs)[source]
Bases:
PDF4Cat
Subclass of PDF4Cat parent class
- Parameters
doc_file (None, optional) – Document file (for multiple operations, ‘use input_doc_list’)
input_doc_list (list, optional) – List of input docs
passwd (str, optional) – Document password (for crypt/decrypt)
progress_callback (None, optional) – Progress callback like:
- Raises
TypeError – If you use doc_file with input_doc_list (you can use only one)
PDF4Cat.helpers module
- PDF4Cat.helpers.run_in_subprocess(func)[source]
A decorator adding a kwarg to a function that makes it run in a subprocess. This can be useful when you have a function that may segfault. You can use by call: @PDF4Cat.run_in_subprocess kwargs: run_in_subprocess=True, subprocess_timeout already using in: PDF4Cat.Converter funcs and PDF4Cat.Doc funcs