Complex HTML to Pdf

The current html to pdf system utilizes Apache PDF box which has significant struggles with complex items such as inline images and complex css. One workaround for converting complex html into a pdf in java is Playwright (Microsoft fork of flying saucer). I’ve now confirmed that it is functional and much easier (in my opinion) than PDFBox.

Ideally this could be implemented as a new engine option for html to pdf, but until that is possible it can be done programmatically as follows.

  1. Add the import to build.gradle. Ensure to update to the most recent version available on their website.
    implementation 'com.microsoft.playwright:playwright:1.49.0'
  1. Create a Bean Component HtmlToPdf.
package com.company.test

import com.microsoft.playwright.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;

@Component
public class HtmlToPdf {

    private static final Logger log = LoggerFactory.getLogger(HtmlToPdf.class);

    public static byte[] htmlToPdf(String html) {
        try (Playwright playwright = Playwright.create()) {
            BrowserType chromium = playwright.chromium();
            try (Browser browser = chromium.launch();
                 Page page = browser.newPage()) {
                page.setContent(html);
                byte[] pdfBytes = page.pdf();
                return pdfBytes;
            }
        } catch (Exception e) {
            log.error("Error generating PDF", e);
            return null;
        }
    }
}
  1. Call that component with HTML (as a String), [In this example it is stored in the db as a byte[]]
String html = new String(testDc.getItem().getReferralFormHtml(), StandardCharsets.UTF_8);
byte[] pdfDoc = HtmlToPdf.htmlToPdf(html);

Thank you,
Oran

3 Likes