Introducing Browsershot v3: the best way to convert html to PDFs and images

Original – Oct 23rd 2017 by Freek Van der Herten – 5 minute read

To convert html to a pdf or an image using wkhtmltopdf and wkhtmltoimage tends to be the popular option. Unfortunately those tools contain an outdated browser engine to do their thing, so you can't use any newish css syntax. A while ago Google added a headless mode to Chrome. They've also released a javascript library called Puppeteer that gives you programmatic and fine grained control over Chrome.

Wouldn't it be great if we could just use Chrome and Puppeteer to convert html to PDFs and images? Browsershot is package that does exactly that. In this post I'd like to introduce v3 of Browsershot, which was recently released.

Basic usage

To be able to use the package Puppeteer should be installed on your system. Luckily this is an easy process. The readme of Browsershot mentions the steps to install it.

Here are a few examples of what the package can do:

use Spatie\Browsershot\Browsershot;

// an image will be saved
Browsershot::url('https://example.com')->save($pathToImage);

It will save a pdf if the path passed to the save method has a pdf extension.

// a pdf will be saved
Browsershot::url('https://example.com')->save('example.pdf');

You can also use an arbitrary html input, simply replace the url method with html:

Browsershot::html('<h1>Hello world!!</h1>')->save('example.pdf');

Browsershot also can get the body of an html page after JavaScript has been executed:

Browsershot::url('https://example.com')->bodyHtml(); // returns the html of the body

Advanced usage

The examples above could all be achieved by using v2 of Browsershot which leveraged headless chrome without Puppeteer. I'm very grateful for this awesome PR by itsgoingd which added integration for Puppeteer and formed the basis for v3 of Browsershot. Using Puppeteer unlocks awesome new possibilities.

You can take now take a screenshot of the full length of the page.

Browsershot::url('https://example.com')
    ->fullPage()
    ->save($pathToImage);

Pdf's can now be landscape oriented.

Browsershot::html($someHtml)
   ->landscape()
   ->save('example.pdf');

You can specify the width and the height in millimeters.

Browsershot::html($someHtml)
   ->paperSize($width, $height)
   ->save('example.pdf');

Margins can be set as well.

Browsershot::html($someHtml)
   ->margins($top, $right, $bottom, $left)
   ->save('example.pdf');

Behind the scenes

Let's take a look on how a JavaScript library like Puppeteer can be called from within a PHP package.

When calling save the package will build up an array which all of the options that have been set.

public function createPdfCommand($targetPath): array
{
    $url = $this->html ? $this->createTemporaryHtmlFile() : $this->url;

    $command = $this->createCommand($url, 'pdf', ['path' => $targetPath]);

    if ($this->showBrowserHeaderAndFooter) {
        $command['options']['displayHeaderFooter'] = true;
    }

    if ($this->showBackground) {
        $command['options']['printBackground'] = true;
    }
...
}

The next interesting thing happens in the callBrowser function. Here we will convert that array we've built up (it's being passed to the function as $command) to json. We'll start a process that will let node execute a JavaScript function contained in the bin/browser.js file in the package.

protected function callBrowser(array $command)
{
   $setIncludePathCommand = "PATH={$this->includePath}";

   $setNodePathCommand = "NODE_PATH=`{$this->nodeBinary} {$this->npmBinary} root -g`";

   $binPath = __DIR__.'/../bin/browser.js';

   $fullCommand =
       $setIncludePathCommand.' '
       .$setNodePathCommand.' '
       .$this->nodeBinary.' '
       .escapeshellarg($binPath).' '
       .escapeshellarg(json_encode($command));

   $process = (new Process($fullCommand))->setTimeout($this->timeout);

   $process->run();

   if (! $process->isSuccessful()) {
       throw new ProcessFailedException($process);
   }

   return $process->getOutput();
}

Let's take a look at that browser.js file. I've added some inline comments to make things more clear.

const puppeteer = require('puppeteer');

//here we convert the json with all the conversion options to a JavaScript object
const request = JSON.parse(process.argv[2]);

const callChrome = async () => {
    let browser;
    let page;

    try {

        // let's launch headless chrome
        browser = await puppeteer.launch();

        // here we create a new page
        page = await browser.newPage();

        // build up options, omitted in this blog post
        ...

        // and here we set the url of that page and pass all the requested options
        await page.goto(request.url, requestOptions);

        await browser.close();
    } catch (exception) {
        if (browser) {
            await browser.close();
        }

        process.exit(1);
    }
};

// do the magic!
callChrome();

In closing

Personally I probably won't use wkhtmltopdf or wkhtmltoimage again in the foreseeable future. If you need to convert html to a pdf or an image, be sure to take Browsershot for a spin. In this blogpost we've touched upon some of the stuff the package can do. But there are many many more options. Head over to the readme of Browsershot on GitHub to learn more.

I'm pretty sure that headless Chrome, Puppeteer and Browsershot will gain more nice functionalities in the future.

If you need to create pdfs that need to go to a printing office where it needs to adhere to a certain pdf or color standard, it's probably better to use something that gives you more fine grained control, like Latex or Docraptor.

If you like Browsershot, take a look at the PHP and Laravel packages our team has previously released.

Freek.dev

Introducing Browsershot v3: the best way to convert html to PDFs and images

Basic usage

Advanced usage

Behind the scenes

In closing

Comments #