How to generate PDFs using Puppeteer and Headless Chrome

Goal

To use Puppeteer and headless Chrome to create an ExpressJS application that generates PDFs of web sites on Platform.sh.

Assumptions

You will need:

Problems

Using headless Chrome to generate PDFs of a website requires properly connecting the chrome-headless service container to the Node library Puppeteer by passing its credentials using the Node.js Config Reader libary.

Steps

The project will ultimately have the following structure:

.
├── .platform
│   ├── routes.yaml
│   └── services.yaml
├── .platform.app.yaml
├── index.js
├── package.json
├── package-lock.json
└── pdfs.js

1. Initialize the project

Create an empty project on Platform.sh using the CLI.

$ platform create

Create a new project directory for the application on your local machine called pdfs and cd into it. Initialize the directory as a Git repository and set its remote to the newly created Platform.sh project using the outputted project ID.

$ git init
$ platform project:set-remote <project ID>

2. Create the Platform.sh configuration files

  1. .platform/services.yaml

    Define the chrome-headless container using the supported version outlined in the Headless Chrome documentation.

    headless:
        type: chrome-headless:73
    
  2. .platform.app.yaml

    Configure the application nodejs:

    name: nodejs
    
    type: nodejs:10
    
    relationships:
      headless: "headless:http"
    
    crons:
      cleanup:
        spec: '*/30 * * * *'
        cmd: rm pdfs/*
    
    web:
      commands:
        start: "nodejs index.js"
    
    mounts:
      "/pdfs": "shared:files/pdfs"
    
    disk: 512
    

    The configuration uses nodejs 10, since it is required to use the Config Reader library with Puppeteer. It defines the mount pdfs that will act as a writable directory to save the PDFs the application generates.

    In order to prevent pdfs/ from filling up as people use it, a cron job is also defined that removes its contents every 30 minutes.

  3. .platform/routes.yaml

    Lastly, set up a basic routes configuration file, using the name of the application nodejs

    "https://{default}/":
        id: main
        type: upstream
        upstream: "nodejs:http"
    
    "https://www.{default}/":
        type: redirect
        to: "https://{default}/"
    

3. Write the pdfs.js file

Create a file in the project directory called pdfs.js with the following contents:

const puppeteer = require('puppeteer');
const platformsh = require('platformsh-config');

var exports = module.exports = {};

// Create an async function
exports.makePDF = async function (url, pdfID) {

    try {
        // Connect to chrome-headless using pre-formatted puppeteer credentials
        let config = platformsh.config();
        const formattedURL = config.formattedCredentials("headless", "puppeteer");
        const browser = await puppeteer.connect({browserURL: formattedURL});

        // Open a new page to the given url and create the PDF
        const page = await browser.newPage();
        await page.goto(url, {waitUntil: 'networkidle2'});
        await page.pdf({
            path: `pdfs/${pdfID}.pdf`,
            printBackground: true
        });
        await browser.close();

        return browser

    } catch (e) {

        return Promise.reject(e);

    }

};

It defines an async function called makePDF as a module export. The Node.js Config Reader retrieves the library’s formatted credentials for Puppeteer to create the formattedURL string.

path defines the saved location of the PDF, while printBackground allows background images on the page to be included in the generated PDF. Additional parameters for page.pdf() can be found in the Puppeteer documentation.

4. Define index.js

Create the file index.js that defines the ExpressJS application app:

const fs = require('fs');
const uuidv4 = require('uuid/v4')
const platformsh = require('platformsh-config');
const express = require('express');

// Require pdf file and its function
var pdfs = require("./pdfs.js");

// Build the application
var app = express();

// Define the index route
app.get('/', (req, res) => {
  res.writeHead(200, {"Content-Type": "text/html"});
  res.write(`<html>
<head>
    <title>Headless Chrome on Platform.sh</title>
</head>
<body>

<h1>Headless Chrome on Platform.sh</h1>

<h2>Generate a PDF of a page</h2>

Click submit to generate a PDF of the <a href="https://platform.sh/">Platform.sh website</a>, or paste in another URL.

</br></br>

<form method="get" action="/result">
    <input type="text" name="pdfURL" value="https://platform.sh/">
    <input type="submit">
</form>

`);
    res.end(`</body></html>`);
})

// Define PDF result route
app.get('/result', async function(req, res){
  // Create a randomly generated ID number for the current PDF
  var pdfID = uuidv4();
  // Generate the PDF
  await pdfs.makePDF(req.query['pdfURL'], pdfID)
  // Define and download the file
  const file = `pdfs/${pdfID}.pdf`;
  res.download(file);
});

// Create config object to get Platform.sh PORT credentials
let config = platformsh.config();

// Start the server.
app.listen(config.port, function() {
    console.log(`Listening on port ${config.port}`)
});

In addition to the home route, index.js defines a /results path that calls makePDF() and passes a randomly generated ID that will become part of the name for the generated PDF file.

5. Define the application’s dependencies

Include the application’s dependencies in package.json:

{
  "name": "chrome_headless",
  "version": "1.0.0",
  "description": "A simple example for taking screenshots with Puppeteer and headless Chrome on Platform.sh",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Chad Carlson",
  "license": "MIT",
  "dependencies": {
    "platformsh-config": "^2.0.0",
    "puppeteer": "^1.14.0",
    "express": "^4.16.4",
    "uuid": "^3.3.2"
  }
}

Then create the package-lock.json file by running

$ npm install

6. Push to Platform.sh

Commit the changes and push master to Platform.sh

$ git add .
$ git commit -m "Create PDF generator application."
$ git push platform master

7. Verify

Use the command platform url when the build process has completed to visit the site. Click submit to generate a PDF of the Platform.sh website, or copy in another url to test the application.

Conclusion

Using ExpressJS, Puppeteer, and Platform.sh headless Chrome as a service, a simple application can be made that generates PDFs of an inputted url.