Verifying PDF file data in Playwright

2 min readNov 9, 2024

This blog covers how to validate the content of a PDF file using Playwright with a Node.js library.

Scenario

Consider a use case where you need to automate an action that clicks a download button. This action downloads a PDF file, which you then need to verify based on the following criteria:

  1. The entire text content or specific substring of the PDF file.
  2. The number of pages in the file.

Solution

We can achieve the above-mentioned sequence of actions using a library called ‘pdf-parse’. Let’s see the implementation of the same using Playwright JavaScript.

import pdfjs from "pdf-parse";
import { test, expect } from "playwright/test";
const fs = require("fs");

test("pdf verification", async ({ page }) => {
const filePath = "data/downloads/invoice.pdf";
// Start waiting for download before clicking.
const downloadPromise = page.waitForEvent("download");
await page.getByTestId("download-button").click();
const download = await downloadPromise;
await download.saveAs(filePath);

const dataBuffer = fs.readFileSync(filePath);
await pdfjs(dataBuffer).then((data) => {
// PDF text
console.log(data.text);
// PDF info
console.log(data.info);
// PDF metdata
console.log(data.metadata);
// number of pages
console.log(data.numpages);
expect(data.text).toContain(`Test Business
123 Somewhere St
Melbourne, VIC 3000`);
expect(data.numpages).toEqual(1);
});
});

This is the snapshot of PDF file content:

Sample PDF File Content

Code breakdown

  1. Create a download promise before clicking on the download button using waitForEvent('download') .
  2. Fulfill the download promise after the download event is triggered and use saveAs() to save the downloaded file in the desired path.
  3. Now, use fs library to get the buffer data of the PDF file.
  4. Then use pdfjs library to get the different properties of PDF files like text content, info, metadata, page count, etc.
  5. Once the required PDF properties are obtained, they can be asserted with the expected value using Playwright’s expect .

For more information about the pdf-parse, checkout it’s documentation here.

Shiv Jirwankar
SDET

LinkedIn

--

--

Shiv Jirwankar
Shiv Jirwankar

Written by Shiv Jirwankar

Software Development Engineer in Test | An ambivert, optimistic, and karma believer | https://www.linkedin.com/in/shiv-jirwankar-45246577

No responses yet