Verifying PDF file data in Playwright
This blog covers how to validate the content of a PDF file using Playwright with a Node.js library.
Scenario
Consider a use case where you need to automate an action that clicks a download button. This action downloads a PDF file, which you then need to verify based on the following criteria:
- The entire text content or specific substring of the PDF file.
- The number of pages in the file.
Solution
We can achieve the above-mentioned sequence of actions using a library called ‘pdf-parse’. Let’s see the implementation of the same using Playwright JavaScript.
import pdfjs from "pdf-parse";
import { test, expect } from "playwright/test";
const fs = require("fs");
test("pdf verification", async ({ page }) => {
const filePath = "data/downloads/invoice.pdf";
// Start waiting for download before clicking.
const downloadPromise = page.waitForEvent("download");
await page.getByTestId("download-button").click();
const download = await downloadPromise;
await download.saveAs(filePath);
const dataBuffer = fs.readFileSync(filePath);
await pdfjs(dataBuffer).then((data) => {
// PDF text
console.log(data.text);
// PDF info
console.log(data.info);
// PDF metdata
console.log(data.metadata);
// number of pages
console.log(data.numpages);
expect(data.text).toContain(`Test Business
123 Somewhere St
Melbourne, VIC 3000`);
expect(data.numpages).toEqual(1);
});
});
This is the snapshot of PDF file content:
Code breakdown
- Create a download promise before clicking on the download button using
waitForEvent('download')
. - Fulfill the download promise after the download event is triggered and use
saveAs()
to save the downloaded file in the desired path. - Now, use
fs
library to get the buffer data of the PDF file. - Then use
pdfjs
library to get the different properties of PDF files like text content, info, metadata, page count, etc. - Once the required PDF properties are obtained, they can be asserted with the expected value using Playwright’s
expect
.
For more information about the pdf-parse, checkout it’s documentation here.
Shiv Jirwankar
SDET