Removing password protection from PDF files

Important note: this post wont help you if you have a PDF file and don’t know the password. This is for removing passwords on PDFs that you have legal access to, but don’t want to be password-protected any more.

A while ago, one of my employers started emailing payslips in PDF format. Now, I know there are many issues around accessibility with PDFs, but it works for me – I get a digital version of a document that looks exactly as the printed one would have. Except that someone decided email (even to a company-secured account) was not secure enough, and they password-protected the files. In theory, this stops another employee from opening my payslip. In practice, they used a known piece of personally identifiable information (PII).

Anyway, I wanted to keep a copy of the files on my own file storage. I can do this because, technically, they are not company data and they are (or at least should be) private to me. Indeed the company in question has since moved to a system that emails a link to a personal email account, inviting the employee to download their payslip from a portal.

I didn’t want the copies of the payslips that I held to be password protected. That meant I needed to remove those passwords.

QPDF

QPDF is a computer program, and associated library, for structural, content-preserving transformations on PDF files. It’s not for creating, viewing or converting PDF files.

One of the things it can do, is remove the password protection on a file. Remember, this is a file that I have legal access to, so removing the password protection is not a crime. I’m not hacking the file – in fact I need to know the password in order to remove it.

QPDF can do much more than remove passwords (for example I think I could use it to create new versions of a PDF file with just a subset of the pages), but this was what I needed to do.

A little side-note

This was the second time I performed this exercise. I first did it a few years ago, but only on the payslips I’d received up until that date. Later ones were still password-protected. I didn’t document my method the first time around though… so I had to work it all out again. This time I decided to write it down…

A little PowerShell Script

It looks like, the first time I ran this, I downloaded a Windows executable version of QPDF and either wrote, or more likely found, a PowerShell script to adapt. The script is called payslips.ps1 and looks like this:

$children = Get-ChildItem # Save files in a variable. Piping the rest of the script from Get-ChildItem in a single line was a bad idea
$children | ForEach-Object {
Write-Debug "Working on $_.Name"; #Doesn't actually display a lot
$fileName =[System.IO.Path]::GetFileNameWithoutExtension($_.Name); #Strip name, we will append "tmp"
$ext =[System.IO.Path]::GetExtension($_.Name);
$tempFile = $fileName + "tmp" + $ext; # Append "_tmp" Move-Item -Path $.Name -Destination $tempFile; #Move the file to a temporary location
..\qpdf.exe --password=AB123456C --decrypt $tempFile $_.Name; #Use qpdf to decrypt it, save in original location
#Remove-Item $tempFile #Remove temporary file
}

ABC123456C should be replaced with the actual password. Actually, it shouldn’t, because including credentials in code is sloppy security practice. There are better ways to pass the password, but I’m just converting 50 files as a one-off exercise, not building a repeatable business process. If you go on to use this in a business environment, please don’t do it this way!

Release notes

The script makes a temporary copy of each file, suffixed with _tmp but preserving the file extension.

If you run the script against the current folder, it will run against all files, not just PDFs. That means it will rename itself and all the QPDF files with _tmp. This will cause it to fail.

It looks like, when I ran this a few years ago, I used a files.txt file to control this behaviour. files.txt was just a list of filenames and is easily generated using the following command:

dir /b /a-d > files.txt

But, this time, I couldn’t see how to provide that as a parameter to QPDF, so I had to:

  1. Place all the files to be converted in a subfolder of the folder containing QPDF and my PowerShell script.
  2. Edit the payslips.ps1 script to refer to ..\qpdf.exe (i.e. qpdf.exe in the folder above the current one).
  3. Change directory into the subfolder.
  4. Run payslips.ps1 from the subfolder – i.e.:
..\payslips.ps1

This means it will only run against the files in the subfolder, and not against QPDF, the script, or anything else.

It doesn’t seem to remove the temporary files. I didn’t try to work out why. It had already created what I needed by then.

Featured image: author’s own

Bulk removing passwords from PDF documents

This content is 4 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

My payslip and related documents are sent to me in PDF format. To provide some rudimentary protection from interception, they are password protected, though the password is easily obtained by anyone who knows what the system is.

Because these are important documents, I store a copy in my personal filing system, but I don’t want to have to enter the password each time I open a file. I know I can open each file individually and then resave without a password (Preview on the Mac should do this) but I wanted a way to do it in bulk, for 10s of files, without access to Adobe Acrobat Pro.

Twitter came to my aid with various suggestions including Automator on the Mac. In the end, the approach I used employed an open source tool called QPDF, recommended to me by Scott Cross (@ScottCross79). Scott also signposted a Stack Overflow post with a PowerShell script to run against a set of files but it didn’t work (leading to a rant about how Stack Overflow’s arcane rules and culture prevented me from making a single character edit) and turned out to be over-engineered. It did get me thinking though…

Those of us old enough to remember writing MS-DOS batch files will probably remember setting environment variables. Combined with a good old FOR loop, I got this:

FOR %G IN (*.pdf) DO qpdf --decrypt --password=mypassword "%G" --replace-input

Obviously, replace mypassword with something more appropriate. The --replace-input switch avoids the need to specify output filenames, and the use of the FOR command simply cycles through an entire folder and removes the encryption.