Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CVE-2024-43785: gitoxide-core does not neutralize special characters for terminals #1534

Open
EliahKagan opened this issue Aug 22, 2024 · 15 comments
Labels
acknowledged an issue is accepted as shortcoming to be fixed help wanted Extra attention is needed

Comments

@EliahKagan
Copy link
Member

EliahKagan commented Aug 22, 2024

Current behavior 😯

This issue is for tracking the public vulnerability CVE-2024-43785 (GHSA-88g2-r9rw-g55h, RUSTSEC-2024-0364).

The gix and ein commands write pathnames and other metadata literally to terminals, even if they contain characters terminals treat specially, including ANSI escape sequences. This sometimes allows an untrusted repository to misrepresent its contents and to alter or concoct error messages.

Further details, including detailed instructions to reproduce the main effect described, are in the advisory, available in:

As noted, this vulnerability is low-risk. It was decided, following a coordinated disclosure, that informing users would be beneficial, even before a patch is available. Contributions are welcome.

Expected behavior 🤔

General expectations

When sufficiently capable of misleading the user or (though less of a threat) interfering with the operation of the terminal, special characters should be escaped in the output of gix and ein commands, except when raw output has been requested or is otherwise expected.

In addition, for paths, when displaying them in human-readable (as opposed to JSON) text in a terminal, there is little to no disadvantage to quoting them using an unambiguous scheme. I think this should happen in the situations where git always quotes paths, i.e., for paths that git quotes even if core.quotePath has been set to false.

This does not necessarily mean they need to be quoted in the same way that git quotes them, nor that core.quotePath itself needs to be implemented.

Ideas for quoting

I am hoping it may be feasible to use the type system not just to implement specific display behavior, but to distinguish between text that may need escaping and text that is known not to need it, so that there would be fewer places, including in code that may be added in the future, where forgetting to calling a sanitization function or construct an safe-displaying object, or where using the wrong format specifier, would result in unintentionally outputting text that may contain terminal escape sequences. I don’t know if that’s feasible or if it’s a good idea.

Text from repositories is often &BStr, including when it is a path, as is the case for example in gixoxide_core::repository::tree::format_entry():

https://github.com/Byron/gitoxide/blob/25a3f1b0b07c01dd44df254f46caa6f78a4d3014/gitoxide-core/src/repository/tree.rs#L181-L202

That is not the only place where paths sometimes need to be quoted.

Although the obvious way to express that something is a path is to represent it as a Path or PathBuf, I don't think that is the best approach here. If I understand correctly, at least on Windows there’s no safe guaranteed-to-succeed conversion from &[u8] or &BStr to &OsStr or Path, nor a safe guaranteed-to-succeed way to construct an OsString or PathBuf from arbitrary bytes. Furthermore, if we had a Path, it would still need custom printing to neutralize escape sequences.

One approach may be to do this, though this Rust code may be better understood as pseudocode in that a faster approach with fewer allocations should probably be preferred:

let plain = format!("{filename}");
let quoted = format!("{filename:?}");
if format!("\"{plain}\"") == quoted { plain } else { quoted }

That is, if quoting would keep it the same except the double quote marks around it, then show it verbatim not bothering with those quotation marks, and otherwise show it with the debug quoting provided by the BStr implementation.

This quoting is sufficient to neutralize terminal escape sequences because it turns the escape character, most commonly represented as \e, \033, or \x1B, into this literal sequence (which I think is at least as good as those representations):

\u{1b}

It seems to me that this may be beneficial even outside of the issue of escape characters. For example, if a path stored in a Git repository has pieces that aren’t valid UTF-8, then it would probably be better to show the escape sequences for those bytes as BStr has Debug do, rather than to show the Unicode substitution character as BStr has Display do.

This is a lossless encoding. It is reasonably easy to parse, in case anyone wants to do that, because which of the two forms was used for the output is discernible by whether it has a leading " character. In particular, even if the original had a " character, then the debug representation escapes it while also adding more quotes, and is therefore never equal to the original with quotes added, and thus would always be used rather than the original with quotes added.

Git behavior

Paths

git always quotes escape characters in paths when not told to do otherwise such as by -z, and will perform additional quoting if core.quotePath is true, which it is by default.

The following rehashes a fragment of the advisory to compare the behavior of git and gix, but it is not a substitute for the advisory.

I created a file whose name I specified in bash using the $' ' notation (a more portable and fully described approach is presented in the advisory) as:

$'\033]0;Boo!\007\033[2K\r\033[91mError: Repository is corrupted. Run \033[96mEVIL_COMMAND\033[91m to attempt recovery.\033[0m'

Running git ls-tree HEAD shows this, which is identical to what it really outputs:

100644 blob 4b94b710bc5f8670d781e8a8db1a8db9d73f036f    "\033]0;Boo!\a\033[2K\r\033[91mError: Repository is corrupted. Run \033[96mEVIL_COMMAND\033[91m to attempt recovery.\033[0m"

In contrast, running gix tree entries changes the terminal title (until it is rewritten, which some shells prompts may do), and it shows this, in bright red and bright cyan as described above, such that it wrongly appears to be the entire output of the command:

Error: Repository is corrupted. Run EVIL_COMMAND to attempt recovery.

Here's a screenshot showing the appearance of both git and gix commands:

Screenshot of a terminal showing how git presents the unusual characters symbolically while gix lets them through and allows them to misleadingly rewrite and color the output

Non-path data

git actually sometimes allows escape sequences from a repository through, such as in author and committer information shown in the output of git log, as well as in changed blob contents shown in the output of git diff. However, it seems to avoid it in situations where it could be seriously misleading, or where it would interfere with the operation of the terminal.

Characters that could be especially misleading are represented symbolically, such as the backspace and carriage return characters. Escape sequences that could be especially misleading are simply not let through, such as attempts to reposition the cursor in the terminal, except that those are let through when the output device is not a terminal. Colorization, when allowed to change, seems always to be restored, though I have not exhaustively verified that. Attempts to conceal or mimic leading +, -, or space characters in diffs do not seem to succeed.

I believe the Git behavior is not vulnerable, even outside of the treatment of paths.

Steps to reproduce 🕹

See the "PoC" (proof of concept) section in CVE-2024-43785 (GHSA-88g2-r9rw-g55h, RUSTSEC-2024-0364).

See also the "Git behavior" section above, which includes both a brief description of reproducing this, a screenshot that is not currently in the advisory, as well as showing that Git is not affected.

@Byron Byron added help wanted Extra attention is needed acknowledged an issue is accepted as shortcoming to be fixed labels Aug 22, 2024
@Byron
Copy link
Member

Byron commented Aug 22, 2024

Thanks a lot for setting up this issue!

@Byron You have suggested some good ideas, which I have not covered here, because I don't know if you prefer to add them. You can let me know if you want me to expand this with information about what you've said about that, which you can then edit if you like. Or you can add them either by editing this or commenting.

Initially I thought that one would want to prevent any terminal escape sequences in gitoxide-core output, but then I realized that it's exactly this that one might want in gix status one day. Hence, one probably won't get around escaping all paths when outputting them.

A utility to perform the 'Debug-print if needed' operation efficiently, i.e. without allocation by reusing a buffer, is probably the way to go. I'd keep a single buffer, write the Debug version into it (it implements Write after all), and then see if it now differs from the input BStr. If so, print that instead.

@Skgland
Copy link

Skgland commented Aug 22, 2024

A utility to perform the 'Debug-print if needed' operation efficiently, i.e. without allocation by reusing a buffer, is probably the way to go. I'd keep a single buffer, write the Debug version into it (it implements Write after all), and then see if it now differs from the input BStr. If so, print that instead.

I might be misunderstanding something, but why would the debug version be needed to be written to a (reused) buffer instead of just printing it directly?

  • If the debug version doesn't differ from the original it doesn't matter whether we print it or the original, so as we are spending to time anyway to fill the buffer we can just print it directly instead.
  • If it differs its printed either way.

@Byron
Copy link
Member

Byron commented Aug 22, 2024

The debug version always prints with surrounding "", and they are used to indicate that the debug version is actually needed to be 'display-safe'.

@Skgland
Copy link

Skgland commented Aug 22, 2024

Couldn't one still eliminate the need for the buffer by instead checking if the string is display-safe as is and based on that either normal or display-safe printing it?

Either way one would need to iterate over the string twice.
Once to check/display-safe-print and once to actually print the original/display-safe buffer/display-safe-print directly.

I would expect to debug-print to a buffer and then printing that buffer to be slower than checking whether the string is display-safe and then either normal or display-safe printing directly.

@EliahKagan EliahKagan changed the title gitoxide-core does not neutralize special characters for terminals CVE-2024-43785: gitoxide-core does not neutralize special characters for terminals Aug 22, 2024
@Byron
Copy link
Member

Byron commented Aug 22, 2024

I would expect to debug-print to a buffer and then printing that buffer to be slower than checking whether the string is display-safe and then either normal or display-safe printing directly.

I'd expect that too, but didn't think the complexity of attempting to predict Debug to be worth it. Git might be faster here if it scans first, and with git ls-entries it's already hard to keep up, if I remember correctly, but it's also not the most important thing to beat Git here yet.
Once it is, it's good to know there are low-hanging fruit once that function lights up in profile runs.

@EliahKagan

This comment was marked as resolved.

@Skgland
Copy link

Skgland commented Aug 22, 2024

I'd expect that too, but didn't think the complexity of attempting to predict Debug to be worth it.

I would expected display-save printing would need to be separate from debug printing anyways especially on windows as I would expect \ to be common in windows paths and be considered display-safe there, but would expect debug to escape it to \\ as such not being equal to the original and resulting in debug output even though it should be display-safe.

@EliahKagan
Copy link
Member Author

I think it depends where the paths are coming from. For the most part I think the paths are from repository metadata and shown with / separators.

But I'm not sure how they are shown in error messages related to the inability to check them out (which paths with most control characters, including any ANSI escape sequences, would produce when attempting to create them on Windows, at least under ordinary conditions, since these characters are generally prohibited in paths on Windows).

@EliahKagan

This comment was marked as resolved.

@EliahKagan

This comment was marked as resolved.

@rrrodzilla
Copy link

Would it make sense to enforce safe terminal display at the type system level by replacing direct BStr terminal output with something like:

pub struct SafeTermPath<'a>(&'a BStr);

impl<'a> std::fmt::Display for SafeTermPath<'a> {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        // Central safe display logic here
        // Escape/quote as needed based on content
    }
}

The proposed approach would enable us to

  • prevent direct BStr terminal display through the type system,
  • centralize all escaping logic in one implementation,
  • maintain zero-cost abstractions via static dispatch.

For performance-critical paths like ls-entries, we could

  • avoid allocations through buffer reuse,
  • evaluate escape needs lazily,
  • cache results for repeated access.

I'm new to the codebase, so please let me know if I'm missing any important considerations or if this approach aligns with the project's goals.

@Byron
Copy link
Member

Byron commented Nov 17, 2024

Thanks for chiming in. I also think that would be fastest, but would also require the parts that are 'vulnerable' to be retrofitted. There probably isn't too many of these, so that should be fine.

I also thought about having an std::io::Write wrapper that would take out terminal escape codes in everything that's written. That way it could never be forgotten, but also would be slowest and prevent any terminal escape codes ever to get through unless there was a way to switch it off depending on what's written.

So overall, I think the proposed method of using a way of writing to std::io::Write without terminal escape codes specifically would do the trick. I don't think that std::fmt::Display is the way to do it though as it requires UTF8, which paths are not.

@rrrodzilla
Copy link

Thanks for the feedback. The performance concerns with the Write wrapper got me thinking about a different approach. Psuedocode follows:

pub trait TerminalSafe: AsRef<[u8]> {
    fn needs_escaping(&self) -> bool;
}

pub struct TerminalBytes<B: AsRef<[u8]>> {
    inner: B,
    needs_escaping: bool,
}

impl<B: AsRef<[u8]>> TerminalBytes<B> {
    pub fn new(bytes: B) -> Self {
        Self {
            needs_escaping: contains_terminal_escapes(bytes.as_ref()),
            inner: bytes,
        }
    }
}

Instead of checking every write operation (which would be slowest), this caches the safety check at creation time. When writing:

fn print_status(writer: &mut impl Write, item: impl AsRef<[u8]> + TerminalSafe) -> io::Result<()> {
    if item.needs_escaping() {
        write_escaped(writer, item.as_ref())
    } else {
        // Fast path for known-safe content
        writer.write_all(item.as_ref())
    }
}

This means

  • safe content takes the fast path with no escaping overhead,
  • we only pay for the safety check once per item rather than on every write,
  • no UTF-8 requirements since we work directly with bytes.

What are your thoughts on an approach like this instead of wrapping every Write operation?

@Byron
Copy link
Member

Byron commented Nov 18, 2024

Thanks for sharing!

In practice, each item, like a path in the index, is only written once, while it also won't have anything to escape. So I'd probably dumb in down and just write it escaped each time to avoid any extra complexity.

Only gix index entries has ever been optimized for write performance, and that would probably be a good command to try the escaping out on. Most of the content will have to go through the escaping routine, will that slow anything down significantly? If so, does Git anything smart there? My assumption is that as long as the escaped writing doesn't allocate, it won't be noticeable while writing, IO is the slow part. And even if the escaping ends up slowing things down a bit, I'd also be inclined to say that gix (CLI) doesn't have to compete with git in that regard, gix as a development tool, and when I optimized that I just wanted to have some fun.

@EliahKagan

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants