Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do something if the index might give away the game #5

Open
mlissner opened this issue Dec 17, 2020 · 3 comments
Open

Do something if the index might give away the game #5

mlissner opened this issue Dec 17, 2020 · 3 comments

Comments

@mlissner
Copy link
Member

Slate has a great article about how they used the index of a document to figure out a bunch of redactions. Basically, a word would be redacted in some places, but not others, so you could look up the word in the index and figure it out:

https://slate.com/news-and-politics/2020/10/ghislaine-maxwell-deposition-redactions-epstein-how-to-crack.html

I suppose this is beyond what computers can do, BUT it'd be nice if we could highlight if there's an index that could be used for this purpose?

@ZavierHenry
Copy link

This one is a bit tricky but at the very least something like this should be possible:

if index exists:
    for each redacted word w in index:
        x = last alphabetical unredacted word before w
        y = first alphabetical unredacted word after w
        for each location (page, line) of w:
            if there are no redactions in line:
                flag every word between x and y for possible improper redaction
            else:
                ws = words in line between x and y
                for each x in ws:
                    flag x if location in w has redactions and x is not in line

Let's use finding Clinton in the Maxwell deposition as an example of how this would work. Going through the redactions in the index we would check the one between clients and clock. Then, going through the locations of the redacted word, we would see on page 135, line 7 that there are no redactions, so we flag every word between clients and clock, which in this case is only Clinton.

While this won't find every instance of being able to figure out redactions given an index, it should still be helpful

@ZavierHenry
Copy link

As I was thinking of the previous this I realized that it could be simpler to just start with checking that every location of a redacted word in the index has at least one redaction as a sanity check

@mlissner
Copy link
Member Author

Pretty neat. There is a lot of handwaving at hard stuff (like figuring out what line 7 is, say), but the algorithm sure seems about right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants