Add Reference Citation Extractor #191

flooie · 2025-01-10T21:41:13Z

Add ReferenceCitation to find citations like Foo at 123,
Requires a full citation to be present and previous something like Foo v. Bar. 1 U.S. 1.

Also fixes the extraction of defendant/plaintiff name when parallel citations exist.

Add XYZ at 123 to eyecite capability Improve and recognize parallel citations

quevon24

Everything looks good and very well structured. I like that you added several test cases, not just what should work.

Only found one typo in a comment and some suggestions for docstrings . It can now be merged without any problems. These are very small details.

tests/test_AnnotateTest.py

eyecite/helpers.py

eyecite/resolve.py

Fix typos

flooie · 2025-01-14T20:00:57Z

I don’t think our testing files are as large as we state on the packaging. I downloaded the ten percent sample and ran it locally. It appears to only contain 7600 rows of opinions. A far cry from ten percent moniker of the 10 million opinion objects in the database. On the flip side that extrapolates to 126,000 reference citations that could be added to the citation database.

Also - the auto generated markdown here appears to reverse the gains and losses columns. I'm not sure why - but locally it did not do that - seems to create the markdown correctly—identifying the gains as gains. Above, it shows these are classified as losses, but you can see from the output that this isn’t the case. I’ll add some notes to the Eyecite report issue to clarify this.

On a final note, the Eyecite report did catch a regex bug that was causing a number of essentially empty citations to be found. I fixed the bug and added several additional tests to ensure this is properly handled moving forward.

@mlissner

mlissner · 2025-01-14T21:28:01Z

Nice to see the eyecite report finding bugs; weird that it's backwards, but I guess it must have always been that way.

I don't know why the 10 percent file is the wrong size, but probably I made it using a random sample method that doesn't guarantee a particular count (and probably I had an error setting the percentage?). Seems to be work OK though, I guess.

7600 rows of opinions [...] that extrapolates to 126,000 reference citations

That comes out to 126,000 ÷ 7600 = 16.6 additional citations per case. Neat.

flooie · 2025-01-15T15:42:48Z

@mlissner -that comes out to 126,000 ÷ 7600 = 16.6 additional citations per case. Neat.

I think our wires are crossed here. this found 91 reference citations (excluding the much more common I suspect references to cases) in the 7,600 sample file.

So unless my math is wrong

(10,549,603 opinions / 7,600) * 91 ~= 126,317 reference citations

mlissner

Man, I don't know this code all that well anymore, but I think this looks pretty good. I guess one thing that'd give me more confidence would be more tests. Would it be possible to add a few more, including ones where the current code isn't good enough (like, perhaps, it can't find the plaintiff, or other known failure modes)?

I can't quite suss them out, but I think it'd be helpful to have them written down, even if they're known to fail.

eyecite/models.py

eyecite/helpers.py

mlissner · 2025-01-18T00:18:07Z

eyecite/helpers.py

@@ -307,6 +307,27 @@ def disambiguate_reporters(
    ]


+def filter_citations(citations: List[CitationBase]) -> List[CitationBase]:


Do we have a test case for this, so I can see what it's supposed to do?

I added a test in find test that shows how it is used. Essentially it's meant to be a back stop against older or oddly named reporters now and in the future.

For example, Miles is a reporter from a way back. It envisions a scenario where

Miles v. Smith 1 US 1 - .... 101 Miles 100 (1850), .... in 101 Miles at 105

In this scenario we have a FULL Cite, a Second Full Cite and a Short Cite. But the final one could also be a Reference Cite. The function filters out the reference citation.

Also - since reference citations are found after each full case citation is found, they are found out of sequential order. This function also sorts our newly filtered list by span.

OK. If that's important, let's explain that in the docstring, because it's pretty hard to understand what's going on here otherwise (at least for me).

sounds good.

Limit the names that can be used to better formatted plaintiff/defendants Add tests to show filtering/ordering reference citaitons And refactor add defendant for edge case where it could be only whitespace. typos etc.

flooie · 2025-01-22T19:19:28Z

I ran this latest batch with the 1 percent file on my machine and it added 1188 new correct reference citations.

This extrapolates to 118,800 new reference citations in the dataset under strict standards.

mlissner · 2025-01-22T21:15:52Z

118,800 new reference citations

That's surprisingly few, no? I'd expect at least one or two per case, and about 10× more than that across the full data set. Are we missing citations we should be grabbing?

flooie · 2025-01-22T21:18:44Z

No I dont think so. Remember these reference citations all require the format [NAME] at [PAGE].

so I think its not a format that is used as often as you would expect. I think you are right though that when we add any reference - like in Roe. we are going to have many more. That should be done in a separate PR

github-actions · 2025-01-23T14:48:49Z

The Eyecite Report 👁️

Gains and Losses

There were 0 gains and 13 losses.

Click here to see details.

id	Gain	Loss
2060699		Beckler at 775
2060699		Frohlich at 301
2829730		Layne at 405
2414924		Brzonkala at 37
2414924		Brzonkala at 834
2414924		Robinson at 1211
2414924		Robinson at 1210
2414924		Brzonkala at 874
2414924		Brzonkala at 887
2414924		Brzonkala at 3
2414924		Boerne at 2170
1433305		Gullings at 244
2267203		Fisher at 1347

Time Chart

Generated Files

Branch 1 Output
Branch 2 Output
Full Output CSV

flooie · 2025-01-24T15:38:30Z

@mlissner any chance this is ready? I'd like to get this merged before I update the more advanced complex citation parsing finished?

mlissner · 2025-01-24T15:42:01Z

Sorry, I didn't realize it was waiting on me. Merged, thank you!

flooie · 2025-01-24T15:42:27Z

@mlissner thank you

flooie added 9 commits January 10, 2025 14:38

feat(find): Add reference citation to model

f2ec1e1

Add XYZ at 123 to eyecite capability Improve and recognize parallel citations

fix(eyecite): Lint

98b90a1

fix(eyecite): Flake8 fixes

fa419aa

fix(eyecite): Flake8 fixes

b56bf22

fix(eyecite): Black lint

af97e06

fix(eyecite): iSort

8014239

chore(eyecite): lint

95fdb1b

chore(eyecite): lint

b239171

chore(eyecite): lint

a785288

flooie requested a review from quevon24 January 10, 2025 21:41

flooie assigned quevon24 Jan 10, 2025

quevon24 approved these changes Jan 10, 2025

View reviewed changes

tests/test_AnnotateTest.py Outdated Show resolved Hide resolved

eyecite/helpers.py Outdated Show resolved Hide resolved

eyecite/resolve.py Show resolved Hide resolved

flooie added 2 commits January 14, 2025 12:06

fix(find.py): Add fixes for over id'ing reference citations

9ea2169

Fix typos

fix(tests): Lint and test fixes

6507d01

flooie requested a review from mlissner January 14, 2025 19:52

flooie assigned mlissner and unassigned quevon24 Jan 14, 2025

mlissner reviewed Jan 18, 2025

View reviewed changes

fix(tests): Make reference citation extraction stricter

67df9a8

Limit the names that can be used to better formatted plaintiff/defendants Add tests to show filtering/ordering reference citaitons And refactor add defendant for edge case where it could be only whitespace. typos etc.

flooie added 2 commits January 22, 2025 14:21

fix(tests): Flake8

b4f6ff2

Merge branch 'main' into fix-eyecite-defendants

6184921

flooie added 2 commits January 22, 2025 16:34

chore(filter_citations): Update docstring

5edc50d

Merge branch 'main' into fix-eyecite-defendants

7f28ad6

mlissner merged commit d09473c into main Jan 24, 2025
13 checks passed

mlissner deleted the fix-eyecite-defendants branch January 24, 2025 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Reference Citation Extractor #191

Add Reference Citation Extractor #191

flooie commented Jan 10, 2025

quevon24 left a comment

flooie commented Jan 14, 2025

mlissner commented Jan 14, 2025

flooie commented Jan 15, 2025

mlissner left a comment

mlissner Jan 18, 2025

flooie Jan 22, 2025

mlissner Jan 22, 2025

flooie Jan 23, 2025

flooie commented Jan 22, 2025

mlissner commented Jan 22, 2025

flooie commented Jan 22, 2025

github-actions bot commented Jan 23, 2025

flooie commented Jan 24, 2025

mlissner commented Jan 24, 2025

flooie commented Jan 24, 2025

		@@ -307,6 +307,27 @@ def disambiguate_reporters(
		]


		def filter_citations(citations: List[CitationBase]) -> List[CitationBase]:

Add Reference Citation Extractor #191

Add Reference Citation Extractor #191

Conversation

flooie commented Jan 10, 2025

quevon24 left a comment

Choose a reason for hiding this comment

flooie commented Jan 14, 2025

mlissner commented Jan 14, 2025

flooie commented Jan 15, 2025

mlissner left a comment

Choose a reason for hiding this comment

mlissner Jan 18, 2025

Choose a reason for hiding this comment

flooie Jan 22, 2025

Choose a reason for hiding this comment

mlissner Jan 22, 2025

Choose a reason for hiding this comment

flooie Jan 23, 2025

Choose a reason for hiding this comment

flooie commented Jan 22, 2025

mlissner commented Jan 22, 2025

flooie commented Jan 22, 2025

github-actions bot commented Jan 23, 2025

The Eyecite Report 👁️

Gains and Losses

Time Chart

Generated Files

flooie commented Jan 24, 2025

mlissner commented Jan 24, 2025

flooie commented Jan 24, 2025