Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(find_reference_citations_from_markup) #203

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

grossir
Copy link
Contributor

@grossir grossir commented Feb 5, 2025

Solves #198
Solves problems made evident by first iteration of this PR and described here in #209

  • Implements a function to get name-only ReferenceCitations, taking advantage of style i/em tags on HTML sources
  • this new function will be triggered by passing an extra argument to the main function find.get_citations
  • Refactors ReferenceCitation.is_valid_name to utils.is_valid_name
  • adds regexes.PRE_FULL_CITATION_REGEX to account for single-name full case citations and for single-name-and-pincite-full-case-citations
  • add tests for the new function, to check both that it works as standalone, and that it does not collide with other citation types
  • resolved a bug in match_on_tokens where MAX_MATCH_CHARS was used incorrectly
  • updated tests that where invalidated, where what was identified as a Reference was actually a part of the FullCaseCitation

Solves #198

Implements a function to get name-only ReferenceCitations, taking advantage of style i/em tags on HTML sources

- Refactors ReferenceCitation.is_valid_name to utils.is_valid_name
- add tests for the new function
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch from 81b4fa2 to 2e6ae84 Compare February 5, 2025 23:36
@grossir grossir requested a review from flooie February 5, 2025 23:40
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch from 2e6ae84 to 24d6166 Compare February 5, 2025 23:44
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch from 24d6166 to d8eb198 Compare February 5, 2025 23:47
grossir added a commit to freelawproject/courtlistener that referenced this pull request Feb 6, 2025
…w uses find_reference_citations_from_markup

Adds logic to use freelawproject/eyecite#203
@flooie flooie assigned grossir and unassigned flooie Feb 7, 2025
flooie and others added 8 commits February 7, 2025 10:50
apply refactor from code review #206
…ent pincites

This will help disambiguate adyacent ReferenceCitations

- add `helpers.add_pre_citation`
- add regex needed
- add test_FindTest where this is used
- resolved a bug in match_on_tokens where MAX_MATCH_CHARS was used incorrectly
- updated tests that where invalidated, where what was identified as a Reference was actually a part of the FullCaseCitation
This is passed to `extract_reference_citations`, which allows us to use `find_reference_citations_from_markup` inside that function, simplyfing the calls
Solves #209

- add test cases for full case citation with antecedent and no pincite
- fix span calculation on add_pre_citation
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch 2 times, most recently from 8db0f0a to 71c42c6 Compare February 13, 2025 21:46
Bill noticed on testing that the HTML extraction on real data was slow; we were using a SpanUpdater for each full citation; code is now refactored to create the SpanUpdaters once, for each Opinion
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch from 71c42c6 to 509c12a Compare February 13, 2025 21:48
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch from 38fde4f to 61ddc22 Compare February 13, 2025 22:39
@grossir grossir mentioned this pull request Feb 14, 2025
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch from 45981a7 to 8f72df6 Compare February 14, 2025 22:22
Copy link
Contributor

The Eyecite Report 👁️

Gains and Losses

There were 0 gains and 236 losses.

Click here to see details.
id Gain Loss
4746031 Llamas-Villa
4746031 Pena
4746031 Buie
4799679 Graham
4799679 Akins
5066102 Fredericks
5071459 Burks
5071459 Greene
5112424 Jones
5112424 Overmyer
5123092 Atwood
5160500 Sibley
5165179 Rodriguez
5165179 McKinstrey
5167616 Toney
5618955 Widincamp
5656104 Malouf
5750897 Preston
1996784 Caplin
2014564 Hanreddy
2060699 Frohlich
1917661 Doucet
3419420 Best
3419420 Martin
3419420 Cunningham
2303811 Campos
2303811 Lovett
2303811 Butzberger
2303811 Miller
2303811 Campos
2303811 Vanderweele
2387663 Murray
1662392 Jergnigan
1744543 Solem
1744543 Faretta
1804094 Mercer
1783747 Kaperonis
1783747 Vallon
2168388 Tomasek
1853016 Tyler
1137818 Cherney
1137818 Beekner
1137818 Payne
1341018 Looney
1537257 Greger
1537257 Pope
1546016 Pettit
1546016 Vincenzi
1546016 Davis
1929026 Walker
1940979 Wallace
1941966 LeBrane

Time Chart

image

Generated Files

Branch 1 Output
Branch 2 Output
Full Output CSV

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: PRs to Review
Development

Successfully merging this pull request may close these issues.

2 participants