Parse attachments from docket when available #718

ttys0dev · 2023-09-06T04:37:06Z

Needs some cleanup/testing but seems to work.

mlissner

Nice, thanks. A couple little comments. Two things I'm thinking about though:

Do we need to upgrade the make_doc1_url functions? If not, I'd say let's not bother.
I think it'd be better to get the court ID numbers from the prior item rather than from a lookup table. It feels tidier that way, to rely on the internal data rather than the external. Could you tweak it to do that and then remove the two big look ups? Sorry I crashed last night before realizing what you were up to.

Thanks again for this. Nice to have scraper updates.

juriscraper/pacer/docket_report.py

ttys0dev · 2023-09-07T20:34:07Z

Do we need to upgrade the make_doc1_url functions? If not, I'd say let's not bother.

So I did this mostly as a safeguard to ensure the mapping tables are accurate and to add additional sanity checks for any code using these functions(ie for example the test code with bad/mismatched court_id's). Making court_id optional may also allow for simplifying some logic elsewhere in the codebase.

I think it'd be better to get the court ID numbers from the prior item rather than from a lookup table. It feels tidier that way, to rely on the internal data rather than the external. Could you tweak it to do that and then remove the two big look ups? Sorry I crashed last night before realizing what you were up to.

My previous approach of extracting from a previous item was making the code a good bit more complex and I suspect a bit slower as well due to the extra complexity, note that the merge function will error out if there's a court ID number mismatch so we're still effectively validating the court ID numbers are correct against previous entries. Actually having the ability to compute the full doc id entirely independently from the docket entry doc id here I think makes the merge more robust in terms of being able to validate that they match.

mlissner · 2023-09-07T21:27:34Z

OK, let's clean up the other comments and we can land this. Please request my review when you think it's ready again.

mlissner · 2023-09-08T05:52:40Z

I saw the review request, but there are still a few outstanding tweaks. Could you do those, please, and do another request?

ttys0dev · 2023-09-08T06:07:47Z

there are still a few outstanding tweaks

I think I missed an example but I did add docstrings with explanations, the docstrings may not have been obvious with how github was showing the diffs in the review comment threads.

mlissner · 2023-09-08T06:14:09Z

Great. Merging. Juriscraper doesn't get auto-released, so to get this live we need to cut a new version and then update it in CourtListener. If you're thinking of doing other parsing work (which we desperately need), I'd suggest landing those PRs first, then doing a release.

ttys0dev · 2023-09-08T06:35:27Z

I don't have any immediate plans for more parsing work here, planning to look at integrating this into courtlistener next(after cleaning up some of the docket insertion code) but would be good to have a release to make it easier to test the integration.

mlissner · 2023-09-08T06:37:41Z

Could you try to use a git install for a bit so we can stay focused on Elastic Search?

ttys0dev · 2023-09-08T06:47:24Z

Could you try to use a git install for a bit so we can stay focused on Elastic Search?

Sure, probably going to wait until some preliminary refactoring like #3120 is merged first anyways.

ttys0dev force-pushed the docket-attachments branch from 8764cfb to f5b9e0b Compare September 6, 2023 04:40

This was referenced Sep 7, 2023

Parse docket report properly when "View multiple documents" option is enabled on PACER #687

Closed

687 Fix docket report parsing on view multiple documents layout #688

Merged

Add cand docket with attachments #719

Merged

ttys0dev force-pushed the docket-attachments branch 5 times, most recently from 2a24450 to 94e2cd0 Compare September 7, 2023 08:56

mlissner requested changes Sep 7, 2023

View reviewed changes

juriscraper/pacer/docket_report.py Outdated Show resolved Hide resolved

juriscraper/pacer/docket_report.py Outdated Show resolved Hide resolved

juriscraper/pacer/docket_report.py Show resolved Hide resolved

ttys0dev force-pushed the docket-attachments branch from 94e2cd0 to 715a5f6 Compare September 7, 2023 22:08

ttys0dev requested a review from mlissner September 7, 2023 22:19

Parse attachments from docket when available

0ebd7e1

ttys0dev force-pushed the docket-attachments branch from 715a5f6 to 0ebd7e1 Compare September 8, 2023 06:03

mlissner approved these changes Sep 8, 2023

View reviewed changes

mlissner merged commit 8f45e61 into freelawproject:main Sep 8, 2023

ttys0dev deleted the docket-attachments branch September 8, 2023 06:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse attachments from docket when available #718

Parse attachments from docket when available #718

ttys0dev commented Sep 6, 2023

mlissner left a comment

ttys0dev commented Sep 7, 2023 •

edited

Loading

mlissner commented Sep 7, 2023

mlissner commented Sep 8, 2023

ttys0dev commented Sep 8, 2023

mlissner commented Sep 8, 2023

ttys0dev commented Sep 8, 2023

mlissner commented Sep 8, 2023

ttys0dev commented Sep 8, 2023

Parse attachments from docket when available #718

Parse attachments from docket when available #718

Conversation

ttys0dev commented Sep 6, 2023

mlissner left a comment

Choose a reason for hiding this comment

ttys0dev commented Sep 7, 2023 • edited Loading

mlissner commented Sep 7, 2023

mlissner commented Sep 8, 2023

ttys0dev commented Sep 8, 2023

mlissner commented Sep 8, 2023

ttys0dev commented Sep 8, 2023

mlissner commented Sep 8, 2023

ttys0dev commented Sep 8, 2023

ttys0dev commented Sep 7, 2023 •

edited

Loading