assemble pages and then look for claims #10

gvelez17 · 2025-04-21T03:50:24Z

No description provided.

zeyadhessuin · 2025-04-26T03:47:59Z

pdf_parser/src/claim_viz.py

@@ -21,14 +22,30 @@ def process_and_visualize_claims(docmgr, output_file: str = "claims_analysis.htm
        print(f"First metadata sample: {results['metadatas'][0]}")
    else:
        print("No documents found in collection! Exiting")
-        exit
+        return


If we pass a new pdf, it will be processed by doc_manager.process_pdf(args.pdf) in 'main()', so how could this else statement happen unless there is not text in the pdf.

Testing code on new pdf not processed before doesn't work as it return instead of processing the pdf and complete

zeyadhessuin · 2025-04-26T03:51:05Z

pdf_parser/src/claim_viz.py

+    def sort_key(x):
+        metadata = x[1]
+        if 'bbox' in metadata:
+            return (metadata['page'], metadata['bbox'][1])  # Sort by page, then y-coord


It gets the page number without checking the PDF source. So it get's all text from page n, whatever it belongs to any pdf

zeyadhessuin · 2025-04-26T03:56:57Z

pdf_parser/src/claim_viz.py

+        page_text = ""
+        for text, metadata in pages[page_num]:
+            if metadata.get('type') == 'text':  # Skip images
+                page_text += text + "\n\n"


images' text can be added at the end of page_text after adding all direct text

zeyadhessuin

Grouping the chunks doesn't work well in the case of multiple stored processed PDFs in chromadb. It groups all text in page[n] from different PDFs (pdf_1, pdf_2, ...), so it will group text from all pages[n] we have.

assemble pages and then look for claims

6eb0f09

gvelez17 requested a review from ZiadHamdyy April 21, 2025 03:51

TutTrue requested a review from zeyadhessuin April 21, 2025 11:58

zeyadhessuin reviewed Apr 26, 2025

View reviewed changes

zeyadhessuin requested changes Apr 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assemble pages and then look for claims #10

assemble pages and then look for claims #10

gvelez17 commented Apr 21, 2025

zeyadhessuin Apr 26, 2025 •

edited

Loading

zeyadhessuin Apr 26, 2025

zeyadhessuin Apr 26, 2025

zeyadhessuin left a comment

assemble pages and then look for claims #10

Are you sure you want to change the base?

assemble pages and then look for claims #10

Conversation

gvelez17 commented Apr 21, 2025

zeyadhessuin Apr 26, 2025 • edited Loading

Choose a reason for hiding this comment

zeyadhessuin Apr 26, 2025

Choose a reason for hiding this comment

zeyadhessuin Apr 26, 2025

Choose a reason for hiding this comment

zeyadhessuin left a comment

Choose a reason for hiding this comment

zeyadhessuin Apr 26, 2025 •

edited

Loading