Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lacking capability for in-memory processing. #66

Open
maxupp opened this issue Nov 24, 2023 · 1 comment
Open

Lacking capability for in-memory processing. #66

maxupp opened this issue Nov 24, 2023 · 1 comment

Comments

@maxupp
Copy link

maxupp commented Nov 24, 2023

The fact that output can only be written to files and not kept in memory for further processing is a major drawback.
I suggest returning a dictionary with all the TEI objects.

@kermitt2
Copy link
Owner

Hi @maxupp !

If you process just one file, client.process_pdf() returns the response in memory and you can just parse it with a python XML parser.

If you process files in batch, instead of writing the server responses in files on disk you can change the behavior here:
https://github.com/kermitt2/grobid_client_python/blob/master/grobid_client/grobid_client.py#L228

Or do I misunderstand the issue?

The idea of this client is to provide a simple basis (only dependencies on standard python libraries) that can be extended as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants