You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the ocr_interface.py file, it would be nice if the code handles the importlib.import_module(module_name) in the get_instance(...) function
@staticmethod@functools.lru_cache(maxsize=None)defget_instance(ocr_agent_module: str) ->"OCRAgent":
module_name, class_name=ocr_agent_module.rsplit(".", 1)
ifmodule_nameinOCR_AGENT_MODULES_WHITELIST:
module=importlib.import_module(module_name)
loaded_class=getattr(module, class_name)
returnloaded_class()
else:
raiseValueError(
f"Environment variable OCR_AGENT module name {module_name}, must be set to a"f" whitelisted module part of {OCR_AGENT_MODULES_WHITELIST}.",
)
I was so confused when I keep getting this error from the get_agent(...) function
ValueError: Environment variable OCR_AGENT must be set to an existing OCR agent module, not unstructured.partition.utils.ocr_models.tesseract_ocr.OCRAgentTesseract.
when after hours of digging it turns out I just haven't installed pandas lol🗿
The text was updated successfully, but these errors were encountered:
classOCRAgent(ABC):
"""Defines the interface for an Optical Character Recognition (OCR) service."""@classmethoddefget_agent(cls) ->OCRAgent:
"""Get the configured OCRAgent instance. The OCR package used by the agent is determined by the `OCR_AGENT` environment variable. """ocr_agent_cls_qname=cls._get_ocr_agent_cls_qname()
try:
returncls.get_instance(ocr_agent_cls_qname)
except (ImportError, AttributeError):
raiseValueError(
f"Environment variable OCR_AGENT must be set to an existing OCR agent module,"f" not {ocr_agent_cls_qname}."
)
The wording starts out very similarly :)
I'm thinking the right solution for this case is to print the original error messages as well as the "Environment variable ..." message.
I'm guessing the triggering error in your case was something like ImportError: cannot import package 'pandas' or whatever, and something like that might have been just the pointer you would need in that situation :)
What do you think about that idea?
Implementation would be something like:
try:
returncls.get_instance(ocr_agent_cls_qname)
except (ImportError, AttributeError) ase:
raiseValueError(
f"Environment variable OCR_AGENT must be set to an existing OCR agent module,"f" not {ocr_agent_cls_qname}: {str(e)}"
)
In the
ocr_interface.py
file, it would be nice if the code handles theimportlib.import_module(module_name)
in theget_instance(...)
functionI was so confused when I keep getting this error from the
get_agent(...)
functionValueError: Environment variable OCR_AGENT must be set to an existing OCR agent module, not unstructured.partition.utils.ocr_models.tesseract_ocr.OCRAgentTesseract.
when after hours of digging it turns out I just haven't installed pandas lol🗿
The text was updated successfully, but these errors were encountered: