Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlalchemy: ObjectDeletedError #213

Open
lukehsiao opened this issue Feb 6, 2019 · 5 comments
Open

sqlalchemy: ObjectDeletedError #213

lukehsiao opened this issue Feb 6, 2019 · 5 comments
Labels
bug Something isn't working help wanted Extra attention is required

Comments

@lukehsiao
Copy link
Contributor

lukehsiao commented Feb 6, 2019

Describe the bug
During some stages of the pipeline (e.g. candidate extraction or featurization), some of the UDFs crash with a sqlalchemy.orm.exc.ObjectDeletedError.

To Reproduce
Unfortunately, this requires a large dataset to reproduce. I will update this with a minimal example if I can find one. Otherwise, I've currently seeing it on the full transistor dataset when extracting candidates as follows

from fonduer.candidates import CandidateExtractor

candidate_extractor = CandidateExtractor(
    session,
    [PartStgTempMin, PartStgTempMax, PartPolarity, PartCeVMax],
    throttlers=[temp_throttler, temp_throttler, polarity_throttler, ce_v_max_throttler],
)

for i, docs in enumerate([train_docs, dev_docs, test_docs]):
        candidate_extractor.apply(docs, split=i, parallelism=PARALLEL)
        logger.info(
            f"PartStgTempMin in split={i}: "
            f"{session.query(PartStgTempMin).filter(PartStgTempMin.split == i).count()}"
        )
        logger.info(
            f"PartStgTempMax in split={i}: "
            f"{session.query(PartStgTempMax).filter(PartStgTempMax.split == i).count()}"
        )
        logger.info(
            f"PartPolarity in split={i}: "
            f"{session.query(PartPolarity).filter(PartPolarity.split == i).count()}"
        )
        logger.info(
            f"PartCeVMax in split={i}: "
            f"{session.query(PartCeVMax).filter(PartCeVMax.split == i).count()}"
        )


train_cands = candidate_extractor.get_candidates(split = 0)
dev_cands = candidate_extractor.get_candidates(split = 1)
test_cands = candidate_extractor.get_candidates(split = 2)

logger.info(f"Total train candidate: {len(train_cands[0])}")
logger.info(f"Total dev candidate: {len(dev_cands[0])}")
logger.info(f"Total test candidate: {len(test_cands[0])}")

Expected behavior
No errors. No UDFs crashing.

Error Logs/Screenshots

Process CandidateExtractorUDF-46:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/lfs/raiders10/hdd/lwhsiao/repos/fonduer/src/fonduer/utils/udf.py", line 194, in run
    self.session.add_all(y for y in self.apply(doc, **self.apply_kwargs))
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 1940, in add_all
    for instance in instances:
  File "/lfs/raiders10/hdd/lwhsiao/repos/fonduer/src/fonduer/utils/udf.py", line 194, in <genexpr>
    self.session.add_all(y for y in self.apply(doc, **self.apply_kwargs))
  File "/lfs/raiders10/hdd/lwhsiao/repos/fonduer/src/fonduer/candidates/candidates.py", line 262, in apply
    tuple(cand[j][1] for j in range(self.arities[i]))
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/transistors/transistor_throttlers.py", line 30, in stg_temp_filter
    if same_table((part, attr)):
  File "/lfs/raiders10/hdd/lwhsiao/repos/fonduer/src/fonduer/utils/data_model_utils/tabular.py", line 47, in same_table
    for i in range(len(c))
  File "/lfs/raiders10/hdd/lwhsiao/repos/fonduer/src/fonduer/utils/data_model_utils/tabular.py", line 47, in <genexpr>
    for i in range(len(c))
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 275, in __get__
    return self.impl.get(instance_state(instance), dict_)
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 674, in get
    value = self.callable_(state, passive)
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/strategies.py", line 678, in _load_for_state
    session, state, passive
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/strategies.py", line 723, in _get_ident_for_use_get
    for pk in self.mapper.primary_key
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/strategies.py", line 723, in <listcomp>
    for pk in self.mapper.primary_key
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/mapper.py", line 2753, in _get_state_attr_by_column
    return state.manager[prop.key].impl.get(state, dict_, passive=passive)
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 669, in get
    value = state._load_expired(state, passive)
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/state.py", line 632, in _load_expired
    self.manager.deferred_scalar_loader(self, toload)
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/loading.py", line 985, in load_scalar_attributes
    raise orm_exc.ObjectDeletedError(state)
sqlalchemy.orm.exc.ObjectDeletedError: Instance '<SpanMention at 0x7f9118d76630>' has been deleted, or its row is otherwise not present.

This also happens for ImplicitSpanMentions:

Process CandidateExtractorUDF-52:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/lfs/raiders10/hdd/lwhsiao/repos/fonduer/src/fonduer/utils/udf.py", line 194, in run
    self.session.add_all(y for y in self.apply(doc, **self.apply_kwargs))
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 1940, in add_all
    for instance in instances:
  File "/lfs/raiders10/hdd/lwhsiao/repos/fonduer/src/fonduer/utils/udf.py", line 194, in <genexpr>
    self.session.add_all(y for y in self.apply(doc, **self.apply_kwargs))
  File "/lfs/raiders10/hdd/lwhsiao/repos/fonduer/src/fonduer/candidates/candidates.py", line 262, in apply
    tuple(cand[j][1] for j in range(self.arities[i]))
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/transistors/transistor_throttlers.py", line 30, in stg_temp_filter
    if same_table((part, attr)):
  File "/lfs/raiders10/hdd/lwhsiao/repos/fonduer/src/fonduer/utils/data_model_utils/tabular.py", line 44, in same_table
    for i in range(len(c))
  File "/lfs/raiders10/hdd/lwhsiao/repos/fonduer/src/fonduer/utils/data_model_utils/tabular.py", line 44, in <genexpr>
    for i in range(len(c))
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 275, in __get__
    return self.impl.get(instance_state(instance), dict_)
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 674, in get
    value = self.callable_(state, passive)
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/strategies.py", line 678, in _load_for_state
    session, state, passive
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/strategies.py", line 723, in _get_ident_for_use_get
    for pk in self.mapper.primary_key
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/strategies.py", line 723, in <listcomp>
    for pk in self.mapper.primary_key
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/mapper.py", line 2753, in _get_state_attr_by_column
    return state.manager[prop.key].impl.get(state, dict_, passive=passive)
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 669, in get
    value = state._load_expired(state, passive)
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/state.py", line 632, in _load_expired
    self.manager.deferred_scalar_loader(self, toload)
  File "/lfs/raiders10/hdd/lwhsiao/repos/hack/.venv/lib/python3.6/site-packages/sqlalchemy/orm/loading.py", line 985, in load_scalar_attributes
    raise orm_exc.ObjectDeletedError(state)
sqlalchemy.orm.exc.ObjectDeletedError: Instance '<ImplicitSpanMention at 0x7f5d6bc300b8>' has been deleted, or its row is otherwise not present

Environment (please complete the following information):

  • OS: [e.g. Ubuntu 18.04]
  • PostgreSQL Version: [e.g. 9.6]
  • Poppler Utils Version: [e.g. 0.62.0]
  • Fonduer Version: [e.g. 0.5.0]
@lukehsiao lukehsiao added bug Something isn't working help wanted Extra attention is required labels Feb 6, 2019
@lukehsiao lukehsiao self-assigned this Feb 6, 2019
@lukehsiao lukehsiao removed the help wanted Extra attention is required label Feb 6, 2019
@lukehsiao lukehsiao added this to the v0.5.1 milestone Feb 6, 2019
lukehsiao added a commit that referenced this issue Feb 7, 2019
This adds session synchronization to all delete queries in fonduer. Note
that this does have performance implications. Specifically:

> The 'fetch' strategy results in an additional SELECT statement emitted
> and will significantly reduce performance.

from [1].

[1]: https://docs.sqlalchemy.org/en/latest/orm/query.html

Closes #213.
@lukehsiao
Copy link
Contributor Author

Reopening. This is still an issue.

@lukehsiao lukehsiao reopened this Feb 28, 2019
@lukehsiao lukehsiao removed their assignment Feb 28, 2019
@lukehsiao lukehsiao added the help wanted Extra attention is required label Feb 28, 2019
@lukehsiao lukehsiao removed this from the v0.6.0 milestone Feb 28, 2019
@senwu
Copy link
Collaborator

senwu commented Apr 4, 2019

Do we still observe this issue?

@lukehsiao
Copy link
Contributor Author

Yes.

@HiromuHota
Copy link
Contributor

I've just observed the same error message (not sure if it is exactly the same issue) during candidate_extractor.apply
I'm using Fonduer Version: 0.4.0.

[2019-04-04 22:42:15,822][INFO] fonduer.candidates.candidates - Clearing table cand_bank_branch (split 0)
---------------------------------------------------------------------------
ObjectDeletedError                        Traceback (most recent call last)
<ipython-input-147-00dd92b8a338> in <module>
      1 for i, docs in enumerate([train_docs, dev_docs, test_docs]):
----> 2     candidate_extractor.apply(docs, split=i, parallelism=PARALLEL)
      3     for candidate_class in candidate_classes:
      4         print("Number of Candidates for {} in split={}: {}".format(candidate_class.__name__, i, session.query(candidate_class).filter(candidate_class.split == i).count()))
      5 

/opt/conda/lib/python3.6/site-packages/fonduer/candidates/candidates.py in apply(self, docs, split, clear, parallelism, progress_bar)
    111             clear=clear,
    112             parallelism=parallelism,
--> 113             progress_bar=progress_bar,
    114         )
    115 

/opt/conda/lib/python3.6/site-packages/fonduer/utils/udf.py in apply(self, doc_loader, clear, parallelism, progress_bar, **kwargs)
     49         # Clear everything downstream of this UDF if requested
     50         if clear:
---> 51             self.clear(**kwargs)
     52 
     53         # Clear the last operated documents

/opt/conda/lib/python3.6/site-packages/fonduer/candidates/candidates.py in clear(self, split)
    129             self.session.query(Candidate).filter(
    130                 Candidate.type == candidate_class.__tablename__
--> 131             ).filter(Candidate.split == split).delete()
    132 
    133     def clear_all(self, split):

/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/query.py in delete(self, synchronize_session)
   3351         delete_op = persistence.BulkDelete.factory(
   3352             self, synchronize_session)
-> 3353         delete_op.exec_()
   3354         return delete_op.rowcount
   3355 

/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py in exec_(self)
   1326     def exec_(self):
   1327         self._do_pre()
-> 1328         self._do_pre_synchronize()
   1329         self._do_exec()
   1330         self._do_post_synchronize()

/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py in _do_pre_synchronize(self)
   1408         self.matched_objects = [
   1409             obj for (cls, pk, identity_token), obj in
-> 1410             query.session.identity_map.items()
   1411             if issubclass(cls, target_cls) and
   1412             eval_condition(obj)]

/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py in <listcomp>(.0)
   1410             query.session.identity_map.items()
   1411             if issubclass(cls, target_cls) and
-> 1412             eval_condition(obj)]
   1413 
   1414 

/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/evaluator.py in evaluate(obj)
     94             def evaluate(obj):
     95                 for sub_evaluate in evaluators:
---> 96                     value = sub_evaluate(obj)
     97                     if not value:
     98                         if value is None:

/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/evaluator.py in evaluate(obj)
    119         elif operator in _straight_ops:
    120             def evaluate(obj):
--> 121                 left_val = eval_left(obj)
    122                 right_val = eval_right(obj)
    123                 if left_val is None or right_val is None:

/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/evaluator.py in <lambda>(obj)
     76 
     77         get_corresponding_attr = operator.attrgetter(key)
---> 78         return lambda obj: get_corresponding_attr(obj)
     79 
     80     def visit_clauselist(self, clause):

/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/attributes.py in __get__(self, instance, owner)
    240             return dict_[self.key]
    241         else:
--> 242             return self.impl.get(instance_state(instance), dict_)
    243 
    244 

/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/attributes.py in get(self, state, dict_, passive)
    594 
    595                 if key in state.expired_attributes:
--> 596                     value = state._load_expired(state, passive)
    597                 elif key in state.callables:
    598                     callable_ = state.callables[key]

/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/state.py in _load_expired(self, state, passive)
    613             intersection(self.unmodified)
    614 
--> 615         self.manager.deferred_scalar_loader(self, toload)
    616 
    617         # if the loader failed, or this

/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/loading.py in load_scalar_attributes(mapper, state, attribute_names)
    879     # may not complete (even if PK attributes are assigned)
    880     if has_key and result is None:
--> 881         raise orm_exc.ObjectDeletedError(state)

ObjectDeletedError: Instance '<CandBankBranch at 0x7f73c7210780>' has been deleted, or its row is otherwise not present.

@HiromuHota
Copy link
Contributor

In my case,

for i, docs in enumerate([train_docs, dev_docs, test_docs]):
    session.query(Candidate).filter(Candidate.split == i).delete(synchronize_session="fetch")

works just fine. Maybe filter(Candidate.type == candidate_class.__tablename__) is doing wrong.

self.session.query(Candidate).filter(
Candidate.type == candidate_class.__tablename__
).filter(Candidate.split == split).delete(synchronize_session="fetch")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is required
Projects
None yet
Development

No branches or pull requests

3 participants