Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Troubleshooting] Dataset status didn't update as expected #613

Open
yuanzhou opened this issue Feb 6, 2024 · 10 comments
Open

[Troubleshooting] Dataset status didn't update as expected #613

yuanzhou opened this issue Feb 6, 2024 · 10 comments
Assignees
Labels
P Pitt dev team

Comments

@yuanzhou
Copy link
Member

yuanzhou commented Feb 6, 2024

Reported by Sunset on 2/5/2024 on PROD:

we had some issues with the multiAssay upload split step, for some reason, we still have the bug about statuses not being updated on the database (but entity-api does sends a successful response), here is a small snip of the log where you can find that we send the status update to Submitted on a dataset and a couple of minutes after that we query the dataset and the status was set back to New (not actually updated as New rather that the Submitted update never happened as shown in the status_history field):

[2024-02-02T17:14:32.273-0500] {logging_mixin.py:137} INFO - Setting status of dataset 9e84f7c2f4c473a7337db6cb9c6f9372 to Submitted
[2024-02-02T17:14:32.273-0500] {base.py:73} INFO - Using connection ID 'entity_api_connection' for task execution.
[2024-02-02T17:14:35.438-0500] {status_manager.py:195} INFO - COMPILED DATA: {'status': <Statuses.DATASET_SUBMITTED: 'submitted'>}
[2024-02-02T17:14:35.439-0500] {status_manager.py:209} INFO -
data:
{'status': <Statuses.DATASET_SUBMITTED: 'submitted'>}
[2024-02-02T17:14:35.439-0500] {base.py:73} INFO - Using connection ID 'entity_api_connection' for task execution.
[2024-02-02T17:14:36.424-0500] {logging_mixin.py:137} INFO - Reorganized new: 9e84f7c2f4c473a7337db6cb9c6f9372 from Upload: f080d45619badd505a45b990ea1f9a92 status is Submitted
[2024-02-02T17:15:03.867-0500] {base.py:73} INFO - Using connection ID 'entity_api_connection' for task execution.
[2024-02-02T17:15:03.931-0500] {logging_mixin.py:137} INFO - ds rslt:
[2024-02-02T17:15:03.932-0500] {logging_mixin.py:137} INFO - {'contains_human_genetic_sequences': True,
'created_by_user_displayname': 'HuBMAP Process',
....
'hubmap_id': 'HBM875.SGLH.395',
'last_modified_timestamp': 1706911891972,
'last_modified_user_displayname': 'HuBMAP Process',
'last_modified_user_email': '[[email protected]](mailto:[email protected])',
'last_modified_user_sub': '3e7bce63-129d-33d0-8f6c-834b34cd382e',
'local_directory_rel_path': 'protected/TMC - University of '
'Pennsylvania/9e84f7c2f4c473a7337db6cb9c6f9372/',
'provider_info': 'Penn-Visium : ./Visium_13_75_S2_b',
'status': 'New',
'status_history': [{'change_timestamp': 1706911891972,
'changed_by_email': '[[email protected]](mailto:[email protected])',
'status': 'New'}],
[2024-02-02T17:14:07.985-0500] {logging_mixin.py:137} INFO - Setting status of dataset fa7fbcd8ae9219225f5df25e8c5e994e to Submitted
[2024-02-02T17:14:07.985-0500] {base.py:73} INFO - Using connection ID 'entity_api_connection' for task execution.
[2024-02-02T17:14:08.053-0500] {status_manager.py:195} INFO - COMPILED DATA: {'status': <Statuses.DATASET_SUBMITTED: 'submitted'>}
[2024-02-02T17:14:08.054-0500] {status_manager.py:209} INFO -
data:
{'status': <Statuses.DATASET_SUBMITTED: 'submitted'>}
[2024-02-02T17:14:08.054-0500] {base.py:73} INFO - Using connection ID 'entity_api_connection' for task execution.
[2024-02-02T17:14:09.634-0500] {logging_mixin.py:137} INFO - Reorganized new: fa7fbcd8ae9219225f5df25e8c5e994e from Upload: f080d45619badd505a45b990ea1f9a92 status is Submitted
[2024-02-02T17:15:01.465-0500] {base.py:73} INFO - Using connection ID 'entity_api_connection' for task execution.
[2024-02-02T17:15:01.517-0500] {logging_mixin.py:137} INFO - ds rslt:
[2024-02-02T17:15:01.518-0500] {logging_mixin.py:137} INFO - {'contains_human_genetic_sequences': True,
'created_by_user_displayname': 'HuBMAP Process',
....
'hubmap_id': 'HBM859.FRLZ.427',
'last_modified_timestamp': 1706911874747,
'last_modified_user_displayname': 'HuBMAP Process',
'last_modified_user_email': '[[email protected]](mailto:[email protected])',
'last_modified_user_sub': '3e7bce63-129d-33d0-8f6c-834b34cd382e',
'local_directory_rel_path': 'protected/TMC - University of '
'Pennsylvania/fa7fbcd8ae9219225f5df25e8c5e994e/',
'provider_info': 'Penn-Visium : ./Visium_13_44_S1_b',
'status': 'New',
'status_history': [{'change_timestamp': 1706911874747,
'changed_by_email': '[[email protected]](mailto:[email protected])',
'status': 'New'}],
@yuanzhou yuanzhou self-assigned this Feb 6, 2024
@yuanzhou yuanzhou added the P Pitt dev team label Feb 6, 2024
@yuanzhou
Copy link
Member Author

yuanzhou commented Feb 6, 2024

@sunset666 can you list the API calls made against entity-api and ingest-api in sequence?

@yuanzhou
Copy link
Member Author

From Sunset:

First one is a simple PUT call to entity API, endpoint /entities/<uuid> json: {'status': 'submitted'}
There is a for cycle through all the uuids of an upload (for these particular runs one with 83 and another one with 102)
After that is complete, on another step (not immediately after). It is a GET call to the same endpoint, for the same list of uuids

@yuanzhou
Copy link
Member Author

yuanzhou commented Feb 16, 2024

2/16/2024, tested on PROD using an existing IEC Testing dataset d0ebafcbe06c9fa2cb739c4a860e79bd with "New" status, changed to "Submitted" and back to "New". Can NOT reproduce, and each status update deleted the old cache.

[2024-02-16 05:07:04] INFO in schema_manager: Deleted cache by key: hm_entity_prod__neo4j_d0ebafcbe06c9fa2cb739c4a860e79bd, hm_entity_prod__complete_d0ebafcbe06c9fa2cb739c4a860e79bd

Made a GET call right after this PUT call, showed updated status "Submitted"

[2024-02-16 05:07:34] INFO in schema_manager: Deleted cache by key: hm_entity_prod__neo4j_d0ebafcbe06c9fa2cb739c4a860e79bd, hm_entity_prod__complete_d0ebafcbe06c9fa2cb739c4a860e79bd

Made another GET call right after this PUT call, also showed updated status "New"
"status": "New",
"status_history": [
{
    "change_timestamp": 1708060024775,
    "changed_by_email": "[email protected]",
    "status": "Submitted"
},
{
    "change_timestamp": 1708060054425,
    "changed_by_email": "[email protected]",
    "status": "New"
}
]

@yuanzhou
Copy link
Member Author

@sunset666 the response of the PUT call looks like below:

{
    "message": "Dataset of d0ebafcbe06c9fa2cb739c4a860e79bd has been updated"
}

Can you add this response to the pipeline logging? And also log when the next GET call is made just in case the pipeline doesn't actually make a new GET call after the PUT call or the ds rslt output logging is from a previous result prior to the PUT call. This would help us further debug and find the root case.

@sunset666
Copy link

@yuanzhou, We are adding "Cache-Control: no-cache" header to the request, just to test if that is something that will fix this issue.

@yuanzhou
Copy link
Member Author

Cache-Control: no-cache" header to the request

Not sure if that'll make a difference, definitely worth to add more loggings. The entity-api has caching mechanism internally but it removes the old entity cache after each PUT update.

@yuanzhou
Copy link
Member Author

yuanzhou commented Mar 4, 2024

3/4/2024 from Sunset:

Still no data to test those updates.

@shirey
Copy link
Member

shirey commented Mar 5, 2024

Waiting for a way to replicate from @sunset666

@shirey shirey added this to Pitt HIVE Jun 7, 2024
@shirey shirey moved this to Backlog in Pitt HIVE Jun 7, 2024
@yuanzhou
Copy link
Member Author

@sunset666 are you still noticing this issue? Otherwise we'll close it.

@yuanzhou
Copy link
Member Author

@sunset666 should we close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P Pitt dev team
Projects
Status: Backlog
Development

No branches or pull requests

3 participants