Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created Sitemap database and support functions #180

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mugdhapolimera
Copy link
Contributor

No description provided.

@mugdhapolimera mugdhapolimera self-assigned this Jul 18, 2024
@mugdhapolimera
Copy link
Contributor Author

@tjacovich
Copy link
Contributor

@mugdhapolimera Will take a look. Also, we need to update the github actions file in .github/workflows/python_actions.yaml. celery has some incompatibilities with newer versions of pip so we want to update this line with pip==24.0

@tjacovich
Copy link
Contributor

We need to make a small update to the github action

python -m pip install --upgrade wheel setuptools pip
This line becomes

python -m pip install --upgrade wheel setuptools pip==24.0 

This is because newer versions of pip have made certain import styles invalid and celery4 uses these.

Copy link
Contributor

@tjacovich tjacovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but you're definitely moving in the right direction!

@@ -19,7 +19,10 @@
from copy import deepcopy
import sys
from sqlalchemy.dialects.postgresql import insert

import os
import pdb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just flagging this to remove for later.

# target_metadata = mymodel.Base.metadata
target_metadata = None
from adsmp import models
target_metadata = models.Base.metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. When we go to deploy we need to remember to set the PYTHONPATH env variable in the container or else this may throw an error.


:param sitemap_url: string, url of the sitemap
"""
with open(sitemap_dir+'/robots.txt', 'w') as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. Wondering if we can condense this into a single call with an import from a file or something as opposed to having this hard-coded into app.

@@ -2,7 +2,7 @@
# serves as a running log of claims and storage of author-related
# information). It is not consumed by others (ie. we 'push' results)
# SQLALCHEMY_URL = 'postgres://docker:docker@localhost:6432/docker'
SQLALCHEMY_URL = "sqlite:///"
SQLALCHEMY_URL = "postgresql://master_pipeline:master_pipeline@ingest_pipeline_test_db:5432/master_pipeline"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to move this to a local_config.py because IIRC, the unittests on github use sqlite and rely on this being set as SQLALCHEMY_URL = "sqlite:///"

@@ -304,6 +308,82 @@ def task_delete_documents(bibcode):
else:
logger.debug('Failed to deleted metrics record: %s', bibcode)

@app.task(queue='populate-sitemap-table')
def task_populate_sitemap_table(bibcodes, action = 'add', sitemap_dir='/app/logs/sitemap/'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be safer to force the user to explicitly define action instead of assuming they want to add by default.

app.delete_contents(SitemapInfo)

# move all sitemap files to a backup directory
app.backup_sitemap_files(sitemap_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a use-case for this to be configurable?

Populate the sitemap table for the given bibcodes
"""

if action not in ['add', 'remove', 'force-update', 'delete-table']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the remove key do?

#unique sitemap filenames
sitemap_filenames = list(map(lambda item: item[0], sitemapinfo))

for file in sitemap_filenames:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth converting this into a task so the files can be generated in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants