-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created Sitemap database and support functions #180
base: master
Are you sure you want to change the base?
Conversation
@tjacovich there are still some pending tasks that I'm working on. See JIRA issue: https://adsabs.atlassian.net/jira/software/c/projects/BKO/boards/16?assignee=712020%3Affa7dce0-66c4-49ac-beed-3b744e70e1ff&selectedIssue=BKO-183 |
@mugdhapolimera Will take a look. Also, we need to update the github actions file in |
We need to make a small update to the github action
python -m pip install --upgrade wheel setuptools pip==24.0 This is because newer versions of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but you're definitely moving in the right direction!
@@ -19,7 +19,10 @@ | |||
from copy import deepcopy | |||
import sys | |||
from sqlalchemy.dialects.postgresql import insert | |||
|
|||
import os | |||
import pdb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just flagging this to remove for later.
# target_metadata = mymodel.Base.metadata | ||
target_metadata = None | ||
from adsmp import models | ||
target_metadata = models.Base.metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good. When we go to deploy we need to remember to set the PYTHONPATH
env variable in the container or else this may throw an error.
|
||
:param sitemap_url: string, url of the sitemap | ||
""" | ||
with open(sitemap_dir+'/robots.txt', 'w') as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. Wondering if we can condense this into a single call with an import from a file or something as opposed to having this hard-coded into app
.
@@ -2,7 +2,7 @@ | |||
# serves as a running log of claims and storage of author-related | |||
# information). It is not consumed by others (ie. we 'push' results) | |||
# SQLALCHEMY_URL = 'postgres://docker:docker@localhost:6432/docker' | |||
SQLALCHEMY_URL = "sqlite:///" | |||
SQLALCHEMY_URL = "postgresql://master_pipeline:master_pipeline@ingest_pipeline_test_db:5432/master_pipeline" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to move this to a local_config.py
because IIRC, the unittests on github use sqlite and rely on this being set as SQLALCHEMY_URL = "sqlite:///"
@@ -304,6 +308,82 @@ def task_delete_documents(bibcode): | |||
else: | |||
logger.debug('Failed to deleted metrics record: %s', bibcode) | |||
|
|||
@app.task(queue='populate-sitemap-table') | |||
def task_populate_sitemap_table(bibcodes, action = 'add', sitemap_dir='/app/logs/sitemap/'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be safer to force the user to explicitly define action
instead of assuming they want to add
by default.
app.delete_contents(SitemapInfo) | ||
|
||
# move all sitemap files to a backup directory | ||
app.backup_sitemap_files(sitemap_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a use-case for this to be configurable?
Populate the sitemap table for the given bibcodes | ||
""" | ||
|
||
if action not in ['add', 'remove', 'force-update', 'delete-table']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the remove
key do?
#unique sitemap filenames | ||
sitemap_filenames = list(map(lambda item: item[0], sitemapinfo)) | ||
|
||
for file in sitemap_filenames: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth converting this into a task so the files can be generated in parallel.
No description provided.