Skip to content
This repository has been archived by the owner on Dec 4, 2023. It is now read-only.

Remove "files" from default skips? #27

Open
rsignell-usgs opened this issue Aug 21, 2020 · 2 comments
Open

Remove "files" from default skips? #27

rsignell-usgs opened this issue Aug 21, 2020 · 2 comments

Comments

@rsignell-usgs
Copy link
Member

rsignell-usgs commented Aug 21, 2020

@kwilcox We just spent an hour trying to figure out why some datasets from our catalog were not getting picked up by the crawler and eventually we found the problem: the path looked like /models/model_a/run27/output_files/catalog.ncml and it was getting rejected by the default skips because it contains "files".

The default skips are:

[
  '.*files.*',
  '.*Individual Files.*',
  '.*File_Access.*',
  '.*Forecast Model Run.*',
  '.*Constant Forecast Offset.*',
  '.*Constant Forecast Date.*'
]

Could we remove the .*files.* line, or if we need it for some common use case, make it more specific, like .*files$?

https://github.com/ioos/thredds_crawler/blob/master/thredds_crawler/crawl.py#L55

@kwilcox
Copy link
Member

kwilcox commented Aug 21, 2020

You can supply a custom list of skips, see https://github.com/ioos/thredds_crawler#skip

@rsignell-usgs
Copy link
Member Author

Yes, that's how we solved it. I'm just wondering if maybe it would be nicer to make that a more specific skip if we can.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants