Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull standard names into Python #144

Closed
lhmarsden opened this issue Dec 12, 2022 · 10 comments
Closed

Pull standard names into Python #144

lhmarsden opened this issue Dec 12, 2022 · 10 comments
Labels
question Further information is requested

Comments

@lhmarsden
Copy link

lhmarsden commented Dec 12, 2022

Hi,

I would like to pull the latest version of the standard names - including descriptions, units and the grouping - into Python for a template generator I am building. Is there a way to do this?

Of course I could pull them from here: http://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html and configure something manually.

But ideally I would like to build something more future-proof, accounting for any additions to the standard names list and also any possible reformatting of the above page. I assume this page pulls data from somewhere. Is that 'somewhere' publicly accessible? Perhaps an API?

Can you please help me with this?

Thanks!

Luke

@lhmarsden lhmarsden added the question Further information is requested label Dec 12, 2022
@MathewBiddle
Copy link

MathewBiddle commented Dec 12, 2022

You could use pandas to read the xml endpoint?

https://cfconventions.org/Data/cf-standard-names/79/src/cf-standard-name-table.xml

For example,

import pandas as pd

df = pd.read_xml('https://cfconventions.org/Data/cf-standard-names/79/src/cf-standard-name-table.xml', xpath="entry")

df

@lhmarsden
Copy link
Author

Thanks! Just what I need.

Do you know if the information to group them is somewhere too? I couldn't see this in the XML. For example, if I select 'Sea Ice' on this page: http://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html

It would be super if I could group the standard names in the same way you do - accounting also for any new terms or changes in grouping.

@zklaus
Copy link

zklaus commented Dec 12, 2022

If you take a look at the javascript on that web page, you will see that this grouping is really simply a text search with carefully selected keywords that basically sets the display style of not matching HTML table rows to invisible.
You can see the exact filter applied after clicking on the group category manifest in the search options of the form.

@sadielbartholomew
Copy link
Member

sadielbartholomew commented Dec 12, 2022

Hi @lhmarsden, if you are willing to 'pull' them from the canonical source code for the tables, i.e. under various directories organised by table number in https://github.com/cf-convention/cf-convention.github.io/tree/main/Data/cf-standard-names, rather than the rendered site data of those names, which is more robust in line with your desire here:

ideally I would like to build something more future-proof, accounting for any additions to the standard names list and also any possible reformatting of the above page.

then I already have some Python code that gets all of the names from any (or all) version(s) of the table and outputs them as a dictionary (it uses regular expressions to parse the XML which might not be the simplest way but it works a charm and is quick and robust, so good enough). I wrote those functions to allow me to use the outputs to create the plots describing totals and nature of the standard name sets as described in the issue here: cf-convention/cf-convention.github.io#110, but I realise there could be wider use for the code.

I had that code on a personal git branch but have since moved it to tidy it up, so the current working code is not available for me to share yet, but I can put it up somewhere shortly if this is the kind of thing you are looking for?

I should add, my code presently doesn't pull in the further information such as:

including descriptions, units and the grouping

but it can be trivially adapted to include this information too. If you would like, and give me a few days to find time to make the necessary tweaks, I can make the trivial adaptations so that my code that I can share includes those?

@lhmarsden
Copy link
Author

Thanks all for your interesting and helpful replies!

I think I will go with @MathewBiddle and pull the data from XML, and then group the terms using a text search. I hope over short to medium time scales, this should be suitable, and this approach is very simply so it will presumably be simple to adapt any code as necessary in the future.

Thanks @sadielbartholomew I see that your solution is indeed more future-proof, but I will stick with the simpler approach in this case. And thanks for your generous offer of help.

@DocOtak
Copy link
Member

DocOtak commented Dec 13, 2022

@lhmarsden In the hope that it might be useful, here is some code I use to load the xml table into an sqlite database linking to just the xml reading part: https://github.com/cchdo/params/blob/ce69f81afdc92e2128494198539362549d4f2880/cchdo/params/__main__.py#L26-L60

It does have a check to make sure I'm loading the standard name table version it is expecting, that could be removed.

@MathewBiddle
Copy link

@lhmarsden you can adjust the url in that code to point to the current xml document hosted on GitHub (which I just learned about through this conversation, so thank you for presenting this opportunity to learn something new):

https://github.com/cf-convention/cf-convention.github.io/raw/main/Data/cf-standard-names/current/src/cf-standard-name-table.xml

That way you pull over the most recent table every time you run the code.

@lhmarsden
Copy link
Author

Very useful, thanks all

@lhmarsden
Copy link
Author

@lhmarsden you can adjust the url in that code to point to the current xml document hosted on GitHub (which I just learned about through this conversation, so thank you for presenting this opportunity to learn something new):

https://github.com/cf-convention/cf-convention.github.io/raw/main/Data/cf-standard-names/current/src/cf-standard-name-table.xml

That way you pull over the most recent table every time you run the code.

I think you can also use this which is a bit of a neater URL

https://cfconventions.org/Data/cf-standard-names/current/src/cf-standard-name-table.xml

@JonathanGregory
Copy link
Contributor

This question has been answered, so I'm closing this issue. I have opened website issue 408 to propose that we provide a link to https://cfconventions.org/Data/cf-standard-names/current/src/cf-standard-name-table.xml, the URL suggested by Luke @lhmarsden. Thanks, Luke.

@efisher008 efisher008 transferred this issue from cf-convention/discuss Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants