Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PF2 JSON data project H1 problem #12

Open
lillo84 opened this issue Mar 27, 2020 · 6 comments
Open

PF2 JSON data project H1 problem #12

lillo84 opened this issue Mar 27, 2020 · 6 comments

Comments

@lillo84
Copy link
Collaborator

lillo84 commented Mar 27, 2020

if you refer to h1 inside the id ctl00_MainContent_DetailedOutput

I also found the error that derives from an incorrect closure of the html code.

So.. i change the code from this
main = soup.find("span", {'id': 'ctl00_MainContent_DetailedOutput'}) #print (main) pfsLegal = main.find("img", {'title': 'PFS Legal'}) if(pfsLegal): pfsLegal = True else: pfsLegal = False name = main.find("h1", {'class': 'title'}).text

to this:
main = soup.find("div", {'id': 'main'}) #print (main) pfsLegal = main.find("img", {'title': 'PFS Standard'}) if(pfsLegal): pfsLegal = True else: pfsLegal = False for finder in main.find_all("a", {'href': eachEntry}): # temporary for error into html source name = finder.text

@jimbarnesrtp
Copy link
Owner

Thanks saw it this week ut a note in the readme about it, asked aon to fix their html as it breaks many things

@lillo84
Copy link
Collaborator Author

lillo84 commented Mar 27, 2020

Nice thing Sir, I hope everything goes well..

@jimbarnesrtp
Copy link
Owner

I am struggling with this one, in that philsophically I was trying to aim for an event driven parsing kind of like the java stax xml parsing pattern, instead of hard coding everything that I was looking for, but this breaks most things and instead of being event driven having to rewrite the code to now look for specific things, thoughts? I understand this way is of course more fragile, just more frustrating as bad html breaks it but html should be right

@lillo84
Copy link
Collaborator Author

lillo84 commented Apr 2, 2020

I know, the human error is frustrating, but the base idea is right.
in a few years, when the contents become a lot, it will take less time to modify the script than to copy them manually.
I would understand if instead there is a way to get the data directly from the database d20pfsrd.

Read also the other post I made on the differences between nethys and d20pfsrd. Because I believe there is a problem License Content in nethys than d20pfsrd

@jimbarnesrtp
Copy link
Owner

sorry it has been busy, and yes the stuff in d20 is the more open version of the names for many things, so is something to considering

@jimbarnesrtp
Copy link
Owner

working on some rewrites for some of the things, also switching to the html5lib for the codec is making it handle parsing better, adding testing to the code as well, and also pulling from the datasets coming from archives of nethys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants