Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help with whitespace, indent, new lines in XML/Oxygen #53

Open
davidamichelson opened this issue Dec 17, 2019 · 5 comments
Open

Need help with whitespace, indent, new lines in XML/Oxygen #53

davidamichelson opened this issue Dec 17, 2019 · 5 comments
Assignees
Labels
Milestone

Comments

@davidamichelson
Copy link

davidamichelson commented Dec 17, 2019

@wsalesky I think the default settings in Oxygen is doing something odd to the spacing/indent in our files.

See for example: https://github.com/srophe/bethqatraye-data/blob/master/data/places/tei/143.xml#L138-L139

Why is this desc text node splitting into a new line like that?

Or why is the closing /desc on a new line here: https://github.com/srophe/bethqatraye-data/blob/master/data/places/tei/143.xml#L142-L143

Anyway, we would like to write some find and replace scripts using regex to clean this data up, but are having trouble because of the spacing. Any ideas on what is going on?

@wsalesky
Copy link
Collaborator

@davidamichelson @wlpotter is this a result of oXygen or a result of eXist export? It looks to me like at some point the file was 'pretty-printed' in oXygen, this shouldn't change meaningful whitespace.

@davidamichelson
Copy link
Author

@wsalesky thanks, we would like to use regex to find and correct all cases where there is a missing "." at the end of either //desc() or //desc/quote() but we can't seem to get around the new lines in Oxygen. Any ideas?

@davidamichelson
Copy link
Author

The problem was caused by Oxygen we think

@wsalesky
Copy link
Collaborator

What is the regex you are trying?

@davidamichelson
Copy link
Author

@wsalesky Let's save this for later.

When we do look at it, record 2294 is a good example

               <desc type="abstract" xml:lang="en" xml:id="abstract2994-1">A region between <ref target="https://bqgazetteer.bethmardutho.org/place/2970"><placeName ref="http://syriaca.org/place/2970">al-Ray</placeName></ref> and <ref target="https://bqgazetteer.bethmardutho.org/place/2997"><placeName ref="http://syriaca.org/place/2997">Naysābūr</placeName></ref>, around <ref target="https://bqgazetteer.bethmardutho.org/place/2995"><placeName ref="http://syriaca.org/place/2995">al-Dāmaghān</placeName></ref>
               </desc>
               <desc xml:lang="en">
                        <quote source="#bib2994-4">[A] small province of
                  mediaeval Islamic Persia, lying to the south of the Alburz chain <choice>
                                <corr>watershed</corr>
                                <sic>watershd</sic>
                            </choice> and
                  extending into the northern fringes of the Das̲h̲t-i Kavīr.</quote>
                    </desc>

@davidamichelson davidamichelson modified the milestones: Dec 18, Later Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants