-
Notifications
You must be signed in to change notification settings - Fork 42
Frequently asked questions
Q: How do I get the last node? //foo//bar
returns all bars, but I only want the last one, and //foo//bar[last()]
did not work.
<div>
<foo>
<bar>First </bar>
<bar>Second </bar>
</foo>
<foo>
<bar>Third </bar>
<bar>Fourth </bar>
</foo>
</div>
A: //foo//bar[last()]
would return the last bar of its parent, in the example Second
and Fourth
You need (//foo//bar)[last()]
to get the last of those.
Q: I want to extract the title attribute from links whose href contains the string "contentFile.aspx"
.
This command returns the href, but I do not know how to get the Title contents instead.
xidel http://www.coorong.sa.gov.au/page.aspx?u=1813 --xquery '//a/@href[contains(., "contentFile.aspx")]'
A: You can go back from the @href to the corresponding a:
xidel http://www.coorong.sa.gov.au/page.aspx?u=1813 --xquery '//a/@href[contains(., "contentFile.aspx")]/../@title'
Or you can put the condition on the a:
xidel http://www.coorong.sa.gov.au/page.aspx?u=1813 --xquery '//a[@href[contains(., "contentFile.aspx")]]/@title'
or
xidel http://www.coorong.sa.gov.au/page.aspx?u=1813 --xquery '//a[contains(@href, "contentFile.aspx")]/@title'
Q: How do you find tags which include a certain text?
A: You can use contains
or matches
on these nodes. E.g.
xidel input.html -e '//*[contains(., "searched text")]'
finds all nodes containing text as well as their ancestors, because a node containing a node containing text contains the text, too.
To find the nodes without ancestors, you can check only the direct text of the nodes:
xidel input.html -e '//*[text()[contains(., "searched text")]]'
This is also much faster, however texts that span multiple nodes are not found, e.g. in <span>foo<b>bar</b></span>
either foo
or bar
can be found with text()
, but not foobar
.
When "searched text"
is a regular expression, you can use matches
in place of contains
.
Q: How to return a default value, if the input is empty?
A: For inputs that have at most one value use:
(input, "default value")[1]
[1]
returns the first value of a sequence, so it will return input
if input exists. If input is empty, the sequence becomes ("default value")[1]
, so it will return "default value"
.
Q: How do I delete the div from
<div>
<span>I want to keep this</span>
<div class="I_want_to_delete_this">
<span>blah< blah/span>
</div>
<span>I want to keep this too</span>
</div>
to get something like
<div>
<span>I want to keep this</span>
<span>I want to keep this too</span>
</div>
?
A: All data is immutable, so you cannot delete something from a document, but you can create a new document without these nodes.
For example using the x:replace-nodes
function:
xidel --xml -e 'x:replace-nodes(//div[@class="I_want_to_delete_this"],())' xx.xml
Or x:transform-nodes
function:
xidel -s input.xml -e '
x:transform-nodes(
/,
function($x){
if (name($x)="div" and $x[@class="I_want_to_delete_this"])
then ()
else $x
}
)
' --output-node-format=xml --output-node-indent
or
xidel -s input.xml -e '
let $delete:=//div[@class="I_want_to_delete_this"] return
x:transform-nodes(
/,
function($x){if ($delete[$x is .]) then () else $x}
)
' --output-node-format=xml --output-node-indent
or
xidel -s input.xml -e '
let $delete:=//div[@class="I_want_to_delete_this"] return
x:transform-nodes(
/,
function($x){$x[not($delete[$x is .])]}
)
' --output-node-format=xml --output-node-indent
Q: Is there any way of processing output from another script in xidel, i.e. is there any option to tell xidel to grab the content like this: grep foobar test.html | xidel ...
A: If you give it a dash - as file name it reads the pipe input.
grep foobar test.html | xidel - ...
Also look here for things to avoid: https://github.com/benibela/xidel/wiki/Caveats