Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get the article URL when extracting data #16

Open
brandonStell opened this issue Apr 30, 2019 · 5 comments
Open

Get the article URL when extracting data #16

brandonStell opened this issue Apr 30, 2019 · 5 comments

Comments

@brandonStell
Copy link

We could do it based on DOIs using something like this:

<?php
$doi = $argv[1];
$url = 'http://dx.doi.org/'.$doi;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$a = curl_exec($ch);
if(preg_match('#Location: (.*)#', $a, $r))
 $l = trim($r[1]);
return $l;

For things that don't have a DOI (only arXiv?) we can make the URL from the ID:

$arxivID = $argv[1];
$url 'https://arxiv.org/abs/'.$arxivID
return $url
@brandonStell
Copy link
Author

by the way the code above was taken from here:
http://zzz.rezo.net/HowTo-Expand-Short-URLs.html

@XavRsl
Copy link
Collaborator

XavRsl commented May 1, 2019 via email

@brandonStell
Copy link
Author

That would be great. However, I think the problem is more complicated that I originally thought.
For example this DOI: 10.1016/j.cell.2019.02.019
Should resolve to this URL: https://www.cell.com/cell/fulltext/S0092-8674(19)30168-0
Like it does here: https://doi.org/10.1016/j.cell.2019.02.019

cURL in my script above does not return the correct link...

I can get the correct link only when I use the selenium package in python (presumably because it emulates a real browser).

@brandonStell
Copy link
Author

(also note that the URL returned by the crossref API is not correct)

@brandonStell
Copy link
Author

I guess we'll probably need an array of links for each DOI since there seems to be several.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants