Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Indicate papers that cite a Dandiset in the DLP #1897

Open
bendichter opened this issue Mar 20, 2024 · 18 comments
Open

Feature request: Indicate papers that cite a Dandiset in the DLP #1897

bendichter opened this issue Mar 20, 2024 · 18 comments
Labels
enhancement New feature or request UX Affects usability of the system

Comments

@bendichter
Copy link
Member

bendichter commented Mar 20, 2024

I found a paper (https://doi.org/10.1016/j.neuron.2023.08.005) that cites Dandiset 000458 (https://doi.org/10.48324/dandi.000458/0.230317.0039). When I went to the Dandiset landing page, I find that there are some papers associated with this Dandiset but not the paper that I found. This is because the paper is a secondary use of this Dandiset, and did not exist when the Dandise was published.

I think we are missing a huge opportunity here. If we want to influence the behavior of scientists to reuse data, one of the best ways to do that is to educate them about others that are already doing this behavior. In doing so, we will establish that this is a high-quality dataset worth analyzing, demonstrate that you can achieve publications through reuse of data, and advance social norms around using data. All the better if the publications are from high-impact journals like Neuron. Therefore, I think in some way indicating papers that use and cite a Dandiset should be a high priority. While GitHub-like stars, page views, and download stats are all very important, IMO this metric is even more important than all of those.

I think this should really go on the DLP, and should not be in control of the Dandiset owner. Ideally, this would reflect UX patterns that the user is already familiar with. For example every scientist is familiar with the Google Scholar "Cited By [x]" link:

image

I think the most straightforward UX solution would be to add a button here:

image

that says "Cited by [#]". Then that button would lead to a modal window that contains a list of papers that cite this Dandiset, formatted similarly to how this is done in Google Scholar:

image

This may not be ideal because it does not make the citation metrics as prominent as I would like, but it would be a massive improvement over not having this metric on the DLP at all.

Then the question is: how do we gather this information? It looks like this can be done with crossref (https://www.crossref.org/documentation/cited-by/retrieve-citations/), which would require credentials, and I don't know whether crossref even tracks using of DANDI DOIs.

opencitations provides a service for this that works on Science papers, e.g.
http://opencitations.net/index/coci/api/v1/citations/10.1126/science.abf4588 but not on Dandisets.
http://opencitations.net/index/coci/api/v1/citations/10.48324/dandi.000458/0.230317.0039 returns an empty list. It is possible the citations has just not been indexed yet. This is hard to test because a lot of publications like https://www.nature.com/articles/s41586-023-06031-6 do not properly cite the Dandiset DOI. This is another issue: we might want to be able to manually add citation information for examples like this where high-profile papers use Dandisets but do not cite them in a way that our system will be able to detect.

Once we have the DOIs of the citing papers, I can confirm that crossref is a great tool for gathering information about a specific publication. https://api.crossref.org/works/{doi} returns all the information we would need, e.g.

https://api.crossref.org/works/10.1126/science.abf4588
{'DOI': '10.1126/science.abf4588',
 'ISSN': ['0036-8075', '1095-9203'],
 'URL': 'http://dx.doi.org/10.1126/science.abf4588',
 'abstract': '<jats:title>Recording many neurons for a long time</jats:title>\n'
             '          <jats:p>\n'
             '            The ultimate aim of chronic recordings is to sample '
             'from the same neuron over days and weeks. However, this goal has '
             'been difficult to achieve for large populations of neurons. '
             'Steinmetz\n'
             '            <jats:italic>et al.</jats:italic>\n'
             '            describe the development and testing of Neuropixels '
             '2.0. This new electrophysiological recording tool is a '
             'miniaturized, high-density probe for both acute and long-term '
             'experiments combined with sophisticated software algorithms for '
             'fully automatic post hoc computational stabilization. The '
             'technique also provides a strategy for extending the number of '
             'recorded sites beyond the number of available recording '
             'channels. In freely moving animals, extremely large numbers of '
             'individual neurons could thus be followed and tracked with the '
             'same probe for weeks and occasionally months.\n'
             '          </jats:p>\n'
             '          <jats:p>\n'
             '            <jats:italic>Science</jats:italic>\n'
             '            , this issue p.\n'
             '            <jats:related-article '
             'xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="doi" '
             'related-article-type="in-this-issue" '
             'xlink:href="10.1126/science.abf4588">eabf4588</jats:related-article>\n'
             '          </jats:p>',
 'alternative-id': ['10.1126/science.abf4588'],
 'author': [{'ORCID': 'http://orcid.org/0000-0001-7029-2908',
             'affiliation': [{'name': 'UCL Institute of Ophthalmology, '
                                      'University College London, London, UK.'},
                             {'name': 'Department of Biological Structure, '
                                      'University of Washington, Seattle, WA, '
                                      'USA.'}],
             'authenticated-orcid': True,
             'family': 'Steinmetz',
             'given': 'Nicholas A.',
             'sequence': 'first'},
            {'ORCID': 'http://orcid.org/0000-0002-7216-1079',
             'affiliation': [{'name': 'Neuroelectronics Research Flanders, '
                                      'Leuven, Belgium.'}],
             'authenticated-orcid': True,
             'family': 'Aydin',
             'given': 'Cagatay',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0003-2454-411X',
             'affiliation': [{'name': 'Sainsbury Wellcome Centre, University '
                                      'College London, London, UK.'}],
             'authenticated-orcid': True,
             'family': 'Lebedeva',
             'given': 'Anna',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0002-7795-5513',
             'affiliation': [{'name': 'Centre for Systems Neuroscience and '
                                      'Department of Neuroscience, Psychology '
                                      'and Behaviour, University of Leicester, '
                                      'Leicester, UK.'},
                             {'name': 'UCL Queen Square Institute of '
                                      'Neurology, University College London, '
                                      'London, UK.'}],
             'authenticated-orcid': True,
             'family': 'Okun',
             'given': 'Michael',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0001-7106-814X',
             'affiliation': [{'name': 'Janelia Research Campus, Howard Hughes '
                                      'Medical Institute, Ashburn, VA, USA.'}],
             'authenticated-orcid': True,
             'family': 'Pachitariu',
             'given': 'Marius',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0003-3514-8382',
             'affiliation': [{'name': 'Sainsbury Wellcome Centre, University '
                                      'College London, London, UK.'}],
             'authenticated-orcid': True,
             'family': 'Bauza',
             'given': 'Marius',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0002-8907-6612',
             'affiliation': [{'name': 'Wolfson Institute for Biomedical '
                                      'Research, University College London, '
                                      'London, UK.'}],
             'authenticated-orcid': True,
             'family': 'Beau',
             'given': 'Maxime',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0003-2571-3712',
             'affiliation': [{'name': 'UCL Queen Square Institute of '
                                      'Neurology, University College London, '
                                      'London, UK.'}],
             'authenticated-orcid': True,
             'family': 'Bhagat',
             'given': 'Jai',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0002-9802-1162',
             'affiliation': [{'name': 'Janelia Research Campus, Howard Hughes '
                                      'Medical Institute, Ashburn, VA, USA.'}],
             'authenticated-orcid': True,
             'family': 'Böhm',
             'given': 'Claudia',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0001-6543-3403',
             'affiliation': [{'name': 'Neuroelectronics Research Flanders, '
                                      'Leuven, Belgium.'}],
             'authenticated-orcid': True,
             'family': 'Broux',
             'given': 'Martijn',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0002-5065-1157',
             'affiliation': [{'name': 'Janelia Research Campus, Howard Hughes '
                                      'Medical Institute, Ashburn, VA, USA.'}],
             'authenticated-orcid': True,
             'family': 'Chen',
             'given': 'Susu',
             'sequence': 'additional'},
            {'affiliation': [{'name': 'Janelia Research Campus, Howard Hughes '
                                      'Medical Institute, Ashburn, VA, USA.'}],
             'family': 'Colonell',
             'given': 'Jennifer',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0002-3242-8840',
             'affiliation': [{'name': 'Kavli Institute for Systems '
                                      'Neuroscience, Norwegian University of '
                                      'Science and Technology, Trondheim, '
                                      'Norway.'}],
             'authenticated-orcid': True,
             'family': 'Gardner',
             'given': 'Richard J.',
             'sequence': 'additional'},
            {'affiliation': [{'name': 'Janelia Research Campus, Howard Hughes '
                                      'Medical Institute, Ashburn, VA, USA.'}],
             'family': 'Karsh',
             'given': 'Bill',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0001-6680-9660',
             'affiliation': [{'name': 'Neuroelectronics Research Flanders, '
                                      'Leuven, Belgium.'},
                             {'name': 'IMEC, Leuven, Belgium.'},
                             {'name': 'Vlaams Instituut voor Biotechnologie '
                                      '(VIB), Leuven, Belgium.'},
                             {'name': 'Brain and Cognition, KU Leuven, Leuven, '
                                      'Belgium.'}],
             'authenticated-orcid': True,
             'family': 'Kloosterman',
             'given': 'Fabian',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0001-7548-2209',
             'affiliation': [{'name': 'Wolfson Institute for Biomedical '
                                      'Research, University College London, '
                                      'London, UK.'}],
             'authenticated-orcid': True,
             'family': 'Kostadinov',
             'given': 'Dimitar',
             'sequence': 'additional'},
            {'affiliation': [{'name': 'IMEC, Leuven, Belgium.'}],
             'family': 'Mora-Lopez',
             'given': 'Carolina',
             'sequence': 'additional'},
            {'affiliation': [{'name': 'IMEC, Leuven, Belgium.'}],
             'family': 'O’Callaghan',
             'given': 'John',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0002-4739-0793',
             'affiliation': [{'name': 'Janelia Research Campus, Howard Hughes '
                                      'Medical Institute, Ashburn, VA, USA.'}],
             'authenticated-orcid': True,
             'family': 'Park',
             'given': 'Junchol',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0001-8834-5852',
             'affiliation': [{'name': 'IMEC, Leuven, Belgium.'}],
             'authenticated-orcid': True,
             'family': 'Putzeys',
             'given': 'Jan',
             'sequence': 'additional'},
            {'affiliation': [{'name': 'Janelia Research Campus, Howard Hughes '
                                      'Medical Institute, Ashburn, VA, USA.'}],
             'family': 'Sauerbrei',
             'given': 'Britton',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0001-5230-6165',
             'affiliation': [{'name': 'Neuroelectronics Research Flanders, '
                                      'Leuven, Belgium.'},
                             {'name': 'ATLAS Neuroengineering, Leuven, '
                                      'Belgium.'},
                             {'name': 'Micro- and Nanosystems, KU Leuven, '
                                      'Leuven, Belgium.'}],
             'authenticated-orcid': True,
             'family': 'van Daal',
             'given': 'Rik J. J.',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0003-3376-6689',
             'affiliation': [{'name': 'Kavli Institute for Systems '
                                      'Neuroscience, Norwegian University of '
                                      'Science and Technology, Trondheim, '
                                      'Norway.'}],
             'authenticated-orcid': True,
             'family': 'Vollan',
             'given': 'Abraham Z.',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0002-5450-2108',
             'affiliation': [{'name': 'IMEC, Leuven, Belgium.'}],
             'authenticated-orcid': True,
             'family': 'Wang',
             'given': 'Shiwei',
             'sequence': 'additional'},
            {'affiliation': [{'name': 'IMEC, Leuven, Belgium.'}],
             'family': 'Welkenhuysen',
             'given': 'Marleen',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0003-4311-1037',
             'affiliation': [{'name': 'Department of Biological Structure, '
                                      'University of Washington, Seattle, WA, '
                                      'USA.'}],
             'authenticated-orcid': True,
             'family': 'Ye',
             'given': 'Zhiwen',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0002-4436-1057',
             'affiliation': [{'name': 'Janelia Research Campus, Howard Hughes '
                                      'Medical Institute, Ashburn, VA, USA.'}],
             'authenticated-orcid': True,
             'family': 'Dudman',
             'given': 'Joshua T.',
             'sequence': 'additional'},
            {'affiliation': [{'name': 'IMEC, Leuven, Belgium.'}],
             'family': 'Dutta',
             'given': 'Barundeb',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0002-6563-1423',
             'affiliation': [{'name': 'Janelia Research Campus, Howard Hughes '
                                      'Medical Institute, Ashburn, VA, USA.'}],
             'authenticated-orcid': True,
             'family': 'Hantman',
             'given': 'Adam W.',
             'sequence': 'additional'},
            {'affiliation': [{'name': 'UCL Queen Square Institute of '
                                      'Neurology, University College London, '
                                      'London, UK.'}],
             'family': 'Harris',
             'given': 'Kenneth D.',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0003-4332-8332',
             'affiliation': [{'name': 'Janelia Research Campus, Howard Hughes '
                                      'Medical Institute, Ashburn, VA, USA.'}],
             'authenticated-orcid': True,
             'family': 'Lee',
             'given': 'Albert K.',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0003-0226-5566',
             'affiliation': [{'name': 'Kavli Institute for Systems '
                                      'Neuroscience, Norwegian University of '
                                      'Science and Technology, Trondheim, '
                                      'Norway.'}],
             'authenticated-orcid': True,
             'family': 'Moser',
             'given': 'Edvard I.',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0001-5697-4881',
             'affiliation': [{'name': 'Sainsbury Wellcome Centre, University '
                                      'College London, London, UK.'}],
             'authenticated-orcid': True,
             'family': 'O’Keefe',
             'given': 'John',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0001-7916-9930',
             'affiliation': [{'name': 'Champalimaud Centre for the Unknown, '
                                      'Lisbon, Portugal.'}],
             'authenticated-orcid': True,
             'family': 'Renart',
             'given': 'Alfonso',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0002-6670-7362',
             'affiliation': [{'name': 'Janelia Research Campus, Howard Hughes '
                                      'Medical Institute, Ashburn, VA, USA.'}],
             'authenticated-orcid': True,
             'family': 'Svoboda',
             'given': 'Karel',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0002-2673-8957',
             'affiliation': [{'name': 'Wolfson Institute for Biomedical '
                                      'Research, University College London, '
                                      'London, UK.'}],
             'authenticated-orcid': True,
             'family': 'Häusser',
             'given': 'Michael',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0003-4924-7381',
             'affiliation': [{'name': 'Neuroelectronics Research Flanders, '
                                      'Leuven, Belgium.'},
                             {'name': 'Vlaams Instituut voor Biotechnologie '
                                      '(VIB), Leuven, Belgium.'}],
             'authenticated-orcid': True,
             'family': 'Haesler',
             'given': 'Sebastian',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0003-4880-7682',
             'affiliation': [{'name': 'UCL Institute of Ophthalmology, '
                                      'University College London, London, '
                                      'UK.'}],
             'authenticated-orcid': True,
             'family': 'Carandini',
             'given': 'Matteo',
             'sequence': 'additional'},
            {'ORCID': 'http://orcid.org/0000-0002-6289-4439',
             'affiliation': [{'name': 'Janelia Research Campus, Howard Hughes '
                                      'Medical Institute, Ashburn, VA, USA.'}],
             'authenticated-orcid': True,
             'family': 'Harris',
             'given': 'Timothy D.',
             'sequence': 'additional'}],
 'container-title': ['Science'],
 'content-domain': {'crossmark-restriction': False, 'domain': []},
 'created': {'date-parts': [[2021, 4, 15]],
             'date-time': '2021-04-15T19:51:33Z',
             'timestamp': 1618516293000},
 'deposited': {'date-parts': [[2024, 1, 15]],
               'date-time': '2024-01-15T22:52:17Z',
               'timestamp': 1705359137000},
 'funder': [{'DOI': '10.13039/100000002',
             'award': ['1U01NS113252-01'],
             'doi-asserted-by': 'publisher',
             'name': 'National Institutes of Health'},
            {'DOI': '10.13039/100000875',
             'doi-asserted-by': 'publisher',
             'name': 'Pew Charitable Trusts'},
            {'DOI': '10.13039/100001201',
             'doi-asserted-by': 'publisher',
             'name': 'Kavli Foundation'},
            {'DOI': '10.13039/100001207',
             'doi-asserted-by': 'publisher',
             'name': 'Esther A. and Joseph Klingenstein Fund'},
            {'DOI': '10.13039/100005930',
             'doi-asserted-by': 'publisher',
             'name': 'ASCRS Research Foundation'},
            {'DOI': '10.13039/100010269',
             'award': ['204915/Z/16/Z'],
             'doi-asserted-by': 'publisher',
             'name': 'Wellcome'},
            {'DOI': '10.13039/100010269',
             'award': ['201225/Z/16/Z'],
             'doi-asserted-by': 'publisher',
             'name': 'Wellcome'},
            {'DOI': '10.13039/100010269',
             'doi-asserted-by': 'publisher',
             'name': 'Wellcome'},
            {'DOI': '10.13039/100012331',
             'award': ['1U01NS113252-01'],
             'doi-asserted-by': 'publisher',
             'name': 'Agentschap Innoveren en Ondernemen'},
            {'DOI': '10.13039/100012331',
             'award': ['G096219N'],
             'doi-asserted-by': 'publisher',
             'name': 'Agentschap Innoveren en Ondernemen'},
            {'DOI': '10.13039/100012331',
             'award': ['C14/17/109'],
             'doi-asserted-by': 'publisher',
             'name': 'Agentschap Innoveren en Ondernemen'},
            {'DOI': '10.13039/100012331',
             'award': ['G0D7516N'],
             'doi-asserted-by': 'publisher',
             'name': 'Agentschap Innoveren en Ondernemen'},
            {'DOI': '10.13039/100012331',
             'award': ['C14/17/042'],
             'doi-asserted-by': 'publisher',
             'name': 'Agentschap Innoveren en Ondernemen'},
            {'DOI': '10.13039/100012331',
             'award': ['BB/P020607/1'],
             'doi-asserted-by': 'publisher',
             'name': 'Agentschap Innoveren en Ondernemen'},
            {'DOI': '10.13039/100012331',
             'award': ['SBF002'],
             'doi-asserted-by': 'publisher',
             'name': 'Agentschap Innoveren en Ondernemen'},
            {'DOI': '10.13039/100012331',
             'award': ['HBC.2018.2114'],
             'doi-asserted-by': 'publisher',
             'name': 'Agentschap Innoveren en Ondernemen'},
            {'DOI': '10.13039/100012331',
             'doi-asserted-by': 'publisher',
             'name': 'Agentschap Innoveren en Ondernemen'},
            {'DOI': '10.13039/501100000268',
             'doi-asserted-by': 'publisher',
             'name': 'Biotechnology and Biological Sciences Research Council'},
            {'DOI': '10.13039/501100003130',
             'doi-asserted-by': 'publisher',
             'name': 'Fonds Wetenschappelijk Onderzoek'},
            {'DOI': '10.13039/100010269',
             'award': ['204717/Z/16/Z'],
             'doi-asserted-by': 'publisher',
             'name': 'Wellcome'},
            {'DOI': '10.13039/501100004040',
             'award': ['C14/17/109'],
             'doi-asserted-by': 'publisher',
             'name': 'KU Leuven'},
            {'DOI': '10.13039/501100011878',
             'doi-asserted-by': 'publisher',
             'name': 'Vlaamse regering'},
            {'DOI': '10.13039/501100014069',
             'award': ['204915/Z/16/Z'],
             'doi-asserted-by': 'publisher',
             'name': 'Fundação Champalimaud'},
            {'DOI': '10.13039/501100014069',
             'award': ['951319'],
             'doi-asserted-by': 'publisher',
             'name': 'Fundação Champalimaud'},
            {'DOI': '10.13039/501100014069',
             'award': ['295721'],
             'doi-asserted-by': 'publisher',
             'name': 'Fundação Champalimaud'},
            {'DOI': '10.13039/501100014069',
             'award': ['286225'],
             'doi-asserted-by': 'publisher',
             'name': 'Fundação Champalimaud'},
            {'DOI': '10.13039/501100014069',
             'award': ['223262'],
             'doi-asserted-by': 'publisher',
             'name': 'Fundação Champalimaud'},
            {'DOI': '10.13039/501100018719',
             'doi-asserted-by': 'publisher',
             'name': 'Koç Üniversitesi'},
            {'DOI': '10.13039/501100005416',
             'award': ['National Infrastructure Scheme, NORBRAIN, grant number '
                       '295721'],
             'doi-asserted-by': 'crossref',
             'name': 'Research Council of Norway'},
            {'DOI': '10.13039/501100005416',
             'award': ['FRIPRO grant number 286225'],
             'doi-asserted-by': 'crossref',
             'name': 'Research Council of Norway'},
            {'DOI': '10.13039/501100005416',
             'award': ['Centre of Excellence grant number 223262'],
             'doi-asserted-by': 'crossref',
             'name': 'Research Council of Norway'},
            {'name': 'HHMI Janelia Research Campus'},
            {'award': ['HBC.2018.2114'],
             'name': 'Hermesfonds with a VLAIO Baekeland mandate'},
            {'name': 'HHMI Janelia Research Campus'},
            {'name': 'Sainsbury Wellcome Centre for Neural Circuits and '
                     'Behaviour'},
            {'DOI': '10.13039/501100003130',
             'award': ['G096219N'],
             'doi-asserted-by': 'crossref',
             'name': 'Research Foundation Flanders'},
            {'award': ['Springboard SBF002\\1045'],
             'name': 'Academy of Medical Sciences and Wellcome Trust'},
            {'DOI': '10.13039/501100000268',
             'award': ['BB/P020607/1'],
             'doi-asserted-by': 'crossref',
             'name': 'BBSRC'},
            {'name': 'HHMI Janelia Research Campus'}],
 'indexed': {'date-parts': [[2024, 3, 20]],
             'date-time': '2024-03-20T04:21:37Z',
             'timestamp': 1710908497756},
 'is-referenced-by-count': 446,
 'issn-type': [{'type': 'print', 'value': '0036-8075'},
               {'type': 'electronic', 'value': '1095-9203'}],
 'issue': '6539',
 'issued': {'date-parts': [[2021, 4, 16]]},
 'journal-issue': {'issue': '6539',
                   'published-print': {'date-parts': [[2021, 4, 16]]}},
 'language': 'en',
 'link': [{'URL': 'https://syndication.highwire.org/content/doi/10.1126/science.abf4588',
           'content-type': 'unspecified',
           'content-version': 'vor',
           'intended-application': 'syndication'},
          {'URL': 'https://www.science.org/doi/pdf/10.1126/science.abf4588',
           'content-type': 'unspecified',
           'content-version': 'vor',
           'intended-application': 'similarity-checking'}],
 'member': '221',
 'original-title': [],
 'prefix': '10.1126',
 'published': {'date-parts': [[2021, 4, 16]]},
 'published-print': {'date-parts': [[2021, 4, 16]]},
 'publisher': 'American Association for the Advancement of Science (AAAS)',
 'reference': [{'DOI': '10.1038/natrevmats.2016.93',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_2_2'},
               {'DOI': '10.1038/micronano.2016.66',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_3_2'},
               {'DOI': '10.1016/j.conb.2018.01.009',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_4_2'},
               {'DOI': '10.1038/s41583-019-0140-6',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_5_2'},
               {'DOI': '10.1016/j.neuron.2019.08.011',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_6_2'},
               {'DOI': '10.1038/nature24636',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_7_2'},
               {'DOI': '10.3390/s17102388',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_8_2'},
               {'DOI': '10.1126/science.aav3932',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_9_2'},
               {'DOI': '10.1038/s41586-020-03171-x',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_10_2'},
               {'DOI': '10.1038/s41586-019-1787-x',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_11_2'},
               {'DOI': '10.1126/science.aav7893',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_12_2'},
               {'DOI': '10.1038/s41586-018-0244-6',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_13_2'},
               {'DOI': '10.1016/j.neuron.2018.02.023',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_14_2'},
               {'DOI': '10.1016/j.neuron.2019.02.010',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_15_2'},
               {'DOI': '10.1038/s41593-019-0381-8',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_16_2'},
               {'DOI': '10.1038/s41593-019-0502-4',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_17_2'},
               {'DOI': '10.1101/772517',
                'doi-asserted-by': 'crossref',
                'key': 'e_1_3_2_18_2',
                'unstructured': 'J. Park J. W. Phillips K. A. Martin A. W. '
                                'Hantman J. T. Dudman Flexible routing of '
                                'motor control signals through neocortical '
                                'projection neuron classes. bioRxiv 772517 '
                                '[Preprint]. 18 September 2019. '
                                'https://doi.org/10.1101/772517.'},
               {'DOI': '10.1016/j.neuron.2020.04.026',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_19_2'},
               {'DOI': '10.1038/s41586-019-1346-5',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_20_2'},
               {'DOI': '10.1101/2020.02.25.965210',
                'doi-asserted-by': 'crossref',
                'key': 'e_1_3_2_21_2',
                'unstructured': 'L. D. Liu S. Chen M. N. Economo N. Li K. '
                                'Svoboda Accurate localization of linear probe '
                                'electrodes across multiple brains. bioRxiv '
                                '2020.02.25.965210 [Preprint]. 26 February '
                                '2020. '
                                'https://doi.org/10.1101/2020.02.25.965210.'},
               {'DOI': '10.1038/s41586-019-1869-9',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_22_2'},
               {'DOI': '10.1126/science.aao4960',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_23_2'},
               {'DOI': '10.1038/s41593-019-0360-0',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_24_2'},
               {'DOI': '10.7554/eLife.63035',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_25_2'},
               {'DOI': '10.7554/eLife.59716',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_26_2'},
               {'DOI': '10.7554/eLife.53462',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_27_2'},
               {'DOI': '10.1016/j.neuron.2019.05.003',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_28_2'},
               {'DOI': '10.1073/pnas.1717695114',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_29_2'},
               {'DOI': '10.1016/j.neuron.2018.11.002',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_30_2'},
               {'DOI': '10.2196/16194',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_31_2'},
               {'DOI': '10.1088/1741-2560/10/4/046016',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_32_2'},
               {'DOI': '10.1126/sciadv.1601966',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_33_2'},
               {'DOI': '10.1152/jn.00352.2020',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_34_2'},
               {'DOI': '10.1088/1741-2552/ab8343',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_35_2'},
               {'article-title': 'The tetrode: A new technique for multi-unit '
                                 'extracellular recording',
                'author': 'Recce M.',
                'first-page': '1250',
                'journal-title': 'Soc. Neurosci. Abstr.',
                'key': 'e_1_3_2_36_2',
                'unstructured': 'M. Recce, J. O’Keefe, The tetrode: A new '
                                'technique for multi-unit extracellular '
                                'recording. Soc. Neurosci. Abstr. 15, 1250 '
                                '(1989).',
                'volume': '15',
                'year': '1989'},
               {'DOI': '10.7554/eLife.27702',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_37_2'},
               {'DOI': '10.1016/S0013-4694(96)95176-0',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_38_2'},
               {'DOI': '10.1088/1741-2560/8/4/045005',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_39_2'},
               {'DOI': '10.1371/journal.pone.0151180',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_40_2'},
               {'DOI': '10.1101/2020.08.09.243279',
                'doi-asserted-by': 'crossref',
                'key': 'e_1_3_2_41_2',
                'unstructured': 'J.-O. Muthmann A. J. Levi H. C. Carney A. C. '
                                'Huk “Supersessioning”: A hardware/software '
                                'system for electrophysiology spanning '
                                'multiple sessions in marmosets. bioRxiv '
                                '2020.08.09.243279 [Preprint]. 10 August 2020. '
                                'https://doi.org/10.1101/2020.08.09.243279.'},
               {'DOI': '10.1101/2020.09.24.312132',
                'doi-asserted-by': 'crossref',
                'key': 'e_1_3_2_42_2',
                'unstructured': 'C. E. Schoonover S. N. Ohashi R. Axel A. J. '
                                'P. Fink Representational drift in primary '
                                'olfactory cortex. bioRxiv 2020.09.24.312132 '
                                '[Preprint]. 25 September 2020. '
                                'https://doi.org/10.1101/2020.09.24.312132.'},
               {'DOI': '10.7554/eLife.47188',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_43_2'},
               {'DOI': '10.1016/j.jneumeth.2003.12.022',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_44_2'},
               {'DOI': '10.1152/jn.00569.2007',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_45_2'},
               {'DOI': '10.1152/jn.00260.2007',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_46_2'},
               {'DOI': '10.1152/jn.90920.2008',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_47_2'},
               {'DOI': '10.1152/jn.01012.2010',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_48_2'},
               {'DOI': '10.1152/jn.00052.2014',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_49_2'},
               {'DOI': '10.1152/jn.00464.2015',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_50_2'},
               {'DOI': '10.1021/acs.nanolett.6b02673',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_51_2'},
               {'DOI': '10.1109/TBME.2015.2406113',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_52_2'},
               {'DOI': '10.1101/742346',
                'doi-asserted-by': 'crossref',
                'key': 'e_1_3_2_53_2',
                'unstructured': 'M. S. Saleh S. M. Ritchie M. A. Nicholas R. '
                                'Bezbaruah J. W. Reddy M. Chamanzar E. A. '
                                'Yttri R. P. Panat CMU Array: A 3D '
                                'nano-printed customizable ultra-high-density '
                                'microelectrode array platform. bioRxiv 742346 '
                                '[Preprint]. 23 August 2019. '
                                'https://doi.org/10.1101/742346.'},
               {'DOI': '10.1126/sciadv.aay2789',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_54_2'},
               {'DOI': '10.1088/1741-2552/abd0ce',
                'article-title': 'The Argo: A high channel count recording '
                                 'system for neural recording in vivo',
                'author': 'Sahasrabuddhe K.',
                'doi-asserted-by': 'crossref',
                'journal-title': 'J. Neural Eng.',
                'key': 'e_1_3_2_55_2',
                'unstructured': 'K. Sahasrabuddhe, A. A. Khan, A. P. Singh, T. '
                                'M. Stern, Y. Ng, A. Tadić, P. Orel, C. '
                                'LaReau, D. Pouzzner, K. Nishimura, K. M. '
                                'Boergens, S. Shivakumar, M. S. Hopper, B. '
                                'Kerr, M. S. Hanna, R. J. Edgington, I. '
                                'McNamara, D. Fell, P. Gao, A. Babaie-Fishani, '
                                'S. Veijalainen, A. V. Klekachev, A. M. '
                                'Stuckey, B. Luyssaert, T. D. Y. Kozai, C. '
                                'Xie, V. Gilja, B. Dierickx, Y. Kong, M. '
                                'Straka, H. S. Sohal, M. R. Angle, The Argo: A '
                                'high channel count recording system for '
                                'neural recording in vivo. J. Neural Eng. 18, '
                                '015002 (2021). 33624614',
                'volume': '18',
                'year': '2021'},
               {'DOI': '10.1038/nature03274',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_56_2'},
               {'DOI': '10.1152/jn.00747.2006',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_57_2'},
               {'DOI': '10.1523/JNEUROSCI.2974-11.2011',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_58_2'},
               {'DOI': '10.3389/fncir.2011.00018',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_59_2'},
               {'DOI': '10.1523/JNEUROSCI.4071-12.2013',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_60_2'},
               {'DOI': '10.1371/journal.pone.0008222',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_61_2'},
               {'DOI': '10.1101/2020.10.05.327049',
                'doi-asserted-by': 'crossref',
                'key': 'e_1_3_2_62_2',
                'unstructured': 'D. Deitch A. Rubin Y. Ziv Representational '
                                'drift in the mouse visual cortex. bioRxiv '
                                '2020.10.05.327049 [Preprint]. 5 October 2020. '
                                'https://doi.org/10.1101/2020.10.05.327049.'},
               {'DOI': '10.1101/2020.12.10.420620',
                'doi-asserted-by': 'crossref',
                'key': 'e_1_3_2_63_2',
                'unstructured': 'T. D. Marks M. J. Goard Stimulus-dependent '
                                'representational drift in primary visual '
                                'cortex. bioRxiv 2020.12.10.420620 [Preprint]. '
                                '11 December 2020. '
                                'https://doi.org/10.1101/2020.12.10.420620.'},
               {'DOI': '10.1101/851691',
                'doi-asserted-by': 'crossref',
                'key': 'e_1_3_2_64_2',
                'unstructured': 'K. H. Lee Y.-L. Ni M. Meister Electrode '
                                'pooling: How to boost the yield of switchable '
                                'silicon probes for neuronal recordings. '
                                'bioRxiv 851691 [Preprint]. 26 November 2019. '
                                'https://doi.org/10.1101/851691.'},
               {'DOI': '10.1152/jn.00979.2005',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_65_2'},
               {'DOI': '10.1016/j.jneumeth.2018.08.020',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_66_2'},
               {'DOI': '10.1109/TBCAS.2019.2942450',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_67_2'},
               {'DOI': '10.1109/TBCAS.2019.2943077',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_68_2'},
               {'key': 'e_1_3_2_69_2',
                'unstructured': 'N. Steinmetz M. Okun Ç. Aydın Code and '
                                'summary data for Steinmetz et al. '
                                '“Neuropixels 2.0: A miniaturized high-density '
                                'probe for stable long-term brain recordings ” '
                                'Version 1 Zenodo (2021); '
                                'https://doi.org/10.5281/zenodo.4558642.10.5281/zenodo.4558642'},
               {'key': 'e_1_3_2_70_2',
                'unstructured': 'N. Steinmetz Raw data for Steinmetz et al. '
                                '“Neuropixels 2.0: A miniaturized high-density '
                                'probe for stable long-term brain recordings ” '
                                'Figshare (2021); '
                                'https://doi.org/10.6084/m9.figshare.14024495.10.6084/m9.figshare.14024495'},
               {'key': 'e_1_3_2_71_2',
                'unstructured': 'M. Pachitariu C. Rossant N. Steinmetz J. '
                                'Colonell A. G. Bondy O. Winter K. Banga J. '
                                'Bhagat M. Sosa D. O’Shea J. Guzman K. C. '
                                'Nakamura Geffen Lab P. Botros R. Saxena A. '
                                'Liddell J. Pellman M. Spacek D. Bryzgalov C. '
                                'Stringer D. Denman D. Karamanlis M. Beau '
                                'Kilosort 2.5 Software package for Steinmetz '
                                'et al. “Neuropixels 2.0: A miniaturized '
                                'high-density probe for stable long-term brain '
                                'recordings.” Version 2.5 Zenodo (2021); '
                                'https://doi.org/10.5281/zenodo.4482749.'},
               {'key': 'e_1_3_2_72_2',
                'unstructured': 'Ç. Aydin R. van Daal CAD files for Steinmetz '
                                'et al. “Neuropixels 2.0: A miniaturized '
                                'high-density probe for stable long-term brain '
                                'recordings ” Version 1 Zenodo (2021); '
                                'https://doi.org/10.5281/zenodo.4564136.10.5281/zenodo.4564136'},
               {'DOI': '10.1371/journal.pone.0089007',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_73_2'},
               {'DOI': '10.1016/j.cell.2015.08.014',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_74_2'},
               {'DOI': '10.1038/nn.3078',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_75_2'},
               {'article-title': 'A statistical approach to some basic mine '
                                 'valuation problems on the Witwatersrand',
                'author': 'Krige D. G.',
                'first-page': '119',
                'journal-title': 'J. South. Afr. Inst. Min. Metall.',
                'key': 'e_1_3_2_76_2',
                'unstructured': 'D. G. Krige, A statistical approach to some '
                                'basic mine valuation problems on the '
                                'Witwatersrand. J. South. Afr. Inst. Min. '
                                'Metall. 52, 119–139 (1951).',
                'volume': '52',
                'year': '1951'},
               {'DOI': '10.1371/journal.pone.0062123',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_77_2'},
               {'DOI': '10.1101/061481',
                'doi-asserted-by': 'crossref',
                'key': 'e_1_3_2_78_2',
                'unstructured': 'M. Pachitariu N. Steinmetz S. Kadir M. '
                                'Carandini K. D. Harris Kilosort: Realtime '
                                'spike-sorting for extracellular '
                                'electrophysiology with hundreds of channels. '
                                'bioRxiv 061481 [Preprint]. 30 June 2016. '
                                'https://doi.org/10.1101/061481.10.1101/061481'},
               {'DOI': '10.1109/83.988953',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_79_2'},
               {'DOI': '10.1101/061507',
                'doi-asserted-by': 'crossref',
                'key': 'e_1_3_2_80_2',
                'unstructured': 'M. Pachitariu C. Stringer M. Dipoppa S. '
                                'Schröder L. F. Rossi H. Dalgleish M. '
                                'Carandini K. D. Harris Suite2p: Beyond 10 000 '
                                'neurons with standard two-photon microscopy. '
                                'bioRxiv 061507 [Preprint]. 20 July 2017. '
                                'https://doi.org/10.1101/061507.10.1101/061507'},
               {'DOI': '10.1038/nn.4268',
                'doi-asserted-by': 'publisher',
                'key': 'e_1_3_2_81_2'},
               {'DOI': '10.1038/s41596-021-00539-9',
                'doi-asserted-by': 'crossref',
                'key': 'e_1_3_2_82_2',
                'unstructured': 'R. J. van Daal Ç. Aydin F. Michon A. A. Aarts '
                                'M. Kraft F. Kloosterman S. Haesler '
                                'Implantation of Neuropixels probes for '
                                'chronic recording of neuronal activity in '
                                'freely behaving mice and rats. Nat. Protoc. '
                                '10.1038/s41596-021-00539-9 (2021).'}],
 'reference-count': 81,
 'references-count': 81,
 'relation': {'has-preprint': [{'asserted-by': 'object',
                                'id': '10.1101/2020.10.27.358291',
                                'id-type': 'doi'}],
              'has-review': [{'asserted-by': 'object',
                              'id': '10.3410/f.739941959.793589669',
                              'id-type': 'doi'}]},
 'resource': {'primary': {'URL': 'https://www.science.org/doi/10.1126/science.abf4588'}},
 'score': 1,
 'short-container-title': ['Science'],
 'short-title': [],
 'source': 'Crossref',
 'subject': ['Multidisciplinary'],
 'subtitle': [],
 'title': ['Neuropixels 2.0: A miniaturized high-density probe for stable, '
           'long-term brain recordings'],
 'type': 'journal-article',
 'volume': '372'}

Beyond putting on the DLP, this is a very important metric for us to track. Looking at publications over the last year or so, I am seeing examples of high-profile papers that use Dandisets that we don't even know about, and this is quickly getting to a point where we need automated tools to track this.

  1. What is the best way to automatically track this information?
  2. How does the team feel about displaying this information on the DLP?
@bendichter bendichter changed the title Request: Indicate papers that cite a Dandiset in the DLP Feature request: Indicate papers that cite a Dandiset in the DLP Mar 20, 2024
@bendichter bendichter added the enhancement New feature or request label Mar 20, 2024
@yarikoptic yarikoptic added the UX Affects usability of the system label Mar 20, 2024
@yarikoptic
Copy link
Member

quick one: I would feel great if such information ("Cited By ??" badge leading to listing) was displayed on DLP.
How we make it happen is indeed worth thinking through, and indeed would depend on what service could provide us the "discovery". Similar problem openneuro has and pretty much did "manual labor" to figure out such citations and reason (use vs just referencing for some reason). I wonder if there is a way to make google scholar "index" datasets? I was hoping that may be dataset search of google could collect that info, but looking at a sample openneuro dataset I see no citation.
But we can't easily (ab)use "Cited by" banner of some other service since for a single dandiset we could have multiple dois for different versions and something needs to aggregate their citations.

@CodyCBakerPhD
Copy link

This would be great to include on the DLP as yet another way of demonstrating DANDI usage (in addition to the work in progress access stats)

Also great for reporting purposes I'd imagine

@yarikoptic
Copy link
Member

could someone check more if https://support.datacite.org/docs/consuming-citations-and-references could be the one to go after? all our DOIs are minted by datacite (through dartmouth library subscription). I could not resist so here is some crude script where I used their REST API on a list of our dandisets most recent (so not all versions per dandiset -- to be tuned!)

crude bash script
#!/bin/bash

cd /tmp
curl -X 'GET' \
  'https://api.dandiarchive.org/api/dandisets/?page_size=1000&draft=false&empty=false&embargoed=false' \
  -H 'accept: application/json' | jq . > published_dandisets.json

mkdir citations

jq -r '.results[] | "\(.identifier) \(.most_recent_published_version.version)"' < /tmp/published_dandisets.json \
| while read id version; do
    curl --silent https://api.datacite.org/events?doi=10.48324/dandi.$id/$version > citations/$id-$version.json
done

and looking at results which are not empty

❯ /usr/bin/find /tmp/citations -size '+1b' -ls
   176439      0 drwx------   2 yoh      yoh          2640 Mar 20 13:38 /tmp/citations
   176469      4 -rw-------   1 yoh      yoh          1575 Mar 20 13:37 /tmp/citations/000055-0.220127.0436.json
   176501      4 -rw-------   1 yoh      yoh          1575 Mar 20 13:37 /tmp/citations/000231-0.220904.1554.json
   176503      4 -rw-------   1 yoh      yoh          1346 Mar 20 13:37 /tmp/citations/000235-0.230316.1600.json
   176504      4 -rw-------   1 yoh      yoh          1346 Mar 20 13:37 /tmp/citations/000236-0.230316.2031.json
   176505      4 -rw-------   1 yoh      yoh          1587 Mar 20 13:37 /tmp/citations/000237-0.230316.1655.json
   176506      4 -rw-------   1 yoh      yoh          1346 Mar 20 13:37 /tmp/citations/000238-0.230316.1519.json
   176509      4 -rw-------   1 yoh      yoh          4005 Mar 20 13:37 /tmp/citations/000252-0.230408.2207.json
   176513      4 -rw-------   1 yoh      yoh          1336 Mar 20 13:37 /tmp/citations/000301-0.230806.0034.json
   176525      8 -rw-------   1 yoh      yoh          5199 Mar 20 13:37 /tmp/citations/000469-0.240123.1806.json
   176549      4 -rw-------   1 yoh      yoh          1354 Mar 20 13:38 /tmp/citations/000623-0.240227.2023.json
   176552      4 -rw-------   1 yoh      yoh          1330 Mar 20 13:38 /tmp/citations/000630-0.230915.2257.json
   176559     12 -rw-------   1 yoh      yoh          8375 Mar 20 13:38 /tmp/citations/000673-0.240118.2135.json
   176560      4 -rw-------   1 yoh      yoh          1614 Mar 20 13:38 /tmp/citations/000678-0.231004.2146.json
   176569      4 -rw-------   1 yoh      yoh          1455 Mar 20 13:38 /tmp/citations/000934-0.240315.1754.json

we get some! 000458 is not in the list :-/ but looking inside for different types, interesting one seems to be

❯ for f in *json; do jq . $f | grep -E 'relation-type-id.*(references|is-supplement)' && echo $f; done
        "relation-type-id": "references",
000055-0.220127.0436.json
        "relation-type-id": "references",
000231-0.220904.1554.json
        "relation-type-id": "is-supplemented-by",
000235-0.230316.1600.json
        "relation-type-id": "is-supplemented-by",
000236-0.230316.2031.json
        "relation-type-id": "is-supplemented-by",
000237-0.230316.1655.json
        "relation-type-id": "is-supplemented-by",
000238-0.230316.1519.json
        "relation-type-id": "references",
000301-0.230806.0034.json
        "relation-type-id": "is-supplement-to",
        "relation-type-id": "is-supplemented-by",
000469-0.240123.1806.json
        "relation-type-id": "is-supplemented-by",
000623-0.240227.2023.json
        "relation-type-id": "references",
000630-0.230915.2257.json
        "relation-type-id": "references",
000678-0.231004.2146.json

e.g.

    {
      "id": "bbb655d0-5d76-481e-b6f1-b2cb2b457380",
      "type": "events",
      "attributes": {
        "subj-id": "https://doi.org/10.1038/s41597-022-01280-y",
        "obj-id": "https://doi.org/10.48324/dandi.000055/0.220127.0436",
        "source-id": "crossref",
        "relation-type-id": "references",
        "total": 1,
        "message-action": "add",
        "source-token": "36c35e23-8757-4a9d-aacf-345e9b7eb50d",
        "license": "https://creativecommons.org/publicdomain/zero/1.0/",
        "occurred-at": "2022-04-21T10:45:13.000Z",
        "timestamp": "2022-04-23T03:38:18.173Z"
      },

so it points to https://www.nature.com/articles/s41597-022-01280-y which is paper telling that data was shared on DANDI.

So I think for now we could easily provide some basic "citations gatherer" service to run on cron, e.g. weekly, and produce badges per each dandiset . The question only would be how to integrate with the archive -- I do not think it should modify metadata record since that one could be later changed by the author(s)

@yarikoptic
Copy link
Member

note that this loosely relates also to @magland 's annotations -- we might want to post banner which would point to list of annotations for NWBs in the dandiset.. also relates to notebooks etc -- i.e. how should we build services which provide extra linkages which we do not want to become part of metadata records.

@bendichter
Copy link
Member Author

This is great, @yarikoptic! It looks like this could work well for automatically gathering citation information.

@rly
Copy link

rly commented Mar 20, 2024

I very much support this idea. This feature would allow us to notify dataset owners when their data is reused, create a data reuse score for researchers like an h-index that can be used in performance evaluations / career advancement, show funders that standards and archives can generate new science and methods, and generally foster a culture of data sharing and reuse.

we might want to be able to manually add citation information for examples like this where high-profile papers use Dandisets but do not cite them in a way that our system will be able to detect.

I found many examples of such when searching for data reuse examples of dandisets (ad hoc listing here). Data are often not cited in the References section but in the Data Availability section, and I think DataCite / CrossRef does not pick those up. (editors need to do better at addressing this!) I also found DataCite to be more effective at finding examples than CrossRef.

Some general heuristics that I used were to search "dandi", "nwb", "dandiarchive.org", "neurophysiology data available" and related terms on google scholar.

I think LLMs are well-suited to help solve this problem, assuming papers can be scraped from pubmed/biorxiv/elsewhere (maybe using NeuroQuery?). The LLM could 1) detect that a DANDI dataset has been used and 2) distinguish between primary use, secondary use, and just referencing (maybe it could give a general score that a human can go in and review afterward).

Some related efforts:

@yarikoptic
Copy link
Member

Great info @rly -- thanks!!

  • we should avoid duplication of effort!
  • joined (subscribed to mailing list) makedatacount
  • datacite is part of that effort
  • we might want to join effort with openneuro (attn @effigies @poldrack @)
  • I feel like we need
    • a tool which given a list of lists of dois (dois per dataset) gathers "official" references (from datacite)
    • supplements with possible "auto discoveries" from some mining
    • possibly interface our data access logs (e.g. our current effort to summarize per dataset from S3 logs) to datacite's usage gathering

@bendichter
Copy link
Member Author

Joining efforts on this sounds great! @effigies and @poldrack, are you aware of any work along these lines?

@effigies
Copy link

This is on our roadmap, but I do not believe we have started on this. I briefly looked into the DataCite API, but I didn't get as far as Yarik did. @nellh or @rwblair may have, so pinging them.

We have previously tasked @jbwexler with finding reuses and citations. I believe this was mostly scraping search engine results, but he might have thoughts here.

@jbwexler
Copy link

jbwexler commented Mar 21, 2024

Agreed this would be a great feature to add for both Dandi and ON. I unfortunately don't have too much to add. My approach was basically a semi-automated version of:

  1. Search google scholar for 'OpenNeuro'
  2. Within the text of the results, find any word matching 'ds' followed by 6 numbers
  3. Read a few sentences before and after each match to see if the word is actually referring to an ON dataset and whether it was actually re-used or if it was just mentioned for some other reason. Occasionally it was necessary to skim the paper as a whole to get the context.

The first two steps could of course be easily automated. If we skip the third to avoid the labor cost, that would leave us with a list of "papers that might mention this dataset". That seems potentially useful but probably too much room for error for something akin to an h-index.

I like the LLM idea to do step 3. That would be fun to try to get that working.

@zchandler
Copy link

+1 for working with DataCite. There is emerging work happening related to the Global Data Citation Corpus that could be helpful here, and your use case for Dandiset might be an interesting test of what they have already. Because only citations that happen in the References section of articles are counted in the Crossref/DataCite shared EventsDB, they are working with CZI on applying AI (named entity recognition) to the scholarly record (PubMed?) to pull out unstructured data into something that makes sense.
There is also separate work happening within the RRID ecosystem that you might want to consider. Following citation to its logical conclusion, its only valuable if we can find out later who cited what, etc. so there are two knowledge graph projects to check for fit: the DataCite PIDgraph and OpenAlex.
So 2 things:

  1. if you want to cite data in your publications correctly please put that in the References section of your paper :)
  2. because very few people actually do that, the Global Data Citation Corpus (DataCite) is probably our best shot right now. (contact: Iratxe Puebla)

@bendichter
Copy link
Member Author

I suppose I can take up the baton here. In Python, for mere mortals:

import os
import requests
import json
from tqdm import tqdm


# get all published dandiset IDs
dandi_api_url = "https://api.dandiarchive.org/api/dandisets/"
params = {
    "page_size": 1000,
    "empty": "false",
    "draft": "false",
    "embargoed": "false",
}
headers = {"accept": "application/json"}

# Fetch the list of published dandisets
response = requests.get(dandi_api_url, headers=headers, params=params)
response.raise_for_status()  # Check for HTTP request errors
published_dandisets = response.json()

published_dandisets_ids = [x["identifier"] for x in published_dandisets["results"]]


# get all versions of each published dandiset
all_versions = {}

for id_ in tqdm(published_dandisets_ids, desc="get dandiset versions"):
    dandi_api_url = f"https://api.dandiarchive.org/api/dandisets/{id_}/versions"
    params = {"page_size": 1000}
    headers = {"accept": "application/json"}

    response = requests.get(dandi_api_url, headers=headers, params=params)
    versions = response.json()
    all_versions[id_] = [x["version"] for x in versions["results"] if x["version"] != "draft"]

# Iterate over each version of each dandiset and fetch citation data from DataCite
from collections import defaultdict
from dateutil import parser
results = []

# iterate over versions of dandisets and get citations
for identifier, versions in tqdm(all_versions.items(), desc="get citations"):
    for version in versions:
        datacite_url = f"https://api.datacite.org/events?doi=10.48324/dandi.{identifier}/{version}"
        citation_response = requests.get(datacite_url)
        citation_response.raise_for_status()
        citation_data = citation_response.json()
        
        for x in citation_data["data"]:
            if "dandi" in x["attributes"]["subj-id"]:
                continue  # exclude citations from other dandisets
            results.append(
                dict(
                    dandiset_id=identifier,
                    doi=x["attributes"]["subj-id"],
                    timestamp=parser.parse(x["attributes"]["timestamp"]),
                )  
            )

import pandas as pd

df = pd.DataFrame(results)
df
dandiset_id doi timestamp
0 000055https://doi.org/10.1038/s41597-022-01280-y 2022-04-23 03:38:18.173000+00:00
1 000207 https://doi.org/10.7554/elife.85786.3 2023-10-27 08:55:39.876000+00:00
2 000231 https://doi.org/10.1038/s41597-022-01728-1 2022-10-14 08:55:30.912000+00:00
3 000235 https://doi.org/10.7554/elife.83289 2023-10-26 08:55:31.168000+00:00
4 000236 https://doi.org/10.7554/elife.83289 2023-10-26 08:55:31.206000+00:00
5000237 https://doi.org/10.7554/elife.832892023-10-27 08:55:07.770000+00:00
6000238 https://doi.org/10.7554/elife.832892023-10-26 08:55:31.253000+00:00
7000301 https://doi.org/10.1038/s41467-023-41755-z2023-10-09 08:55:20.707000+00:00
8000630 https://doi.org/10.1126/science.adf08052023-10-27 08:55:18.773000+00:00
9000678 https://doi.org/10.5281/zenodo.84086602023-12-08 22:01:36.955000+00:00

@poldrack
Copy link

This is great! though may people don't actually cite the doi, which is why @jbwexler had to resort to the manual approach.

@bendichter
Copy link
Member Author

Yes, I also see a lot of references to unpublished dandisets that don't have DOIs so they don't show up here. Still, it's nice to get what we can from the fully automated approach. This might work better for ON.

@poldrack
Copy link

definitely!

@yarikoptic
Copy link
Member

Yes, I also see a lot of references to unpublished dandisets that don't have DOIs so they don't show up here. Still, it's nice to get what we can from the fully automated approach. This might work better for ON.

A systematic (looking "forward") solution IMHO would be to provide DOIs for draft dandisets too. Related:

@yarikoptic
Copy link
Member

FWIW in a chat with chatgpt for a different but very similar need , it pointed me to https://opencitations.net/ and their API

here is the script it gave to feed DOI and get what references it
#!/usr/bin/env python3

import requests
import os
import sys
import json
from platformdirs import user_cache_dir
import bibtexparser

CACHE_DIR = os.path.join(user_cache_dir("citing_works"))
os.makedirs(CACHE_DIR, exist_ok=True)

def fetch_and_cache_doi_info(doi):
    """
    Fetches the citation record for a DOI and caches the result.
    """
    cache_file = os.path.join(CACHE_DIR, f"{doi.replace('/', '_')}.json")
    if os.path.exists(cache_file):
        with open(cache_file, "r") as f:
            return json.load(f)

    bibtex_url = f"https://doi.org/{doi}"
    headers = {"Accept": "application/x-bibtex"}
    try:
        response = requests.get(bibtex_url, headers=headers)
        response.raise_for_status()
        bibtex_entry = response.text
        # Parse BibTeX
        bib_database = bibtexparser.loads(bibtex_entry)
        if bib_database.entries:
            entry = bib_database.entries[0]
            result = {
                "title": entry.get("title", "No Title"),
                "year": entry.get("year", "No Year"),
                "doi": doi,
            }
            with open(cache_file, "w") as f:
                json.dump(result, f)
            return result
        else:
            print(f"No valid BibTeX found for DOI: {doi}")
            return {"title": "No Title", "year": "No Year", "doi": doi}
    except requests.exceptions.RequestException as e:
        print(f"Error fetching citation data for DOI {doi}: {e}")
        return {"title": "No Title", "year": "No Year", "doi": doi}

def get_citing_works(doi):
    """
    Fetches works citing the given DOI using the OpenCitations API.
    """
    base_url = "https://opencitations.net/index/coci/api/v1/citations/"
    headers = {"Accept": "application/json"}
    
    try:
        response = requests.get(f"{base_url}{doi}", headers=headers)
        response.raise_for_status()
        data = response.json()
        if data:
            return data
        else:
            print("No citing works found for this DOI.")
            return []
    except requests.exceptions.RequestException as e:
        print(f"Error fetching citing works: {e}")
        return []

def main():
    """
    Main entry point for the script.
    """
    if len(sys.argv) != 2:
        print("Usage: python get_citing_works.py <DOI>")
        sys.exit(1)
    
    doi = sys.argv[1].strip()
    if doi.startswith("https://doi.org/"):
        doi = doi.replace("https://doi.org/", "")
    
    print(f"Fetching works citing DOI: {doi}")
    citing_works = get_citing_works(doi)
    
    if citing_works:
        print(f"\nFound {len(citing_works)} citing works:\n")
        for i, work in enumerate(citing_works, start=1):
            citing_doi = work.get("citing", "No DOI")
            if citing_doi != "No DOI":
                record = fetch_and_cache_doi_info(citing_doi)
                title = record.get("title", "No Title")
                year = record.get("year", "No Year")
                print(f"{i}. Title: {title}, Year: {year}, DOI: {citing_doi}")
    else:
        print("No citing works found.")

if __name__ == "__main__":
    main()
paired with that ugly bash script
#!/bin/bash

# cd /tmp
curl -X 'GET' \
  'https://api.dandiarchive.org/api/dandisets/?page_size=1000&draft=false&empty=false&embargoed=false' \
  -H 'accept: application/json' | jq . > published_dandisets.json

mkdir citations

jq -r '.results[] | "\(.identifier) \(.most_recent_published_version.version)"' < /tmp/published_dandisets.json \
| while read id version; do
    works=$(python citeref_publications2.py 10.48324/dandi.$id/$version )
    if ! echo "$works" | grep -q "No citing" ; then
     echo "$works"
    fi
done
gives us this listing
Fetching works citing DOI: 10.48324/dandi.000055/0.220127.0436

Found 1 citing works:

1. Title: AJILE12: Long-term naturalistic human intracranial neural recordings and pose, Year: 2022, DOI: 10.1038/s41597-022-01280-y
Fetching works citing DOI: 10.48324/dandi.000140/0.220113.0408

Found 1 citing works:

1. Title: A spiking neural network with continuous local learning for robust online brain machine interface, Year: 2023, DOI: 10.1088/1741-2552/ad1787
Fetching works citing DOI: 10.48324/dandi.000301/0.230806.0034

Found 1 citing works:

1. Title: Neural mechanisms for the localization of unexpected external motion, Year: 2023, DOI: 10.1038/s41467-023-41755-z
Fetching works citing DOI: 10.48324/dandi.000488/0.230602.2022

Found 1 citing works:

1. Title: Differential encoding of temporal context and expectation under representational drift across hierarchically connected areas, Year: 2023, DOI: 10.1101/2023.06.02.543483

so seems significantly less than what we get from datacite

@yarikoptic
Copy link
Member

@jgrethe shared pointer to their scripts for SPARC https://github.com/SciCrunch/SPARC-Citations which was used to produce quite a comprehensive https://github.com/SciCrunch/SPARC-Citations/blob/main/dataset_data_citations.tsv .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request UX Affects usability of the system
Projects
None yet
Development

No branches or pull requests

8 participants