diff --git a/DESCRIPTION b/DESCRIPTION index 64ed91b8..cf5960a2 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -27,7 +27,10 @@ Suggests: testthat, httr, magrittr, readr, - xml2 + xml2, + dplyr, + purrr, + printr VignetteBuilder: knitr keywords: metadata, codemeta, ropensci, citation, credit affiliation: https://ropensci.org diff --git a/README.md b/README.md index 18545b8e..8a93c95f 100644 --- a/README.md +++ b/README.md @@ -185,6 +185,39 @@ write_codemeta(".") "name": "Central R Archive Network (CRAN)", "url": "https://cran.r-project.org" } + }, + { + "@type": "SoftwareApplication", + "identifier": "dplyr", + "name": "dplyr", + "provider": { + "@id": "https://cran.r-project.org", + "@type": "Organization", + "name": "Central R Archive Network (CRAN)", + "url": "https://cran.r-project.org" + } + }, + { + "@type": "SoftwareApplication", + "identifier": "purrr", + "name": "purrr", + "provider": { + "@id": "https://cran.r-project.org", + "@type": "Organization", + "name": "Central R Archive Network (CRAN)", + "url": "https://cran.r-project.org" + } + }, + { + "@type": "SoftwareApplication", + "identifier": "printr", + "name": "printr", + "provider": { + "@id": "https://cran.r-project.org", + "@type": "Organization", + "name": "Central R Archive Network (CRAN)", + "url": "https://cran.r-project.org" + } } ], "softwareRequirements": [ @@ -261,10 +294,10 @@ write_codemeta(".") "keywords": ["metadata", "codemeta", "ropensci", "citation", "credit"], "relatedLink": "https://codemeta.github.io/codemetar", "contIntegration": "https://travis-ci.org/codemeta/codemetar", - "developmentStatus": "wip", + "developmentStatus": "active", "releaseNotes": "https://github.com/codemeta/codemetar/blob/master/NEWS.md", "readme": "https://github.com/codemeta/codemetar/blob/master/README.md", - "fileSize": "366.241KB" + "fileSize": "397.768KB" } Modifying or enriching CodeMeta metadata diff --git a/codemeta.json b/codemeta.json index 501aaa4d..ed63174f 100644 --- a/codemeta.json +++ b/codemeta.json @@ -147,6 +147,39 @@ "name": "Central R Archive Network (CRAN)", "url": "https://cran.r-project.org" } + }, + { + "@type": "SoftwareApplication", + "identifier": "dplyr", + "name": "dplyr", + "provider": { + "@id": "https://cran.r-project.org", + "@type": "Organization", + "name": "Central R Archive Network (CRAN)", + "url": "https://cran.r-project.org" + } + }, + { + "@type": "SoftwareApplication", + "identifier": "purrr", + "name": "purrr", + "provider": { + "@id": "https://cran.r-project.org", + "@type": "Organization", + "name": "Central R Archive Network (CRAN)", + "url": "https://cran.r-project.org" + } + }, + { + "@type": "SoftwareApplication", + "identifier": "printr", + "name": "printr", + "provider": { + "@id": "https://cran.r-project.org", + "@type": "Organization", + "name": "Central R Archive Network (CRAN)", + "url": "https://cran.r-project.org" + } } ], "softwareRequirements": [ @@ -223,8 +256,8 @@ "keywords": ["metadata", "codemeta", "ropensci", "citation", "credit"], "relatedLink": "https://codemeta.github.io/codemetar", "contIntegration": "https://travis-ci.org/codemeta/codemetar", - "developmentStatus": "wip", + "developmentStatus": "active", "releaseNotes": "https://github.com/codemeta/codemetar/blob/master/NEWS.md", "readme": "https://github.com/codemeta/codemetar/blob/master/README.md", - "fileSize": "366.241KB" + "fileSize": "397.768KB" } diff --git a/docs/LICENSE.html b/docs/LICENSE.html index fb08ed29..4772dc6b 100644 --- a/docs/LICENSE.html +++ b/docs/LICENSE.html @@ -70,6 +70,9 @@
  • Codemeta intro
  • +
  • + Parsing CodeMeta Data +
  • Translating between schema using JSON-LD
  • diff --git a/docs/articles/codemeta-intro.html b/docs/articles/codemeta-intro.html index 2c04e4e2..ef4d0b19 100644 --- a/docs/articles/codemeta-intro.html +++ b/docs/articles/codemeta-intro.html @@ -47,6 +47,9 @@
  • Codemeta intro
  • +
  • + Parsing CodeMeta Data +
  • Translating between schema using JSON-LD
  • diff --git a/docs/articles/codemeta-parsing.html b/docs/articles/codemeta-parsing.html new file mode 100644 index 00000000..d73014d2 --- /dev/null +++ b/docs/articles/codemeta-parsing.html @@ -0,0 +1,2716 @@ + + + + + + + +Parsing CodeMeta Data • codemetar + + + + + + +
    +
    + + + +
    +
    + + + + +
    +

    Here we illustrate some example use cases that involve parsing codemeta data.

    +
    library(jsonld)
    +library(jsonlite)
    +library(magrittr)
    +library(codemetar)
    +library(purrr)
    +library(dplyr)
    +library(printr)
    +

    We start with a simple example from the codemeta.json file of codemetar itself. First, we’ll just generate a copy of the codemeta record for the package:

    +
    write_codemeta("codemetar", "codemeta.json")
    +

    We then digest thus input using a JSON-LD “frame.” While not strictly necessary, this helps ensure the data matches the format we expect, even if the original file had errors or missing data. See the vignette “Validating in JSON-LD” in this package and the official JSON-LD docs for details). The codemetar package includes a reasonably explicit frame to get us started:

    +
    frame <- system.file("schema/frame_schema.json", package="codemetar")
    +
    +meta <- 
    +  jsonld_frame("codemeta.json", frame) %>%
    +  fromJSON(FALSE) %>% getElement("@graph") %>% getElement(1)
    +

    Construct a citation

    +
    authors <- 
    +lapply(meta$author, 
    +       function(author) 
    +         person(given = author$given, 
    +                family = author$family, 
    +                email = author$email,
    +                role = "aut"))
    +year <- meta$datePublished
    +if(is.null(year)) 
    +  year <- format(Sys.Date(), "%Y")
    +bibitem <- 
    + bibentry(
    +     bibtype = "Manual",
    +     title = meta$name,
    +     author = authors,
    +     year = year,
    +     note = paste0("R package version ", meta$version),
    +     url = meta$URL,
    +     key = meta$identifier
    +   )
    +
    Warning in bibentry(bibtype = "Manual", title = meta$name, author =
    +authors, : Not all arguments are of the same length, the following need to
    +be recycled: author
    +
    cat(format(bibitem, "bibtex"))
    +
    @Manual{codemetar,
    +  title = {codemetar: Generate CodeMeta Metadata for R Packages},
    +  year = {2017},
    +  note = {R package version 0.1.0},
    +}
    +
    bibitem
    +
    (2017). _codemetar: Generate CodeMeta Metadata for R Packages_. R
    +package version 0.1.0.
    +
    +

    +Parsing the ropensci corpus

    +

    The ropensci corpus consists of a list of codemeta files for all packages provided by the rOpenSci project, <ropensci.org>. This provides a good test-case for how a large collection of codemeta files can be manipulated to help us get a better picture of the corpus.

    +
    download.file("https://github.com/codemeta/codemetar/raw/master/inst/notebook/ropensci.json",
    +              "ropensci.json")
    +

    As before, it is helpful, though not essential, to start off by framing the input data.

    +
    frame <- system.file("schema/frame_schema.json", package="codemetar")
    +
    +corpus <- 
    +    jsonld_frame("ropensci.json", frame) %>%
    +    fromJSON(simplifyVector = FALSE) %>%
    +    getElement("@graph") 
    +

    We’re now ready to start exploring. As usual, functions from purrr prove very useful for iterating through large JSON files. First, we look at some basic summary data:

    +
    ## deal with nulls explicitly by starting with map
    +pkgs <- map(corpus, "name") %>% compact() %>% as.character()
    +
    +# keep only those with package identifiers (names)
    +keep <- map_lgl(corpus, ~ length(.x$identifier) > 0)
    +corpus <- corpus[keep]
    +
    +## now we can just do
    +all_pkgs <- map_chr(corpus, "name")
    +head(all_pkgs)
    +
    [1] "AntWeb: programmatic interface to the AntWeb"                                
    +[2] "aRxiv: Interface to the arXiv API"                                           
    +[3] "chromer: Interface to Chromosome Counts Database API"                        
    +[4] "ckanr: Client for the Comprehensive Knowledge Archive Network ('CKAN') 'API'"
    +[5] "dashboard: A package status dashboard"                                       
    +[6] "ggit: Git Graphics"                                                          
    +
    ## 60 unique maintainers
    +map_chr(corpus, c("maintainer", "familyName")) %>% unique() %>% length()
    +
    [1] 61
    +
    ## Mostly Scott
    +map_chr(corpus, c("maintainer", "familyName")) %>% 
    +  as_tibble() %>%
    +  group_by(value) %>%
    +  tally(sort=TRUE)
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    valuen
    Chamberlain105
    Ooms12
    Mullen8
    Ram8
    Boettiger6
    Salmon5
    FitzJohn4
    Hart2
    Leeper2
    Marwick2
    Müller2
    Padgham2
    South2
    Varela2
    Vitolo2
    Arnold1
    Attali1
    Banbury1
    Becker1
    Bengtsson1
    Braginsky1
    Broman1
    Bryan1
    Dallas1
    de Queiroz1
    Drost1
    Fischetti1
    Ghahraman1
    Goring1
    hackathoners1
    Harrison1
    Hughes1
    Jahn1
    Jones1
    Keyes1
    Krah1
    Lehtomaki1
    Lovelace1
    Lundstrom1
    McGlinn1
    McVey1
    Meissner1
    Michonneau1
    Moroz1
    Otegui1
    Pardo1
    Pennell1
    Poelen1
    Robinson1
    Ross1
    Rowlingson1
    Scott1
    Seers1
    Shotwell1
    Sievert1
    Sparks1
    Stachelek1
    Szöcs1
    Widgren1
    Wiggin1
    Winter1
    +
    ## number of co-authors ... 
    +map_int(corpus, function(r) length(r$author)) %>% 
    +  as_tibble() %>%
    +  group_by(value) %>%
    +  tally(sort=TRUE)
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    valuen
    1146
    230
    317
    48
    55
    73
    131
    +
    ## Contributors isn't used as much...
    +map_int(corpus, function(r) length(r$contributor)) %>% 
    +  as_tibble() %>%
    +  group_by(value) %>%
    +  tally(sort=TRUE)
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    valuen
    0178
    213
    49
    37
    51
    61
    81
    +

    Numbers (n) of packages with a total of (value) dependencies:

    +
    map_int(corpus, function(r) length(r$softwareRequirements))  %>% 
    +  as_tibble() %>%
    +  group_by(value) %>%
    +  tally(sort=TRUE)
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    valuen
    439
    535
    225
    325
    719
    616
    813
    98
    127
    106
    116
    133
    02
    141
    171
    181
    211
    221
    231
    +

    which dependencies are used most frequently?

    +
    corpus %>%
    +map_df(function(x){
    +  ## single, unboxed dep
    +  if("name" %in% names(x$softwareRequirements))
    +    dep <- x$name
    +  else if("name" %in% names(x$softwareRequirements[[1]]))
    +    dep <- map_chr(x$softwareRequirements, "name")
    +  else { ## No requirementsß
    +    dep <- NA
    +  }
    +  
    +  tibble(identifier = x$identifier, dep = dep)
    +}) -> dep_df
    +
    +
    +dep_df %>%
    +group_by(dep) %>% 
    +  tally(sort = TRUE)
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    depn
    jsonlite99
    httr92
    R66
    tibble46
    dplyr43
    methods37
    xml237
    data.table35
    utils35
    crul31
    plyr29
    XML25
    magrittr24
    sp22
    stringr21
    curl18
    ggplot218
    lazyeval17
    stats17
    lubridate14
    R614
    rappdirs13
    assertthat12
    digest12
    RCurl12
    readr11
    rgdal10
    whisker10
    scales9
    ape8
    raster8
    tidyr8
    Rcpp7
    reshape27
    rvest7
    rgeos6
    V86
    hoardr5
    rjson5
    taxize5
    tools5
    git2r4
    maps4
    oai4
    openssl4
    R(>=3.2.1)4
    solrium4
    urltools4
    foreach3
    knitr3
    leaflet3
    maptools3
    memoise3
    mime3
    pdftools3
    purrr3
    RColorBrewer3
    rgbif3
    rmarkdown3
    shiny3
    spocc3
    stringi3
    uuid3
    wicket3
    yaml3
    base64enc2
    bibtex2
    Biostrings2
    crayon2
    devtools2
    downloader2
    fauxpas2
    gdata2
    gistr2
    graphics2
    grid2
    htmltools2
    htmlwidgets2
    httpcode2
    igraph2
    jqr2
    MASS2
    miniUI2
    ncdf42
    png2
    R.cache2
    R.utils2
    rcrossref2
    rentrez2
    reshape2
    rmapshaper2
    rplos2
    rvertnet2
    shinyjs2
    storr2
    tm2
    NA2
    analogue1
    antiword: Extract Text from Microsoft Word Documents1
    apipkgen: Package Generator for HTTP API Wrapper Packages1
    appl: Approximate POMDP Planning Software1
    aRxiv1
    binman1
    Biobase1
    BiocGenerics1
    biomaRt1
    bold1
    caTools1
    ckanr1
    cld2: Google’s Compact Language Detector 21
    countrycode1
    cranlogs1
    crminer1
    crosstalk1
    DBI1
    dirdf: Extracts Metadata from Directory and File Names1
    doParallel1
    DT(>=0.1)1
    elastic1
    EML1
    fastmatch1
    foreign1
    functionMap1
    genderdata: Historical Datasets for Predicting Gender from Names1
    GenomeInfoDb1
    GenomicFeatures1
    GenomicRanges(>=1.23.24)1
    geoaxe1
    geojson1
    geojsonrewind: Fix ‘GeoJSON’ Winding Direction1
    geonames1
    geoops: ‘GeoJSON’ Manipulation Operations1
    geosphere1
    getPass1
    ggm1
    ggmap1
    ggthemes1
    graphql1
    grDevices1
    gridExtra1
    gtools1
    hash1
    hexbin1
    historydata: Data Sets for Historians1
    Hmisc1
    httpuv1
    IRanges1
    isdparser1
    jsonvalidate1
    jsonvalidate: Validate ‘JSON’1
    leafletR1
    loggr1
    mapproj1
    markdown1
    Matrix1
    memisc1
    miniUI(>=0.1.1)1
    nabor1
    natserv1
    openxlsx1
    osmar1
    outliers1
    pdftools: Text Extraction and Rendering of PDF Documents1
    phytools1
    plotly1
    plumber1
    progress1
    protolite1
    qlcMatrix1
    RApiSerialize1
    rapport1
    rbhl1
    rbison1
    rebird1
    redland1
    redux1
    remotes1
    ridigbio1
    ritis1
    rJava1
    RJSONIO1
    rlist1
    Rmpfr1
    RMySQL1
    rncl1
    rnoaa1
    rnrfa1
    rotl1
    rowr1
    RPostgreSQL1
    rredis1
    rredlist1
    RSQLite1
    rstudioapi(>=0.5)1
    rtracklayer1
    rworldmap1
    rzmq: R Bindings for ZeroMQ1
    S4Vectors1
    scrapeR1
    selectr1
    sf1
    shiny(>=0.13.2)1
    snow1
    SnowballC1
    spatstat1
    SSOAP1
    stringdist1
    sys1
    tabulizerjars1
    testthat1
    tif: Text Interchange Format1
    USAboundariesData: Datasets for the ‘USAboundaries’ package1
    VariantAnnotation1
    viridisLite1
    wdman(>=0.2.2)1
    wellknown1
    wicket: Utilities to Handle WKT Spatial Data1
    WikidataR1
    wikitaxa1
    withr1
    worrms1
    xslt: XSLT 1.0 Transformations1
    zoo1
    +

    Alternate approach using a frame instead of purrr functions for subsetting the data. Note that this gets all Depends and suggests (really all SoftwareApplication types mentioned)

    +
    dep_frame <- '{
    +  "@context": "https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld",
    +  "@explicit": "true",
    +  "name": {}
    +}'
    +jsonld_frame("ropensci.json", dep_frame) %>% 
    +  fromJSON() %>% 
    +  getElement("@graph") %>%
    +  filter(type == "SoftwareApplication") %>%
    +  group_by(name) %>% 
    +  tally(sort = TRUE)
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    namen
    testthat168
    knitr122
    jsonlite105
    httr96
    roxygen292
    R72
    rmarkdown68
    covr52
    dplyr49
    tibble48
    xml241
    methods38
    utils37
    data.table36
    ggplot236
    crul33
    plyr32
    magrittr28
    sp26
    XML25
    curl21
    stringr21
    lazyeval18
    stats18
    lubridate16
    R614
    readr14
    rgdal14
    rappdirs13
    assertthat12
    devtools12
    digest12
    raster12
    RCurl12
    scales12
    Rcpp11
    whisker11
    leaflet10
    rgeos10
    taxize10
    tidyr10
    reshape29
    ape8
    maps8
    V88
    maptools7
    purrr7
    rvest7
    pdftools6
    rgbif6
    shiny6
    ggmap5
    git2r5
    hoardr5
    ncdf45
    png5
    rjson5
    tools5
    oai4
    openssl4
    R(>=3.2.1)4
    rcrossref4
    RSQLite4
    sf4
    solrium4
    urltools4
    uuid4
    yaml4
    DBI3
    fauxpas3
    foreach3
    gdata3
    gistr3
    graphics3
    lintr3
    MASS3
    memoise3
    mime3
    miniUI3
    R.utils3
    RColorBrewer3
    rentrez3
    rmapshaper3
    rvertnet3
    rworldmap3
    spocc3
    stringi3
    wicket3
    base64enc2
    bibtex2
    Biostrings2
    broom2
    crayon2
    downloader2
    elastic2
    geiger2
    getPass2
    GGally2
    ggthemes2
    grDevices2
    grid2
    gridExtra2
    htmltools2
    htmlwidgets2
    httpcode2
    igraph2
    jqr2
    jsonvalidate2
    listviewer2
    mapproj2
    Matrix2
    phylobase2
    phytools2
    R.cache2
    RcppRedis2
    readxl2
    remotes2
    reshape2
    rplos2
    shinyjs2
    storr2
    sys2
    tm2
    viridis2
    webp2
    zoo2
    akima1
    analogue1
    aRxiv1
    binman1
    Biobase1
    BiocGenerics1
    biomaRt1
    bold1
    Cairo1
    caTools1
    ckanr1
    corrplot1
    countrycode1
    cranlogs1
    crminer1
    crosstalk1
    dendextend1
    doParallel1
    dplyr(>=0.3.0.2)1
    DT(>=0.1)1
    EML1
    etseed1
    fastmatch1
    fields1
    forecast1
    foreign1
    fulltext1
    functionMap1
    genderdata1
    GenomeInfoDb1
    GenomicFeatures1
    GenomicRanges(>=1.23.24)1
    geoaxe1
    geojson1
    geojsonio1
    geojsonlint1
    geonames1
    geosphere1
    ggalt1
    ggm1
    graphql1
    GSODR1
    gtools1
    hash1
    hexbin1
    historydata1
    Hmisc1
    httpuv1
    IRanges1
    IRdisplay1
    isdparser1
    janeaustenr1
    jpeg1
    knitcitations1
    leafletR1
    loggr1
    magick1
    mapdata1
    markdown1
    MCMCglmm1
    memisc1
    miniUI(>=0.1.1)1
    mongolite1
    nabor1
    natserv1
    openair1
    openxlsx1
    osmar1
    outliers1
    pander1
    parallel1
    plot3D1
    plotKML1
    plotly1
    plumber1
    progress1
    protolite1
    purrrlyr1
    qlcMatrix1
    RApiSerialize1
    rapport1
    rbhl1
    rbison1
    rcdk1
    Rcompression1
    readtext1
    rebird1
    RedisAPI1
    redland1
    redux1
    reeack1
    rfigshare1
    ridigbio1
    rinat1
    ritis1
    rJava1
    RJSONIO1
    rlist1
    Rmpfr1
    RMySQL1
    rnaturalearthdata1
    rnaturalearthhires1
    rncl1
    RNeXML1
    rnoaa1
    rnrfa1
    ropenaq1
    rotl1
    rowr1
    RPostgreSQL1
    rrdf1
    rredis1
    rredlist1
    rrlite1
    RSclient1
    RSelenium1
    Rserve1
    rstudioapi(>=0.5)1
    rsvg1
    rtracklayer1
    RUnit1
    S4Vectors1
    sangerseqR1
    scrapeR1
    selectr1
    seqinr1
    shiny(>=0.13.2)1
    snow1
    SnowballC1
    sofa1
    spacetime1
    spatstat1
    SSOAP1
    stringdist1
    Suggests:testthat1
    Sxslt1
    tabulizerjars1
    testthat(>=0.7)1
    tidytext1
    tidyverse1
    tiff1
    tmap1
    USAboundaries1
    USAboundariesData1
    VariantAnnotation1
    vegan1
    viridisLite1
    wdman(>=0.2.2)1
    weathermetrics1
    webmockr1
    webshot1
    wellknown1
    WikidataR1
    wikitaxa1
    withr1
    wordcloud21
    worrms1
    XMLSchema1
    xtable1
    xts1
    +
    #  summarise(count(name))
    +
    +
    +
    + + + +
    + + + +
    + + + diff --git a/docs/articles/codemeta.json b/docs/articles/codemeta.json deleted file mode 100644 index cae3e1f8..00000000 --- a/docs/articles/codemeta.json +++ /dev/null @@ -1,150 +0,0 @@ -{ - "@context": [ - "https://doi.org/doi:10.5063/schema/codemeta-2.0", - "http://schema.org" - ], - "@type": "SoftwareSourceCode", - "identifier": "testthat", - "description": "A unit testing system designed to be fun, flexible and easy to\n set up.", - "name": "testthat: Unit Testing for R", - "issueTracker": "https://github.com/hadley/testthat/issues", - "datePublished": "2016-04-23 08:37:40", - "license": "https://spdx.org/licenses/MIT", - "version": "1.0.2", - "programmingLanguage": { - "@type": "ComputerLanguage", - "name": "R", - "version": "3.4.0", - "url": "https://r-project.org" - }, - "runtimePlatform": "R version 3.4.0 (2017-04-21)", - "provider": { - "@id": "https://cran.r-project.org", - "@type": "Organization", - "name": "Central R Archive Network (CRAN)", - "url": "https://cran.r-project.org" - }, - "author": [ - { - "@type": "Person", - "givenName": "Hadley", - "familyName": "Wickham", - "email": "hadley@rstudio.com" - } - ], - "copyrightHolder": [ - { - "@type": "Organization", - "name": "RStudio" - } - ], - "maintainer": { - "@type": "Person", - "givenName": "Hadley", - "familyName": "Wickham", - "email": "hadley@rstudio.com" - }, - "softwareSuggestions": [ - { - "@type": "SoftwareApplication", - "identifier": "devtools", - "name": "devtools", - "provider": { - "@id": "https://cran.r-project.org", - "@type": "Organization", - "name": "Central R Archive Network (CRAN)", - "url": "https://cran.r-project.org" - } - }, - { - "@type": "SoftwareApplication", - "identifier": "withr", - "name": "withr", - "provider": { - "@id": "https://cran.r-project.org", - "@type": "Organization", - "name": "Central R Archive Network (CRAN)", - "url": "https://cran.r-project.org" - } - }, - { - "@type": "SoftwareApplication", - "identifier": "covr", - "name": "covr", - "provider": { - "@id": "https://cran.r-project.org", - "@type": "Organization", - "name": "Central R Archive Network (CRAN)", - "url": "https://cran.r-project.org" - } - } - ], - "softwareRequirements": [ - { - "@type": "SoftwareApplication", - "identifier": "digest", - "name": "digest", - "provider": { - "@id": "https://cran.r-project.org", - "@type": "Organization", - "name": "Central R Archive Network (CRAN)", - "url": "https://cran.r-project.org" - } - }, - { - "@type": "SoftwareApplication", - "identifier": "crayon", - "name": "crayon", - "provider": { - "@id": "https://cran.r-project.org", - "@type": "Organization", - "name": "Central R Archive Network (CRAN)", - "url": "https://cran.r-project.org" - } - }, - { - "@type": "SoftwareApplication", - "identifier": "praise", - "name": "praise", - "provider": { - "@id": "https://cran.r-project.org", - "@type": "Organization", - "name": "Central R Archive Network (CRAN)", - "url": "https://cran.r-project.org" - } - }, - { - "@type": "SoftwareApplication", - "identifier": "magrittr", - "name": "magrittr", - "provider": { - "@id": "https://cran.r-project.org", - "@type": "Organization", - "name": "Central R Archive Network (CRAN)", - "url": "https://cran.r-project.org" - } - }, - { - "@type": "SoftwareApplication", - "identifier": "R6", - "name": "R6", - "provider": { - "@id": "https://cran.r-project.org", - "@type": "Organization", - "name": "Central R Archive Network (CRAN)", - "url": "https://cran.r-project.org" - } - }, - { - "@type": "SoftwareApplication", - "identifier": "methods", - "name": "methods" - }, - { - "@type": "SoftwareApplication", - "identifier": "R", - "name": "R", - "version": "3.1.0" - } - ] -} diff --git a/docs/articles/index.html b/docs/articles/index.html index c8d3ec13..118dd064 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -70,6 +70,9 @@
  • Codemeta intro
  • +
  • + Parsing CodeMeta Data +
  • Translating between schema using JSON-LD
  • @@ -110,6 +113,7 @@

    All vignettes

    diff --git a/docs/articles/translating.html b/docs/articles/translating.html index 60054269..9862d251 100644 --- a/docs/articles/translating.html +++ b/docs/articles/translating.html @@ -47,6 +47,9 @@
  • Codemeta intro
  • +
  • + Parsing CodeMeta Data +
  • Translating between schema using JSON-LD
  • diff --git a/docs/articles/validation-in-json-ld.html b/docs/articles/validation-in-json-ld.html index fa5bfdd5..98ab93e9 100644 --- a/docs/articles/validation-in-json-ld.html +++ b/docs/articles/validation-in-json-ld.html @@ -47,6 +47,9 @@
  • Codemeta intro
  • +
  • + Parsing CodeMeta Data +
  • Translating between schema using JSON-LD
  • @@ -90,9 +93,9 @@

    2017-07-05

    Introduction

    -

    Schema validation is a useful and important concept to the distribution of metadata in formats such as XML and JSON, in which the standard-provider creates a schema (specified in an XML-schema, XSD, for XML documents, or json-schema for JSON documents). Schemas allow us to go beyond the basic notation of making sure a file is simply valid XML or valid JSON, a requriement just to be read in by any parser. By detailing how the metadata must be structured, what elements must, can, and may not be included, and what data types may be used for those elements, schema help developers consuming the data to anticipate these details and thus build applications which know how to process them. For the data creator, validation is a convenient way to catch data input errors and ensure a consistent data structure.

    +

    Schema validation is a useful and important concept to the distribution of metadata in formats such as XML and JSON, in which the standard-provider creates a schema (specified in an XML-schema, XSD, for XML documents, or json-schema for JSON documents). Schemas allow us to go beyond the basic notation of making sure a file is simply valid XML or valid JSON, a requriement just to be read in by any parser. By detailing how the metadata must be structured, what elements must, can, and may not be included, and what data types may be used for those elements, schema help developers consuming the data to anticipate these details and thus build applications which know how to process them. For the data creator, validation is a convenient way to catch data input errors and ensure a consistent data structure.

    Because schema validation must ensure predictable behavior without knowledge of what any specific application is going to do with the data, it tends to be very strict. A simple application may not care if certain fields are missing or if integers are mistaken for characters, while to another application these differences could lead it to throw fatal errors.

    -

    The approach of JSON-LD is less perscriptive. JSON-LD uses the notion of “framing” to let each application specify how it expects it data to be structured.JSON frames allow each developer consuming the data to handle many of the same issues that schema validation have previously assured.

    +

    The approach of JSON-LD is less perscriptive. JSON-LD uses the notion of “framing” to let each application specify how it expects it data to be structured. JSON frames allow each developer consuming the data to handle many of the same issues that schema validation have previously assured. Readers should consult the official json-ld framing documentation for details on this approach.

    library(jsonld)
     library(jsonlite)
     library(magrittr)
    @@ -129,11 +132,11 @@ 

    family = author$family, email = author$email, role = "aut"))

    -
    ## [[1]]
    -## [1] "Carl Boettiger <cboettig@gmail.com> [aut]"
    +
    [[1]]
    +[1] "Carl Boettiger <cboettig@gmail.com> [aut]"

    Yay, that works as expected, since our metadata had all the fields we needed. However, there’s other data that is missing in our example that could potentially cause problems for our application. For instance, our first author lists no affiliation, so the following code throws an error:

    meta$author[[1]]$affiliation
    -
    ## NULL
    +
    NULL

    If we’re processing a lot of codemeta.json and only one input file is missing the affilation, it could disrupt our whole process. If codemeta.json were perscribed be a JSON schema, we could insist in the schema that affilation could not be missing. But that feels a bit heavy-handed – many use cases may have no need for affilation. (Of course one we could just leave this problem for each developer to address explicitly with their own error handling logic, but no developer would like that).

    @@ -155,9 +158,9 @@

    getElement("@graph") %>% getElement(1) ## a piped version of [["@graph"]][[1]] meta$author[[1]]$familyName

    -
    ## [1] "Boettiger"
    +
    [1] "Boettiger"
    meta$author[[1]]$affiliation
    -
    ## NULL
    +
    NULL

    @@ -177,17 +180,17 @@

    getElement("@graph") meta[[1]]

    -
    ## $id
    -## [1] "http://orcid.org/0000-0002-1642-628X"
    -## 
    -## $type
    -## [1] "Person"
    -## 
    -## $familyName
    -## [1] "Boettiger"
    -## 
    -## $givenName
    -## [1] "Carl"
    +
    $id
    +[1] "http://orcid.org/0000-0002-1642-628X"
    +
    +$type
    +[1] "Person"
    +
    +$familyName
    +[1] "Boettiger"
    +
    +$givenName
    +[1] "Carl"

    Note that this has only returned the requested fields in the graph (along with the @id and @type, which are always included if provided, since they may be required to interpret the data properly). This frame extracts the givenName and familyName of any Person node it finds, regardless of where it occurs, while ommitting the rest of the data. Note that since the frame requests these elements at the top level, they are returned as such, with each match a separate entry in the @graph. Our example has only one person in meta[[1]], had we more matches they would appear in meta[[2]], etc. Note these returns are un-ordered.

    @@ -196,11 +199,11 @@

    The same underlying data can often be expressed in different ways, particularly when dealing with nested data. Framing can be of great help here to reshape the data into the structure required by the application. For instance, it would be natural to access the email of the maintainer in the same manner we did the author, but this fails for our example as maintainer is defined only by reference to an ID:

    meta <- fromJSON(codemeta, simplifyVector = FALSE) 
     paste("For complaints, email", meta$maintainer$email)
    -
    ## [1] "For complaints, email "
    +
    [1] "For complaints, email "

    We can confirm that maintainer is just an ID:

    meta$maintainer
    -
    ## $`@id`
    -## [1] "http://orcid.org/0000-0002-1642-628X"
    +
    $`@id`
    +[1] "http://orcid.org/0000-0002-1642-628X"

    We can use a frame with the special directive "@embed": "@always" to say that we want the full maintainer information embedded an not just referred to by id alone. Then we can subset maintainer just like we do author.

    frame <- '{
       "@context": "https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld",
    @@ -213,7 +216,7 @@ 

    getElement("@graph") %>% getElement(1)

    Now we can do

    paste("For complaints, email", meta$maintainer$email)
    -
    ## [1] "For complaints, email cboettig@gmail.com"
    +
    [1] "For complaints, email cboettig@gmail.com"

    and see that email has been successfully returned from the matching ID under author data.

    @@ -243,18 +246,18 @@

    # fromJSON(codemeta, simplifyVector = FALSE) meta$buildInstructions

    -
    ## NULL
    +
    NULL

    We just get NULL, rather than some unexpected type of object (e.g. a string that is not a URL.) Note that the data is not lost, but simply not dereferenced:

    names(meta)
    -
    ## [1] "id"                         "type"                      
    -## [3] "name"                       "codemeta:buildInstructions"
    +
    [1] "id"                         "type"                      
    +[3] "name"                       "codemeta:buildInstructions"
    meta["codemeta:buildInstructions"]
    -
    ## $`codemeta:buildInstructions`
    -## $`codemeta:buildInstructions`$type
    -## [1] "Text"
    -## 
    -## $`codemeta:buildInstructions`$`@value`
    -## [1] "Just install this package using devtools::install_github"
    +
    $`codemeta:buildInstructions`
    +$`codemeta:buildInstructions`$type
    +[1] "Text"
    +
    +$`codemeta:buildInstructions`$`@value`
    +[1] "Just install this package using devtools::install_github"

    Note that this behavior only happens because the data declared the "@type": "Text" explicitly. JSON-LD algorithms only believe what they are told about type and only look for consistency in declared types. If you give text but declare it as a "@type": "URL", or don’t declare the type at all, JSON-LD algorithms won’t know anything is amiss and the property will be compacted as usual.

    diff --git a/docs/authors.html b/docs/authors.html index 550b5fb1..072e71c5 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -70,6 +70,9 @@
  • Codemeta intro
  • +
  • + Parsing CodeMeta Data +
  • Translating between schema using JSON-LD
  • diff --git a/docs/index.html b/docs/index.html index cf34b5c2..28c2e2ce 100644 --- a/docs/index.html +++ b/docs/index.html @@ -47,6 +47,9 @@
  • Codemeta intro
  • +
  • + Parsing CodeMeta Data +
  • Translating between schema using JSON-LD
  • diff --git a/docs/news/index.html b/docs/news/index.html index fc8a2203..59547554 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -70,6 +70,9 @@
  • Codemeta intro
  • +
  • + Parsing CodeMeta Data +
  • Translating between schema using JSON-LD
  • diff --git a/docs/reference/codemeta_validate.html b/docs/reference/codemeta_validate.html index f5fec520..0682831f 100644 --- a/docs/reference/codemeta_validate.html +++ b/docs/reference/codemeta_validate.html @@ -70,6 +70,9 @@
  • Codemeta intro
  • +
  • + Parsing CodeMeta Data +
  • Translating between schema using JSON-LD
  • diff --git a/docs/reference/create_codemeta.html b/docs/reference/create_codemeta.html index e2c4ec26..0a0afb08 100644 --- a/docs/reference/create_codemeta.html +++ b/docs/reference/create_codemeta.html @@ -70,6 +70,9 @@
  • Codemeta intro
  • +
  • + Parsing CodeMeta Data +
  • Translating between schema using JSON-LD
  • diff --git a/docs/reference/crosswalk.html b/docs/reference/crosswalk.html index 75c26e4c..01f13c85 100644 --- a/docs/reference/crosswalk.html +++ b/docs/reference/crosswalk.html @@ -70,6 +70,9 @@
  • Codemeta intro
  • +
  • + Parsing CodeMeta Data +
  • Translating between schema using JSON-LD
  • diff --git a/docs/reference/index.html b/docs/reference/index.html index 2947fad9..012c1a0d 100644 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -70,6 +70,9 @@
  • Codemeta intro
  • +
  • + Parsing CodeMeta Data +
  • Translating between schema using JSON-LD
  • diff --git a/docs/reference/write_codemeta.html b/docs/reference/write_codemeta.html index 5588ef58..94939f41 100644 --- a/docs/reference/write_codemeta.html +++ b/docs/reference/write_codemeta.html @@ -70,6 +70,9 @@
  • Codemeta intro
  • +
  • + Parsing CodeMeta Data +
  • Translating between schema using JSON-LD
  • diff --git a/inst/notebook/codemeta-parsing.Rmd b/vignettes/codemeta-parsing.Rmd similarity index 60% rename from inst/notebook/codemeta-parsing.Rmd rename to vignettes/codemeta-parsing.Rmd index 1d9baf66..6b6fc98e 100644 --- a/inst/notebook/codemeta-parsing.Rmd +++ b/vignettes/codemeta-parsing.Rmd @@ -1,25 +1,44 @@ --- -title: "Parsing codemeta data" +title: "Parsing CodeMeta Data" output: github_document --- +--- +title: "Parsing Codmeta Data" +author: "Carl Boettiger" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Parsing CodeMeta Data} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + + +```{r include=FALSE} +knitr::opts_chunk$set(comment="") +``` + + +Here we illustrate some example use cases that involve parsing codemeta data. ```{r message=FALSE} library(jsonld) library(jsonlite) library(magrittr) library(codemetar) -library(tidyverse) +library(purrr) +library(dplyr) library(printr) ``` +We start with a simple example from the `codemeta.json` file of `codemetar` itself. First, we'll just generate a copy of the codemeta record for the package: + ```{r} write_codemeta("codemetar", "codemeta.json") ``` - - -Digest input with a frame: +We then digest thus input using a JSON-LD "frame." While not strictly necessary, this helps ensure the data matches the format we expect, even if the original file had errors or missing data. See the vignette "Validating in JSON-LD" in this package and the official [JSON-LD docs](https://json-ld.org/spec/latest/json-ld-framing/) for details). The `codemetar` package includes a reasonably explicit frame to get us started: ```{r} frame <- system.file("schema/frame_schema.json", package="codemetar") @@ -61,23 +80,28 @@ bibitem ## Parsing the ropensci corpus -Frame, expanding any referenced nodes +The ropensci corpus consists of a list of codemeta files for all packages provided by the rOpenSci project, . This provides a good test-case for how a large collection of codemeta files can be manipulated to help us get a better picture of the corpus. + +```{r} +download.file("https://github.com/codemeta/codemetar/raw/master/inst/notebook/ropensci.json", + "ropensci.json") +``` + + +As before, it is helpful, though not essential, to start off by framing the input data. ```{r} +frame <- system.file("schema/frame_schema.json", package="codemetar") + corpus <- jsonld_frame("ropensci.json", frame) %>% fromJSON(simplifyVector = FALSE) %>% getElement("@graph") - - - ``` - -Some basics: +We're now ready to start exploring. As usual, functions from `purrr` prove very useful for iterating through large JSON files. First, we look at some basic summary data: ```{r} - ## deal with nulls explicitly by starting with map pkgs <- map(corpus, "name") %>% compact() %>% as.character() @@ -152,7 +176,7 @@ group_by(dep) %>% ``` -Alternate approach using a frame, gets all Depends and suggests (really all `SoftwareApplication` types mentioned) +Alternate approach using a frame instead of `purrr` functions for subsetting the data. Note that this gets all Depends and suggests (really all `SoftwareApplication` types mentioned) ```{r} dep_frame <- '{ @@ -170,3 +194,7 @@ jsonld_frame("ropensci.json", dep_frame) %>% # summarise(count(name)) ``` +```{r include = FALSE} +unlink("ropensci.json") +unlink("codemeta.json") +``` diff --git a/vignettes/validation-in-json-ld.Rmd b/vignettes/validation-in-json-ld.Rmd index 1e7b11cb..aca08922 100644 --- a/vignettes/validation-in-json-ld.Rmd +++ b/vignettes/validation-in-json-ld.Rmd @@ -11,11 +11,11 @@ vignette: > ## Introduction -Schema validation is a useful and important concept to the distribution of metadata in formats such as XML and JSON, in which the standard-provider creates a schema (specified in an XML-schema, XSD, for XML documents, or [json-schema]() for JSON documents). Schemas allow us to go beyond the basic notation of making sure a file is simply valid XML or valid JSON, a requriement just to be read in by any parser. By detailing how the metadata must be structured, what elements must, can, and may not be included, and what data types may be used for those elements, schema help developers consuming the data to anticipate these details and thus build applications which know how to process them. For the data creator, validation is a convenient way to catch data input errors and ensure a consistent data structure. +Schema validation is a useful and important concept to the distribution of metadata in formats such as XML and JSON, in which the standard-provider creates a schema (specified in an XML-schema, XSD, for XML documents, or [json-schema](http://json-schema.org/) for JSON documents). Schemas allow us to go beyond the basic notation of making sure a file is simply valid XML or valid JSON, a requriement just to be read in by any parser. By detailing how the metadata must be structured, what elements must, can, and may not be included, and what data types may be used for those elements, schema help developers consuming the data to anticipate these details and thus build applications which know how to process them. For the data creator, validation is a convenient way to catch data input errors and ensure a consistent data structure. Because schema validation must ensure predictable behavior without knowledge of what any specific application is going to do with the data, it tends to be very strict. A simple application may not care if certain fields are missing or if integers are mistaken for characters, while to another application these differences could lead it to throw fatal errors. -The approach of JSON-LD is less perscriptive. JSON-LD uses the notion of "framing" to let each application specify how it expects it data to be structured.JSON frames allow each developer consuming the data to handle many of the same issues that schema validation have previously assured. +The approach of JSON-LD is less perscriptive. JSON-LD uses the notion of "framing" to let each application specify how it expects it data to be structured. JSON frames allow each developer consuming the data to handle many of the same issues that schema validation have previously assured. Readers should consult the [official json-ld framing](https://json-ld.org/spec/latest/json-ld-framing/) documentation for details on this approach. @@ -28,6 +28,11 @@ library(codemetar) ``` +```{r include=FALSE} +knitr::opts_chunk$set(comment="") +``` + + ## A motivating example: