Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs #188

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

docs #188

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 3 additions & 43 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,47 +24,10 @@ knitr::opts_chunk$set(
[![Codecov test coverage](https://codecov.io/gh/lvaudor/glitter/branch/master/graph/badge.svg)](https://app.codecov.io/gh/lvaudor/glitter?branch=master)
<!-- badges: end -->

This package aims at writing and sending SPARQL queries without advanced knowledge of the SPARQL language syntax.
It makes the exploration and use of Linked Open Data (Wikidata in particular) easier for those who do not know SPARQL well.

With glitter, compared to writing SPARQL queries by hand, your code should be easier to write, and easier to read by your peers who do not know SPARQL.
The glitter package supports a "domain-specific language" (DSL) with function names (and syntax) closer to the tidyverse and base R than to SPARQL.

For instance, to find a corpus of 5 articles with a title in English and "wikidata" in that title, instead of writing SPARQL by hand you can run:

```{r}
library("glitter")
query <- spq_init() %>%
spq_add("?item wdt:P31 wd:Q13442814") %>%
spq_label(item) %>%
spq_filter(str_detect(str_to_lower(item_label), 'wikidata')) %>%
spq_head(n = 5)

query
```

Note how we were able to use `str_detect()` and `str_to_lower()` (as in the stringr package) instead of SPARQL's functions `REGEX` and `LCASE`.

To perform the query,

```{r}
spq_perform(query)
```{r child="man/rmd-fragments/intro.Rmd"}
```

To get a random subset of movies with the date they were released, you could use

```{r}
spq_init() %>%
spq_add("?film wdt:P31 wd:Q11424") %>%
spq_label(film) %>%
spq_add("?film wdt:P577 ?date") %>%
spq_mutate(date = year(date)) %>%
spq_head(10) %>%
spq_perform()
```

Note that we were able to "overwrite" the date variable, which is straightforward in dplyr, but not so much in SPARQL.

## Installation

Install this packages through R-universe:
Expand All @@ -76,13 +39,10 @@ install.packages("glitter", repos = "https://lvaudor.r-universe.dev")
Or through GitHub:

```r
install.packages("remotes") #if remotes is not already installed
remotes::install_github("lvaudor/glitter")
install.packages("pak") #if pak is not already installed
pak::pak("lvaudor/glitter")
```

## Documentation

You can access the documentation regarding package `glitter` [on its pkgdown website](http://perso.ens-lyon.fr/lise.vaudor/Rpackages/glitter/).



46 changes: 25 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](h
coverage](https://codecov.io/gh/lvaudor/glitter/branch/master/graph/badge.svg)](https://app.codecov.io/gh/lvaudor/glitter?branch=master)
<!-- badges: end -->

This package aims at writing and sending SPARQL queries without advanced
knowledge of the SPARQL language syntax. It makes the exploration and
use of Linked Open Data (Wikidata in particular) easier for those who do
not know SPARQL well.
The glitter package aims at writing and sending SPARQL queries without
advanced knowledge of the SPARQL language syntax. It makes the
exploration and use of Linked Open Data (Wikidata in particular) easier
for those who do not know SPARQL well.

With glitter, compared to writing SPARQL queries by hand, your code
should be easier to write, and easier to read by your peers who do not
Expand All @@ -38,7 +38,7 @@ query <- spq_init() %>%

query
#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
#> SELECT ?item ?item_label
#> SELECT ?item (COALESCE(?item_labell,'') AS ?item_label)
#> WHERE {
#>
#> ?item wdt:P31 wd:Q13442814.
Expand All @@ -47,8 +47,8 @@ query
#> FILTER(lang(?item_labell) IN ('en'))
#> }
#>
#> BIND(COALESCE(?item_labell,'') AS
#> ?item_label)FILTER(REGEX(LCASE(?item_label),"wikidata"))
#> BIND(COALESCE(?item_labell,'') AS ?item_label)
#> FILTER(REGEX(LCASE(?item_label),"wikidata"))
#> }
#>
#> LIMIT 5
Expand Down Expand Up @@ -83,23 +83,27 @@ spq_init() %>%
spq_head(10) %>%
spq_perform()
#> # A tibble: 10 × 3
#> film date film_label
#> <chr> <dbl> <chr>
#> 1 http://www.wikidata.org/entity/Q372 2009 We Live in Public
#> 2 http://www.wikidata.org/entity/Q595 2011 The Intouchables
#> 3 http://www.wikidata.org/entity/Q595 2011 The Intouchables
#> 4 http://www.wikidata.org/entity/Q595 2012 The Intouchables
#> 5 http://www.wikidata.org/entity/Q595 2012 The Intouchables
#> 6 http://www.wikidata.org/entity/Q593 2011 A Gang Story
#> 7 http://www.wikidata.org/entity/Q1365 1974 Swept Away
#> 8 http://www.wikidata.org/entity/Q1365 1974 Swept Away
#> 9 http://www.wikidata.org/entity/Q1365 1975 Swept Away
#> 10 http://www.wikidata.org/entity/Q1365 1975 Swept Away
#> film film_label date
#> <chr> <chr> <dbl>
#> 1 http://www.wikidata.org/entity/Q372 We Live in Public 2009
#> 2 http://www.wikidata.org/entity/Q595 The Intouchables 2011
#> 3 http://www.wikidata.org/entity/Q595 The Intouchables 2011
#> 4 http://www.wikidata.org/entity/Q595 The Intouchables 2012
#> 5 http://www.wikidata.org/entity/Q595 The Intouchables 2012
#> 6 http://www.wikidata.org/entity/Q593 A Gang Story 2011
#> 7 http://www.wikidata.org/entity/Q1365 Swept Away 1974
#> 8 http://www.wikidata.org/entity/Q2201 Kick-Ass 2010
#> 9 http://www.wikidata.org/entity/Q2201 Kick-Ass 2010
#> 10 http://www.wikidata.org/entity/Q2201 Kick-Ass 2010
```

Note that we were able to “overwrite” the date variable, which is
straightforward in dplyr, but not so much in SPARQL.

If you want to learn more about SPARQL, you could read the [Learning
SPARQL book by Bob
DuCharme](https://www.oreilly.com/library/view/learning-sparql-2nd/9781449371449/).

## Installation

Install this packages through R-universe:
Expand All @@ -111,8 +115,8 @@ install.packages("glitter", repos = "https://lvaudor.r-universe.dev")
Or through GitHub:

``` r
install.packages("remotes") #if remotes is not already installed
remotes::install_github("lvaudor/glitter")
install.packages("pak") #if pak is not already installed
pak::pak("lvaudor/glitter")
```

## Documentation
Expand Down
134 changes: 134 additions & 0 deletions man/rmd-fragments/equivalents.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
In glitter functions such as `spq_mutate()` and `spq_filter()`
you can use functions that look like R functions, for instance
`str_detect()` below:

```{r}
# Lexemes in English that match an expression
# here starting with "pota"
query <- spq_init() |>
spq_prefix(prefixes = c(dct = "http://purl.org/dc/terms/")) |>
spq_add(spq('?lexemeId dct:language wd:Q1860')) |>
spq_mutate(lemma = wikibase::lemma(lexemeId)) |>
spq_filter(str_detect(lemma, '^pota.*')) |>
spq_select(lexemeId, lemma)
```

The query looks like so in SPARQL, so `str_detect()` has been translated to REGEX.

```{r}
query
```

What functions are available?

### Functions operating on sets

```{r}
set_functions
```

Note the case of `str_c()`, whose argument `sep` will be translated to the SPARQL argument `SEPARATOR`.

```{r}
spq_init() %>%
spq_add("?film wdt:P31 wd:Q11424") %>%
spq_add("?film wdt:P921 ?subject") %>%
spq_label(subject) %>%
spq_group_by(film) %>%
spq_summarise(subject_label_concat = str_c(subject_label, sep="; ")) %>%
spq_head(10)
```

### Functions operating on terms

```{r}
term_functions
```

Example with the `lang()` function

```{r}
spq_init() %>%
spq_mutate(statement = wdt::P1843(wd::Q331676)) %>%
spq_mutate(lang = lang(statement))
```

### Miscellaneous functions

```{r}
misc_functions
```

Example with `desc()`

```{r}
spq_init() %>%
spq_add("?item wdt:P31/wdt:P279* wd:Q4022") %>%
spq_label(item) %>%
spq_add("?item wdt:P2043 ?length") %>%
spq_add("?item wdt:P625 ?location") %>%
spq_arrange(desc(length), item_label) %>%
spq_head(50)
```

### Functions operating on strings

```{r}
string_functions
```

Example with `str_detect()`.

```{r}
# Lexemes in English that match an expression
# here starting with "pota"
spq_init() |>
spq_prefix(prefixes = c(dct = "http://purl.org/dc/terms/")) |>
spq_add(spq('?lexemeId dct:language wd:Q1860')) |>
spq_mutate(lemma = wikibase::lemma(lexemeId)) |>
spq_filter(str_detect(lemma, '^pota.*')) |>
spq_select(lexemeId, lemma)
```

### Functions operating on numbers

```{r}
numeric_functions
```

Example (chemical elements)

```{r}
spq_init() %>%
spq_add("?element wdt:P31 wd:Q11344.") %>%
spq_mutate(density = wdt::P2054(element)) %>%
spq_label(element) %>%
spq_mutate(round_density = round(density))
```

### Functions operating on date-time objects

```{r}
datetime_functions
```

Example with `year()`:

```{r}
spq_init() %>%
spq_add("?film wdt:P31 wd:Q11424") %>%
spq_label(film) %>%
spq_add("?film wdt:P577 ?date") %>%
spq_mutate(date = year(date)) %>%
spq_head(10)
```

### All correspondences

```{r}
all_correspondences
```

### More correspondences?

Please open an [issue](https://github.com/lvaudor/glitter/issues) if you think we should amend or add a function.
42 changes: 42 additions & 0 deletions man/rmd-fragments/intro.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
The glitter package aims at writing and sending SPARQL queries without advanced knowledge of the SPARQL language syntax.
It makes the exploration and use of Linked Open Data (Wikidata in particular) easier for those who do not know SPARQL well.

With glitter, compared to writing SPARQL queries by hand, your code should be easier to write, and easier to read by your peers who do not know SPARQL.
The glitter package supports a "domain-specific language" (DSL) with function names (and syntax) closer to the tidyverse and base R than to SPARQL.

For instance, to find a corpus of 5 articles with a title in English and "wikidata" in that title, instead of writing SPARQL by hand you can run:

```{r}
library("glitter")
query <- spq_init() %>%
spq_add("?item wdt:P31 wd:Q13442814") %>%
spq_label(item) %>%
spq_filter(str_detect(str_to_lower(item_label), 'wikidata')) %>%
spq_head(n = 5)

query
```

Note how we were able to use `str_detect()` and `str_to_lower()` (as in the stringr package) instead of SPARQL's functions `REGEX` and `LCASE`.

To perform the query,

```{r}
spq_perform(query)
```

To get a random subset of movies with the date they were released, you could use

```{r}
spq_init() %>%
spq_add("?film wdt:P31 wd:Q11424") %>%
spq_label(film) %>%
spq_add("?film wdt:P577 ?date") %>%
spq_mutate(date = year(date)) %>%
spq_head(10) %>%
spq_perform()
```

Note that we were able to "overwrite" the date variable, which is straightforward in dplyr, but not so much in SPARQL.

If you want to learn more about SPARQL, you could read the [Learning SPARQL book by Bob DuCharme](https://www.oreilly.com/library/view/learning-sparql-2nd/9781449371449/).
2 changes: 1 addition & 1 deletion vignettes/articles/explore.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ When in doubt, add a `spq_head()` in your query pipeline, to ask less at a time,
## Asking for a subset of all triples

In the code below we'll ask for 10 triples.
Note that we use the `endpoint` argument of `spq_perform()` to indicate where to send the query, as well as the `request_type` argument.
Note that we use the `endpoint` argument of `spq_init()` to indicate where to send the query, as well as the `request_type` argument.

How can one know whether a service needs `request_type = "body-form"`?

Expand Down
Loading