Skip to content

Commit

Permalink
Enable downloading Czech model (#3)
Browse files Browse the repository at this point in the history
* Enable downloading Czech model from UFAL
  • Loading branch information
skvrnami authored Oct 18, 2023
1 parent d9a78c6 commit a1c7541
Show file tree
Hide file tree
Showing 5 changed files with 40 additions and 18 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: nametagger
Type: Package
Title: Named Entity Recognition in Texts using 'NameTag'
Version: 0.1.3
Version: 0.1.4
Authors@R: c(
person('Jan', 'Wijffels', role = c('aut', 'cre', 'cph'), email = '[email protected]'),
person('BNOSAC', role = 'cph'),
Expand All @@ -15,7 +15,7 @@ URL: https://github.com/bnosac/nametagger
License: MPL-2.0
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.2
RoxygenNote: 7.2.3
Depends: R (>= 2.10)
Imports: Rcpp (>= 0.11.5), utils
Suggests: udpipe (>= 0.2)
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## CHANGES IN nametagger VERSION 0.1.4

- nametagger_download_model now allows to download a model for Czech: czech-cnec-140304

## CHANGES IN nametagger VERSION 0.1.3

- Add explicit initialization to silence false positive valgrind report in compressor_save.cpp
Expand Down
38 changes: 27 additions & 11 deletions R/nametagger.R
Original file line number Diff line number Diff line change
Expand Up @@ -496,27 +496,43 @@ print.nametagger_options <- function(x, ...){
}



#' @title Download a Nametag model
#' @description Download a Nametag model. Note that models have licence CC-BY-SA-NC.
#' More details at \url{https://ufal.mff.cuni.cz/nametag/1}.
#' @param language 'english-conll-140408'
#' @param language Language model to download, 'english-conll-140408' (default) or 'czech-cnec-140304'
#' @param model_dir a path where the model will be downloaded to.
#' @return an object of class nametagger
#' @references \url{https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3118}
#' @references
#' \url{http://ufal.mff.cuni.cz/nametag/users-manual}
#' \url{https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3118}
#' \url{https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-7D42-8}
#' @export
#' @examples
#' \donttest{
#' model <- nametagger_download_model("english-conll-140408", model_dir = tempdir())
#' model <- nametagger_download_model("czech-cnec-140304", model_dir = tempdir())
#' }
nametagger_download_model <- function(language = c("english-conll-140408"), model_dir = tempdir()){
nametagger_download_model <- function(language = c("english-conll-140408", "czech-cnec-140304"), model_dir = tempdir()){

language <- match.arg(language)
f <- file.path(tempdir(), "english-conll-140408.zip")
download.file(url = "https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-3118/english-conll-140408.zip?sequence=1&isAllowed=y",
destfile = f, mode = "wb")
f <- utils::unzip(f, exdir = tempdir(), files = "english-conll-140408/english-conll-140408.ner")
from <- file.path(tempdir(), "english-conll-140408/english-conll-140408.ner")
to <- file.path(model_dir, "english-conll-140408.ner")

f <- file.path(tempdir(), paste(language, ".zip", sep = ""))
switch (language,
"english-conll-140408" = {
url <- "https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-3118/english-conll-140408.zip?sequence=1&isAllowed=y"
download.file(url = url, destfile = f, mode = "wb")
ner_file_path <- "english-conll-140408/english-conll-140408.ner"
},
"czech-cnec-140304" = {
url <- "https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11858/00-097C-0000-0023-7D42-8/czech-cnec-140304.zip?sequence=1&isAllowed=y"
download.file(url = url, destfile = f, mode = "wb")
ner_file_path <- "czech-cnec-140304/czech-cnec2.0-140304.ner"
}
)

f <- utils::unzip(f, exdir = tempdir(), files = ner_file_path)
from <- file.path(tempdir(), ner_file_path)
to <- file.path(model_dir, paste(language, ".ner", sep = ""))
file.copy(from, to = to, overwrite = TRUE)
nametagger_load_model(to)
}
}
3 changes: 1 addition & 2 deletions man/nametagger.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 6 additions & 3 deletions man/nametagger_download_model.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit a1c7541

Please sign in to comment.