Skip to content

Commit

Permalink
korpRequest for corpus_config
Browse files Browse the repository at this point in the history
  • Loading branch information
arildm committed Dec 13, 2024
1 parent f63f041 commit ce741d8
Show file tree
Hide file tree
Showing 7 changed files with 36 additions and 21 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

### Changed

- The `corpus_config_url` setting is replaced by `get_corpus_ids`, see [doc/frontend_devel.md](./doc/frontend_devel.md)
- The `httpConfAddMethod*` util functions were refactored:
- The `$.ajax` case of `httpConfAddMethod` was extracted into `ajaxConfAddMethod`
- `httpConfAddMethodFetch` was renamed to `fetchConfAddMethod`
Expand Down
2 changes: 2 additions & 0 deletions app/scripts/backend/common.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,13 @@ import { fetchConfAddMethod } from "@/util"
import { getAuthorizationHeader } from "@/components/auth/auth"
import settings from "@/settings"
import { API, Response } from "./types"
import { omitBy } from "lodash"

export async function korpRequest<K extends keyof API>(
endpoint: K,
params: API[K]["params"]
): Promise<Response<API[K]["response"]>> {
params = omitBy(params, (value) => value == null) as API[K]["params"]
const { url, request } = fetchConfAddMethod(settings.korp_backend_url + "/" + endpoint, params)
request.headers = { ...request.headers, ...getAuthorizationHeader() }
const response = await fetch(url, request)
Expand Down
11 changes: 11 additions & 0 deletions app/scripts/backend/types/corpus-config.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
/** @format */
/** @see https://ws.spraakbanken.gu.se/docs/korp#tag/Information/paths/~1corpus_config/get */
import { Config } from "@/settings/config.types"

export type CorpusConfigParams = {
mode: string
corpus?: string
include_lab?: string
}

export type CorpusConfigResponse = Config
5 changes: 5 additions & 0 deletions app/scripts/backend/types/index.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
/** @format */

import { CorpusConfigParams, CorpusConfigResponse } from "./corpus-config"
import { CorpusInfoParams, CorpusInfoResponse } from "./corpus-info"
import { CountParams, CountResponse } from "./count"
import { LoglikeParams, LoglikeResponse } from "./loglike"
Expand All @@ -9,6 +10,10 @@ export * from "./common"

/** Maps a Korp backend endpoint name to the expected parameters and response */
export type API = {
corpus_config: {
params: CorpusConfigParams
response: CorpusConfigResponse
}
corpus_info: {
params: CorpusInfoParams
response: CorpusInfoResponse
Expand Down
30 changes: 12 additions & 18 deletions app/scripts/data_init.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ type InfoData = Record<string, Pick<CorpusTransformed, "info" | "private_struct_
* Fetch CWB corpus info (Size, Updated etc).
*/
async function getInfoData(corpusIds: string[]): Promise<InfoData> {
if (!corpusIds.length) return {}

const params = { corpus: corpusIds.map((id) => id.toUpperCase()).join(",") }
const data = await korpRequest("corpus_info", params)
if ("ERROR" in data) {
Expand Down Expand Up @@ -98,27 +100,19 @@ async function getConfig(): Promise<Config> {
return corpusConfig
} catch {}

let configUrl: string
// The corpora to include can be defined elsewhere can in a mode
if (settings.corpus_config_url) {
configUrl = await settings.corpus_config_url()
} else {
const labParam = process.env.ENVIRONMENT == "staging" ? "&include_lab" : ""
configUrl = `${settings.korp_backend_url}/corpus_config?mode=${currentMode}${labParam}`
}
let response: Response
try {
response = await fetch(configUrl)
} catch (error) {
throw Error("Config request failed")
}
// The corpora to include are normally given by the mode config, but allow defining it elsewhere (used by Mink)
const corpusIds = settings.get_corpus_ids ? await settings.get_corpus_ids() : undefined

const config = await korpRequest("corpus_config", {
mode: currentMode,
corpus: corpusIds?.join(",") || undefined,
})

if (!response.ok) {
console.error("Something wrong with corpus config", response.statusText)
throw Error("Something wrong with corpus config")
if ("ERROR" in config) {
throw Error(config.ERROR.value)
}

return (await response.json()) as Config
return config
}

/**
Expand Down
2 changes: 1 addition & 1 deletion app/scripts/settings/app-settings.types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ export type AppSettings = {
backendURLMaxLength: number
common_struct_types?: Record<string, Attribute>
config_dependent_on_authentication?: boolean
corpus_config_url?: () => Promise<string>
get_corpus_ids?: () => Promise<string[]>
corpus_info_link?: {
url_template: string
label: LangString
Expand Down
6 changes: 4 additions & 2 deletions doc/frontend_devel.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,6 @@ settings that affect the frontend.
that may be added automatically to a corpus. See [backend documentation](https://github.com/spraakbanken/korp-backend)
for more information about how to define attributes.
- __config_dependent_on_authentication__ - Boolean. If true, backend config will not be fetched until login check has finished.
- __corpus_config_url__ - Async function returning a url string. Configuration for the selected mode is fetched from here at app initialization. If not given, the default is `<korp_backend_url>/corpus_config?mode=<mode>`, see the [`corpus_config`](https://ws.spraakbanken.gu.se/docs/korp#tag/Information/paths/~1corpus_config/get) API.
- __corpus_info_link__ - Object. Use this to render a link for each corpus in the corpus chooser.
- __url_template__ - String or translation object. A URL containing a token "%s", which will be replaced with the corpus id.
- __label__ - String or translation object. The label is the the same for all corpora.
Expand All @@ -134,6 +133,7 @@ settings that affect the frontend.
- __label__: String or translation object.
- __params__: Object. This is translated to URL search params when the link is clicked.
- __hint__: String or translation object. Can contain HTML.
- __get_corpus_ids__ - Async function returning a list of strings. The corpus ids are passed as the `corpus=` param to the `<korp_backend_url>/corpus_config?mode=<mode>` call, see the [`corpus_config`](https://ws.spraakbanken.gu.se/docs/korp#tag/Information/paths/~1corpus_config/get) API.
- __group_statistics__ - List of attribute names. Attributes that either have a rank or a numbering used for multi-word units. For example, removing `:2` from `ta_bort..vbm.1:2`, to get the lemgram of this word: `ta_bort..vbm.1`.
- __has_timespan__ - Boolean. If the backend supports the `timespan` call, used in corpus chooser for example. Default: `true`
- __hits_per_page_values__ - Array of integer. The available page sizes. Default: `[25, 50, 75, 100]`
Expand Down Expand Up @@ -228,12 +228,14 @@ If no mode is given, mode is `default`.

It then looks for mode-specific code in `<configDir>/modes/<mode>_mode.js`. Mode code may overwrite values from `config.yml` by altering the `settings` object imported from `@/settings`.

It then looks for settings for this specific mode, the **corpus config**. If it exists at `<configDir>/modes/<mode>_corpus_config.json`, it will be loaded from there. Otherwise, it retrieves it from the url given by the `corpus_config_url` option, which defaults to:
It then looks for settings for this specific mode, the **corpus config**. If it exists at `<configDir>/modes/<mode>_corpus_config.json`, it will be loaded from there. Otherwise, it retrieves it from the backend:

```
https://<korp_backend_url>/corpus_config?mode=<mode>
```

Normally, the mode param is enough for the backend to know what corpora to include. Alternatively, it is possible to specify corpus ids in the `corpus=` param, by assigning a function to the `get_corpus_ids` setting (in `<mode>_mode.js`). The function can be async and should return a list of corpus ids.

See the [`corpus_config`](https://ws.spraakbanken.gu.se/docs/korp#tag/Information/paths/~1corpus_config/get) API for more information.

## Parallel corpora
Expand Down

0 comments on commit ce741d8

Please sign in to comment.