Skip to content

Commit

Permalink
Release 0.7.0 (#303)
Browse files Browse the repository at this point in the history
* WIP layout updates

* recent changes

* copy

* small doc updates

* copy change

* Release 0.7.0

* Fix encoding to utf8 in label config and completions. (#300)

* Pre release 0.7.0 fixes (#302)

* fix template, add exception treatment

* update converter

* fix error handling on export

Co-authored-by: nik <[email protected]>

* Feature/cloudstorage (#299)

* fix context

* fix multi session context

* source/target storages, s3 support

* fix predictions

* uri resolver & simple background jobs

* fix None prefix

* set daemon thread

* add storages forms

* add gcs support

* register gcs storage

* Tasks pagination (#298)

* Task pagination.

* Some.

* Fixes.

* Fix.

* Fixes.

* update requirements

* fix base/s3 storages

* fix s3

* fix FormMeta

* signed urls for gs

* hide manage buttons on GCS

* Fixes.

* Storage settings endpoint.

* api storage settings update

* More.

* Fixes

* fix typo

* get available storage, move api settings

* fix can manage completions

* Some.

* Some.

* Fix.

* Fix.

* filesystem bucket, fix sync

* fix exception'

* Some.

* Errors in UI.

* changing the URL

* get/post api for forms

* Fixes.

* handle target storage

* form field autobound

* Some.

* Some.

* build form with request.json

* fix redundant imports

* Some.

* Some.

* Fixes.

* fix blobs, purify code

* prepend storage class name

* comment error handling on get

* Some.

* Some.

* Before cache.

* Stable.

* load next task

* fix update after regex changed

* fix keys

* tasks json & completions storages

* separate source/target storage names

* make dict names

* validate cloud storage connections

* Fixes.

* extend filters

* validate connection on sync

* fix target filters

* Common error print.

* fix path for BaseForm

* validate storage before create

* Some.

* Project dict to tasks template.

* Some.

* Some.

* dont remove unexisted completion

* fix can manage tasks permissions

* Some.

* can delete tasks flag

* draft doc

* doc params fix

* add error handler for get_value within thread

* Some.

* add logging, can delete tasks, don't return dict on error

* Some.

* deepcopy tasks on saving completion

* Some.

* Some.

* Some.

* Some.

* Fix.

* Timer to 5 sec.

* Fix.

* includes fix.

* polyfill.

* polyfill.js

* ie

* small changes to the layout and copy

* fix invalid completions, reduce forms, add blob url as parameter

* add local copy option target storage,fix completion list

* Some.

* no local copy for source storage

* Some.

* some updates for words

* before remove select in storages.

* fix init old projects

* fix old config load on start

* Some.

* dont output creation times when created_at is absent

* Some.

* Some.

* Some.

* toast in errors.

* fix docs commands

* Fixes.

* fix perms

* Pretty.

* Version check.

Co-authored-by: nik <[email protected]>
Co-authored-by: Max <[email protected]>
Co-authored-by: Mikhail Maluyk <[email protected]>

* Encoding fixes.

* New LS build.

* Fixes with welcome page.

* 0.7.0rc1

* images, blog, copy

* doc updates

* Update README.md

* fix create local copy,enhance cmd docstrings,skip syncing on empty regex

* v0.7.0rc2

* some UI & doc fixes

* add storage.is_syncing flag

* Fixes.

* Error treatment on all pages.

* Paths in storages.

* copy_local checkbox hide.

* supported formats fix

* Storage sync in progress.

* Default data_key in BaseStorage.

* is_syncing for BaseStorage.

* use_blob_urls=True as default.

* completed_at to undefined if not defined.

* fix force option while running docker

* v0.7.0

Co-authored-by: Mikhail Maluyk <[email protected]>
Co-authored-by: niklub <[email protected]>
Co-authored-by: nik <[email protected]>
  • Loading branch information
4 people authored May 29, 2020
1 parent 64b6cc8 commit aa417df
Show file tree
Hide file tree
Showing 63 changed files with 3,683 additions and 841 deletions.
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
include label_studio/examples/*/*.xml
recursive-include label_studio/static *
include label_studio/templates/includes/*.html
include label_studio/templates/*.html
include label_studio/utils/schema/*.json
include label_studio/logger.json
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

<br/>

> **NEW** Release 0.6.0: Nested & Per-Region Labeling, Filtering & Cueing the active labels, read the [release notes](https://labelstud.io/blog/release-060-nested-data-labeling.html).
> **NEW** Release 0.7.0 - Cloud Storage Enablement: read the [release notes](https://labelstud.io/blog/release-070-cloud-storage-enablement.html).
**Label Studio is a swiss army knife of data labeling and annotation tools :v:**

Expand Down Expand Up @@ -203,13 +203,14 @@ Label Studio for Teams is our enterprise edition (cloud & on-prem), that include

```tex
@misc{Label Studio,
title={{Label Studio}: A Swiss Army Knife of Data Labeling and Annotation Tools},
title={{Label Studio}: Data labeling software},
url={https://github.com/heartexlabs/label-studio},
note={Open source software available from https://github.com/heartexlabs/label-studio},
author={
Maxim Tkachenko and
Mikhail Malyuk and
Nikita Shevchenko and
Andrey Holmanyuk
Nikolai Liubimov},
year={2020},
}
Expand Down
24 changes: 19 additions & 5 deletions docs/source/blog/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -131,16 +131,16 @@
<!-- </a> -->
<!-- </div> -->

<!-- Release 0.6.0 -->
<!-- Release 0.7.0 -->
<div class="column">
<a href="/blog/release-060-nested-data-labeling.html">
<a href="/blog/release-070-cloud-storage-enablement.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/images/release-060/nested_labeling.gif); background-size:cover" class="image"></div>
<div style="background-image: url(/images/release-070/s3-mascot-04.png); background-size:cover" class="image"></div>
</div>
<div class="category">release notes</div>
<div class="desc">8 May 2020, 7 min read</div>
<div class="title">Label Studio 0.6.0 Release - Nested Data Labeling</div>
<div class="desc">29 May 2020, 5 min read</div>
<div class="title">Label Studio 0.7.0 Release - Cloud Storage Enablement</div>
</div>
</a>
</div>
Expand All @@ -157,6 +157,20 @@
</div>
</div>

</div>

<!-- Release 0.6.0 -->
<div class="column">
<a href="/blog/release-060-nested-data-labeling.html">
<div class="card">
<div class="image-wrap">
<div style="background-image: url(/images/release-060/nested_labeling.gif); background-size:cover" class="image"></div>
</div>
<div class="category">release notes</div>
<div class="desc">8 May 2020, 7 min read</div>
<div class="title">Label Studio 0.6.0 Release - Nested Data Labeling</div>
</div>
</a>
</div>

<!-- Release 0.5.0 -->
Expand Down
56 changes: 56 additions & 0 deletions docs/source/blog/release-070-cloud-storage-enablement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: Label Studio Release Notes 0.7.0 - Cloud Storage Enablement
type: blog
order: 100
---

Just a couple of weeks after our 0.6.0 release, we’re happy to announce a new big release. We’ve started the discussion about the cloud months ago, and as the first step in simplifying the integration, we’re happy to introduce cloud storage connectors, like AWS S3.

We’re also very interested to learn more from you about your ML pipelines, if you’re interested in having a conversation, please ping us on [Slack](https://join.slack.com/t/label-studio/shared_invite/zt-cr8b7ygm-6L45z7biEBw4HXa5A2b5pw).

<br/>
<img src="/images/release-070/s3-mascot-04.png" />

## Connecting cloud storage

You can configure label studio to synchronize labeling tasks with your s3 or gcp bucket, potentially filtering by a specific prefix or a file extension. Label Studio will take that list and generate pre-signed URLs each time the task is shown to the annotator.

<br/>
<img src="/images/release-070/configure-s3.gif" class="gif-border" />

There are several ways how label studio can load the file, either as a URL or as a blob therefore, you can store the list of tasks or the assets themselves and load that.

<br/>
<img src="/images/release-070/s3-config.png" class="gif-border" />

You can configure it to store the results back to s3/gcp, making Label Studio a part of your data processing pipeline. Read more about the configuration in the docs [here](/guide/storage.html).

## Frontend package updates

Finally with a lot of [work](https://github.com/heartexlabs/label-studio-frontend/pull/75) from [Andrew](https://github.com/hlomzik) there is an implementation of frontend testing. This will make sure that we don’t break things when we introduce new features. Along with that another Important part — improved building and publishing process, configured CI. Now the npm frontend package will be published along with the pip package.

## Labeling Paragraphs and Dialogues

Introducing a new object tag called “Paragraphs”. A paragraph is a piece of text with potentially additional metadata like the author and the timestamp. With this tag we’re also experimenting now with an idea of providing predefined layouts. For example to label the dialogue you can use the following config: `<Paragraphs name=“conversation” value=“$conv” layout=“dialogue” />`

<br/>
<img src="/images/release-070/dialogues.png" class="gif-border" />

This feature is available in the [enterprise version](https://heartex.ai/) only

## Different shapes on the same image

One limitation label studio had was the ability to use only one shape on the same image, for example, you were able to put either bounding boxes or polygons. Now this limitation is waived and you can define different label groups and connect those to the same image.

<br/>
<img src="/images/release-070/multiple-tools.gif" class="gif-border" />

## maxUsages

There are a couple of ways how you can make sure that the annotation is being performed in full. One of these concepts is a `required` flag, and we’ve created a new one called `maxUsages`. For some datasets you know how much objects of a particular type there is, therefore you can limit the usage of specific labels.

## Bugfixes and Enhancements
- Allow different types of shapes to be used in the same image. For example you can label the same image using both rectangles and ellipses.
- Fixing double text deserialization https://github.com/heartexlabs/label-studio-frontend/pull/85
- Fix bug with groups of required choices https://github.com/heartexlabs/label-studio-frontend/pull/74
- Several fixes for NER labeling — empty captured text, double clicks, labels appearance
125 changes: 125 additions & 0 deletions docs/source/guide/storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
title: Cloud storages
type: guide
order: 101
---

You can integrate the popular cloud storage with Label Studio, collect new tasks uploaded to your buckets, and sync back annotation results to use them in your machine learning pipelines.

Cloud storage type and bucket need to be configured during the start of the server, and further configured during the runtime via UI.

You can configure one or both:

- _source storage_ (where tasks are stored)
- _target storage_ (where completions are stored)

The connection to both storages is synced, so you can see new tasks after uploading them to the bucket without restarting Label Studio.

The parameters like prefix or matching filename regex could be changed any time from the webapp interface.

## Amazon S3

To connect your [S3](https://aws.amazon.com/s3) bucket with Label Studio, be sure you have programmatic access enabled. [Check this link](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration) to learn more how to set up access to your S3 bucket.

### Create connection on startup

The following commands launch Label Studio, configure the connection to your S3 bucket, scan for existing tasks, and load them into the labeling app.

#### Read bucket with JSON-formatted tasks

```bash
label-studio start --init --source s3 --source-path my-s3-bucket
```


#### Write completions to bucket

```bash
label-studio start --init --target s3-completions --target-path my-s3-bucket
```

### Working with Binary Large OBjects (BLOBs)

When you are storing BLOBs in your S3 bucket (like images or audio files), you might want to use then as is, by generating URLs pointing to those objects (e.g. `gs://my-s3-bucket/image.jpg`)
Label Studio allows you to generate input tasks with corresponding URLs automatically on-the-fly. You can to this either specifying `--source-params` when launching app:

```bash
label-studio start --init --source s3 --source-path my-s3-bucket --source-params "{\"data_key\": \"my-object-tag-$value\", \"use_blob_urls\": true}"
```

You can leave `"data_key"` empty (or skip it at all) then LS generates it automatically with the first task key from label config (it's useful when you have only one object tag exposed).


### Optional parameters

You can specify additional parameters with the command line escaped JSON string via `--source-params` / `--target-params` or from UI.

#### prefix

Bucket prefix (typically used to specify internal folder/container)

#### regex

A regular expression for filtering bucket objects

#### create_local_copy

If set true, the local copy of the remote storage will be created.

#### use_blob_urls

Generate task data with URLs pointed to your bucket objects(for resources like jpg, mp3, etc). If not selected, bucket objects will be interpreted as tasks in Label Studio JSON format, one object per task.


## Google Cloud Storage

To connect your [GCS](https://cloud.google.com/storage) bucket with Label Studio, be sure you have enabled programmatic access. [Check this link](https://cloud.google.com/storage/docs/reference/libraries) to learn more about how to set up access to your GCS bucket.


### Create connection on startup

The following commands launch Label Studio, configure the connection to your GCS bucket, scan for existing tasks, and load them into the app for the labeling.

#### Read bucket with JSON-formatted tasks

```bash
label-studio start --init --source gcs --source-path my-gcs-bucket
```

#### Write completions to bucket

```bash
label-studio start --init --target gcs-completions --source-path my-gcs-bucket
```

### Working with Binary Large OBjects (BLOBs)

When you are storing BLOBs in your GCS bucket (like images or audio files), you might want to use then as is, by generating URLs pointing to those objects (e.g. `gs://my-gcs-bucket/image.jpg`)
Label Studio allows you to generate input tasks with corresponding URLs automatically on-the-fly. You can to this either specifying `--source-params` when launching app:

```bash
label-studio start --init --source gcs --source-path my-gcs-bucket --source-params "{\"data_key\": \"my-object-tag-$value\", \"use_blob_urls\": true}"
```

You can leave `"data_key"` empty (or skip it at all) then LS generates it automatically with the first task key from label config (it's useful when you have only one object tag exposed).


### Optional parameters

You can specify additional parameters with the command line escaped JSON string via `--source-params` / `--target-params` or from UI.

#### prefix

Bucket prefix (typically used to specify internal folder/container)

#### regex

A regular expression for filtering bucket objects

#### create_local_copy

If set true, the local copy of the remote storage will be created.

#### use_blob_urls

Generate task data with URLs pointed to your bucket objects(for resources like jpg, mp3, etc). If not selected, bucket objects will be interpreted as tasks in Label Studio JSON format, one object per task.
23 changes: 22 additions & 1 deletion docs/themes/htx/layout/layout.ejs
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@
<%- css(isIndex ? 'css/index' : 'css/page') %>
<%- css('css/search') %>

<style>
</style>

<% if (page.type === "playground") { %>
<%- css('css/codemirror') %>
<script src="<%- url_for("/js/jquery.min.js") %>"></script>
Expand All @@ -56,7 +59,7 @@
<!-- this needs to be loaded before guide's inline scripts -->

</head>
<body class="<%- isIndex ? '' : 'docs' -%>">
<body >
<div id="mobile-bar" <%- isIndex || isBlog ? 'style="display: none"' : '' %>>
<a class="menu-button"></a>
</div>
Expand All @@ -73,5 +76,23 @@
<script src="<%- url_for("/js/css.escape.js") %>"></script>
<script src="<%- url_for("/js/common.js") %>"></script>
<%- partial('partials/google_analytics') %>
<script>
window.onscroll = function() {myFunction()};
// Get the header
var header = document.getElementById("header");
// Get the offset position of the navbar
var sticky = header.offsetTop;
// Add the sticky class to the header when you reach its scroll position. Remove "sticky" when you leave the scroll position
function myFunction() {
if (window.pageYOffset > sticky) {
header.classList.add("sticky");
} else {
header.classList.remove("sticky");
}
}
</script>
</body>
</html>
5 changes: 5 additions & 0 deletions docs/themes/htx/layout/partials/header.ejs
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
<div id="header">
<div class="header">
<a id="logo" href="<%- url_for("/") %>">
<!-- <img src="<%- url_for("/images/opossum/heartex_icon_opossum_green.svg") %>" alt="label studio logo" -->
<!-- height="180"/> -->

<img src="<%- url_for("/images/ls_logo.png") %>" alt="label studio logo" />
<span style="font-size: 1.2em;">Label Studio</span>
</a>
<ul id="nav" style="display: flex; align-items: center">
<%- partial('partials/main_menu', { context: 'nav' }) %>
</ul>
</div>
</div>


Loading

0 comments on commit aa417df

Please sign in to comment.