D3.1.4 Build aggregation tools #32

tms-epcc · 2023-03-28T16:11:29Z

(D3.1.4, May 2024) Build appropriate aggregation tools, providing consensus from classifications provided by multiple volunteers

tms-epcc · 2023-08-03T16:55:42Z

03/AUG/24

Chris reported that necessary aggregations tools have been created.
Next step is to produce an accompanying report
Expect to do this report in early 2024

tms-epcc · 2023-08-30T10:16:27Z

25/AUG/23
@chrislintott reported

ongoing discussions with the EPO team regarding the extent to which this deliverable may already have been done

tms-epcc · 2024-01-19T17:31:08Z

19/JAN/24
@chrislintott reported in https://docs.google.com/document/d/13mgVp2T9EWWeuTVkvrf_WEW-xG721zaSsmiCrAdWHwo/edit
that FY24 plan includes

Batch aggregation (D 3.1.4) - provide API endpoints and tools for more sophisticated handling of aggregation, so that it can run on particular subjects or subject sets, and on a programmatic or scheduled basis. Aggregation could also be trigged via the Python client or notebook so data can be returned to RSP. Development underway: target date end Feb.

tms-epcc · 2024-01-26T10:25:37Z

26/JAN/24

@chrislintott explained the need for these batch aggregation tools arose out of the Lasair trial project Endpoint for batching of uploading #2

tms-epcc · 2024-03-27T09:47:02Z

27/MAR/24
@chrislintott reported that good progress is being made on this and currently sees no issues with the 31/MAY/24 due date.

tms-epcc · 2024-04-24T14:42:20Z

24/APR/24
@chrislintott reported

Some components have already been implemented and are being reviewed
still expect delivery to meet due date as planned

tms-epcc · 2024-05-29T08:25:31Z

29/MAY/24

as reported in FY24 Q2 QU

The existing
Aggregations code (zooniverse/aggregation-for-caesar repo) offers an offline / local solution for processing classifications and producing subject-specific summary results - greatly simplifying the post-processing required by Zooniverse project teams, but this can only be run once data has been downloaded. Instead, we want the code to run in response to a button push or API call, over a batch of recent classifications, facilitating the transfer of aggregated data back to the science platform. This required a new Zooniverse-hosted application endpoint to accept and process requests for aggregation of classification batches, which executes data ingest, extraction, reduction, and output data bundle creation. For now, the example implementation will be for binary workflows, but this could quickly be expanded. Progress has been good, with job management implemented zooniverse/aggregation-for-caesar#783) and the individual pieces of functionality ready to add to Panoptes, the Zooniverse backend (zooniverse/panoptes#4303). Integrating these components will be the first task next quarter. We also completed testing of the project copier functionality, including small bug fixes (e.g. zooniverse/panoptes#4270)

tms-epcc · 2024-11-08T09:31:45Z

From draft FY24 AE

The major piece of technical work during the year has been the delivery of more advanced handling of data produced by Zooniverse citizen science projects, with the goal of facilitating the return of data to the Rubin Science Platform. We anticipate wanting to provide users both with raw data, consisting of individual classifications (‘User X saw subject Y in task Z and provided the following annotations…’), and aggregated data (‘Subject p has score X’).

The existing Zooniverse backend assumed that requests for data required everything from a project. For long-lived projects, this can produce very large files which are hard to handle; for example, Planet Hunters:TESS currently has about 50 million classifications in its database. This meant that requests for data to the API often failed silently, due to the size of the file, and tasks such as updating aggregation were slow as they had to be run from scratch, rather than just including newer classification.

The batch aggregation project involved updates to the Zooniverse’s Panoptes back end to enable requests and processing to run only on some subset of a project’s classifications, either identified by subject set (e.g. a batch of images or light curves which were uploaded together) or by date. The result can be used both via the API which will handle requests for data from the RSP, and by internal aggregation tools which update scores for use in task allocation or machine learning.

tms-epcc · 2024-12-02T16:13:43Z

02/DEC/24

deliverable submitted for review

tms-epcc added this to the Completion of all Phase C deliverables milestone Mar 28, 2023

tms-epcc added the Epic label Mar 28, 2023

tms-epcc added the deliverable label Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

D3.1.4 Build aggregation tools #32

D3.1.4 Build aggregation tools #32

tms-epcc commented Mar 28, 2023

tms-epcc commented Aug 3, 2023

tms-epcc commented Aug 30, 2023 •

edited

Loading

tms-epcc commented Jan 19, 2024 •

edited

Loading

tms-epcc commented Jan 26, 2024 •

edited

Loading

tms-epcc commented Mar 27, 2024

tms-epcc commented Apr 24, 2024 •

edited

Loading

tms-epcc commented May 29, 2024

tms-epcc commented Nov 8, 2024

tms-epcc commented Dec 2, 2024

D3.1.4 Build aggregation tools #32

D3.1.4 Build aggregation tools #32

Comments

tms-epcc commented Mar 28, 2023

tms-epcc commented Aug 3, 2023

tms-epcc commented Aug 30, 2023 • edited Loading

tms-epcc commented Jan 19, 2024 • edited Loading

tms-epcc commented Jan 26, 2024 • edited Loading

tms-epcc commented Mar 27, 2024

tms-epcc commented Apr 24, 2024 • edited Loading

tms-epcc commented May 29, 2024

tms-epcc commented Nov 8, 2024

tms-epcc commented Dec 2, 2024

tms-epcc commented Aug 30, 2023 •

edited

Loading

tms-epcc commented Jan 19, 2024 •

edited

Loading

tms-epcc commented Jan 26, 2024 •

edited

Loading

tms-epcc commented Apr 24, 2024 •

edited

Loading