Overview

Information Management Code Registry Hackathon

Synopsis:

The Environmental Data Initiative (EDI) is hosting a hackathon on June 12 to 14, 2018 at the University of New Mexico, Albuquerque, NM. The event brings together a diverse group of people to collaboratively design and work on code that supports the activities of information managers and scientists working with environmental data. The event is designed as a hands-on, participant driven workshop. Participants will break into subgroups to collaboratively tackle self-selected work targets.

Motivation and Goals:

A code registry for environmental scientists who have information management needs will be a valuable community resource. Substantial code has been written for cleaning, manipulating, formatting, documenting, and archiving environmental data sets, but no mechanism exists for code producers to share their code in an easily discoverable and reusable way. The ability to discover well-documented and re-usable code through the Code Registry will make information processing more efficient for many environmental scientists.

Goals for the hackathon are to:

Produce code solutions to participant-supplied data management use cases
Populate the Information Management Code Registry portal with metadata about the tools developed
Establish a community to support the Information Management Data Registry mission

Call for Code Project Use Cases:

We generally expect work targets to be code that scientists and information managers can re-use to clean, format, transform, document and archive environmental data sets. Work targets that align well with participants’ own professional interests are generally more desirable than others that do not, because they typically get more traction, and are more likely to be continued in some way after the event.

Hackathon participants should come prepared to discuss at least one use case for information management-related code. From common participant needs, subgroups will form around selected work targets. In advance of the meeting, please add your use cases as an issue to: https://github.com/IMCR-Hackathon/HackathonCentral/issues. As you consider use cases, think about the code you have created in the past that might have applications for other users. Also consider the types of code you really wish that you had had when faced with a data management challenge. The only constraint on use cases is that they involve tabular data.

Here is an example use case: Co-organizer Kristin often receives Excel files that contain data collected over many years, managed originally in separate Excel files, and then concatenated into a multi-year dataset for archiving. Data management was done by different people each year, so the multi-year file contains issues such as: 1) dates in several formats in the same column, 2) years of data where ‘NS’ or ‘ns’ was entered for missing data (in an otherwise numeric column) while in other years missing data is coded as -9999 or the cell left empty, and 3) coded variables where more than one code is used for the same thing: e.g., GumboLimbo is the same place as ‘GL’. A program that would provide a report on the issues needing corrected in Excel files before archiving would be very useful. Code that could be used to search and replace for particular variables to correct some of these issues would also be welcome so that the script could be re-used year after year.

An example dataset related to your use case may be useful for describing the code needed. Please plan to bring an example dataset to work on with you to the hackathon.

Implementation:

The Information Management Code Registry will be implemented in Ontosoft. OntoSoft is developing a system for software stewardship that offers assistance with metadata capture, open source publication, and dissemination of code through a “software commons”. We will register the code products from this hackathon in this Ontosoft Portal: http://edi.ontosoft.org/ The code itself will be in github of other code hosting platform.

Provide feedback

Saved searches