Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasette API #2

Open
wants to merge 22 commits into
base: local_datasette_testing
Choose a base branch
from
Open

Conversation

jeremy-wayland
Copy link
Collaborator

@jeremy-wayland jeremy-wayland commented Aug 30, 2024

Overview

This update enhances the codebase to directly interact with the online version of our dataset, enabling dynamic data retrieval and analysis. Key features include:
Data Access: Use SQL queries to extract specific HSAs from the Datasette instance and load them locally.
Network Generation: Create network structures from raw interaction data, facilitating advanced analysis.
Feature Computation: Compute curvature-filtrations and standard networkx features on the fly.
Visualization: Explore networks visually, highlighting their structural features.

The Apparent class will serve as the primary interface for users, orchestrating data retrieval, feature computation, analysis, and visualization. Specialized functionality related to network operations will be implemented through dedicated subclasses within the networks submodule.

Structure

├── README.md
├── apparent 
│   ├── __init__.py
│   ├── apparent.py # Main Object to query from Datasette and drive downstream computations!
│   └── networks
│       ├── __init__.py
│       ├── build.py # Methods for building networks. Support default and allow for customisation.
│       ├── cluster.py # Methods for clustering networks.
│       ├── compare.py # Methods for comparing networks.
│       ├── describe.py # Methods for describing networks/computing network features.
│       ├── embed.py # Methods for embedding networks.
├── notebooks
│   └── tutorial.ipynb # Tutorial Notebook on how to use `Apparent`. Maybe can add an advanced network analysis as well.
├── poetry.lock
├── pyproject.toml
├── queries
│   └── sample.sql #Hold some sample queries that will also be available in the Datasette

Tasks

  • Restructure Functionality into usable objects
  • Introduce an Apparent object that will handle data fetching and drive communcation with networks functionality.

Once Apparent grabs a subset of HSAs, we want to support the following functionality:

  • Compute new features of these networks with describe.py (new curvatures, node/edge features etc.)
  • Support Embedding methods for networks (inspo is embedding pairwise curvature-filtration distances) in embed.py
  • Visualize "space" of networks (2D representation of original embedding)
  • Run clustering algorithms on networks to subset further based on structure and visualize :)

@jeremy-wayland jeremy-wayland self-assigned this Nov 18, 2024
Copy link
Collaborator

@emsimons emsimons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Klingt ganz toll!

File restructuring & naming intuitively makes sense - I think it will be immediately apparent to the user where to find what they are looking for. :)

@jeremy-wayland jeremy-wayland marked this pull request as ready for review December 12, 2024 14:53
@jeremy-wayland jeremy-wayland changed the base branch from main to local_datasette_testing December 12, 2024 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants