Skip to content

Commit

Permalink
Update dj_querying_data notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
lochhh committed Feb 25, 2025
1 parent 2133d43 commit 2db32ab
Showing 1 changed file with 60 additions and 104 deletions.
164 changes: 60 additions & 104 deletions src/user/how_to/dj_querying_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,31 +8,22 @@
"# DataJoint pipeline: Querying data\n",
"\n",
":::{important}\n",
"Before getting started, ensure that you have a [DataJoint pipeline deployed](target-dj-pipeline-deployment) and that [data has been ingested into the pipeline](target-dj-data-ingestion-processing).\n",
"This guide assumes you have a [DataJoint pipeline deployed](target-dj-pipeline-deployment) with [data already ingested](target-dj-data-ingestion-processing).\n",
":::\n",
"\n",
"This notebook provides examples of how to query data from the [Aeon DataJoint pipeline](target-aeon-dj-pipeline). \n",
"This guide provides examples of how to query data from the [Aeon DataJoint pipeline](target-aeon-dj-pipeline) using DataJoint's various [query operators](datajoint:docs/core/datajoint-python/0.14/query/operators/), such as restriction (`&`, `-`), projection (`proj`), joining (`*`), and aggregation (`aggr`). These together enable powerful data manipulations and flexible data analysis workflows.\n",
"\n",
":::{note}\n",
"The examples in this notebook use the [Single mouse in a foraging assay](sample-data-single-mouse-foraging:) dataset, specifically the experiment named `social0.2-aeon3`. \n",
"The examples here use the [Single mouse in a foraging assay](sample-data-single-mouse-foraging:) dataset for the experiment named `social0.2-aeon3`. \n",
"If you are using a different dataset, be sure to replace the experiment name and parameters in the code below accordingly.\n",
":::"
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[2025-02-06 17:08:32,281][INFO]: Connecting [email protected]:3306\n",
"[2025-02-06 17:08:32,423][INFO]: Connected [email protected]:3306\n"
]
}
],
"outputs": [],
"source": [
"from aeon.dj_pipeline import acquisition, tracking\n",
"from aeon.dj_pipeline.analysis import block_analysis"
Expand All @@ -42,14 +33,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"DataJoint offers various [query operators](datajoint:docs/core/datajoint-python/0.14/query/operators/), that allow for powerful data manipulations. These include restriction (&), projection, joining (*), and aggregation, enabling flexible data analysis workflows."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Querying Acquisition Data"
"## Acquisition data"
]
},
{
Expand All @@ -60,7 +44,8 @@
}
},
"source": [
"The acquisition module manages raw data collected during experiments. We'll start by exploring the `acquisition.Chunk` table, which stores metadata about discrete time chunks and the associated raw data files for each experiment."
"The [DataJoint acquisition module](target-aeon-dj-pipeline-acquisition-tables) manages raw data collected during experiments. \n",
"We will start by exploring the `acquisition.Chunk` table, which stores metadata about discrete time {term}`chunks <Acquisition Chunk>` and the associated raw data files for each experiment."
]
},
{
Expand Down Expand Up @@ -177,26 +162,29 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This command retrieves all records from the `Chunk` table. While comprehensive, it's often more efficient to filter the data using specific criteria."
"The command above retrieves all records for all experiments from the `Chunk` table.\n",
"Since we only ingested the sample data containing a 2-hour snippet of experiment `social0.2-aeon3`, the query returns the two records corresponding to each hour of the experiment.\n",
"In most cases, you would have multiple experiments in the database and the query would return more records.\n",
"Thus, it is often more efficient to apply specific criteria to filter the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Restricting Data by Experiment"
"### Restricting data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As an example, we can use the [restriction operator `&`](datajoint:docs/core/datajoint-python/0.14/query/operators/#restriction) to focus on data from a specific experiment. For that, we define a restriction key. "
"For example, we can use the [restriction operator `&`](datajoint:docs/core/datajoint-python/0.14/query/operators/#restriction) with a restriction key to specify the `experiment_name` (e.g. `social0.2-aeon3`) for which we want to retrieve the {term}`chunks <Acquisition Chunk>`. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -302,32 +290,24 @@
],
"source": [
"experiment_key = {\"experiment_name\": \"social0.2-aeon3\"}\n",
"acquisition.Chunk & experiment_key\n"
"acquisition.Chunk & experiment_key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All {term}`chunks <Acquisition Chunk>` associated with the specific experiment `social0.2-aeon3` have been retrieved using the restriction operator `&`. \n"
"## Tracking data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Querying Tracking Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tracking data provides detailed information on the movement and positioning of subjects. We'll apply the same experiment filter to explore different tracking-related tables.\n",
"\n",
"Let's apply the same restriction by experiment to query position tracking data with [SLEAP](sleap:). The `SLEAPTracking` table contains the position tracking of object(s) from a particular `VideoSource` per chunk.\n",
"The [DataJoint tracking module](target-aeon-dj-pipeline-tracking-tables) manages the position tracking data produced by different tracking software.\n",
"Here we continue using the same restriction key to explore tracking data associated with the experiment `social0.2-aeon3`. \n",
"\n",
"The same restriction key for the experiment `social0.2-aeon3` is applied:"
"The `SLEAPTracking` table stores the [SLEAP](sleap:) tracking data of the subject(s) in the experiment for each chunk of video recorded from a particular [camera device](target-module-camera)."
]
},
{
Expand Down Expand Up @@ -444,7 +424,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The `PoseIdentity` table identifies each subject and records the body part used as an \"anchor\" for tracking purposes:\n"
"The `PoseIdentity` table contains information that identifies each subject and records the body part used as the anchor point in the [SLEAP top-down-id-model network](sleap:develop/api/sleap.nn.config.model.html#sleap.nn.config.model.MultiClassTopDownConfig)."
]
},
{
Expand Down Expand Up @@ -599,7 +579,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Query the `Part` table to obtain x, y coordinates for all tracked body parts over time:"
"The `Part` table contains the x, y coordinates of tracked body parts for each video frame.\n",
"In this dataset, we only tracked each subject's `centroid`."
]
},
{
Expand Down Expand Up @@ -775,21 +756,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Querying Block-Level Data"
"## Block-level data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`BlockAnalysis` contains block-level data that aggregates experimental events into defined time blocks. This allows for higher-level analyses, such as behavioral trends over extended periods.\n",
"The [DataJoint block analysis module](target-aeon-dj-pipeline-analysis-tables) contains tables that aggregate experimental events into defined time {term}`blocks <Block>`.\n",
"This allows for higher-level analyses, such as behavioural trends over extended periods.\n",
"\n",
"Let's filter blocks using the same restriction key for experiment `social0.2-aeon3` and for blocks longer than 2 hours:"
"### Restricting data\n",
"Using another restriction operator `&`, we can further filter the `Block` and `BlockAnalysis` tables to only include blocks longer than 1 hour for the experiment `social0.2-aeon3`."
]
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -885,12 +868,12 @@
}
],
"source": [
"block_analysis.Block & experiment_key & 'block_duration_hr > 1'\n"
"block_analysis.Block & experiment_key & \"block_duration_hr > 1\""
]
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -990,16 +973,15 @@
}
],
"source": [
"block_analysis.BlockAnalysis & experiment_key & 'block_duration > 1'"
"block_analysis.BlockAnalysis & experiment_key & \"block_duration > 1\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A `Block` refers to a specific period of time, typically lasting around 3 hours, during which the reward rate for each patch is predefined to facilitate certain animal behaviors. \n",
"\n",
"Let's choose a specific block for in-depth analysis:"
"We can also use the SQL `LIKE` operator to filter records based on a pattern match. \n",
"Here, we filter the `Block` table by `block_start` to only include blocks that started on `2024-03-02`."
]
},
{
Expand Down Expand Up @@ -1107,12 +1089,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"To [`fetch()`](datajoint:docs/core/datajoint-python/0.14/query/fetch/#fetch) the [primary keys](datajoint:docs/core/datajoint-python/0.14/design/tables/primary/) of this specific _block_ in experiment `social0.2-aeon3`:"
"### Using primary keys\n",
"\n",
"We can [`fetch()`](datajoint:docs/core/datajoint-python/0.14/query/fetch/#fetch) the [primary keys](datajoint:docs/core/datajoint-python/0.14/design/tables/primary/) of the blocks matching the above query expression as a list of dictionaries."
]
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand All @@ -1128,22 +1112,19 @@
}
],
"source": [
"block_key = (block_analysis.Block & experiment_key & 'block_start LIKE \"%2024-03-02%\"').fetch(\"KEY\")\n",
"block_key"
"block_key = (\n",
" block_analysis.Block & experiment_key & 'block_start LIKE \"%2024-03-02%\"'\n",
").fetch(\"KEY\")\n",
"block_key\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Joining Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `BlockAnalysis` table aggregates data from multiple subjects and patches within a _block_: "
"The primary keys can then be used to retrieve specific block-level records in other [analysis tables that reference the `Block` table](target-aeon-dj-pipeline-analysis-fig).\n",
"\n",
"For instance, we can use the primary keys to retrieve the `BlockAnalysis` records associated with the blocks that started on `2024-03-02`."
]
},
{
Expand Down Expand Up @@ -1255,7 +1236,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"`BlockSubjectAnalysis` offers a detailed analysis focused on individual subjects, examining their interactions within a specific _block_:"
"We can also retrieve subject-specific analyses (`BlockSubjectAnalysis`), such as subjects' food patch preference (`BlockSubjectAnalysis.Preference`) and their interaction with the food patches(`BlockSubjectAnalysis.Patch`) for the same blocks."
]
},
{
Expand Down Expand Up @@ -1351,13 +1332,6 @@
"block_analysis.BlockSubjectAnalysis & block_key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Understanding subjects' patch `Preferences` based on time spent or distance traveled for this _block_:"
]
},
{
"cell_type": "code",
"execution_count": 20,
Expand Down Expand Up @@ -1533,16 +1507,9 @@
"block_analysis.BlockSubjectAnalysis.Preference & block_key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tracking the `Patch` interactions of each subject with different patches (areas of interests):\n"
]
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -1712,15 +1679,17 @@
}
],
"source": [
"# subject position\n",
"block_analysis.BlockSubjectAnalysis.Patch & block_key\n"
"# based on subject position\n",
"block_analysis.BlockSubjectAnalysis.Patch & block_key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By [joining (*)](datajoint:docs/core/datajoint-python/0.14/query/operators/#join-compatibility) the `Patch` and `Preference` part tables, data from both tables is consolidated, presenting a comprehensive view of patch-preference interactions."
"### Joining data\n",
"\n",
"To obtain a comprehensive view of the patch-preference interactions, we can join the `Patch` and `Preference` part tables using the [join operator `*`](datajoint:docs/core/datajoint-python/0.14/query/operators/#join-compatibility)."
]
},
{
Expand Down Expand Up @@ -1956,28 +1925,19 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fetching and Inspecting Patch Data"
"### Fetching and inspecting data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can fetch detailed patch data for the selected _block_ to analyze it further, potentially using pandas for advanced data manipulations:"
"Finally, data can be fetched as a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) using [fetch()](datajoint:docs/core/datajoint-python/0.14/query/fetch/#usage-with-pandas) with the `format=\"frame\"` argument."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"block_patch_data = (block_analysis.BlockAnalysis.Patch & block_key).fetch(format=\"frame\").reset_index()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -2099,20 +2059,16 @@
}
],
"source": [
"block_patch_data = (\n",
" (block_analysis.BlockAnalysis.Patch & block_key).fetch(format=\"frame\").reset_index()\n",
")\n",
"block_patch_data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This DataFrame structure makes it easy to perform additional analyses, visualizations, or statistical tests."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "aeon",
"language": "python",
"name": "python3"
},
Expand All @@ -2126,7 +2082,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
"version": "3.11.11"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 2db32ab

Please sign in to comment.