Upsdate tutorial markdown

datajoint · Oct 25, 2023 · 4190925 · 4190925
1 parent 69cef22
commit 4190925
Showing 1 changed file with 48 additions and 162 deletions.
diff --git a/notebooks/tutorial.ipynb b/notebooks/tutorial.ipynb
@@ -1125,50 +1125,6 @@
     "multielectrode probe. "
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'# Represent a physical probe with unique identification\\nprobe                : varchar(32)                  # unique identifier for this model of probe (e.g. serial number)\\n---\\n-> probe.ProbeType\\nprobe_comment=\"\"     : varchar(1000)                \\n'"
-      ]
-     },
-     "execution_count": 13,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "print(probe.Probe.describe())"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "# Represent a physical probe with unique identification\n",
-       "probe                : varchar(32)                  # unique identifier for this model of probe (e.g. serial number)\n",
-       "---\n",
-       "probe_type           : varchar(32)                  # e.g. neuropixels_1.0\n",
-       "probe_comment=\"\"     : varchar(1000)                # "
-      ]
-     },
-     "execution_count": 14,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "probe.Probe.heading"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": 15,
@@ -1293,7 +1249,7 @@
     }
    ],
    "source": [
-    "ephys.ProbeInsertion.describe()"
+    "print(ephys.ProbeInsertion.describe())"
    ]
   },
   {
@@ -1428,77 +1384,24 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Populate\n",
     "\n",
     "### Automatically populate tables\n",
     "\n",
-    "`ephys.EphysRecording` is the first table in the pipeline that can be populated automatically.\n",
-    "If a table contains a part table, this part table is also populated during the\n",
-    "`populate()` call. `populate()` takes several arguments including the a session\n",
-    "key. This key restricts `populate()` to performing the operation on the session\n",
-    "of interest rather than all possible sessions which could be a time-intensive\n",
-    "process for databases with lots of entries.\n",
+    "In DataJoint, the `populate()` method is a powerful feature designed to fill tables based on the logic defined in the table's `make` method. Here's a breakdown of its functionality:\n",
     "\n",
-    "Let's view the `ephys.EphysRecording` and its part table\n",
-    "`ephys.EphysRecording.EphysFile` and populate both through a single `populate()`\n",
-    "call."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "# Ephys recording from a probe insertion for a given session.\n",
-       "subject              : varchar(8)                   # \n",
-       "session_datetime     : datetime                     # \n",
-       "insertion_number     : tinyint unsigned             # \n",
-       "---\n",
-       "electrode_config_hash : uuid                         # \n",
-       "acq_software         : varchar(24)                  # \n",
-       "sampling_rate        : float                        # (Hz)\n",
-       "recording_datetime   : datetime                     # datetime of the recording from this probe\n",
-       "recording_duration   : float                        # (seconds) duration of the recording from this probe"
-      ]
-     },
-     "execution_count": 19,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "ephys.EphysRecording.heading"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "# Paths of files of a given EphysRecording round.\n",
-       "subject              : varchar(8)                   # \n",
-       "session_datetime     : datetime                     # \n",
-       "insertion_number     : tinyint unsigned             # \n",
-       "file_path            : varchar(255)                 # filepath relative to root data directory"
-      ]
-     },
-     "execution_count": 20,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "ephys.EphysRecording.EphysFile.heading"
+    "- **Automation**: Instead of manually inserting data into each table, which can be error-prone and time-consuming, `populate()` automates the insertion based on the dependencies and relationships already established in the schema.\n",
+    "\n",
+    "- **Dependency Resolution**: Before populating a table, `populate()` ensures all its dependencies are populated. This maintains the integrity and consistency of the data.\n",
+    "\n",
+    "- **Part Tables**: If a table has part tables associated with it, calling `populate()` on the main table will also populate its part tables. This is especially useful in cases like `ephys.EphysRecording` and its part table `ephys.EphysRecording.EphysFile`, as they are closely linked in terms of data lineage.\n",
+    "\n",
+    "- **Restriction**: The `populate()` method can be restricted to specific entries. For instance, by providing a `session_key`, we're ensuring the method only operates on the data relevant to that particular session. This is both efficient and avoids unnecessary operations on unrelated data.\n",
+    "\n",
+    "In the upcoming cells, we'll make use of the `populate()` method to fill the `ephys.EphysRecording` table and its part table. Remember, while this operation is automated, it's essential to understand the underlying logic to ensure accurate and consistent data entry.\n"
    ]
   },
   {
@@ -2131,26 +2034,6 @@
     "downstream processing. Let's view the attributes to get a better understanding. "
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 28,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'# Manual table for defining a clustering task ready to be run\\n-> ephys.EphysRecording\\n-> ephys.ClusteringParamSet\\n---\\nclustering_output_dir=\"\" : varchar(255)                 # clustering output directory relative to the clustering root data directory\\ntask_mode=\"load\"     : enum(\\'load\\',\\'trigger\\')       # \\'load\\': load computed analysis results, \\'trigger\\': trigger computation\\n'"
-      ]
-     },
-     "execution_count": 28,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "ephys.ClusteringTask.describe()"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": 29,
@@ -2187,7 +2070,7 @@
     "+ `paramset_idx` \n",
     "+ `task_mode` \n",
     "\n",
-    "The `paramset_idx` attribute is tracks\n",
+    "The `paramset_idx` attribute tracks\n",
     "your kilosort parameter sets. You can choose the parameter set using which \n",
     "you want spike sort ephys data. For example, `paramset_idx=0` may contain\n",
     "default parameters for kilosort processing whereas `paramset_idx=1` contains your custom parameters for sorting. This\n",
@@ -2215,15 +2098,6 @@
     ")"
    ]
   },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Notice we set the `task_mode` to `load`. Let's call populate on the `Clustering`\n",
-    "table in the pipeline."
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": 31,
@@ -2335,34 +2209,46 @@
     "\n",
     "In this tutorial, we will do some exploratory analysis by fetching the data from the database and creating a few plots.\n",
     "\n",
-    "## Query\n",
+    "## Querying Data\n",
+    "\n",
+    "DataJoint provides a powerful querying system, allowing you to retrieve and work with data stored in your database seamlessly. In this section, we'll explore the fundamental querying concepts.\n",
+    "\n",
+    "#### What is a Query?\n",
+    "\n",
+    "- A query is essentially a request for data. With DataJoint, you can craft specific queries to fetch data that meets your criteria from the database.\n",
+    "\n",
+    "#### The `fetch()` Method\n",
+    "\n",
+    "- The primary method for retrieving data from a DataJoint table is `fetch()`.\n",
+    "- **Default Behavior**: Without any arguments, `fetch()` returns a list of dictionaries. Each dictionary corresponds to an entry in the table.\n",
+    "  \n",
+    "#### The `fetch1()` Method\n",
+    "\n",
+    "- For tables with a single entry or when you're only interested in the first entry, use `fetch1()`.\n",
+    "- **Default Behavior**: It returns a dictionary of attributes for that one entry.\n",
+    "\n",
+    "#### Specific Attributes\n",
+    "\n",
+    "- Both `fetch()` and `fetch1()` can be made more specific by providing attributes.\n",
+    "- Example: `fetch1('fps')` will retrieve only the `fps` attribute from the first entry.\n",
+    "\n",
+    "#### Restricting Queries\n",
+    "\n",
+    "- Often, you don't want to fetch everything. Instead, you might want data related to a specific subject or session.\n",
+    "- DataJoint uses the `&` operator to restrict queries.\n",
+    "- Example: To get all session times for `subject5`, you might use:\n",
+    "  ```python\n",
+    "  subject1_times = (session.Session & \"subject = 'subject1'\").fetch(\"session_datetime\")\n",
+    "  ```\n",
     "\n",
-    "This section focuses on working with data that is already in the\n",
-    "database. \n",
+    "#### Fetching Primary Keys\n",
     "\n",
-    "DataJoint queries allow you to view and import data from the database into a python\n",
-    "variable using the `fetch()` method. \n",
+    "- Sometimes, you just need the primary keys of entries.\n",
+    "- Use the `fetch(\"KEY\")` syntax for this. For instance, `(session.Session).fetch(\"KEY\")`.\n",
     "\n",
-    "There are several important features supported by `fetch()`:\n",
-    "- By default, an empty `fetch()` imports a list of dictionaries containing all\n",
-    "  attributes of all entries in the table that is queried.\n",
-    "- **`fetch1()`**, on the other hand, imports a dictionary containing all attributes of\n",
-    "  one of the entries in the table. By default, if a table has multiple entries,\n",
-    "  `fetch1()` imports the first entry in the table.\n",
-    "- Both `fetch()` and `fetch1()` accept table attributes as an argument to query\n",
-    "  that particular attribute. For example `fetch1('fps')` will fetch the first\n",
-    "  value of the `fps` attribute if it exists in the table.\n",
-    "- Recommended best practice is to **restrict** queries by primary key attributes of the\n",
-    "  table to ensure the accuracy of imported data.\n",
-    "    - The most common restriction for entries in DataJoint tables is performed\n",
-    "      using the `&` operator. For example to fetch all session start times belonging to\n",
-    "      `subject1`, a possible query could be `subject1_sessions =\n",
-    "      (session.Session & \"subject = 'subject1'\").fetch(\"session_datetime\")`.  \n",
-    "- `fetch()` can also be used to obtain the primary keys of a table. To fetch the primary\n",
-    "  keys of a table use `<table_name>.fetch(\"KEY\")` syntax.\n",
+    "#### Let's Dive In!\n",
     "\n",
-    "Let's walk through these concepts of querying by moving from simple to more\n",
-    "complex queries."
+    "Now that we've established the basics, let's delve deeper into querying with some practical examples."
    ]
   },
   {