From 1d2ab3c238ef67de96591f42c0cde46b8eb663a4 Mon Sep 17 00:00:00 2001 From: Henry Schreiner Date: Mon, 10 Jan 2022 12:19:50 -0500 Subject: [PATCH] fix: some touchups before class --- notebooks/0 Intro.ipynb | 10 +- notebooks/1.1 Memory Model.ipynb | 110 +++++++++++-- notebooks/1.1b Intro to Classes.ipynb | 44 +++++ notebooks/1.2 Classes.ipynb | 207 +++++++++++++++++++----- notebooks/1.3 Logging.ipynb | 2 +- notebooks/2.2 Generators.ipynb | 15 +- notebooks/2.4 Context Managers.ipynb | 27 +++- notebooks/2.5 Static Typing.ipynb | 17 +- notebooks/2.6 Using Packages.ipynb | 114 +++++++++---- notebooks/2.7 Creating Packages.ipynb | 71 ++++++-- notebooks/3.6 Code Quality and CI.ipynb | 43 ++++- 11 files changed, 532 insertions(+), 128 deletions(-) diff --git a/notebooks/0 Intro.ipynb b/notebooks/0 Intro.ipynb index 9a98f33..0bd509c 100644 --- a/notebooks/0 Intro.ipynb +++ b/notebooks/0 Intro.ipynb @@ -38,9 +38,9 @@ "\n", "[![Henryiii's github stats](https://github-readme-stats.vercel.app/api?username=henryiii)](https://github.com/anuraghazra/github-readme-stats)\n", "\n", - "Most important link: \n", + "Most important link: \n", "\n", - "[Scikit-HEP](https://scikit-hep.org) admin, [scikit-build](https://github.com/scikit-build) admin, member of [IRIS-HEP](https://iris-hep.org). [PyPA](https://github.com/pypa) member.\n", + "[PyPA](https://github.com/pypa) member. [Scikit-HEP](https://scikit-hep.org) admin, [scikit-build](https://github.com/scikit-build) admin, member of [IRIS-HEP](https://iris-hep.org).\n", "\n", "### Projects\n", "\n", @@ -163,9 +163,7 @@ "source": [ "## Notebooks\n", "\n", - "We will be using notebooks today. Notebooks are fantastic for teaching, quick experimentation, for developing, or for driving a final analysis product. They are not for serious programming - that happens in `.py` files. Once you write something and get it working, move it to a `.py` file and add a test. Then import it into your notebook!\n", - "\n", - "We'll be in JupyterLab 3; it's fairly new and there have been a few bugs related to code completion. Hopefully everything will be smoothed out soon. (IPython and the Jedi library seem to be at odds)." + "We will be using notebooks today. Notebooks are fantastic for teaching, quick experimentation, for developing, or for driving a final analysis product. They are not for serious programming - that happens in `.py` files. Once you write something and get it working, move it to a `.py` file and add a test. Then import it into your notebook!" ] }, { @@ -174,7 +172,7 @@ "source": [ "## Python version\n", "\n", - "Also, everything will be in Python 3.9; I'll try to point out when something is newer than 3.7, but we'll be running in 3.9. [NEP 29](https://numpy.org/neps/nep-0029-deprecation_policy.html) mandates that data science libraries currently support 3.8+ (support dropped 42 months after release), while [general Python EOL is 3.7+](https://endoflife.date/python) (5 year support window).\n", + "Everything (minus pattern matching) will be in Python 3.9; I'll try to point out when something is newer than 3.7, but we'll be running in 3.9. [NEP 29](https://numpy.org/neps/nep-0029-deprecation_policy.html) mandates that data science libraries currently support 3.8+ (support dropped 42 months after release), while [general Python EOL is 3.7+](https://endoflife.date/python) (5 year support window).\n", "\n", "Key upcoming dates:\n", "\n", diff --git a/notebooks/1.1 Memory Model.ipynb b/notebooks/1.1 Memory Model.ipynb index 1704437..71ecebf 100644 --- a/notebooks/1.1 Memory Model.ipynb +++ b/notebooks/1.1 Memory Model.ipynb @@ -128,13 +128,33 @@ "outputs": [], "source": [ "def f(x: float) -> float:\n", - " \"I am a square!\"\n", - " return x ** 2\n", - "\n", - "\n", + " \"\"\"I am a square!\"\"\"\n", + " return x ** 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The help of an object includes its signature and its docstring:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ "help(f)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can see a list of methods (or use `` in iPython or the Python REPL, but underscored methods often require you start by typing an underscore first):" + ] + }, { "cell_type": "code", "execution_count": null, @@ -144,6 +164,13 @@ "dir(f)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The inspect module is a built-in module that can provide a lot of other information:" + ] + }, { "cell_type": "code", "execution_count": null, @@ -190,12 +217,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "> Try adding different keyword arguments to `rich.inspect`." + "> Try adding different keyword arguments to `rich.inspect`. Shift-tab in IPython to see options. `methods=True` on the int, for example." ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ "## Mutability\n", "\n", @@ -222,7 +251,8 @@ "source": [ "> Add an `f` before the string to see the answer (remove the \"=\" inside the brackets for Python <3.8).\n", "\n", - "Now, let's try a mutable object. Lists, sets, and dicts are the most notable mutable objects in `builtins`." + "`bool`, `int`, `float`, `str`, `bytes`, `tuple`, and `frozenset` are immutable built-ins in Python. Singletons (like `None`, `Ellipsis`, `True`, and `False`) are immutable, too.\n", + "Now, let's try a mutable object. `list`, `set`, `dict`, and generic classes/objects are mutable." ] }, { @@ -243,14 +273,42 @@ "source": [ "Why?\n", "\n", - "The problem was that when the object was immutable, the inplace operator `+=` actually behaved like `x = x + 1`, which is a new object. When it was a mutable object (a list), then it was able to change it in-place." + "The problem was that when the object was immutable, it does not define in-place operations. Inplace operators like `+=` actually fall back to out-of-place operations and assignment, like `x = x + 1`; they create new objects. When it was a mutable object (a list), that does have in-place operations, so it was able to change it in-place." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Advanced aside: Why did this work?\n", + "Here's a quick example, showing the fall-back behavior of inplace operations if `__iadd__` is missing:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class Addable:\n", + " def __init__(self, value: int) -> None:\n", + " self.value = value\n", + "\n", + " # Leaving off the return type to avoid discussing it here\n", + " def __add__(self, number: int):\n", + " return Addable(self.value + number)\n", + "\n", + "\n", + "x = Addable(3)\n", + "y = x\n", + "x += 4\n", + "x is y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Advanced aside: Why did list inplace addition work?\n", "\n", "Quick aside for advanced Pythonistas, this is tricky. `x[0]` returns an int. So why is this any different than before? Let's explore, using mock:" ] @@ -273,7 +331,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can see that this has special support for this syntax, it pulls out the item inside, and sets it, then stores it. There are other special syntax treatments in Python as well, all designed to make the language more friendly and powerful:\n", + "You can see that Python has special support for this syntax, it pulls out the item inside, and sets it, then stores it. There are other special syntax treatments in Python as well, all designed to make the language more friendly and powerful:\n", "\n" ] }, @@ -309,7 +367,7 @@ "x.y = 3 # __setattr__\n", "```\n", "\n", - "That's it. These are _not_ valid assignments:\n", + "That's it. These are _not_ valid assignments in Python (they are valid in the C family, for example):\n", "\n", "```python\n", "x(y) = 1 # There is no assignment for __call__\n", @@ -329,7 +387,7 @@ "source": [ "## Scope\n", "\n", - "Python has scope, but not in very many places. Functions and class definitions create scope, and modules have scope. (Generator expressions now have scope too). That's about it. So you can write this:" + "Python has the concept of scope, but not in very many places. Functions and class definitions create scope, and modules have scope. (Generator expressions have scope too in Python 3, though they didn't before). That's about it. So you can write this:" ] }, { @@ -449,7 +507,33 @@ "\n", "**Always pass variables out explicitly, be cautious with using anything not clearly global in a function.**\n", "\n", - "This means you should never see `global`, as it's only needed for setting variables. Global read-only variables (the only safe kind) are sometimes ALL_CAPS. (Hint: for typed code, you can add `Final`)." + "This means you should never see `global`, as it's only needed for setting variables. Global read-only variables (the only safe kind) are sometimes ALL_CAPS. (Hint: for typed code, you can add `Final`).\n", + "\n", + "For example, a better way to write the above function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = 1\n", + "\n", + "\n", + "def f(x: int) -> int:\n", + " return 2\n", + "\n", + "\n", + "x = f(x)\n", + "print(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now it's completely explicit, _at the call site_, that it's going to modify `x`. And, as a bonus, you could even use it on other variables now, not just `x`!" ] }, { diff --git a/notebooks/1.1b Intro to Classes.ipynb b/notebooks/1.1b Intro to Classes.ipynb index dcf8e94..6e122fe 100644 --- a/notebooks/1.1b Intro to Classes.ipynb +++ b/notebooks/1.1b Intro to Classes.ipynb @@ -238,6 +238,50 @@ "source": [ "> Aside: This is not at all how you'd normally call this, but this is useful later when reasoning about how other features work, and for a sneaky trick: _any object_ can be passed in, not just an instance of the class (was not true in Python 2). In this case, the object simply needs to have a `.msg` attribute to be used in `Simple.function`. If you are tempted to write a library that provides both free functions and methods to do the same thing, this means they can be one and the same, avoiding duplication for you and learning two APIs for your users." ] + }, + { + "cell_type": "markdown", + "id": "1fc445e7-90c5-4df7-aa05-8e4ff3cde464", + "metadata": {}, + "source": [ + "## Dataclasses\n", + "\n", + "If you don't have much familiarity with classes, you should really consider learning dataclasses pretty early on. This is a simpler, less-boilerplate way to define classes that aligns better with common usage. We will discuss dataclasses in the next section, and we will cover the special syntax bits used later on, but here's an example of the above class as a dataclass:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0595d8ed-4737-419e-95b3-a61432b673c3", + "metadata": {}, + "outputs": [], + "source": [ + "from dataclasses import dataclass\n", + "\n", + "\n", + "@dataclass\n", + "class SimpleDC:\n", + " msg: str\n", + "\n", + " def function(self) -> None:\n", + " print(self.msg)" + ] + }, + { + "cell_type": "markdown", + "id": "81728dc9-a11a-406f-bfe4-27eb6e5bba97", + "metadata": {}, + "source": [ + "The dataclass decorator `@dataclass` transforms the class-level attribute `msg: str` into an `__init__` method and an instance attribute. And it also adds several things we did not add, like a nice `__repr__`, Python 3.10 pattern matching support, support for conversion to other types (like a dict), comparison support (`__eq__`), and more!" + ] + }, + { + "cell_type": "markdown", + "id": "51f6828c-68a9-41de-9ff9-8f5c15421e60", + "metadata": {}, + "source": [ + "Dataclasses were added in 3.7 but backported to 3.6 as a library. You can also use the library that dataclasses were designed from, which is still much more powerful: attrs." + ] } ], "metadata": { diff --git a/notebooks/1.2 Classes.ipynb b/notebooks/1.2 Classes.ipynb index 64bfb92..ad52bcd 100644 --- a/notebooks/1.2 Classes.ipynb +++ b/notebooks/1.2 Classes.ipynb @@ -53,18 +53,32 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "jp-MarkdownHeadingCollapsed": true, + "tags": [] + }, + "source": [ + "## How classes work (Advanced, for self reading)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "jp-MarkdownHeadingCollapsed": true, + "tags": [] + }, "source": [ - "## How classes work (Advanced, for self reading)\n", - "\n", "Classes are basically bags of objects with a few special features. So what separates them from a dict?" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "jp-MarkdownHeadingCollapsed": true, + "tags": [] + }, "source": [ - "### The Type" + "### The type" ] }, { @@ -120,10 +134,21 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "jp-MarkdownHeadingCollapsed": true, + "tags": [] + }, + "source": [ + "### The attribute names" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "jp-MarkdownHeadingCollapsed": true, + "tags": [] + }, "source": [ - "### The attribute names\n", - "\n", "There are several different ways to name attributes of your class or instance:\n", "\n", "* `.some_item`: This is a publicly accessible attribute, users are expected to see it / use it. It could be a method, instance member, or a property - properties can be settable or read-only.\n", @@ -133,12 +158,20 @@ "* Extra: `._some_item_`: If you need a \"protocol\", that is, a method that users should implement but should not directly call, then single underscores on both sides is an occasional convention that is not discouraged by Python authors, unlike adding new dunder methods. `_repr_html_` in IPython is an example of this convention in use." ] }, + { + "cell_type": "markdown", + "metadata": { + "jp-MarkdownHeadingCollapsed": true, + "tags": [] + }, + "source": [ + "### MRO (Method Resolution Order)" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### MRO (Method Resolution Order)\n", - "\n", "So, if you have an object `obj`, and you call `obj.something`, where does `something` come from, and is it a function or a variable? Hopefully, you know the answer to that last part of the question: there is no difference between a function and a variable; it's an object, and objects might be callable. The lookup order is called the MRO, or Method Resolution Order. It is:\n", "\n", "1. **The object itself.** If the object has a `\"something\"` in `__slots__` or `__dict__`, then that's what you get.\n", @@ -158,12 +191,20 @@ "F.__mro__" ] }, + { + "cell_type": "markdown", + "metadata": { + "jp-MarkdownHeadingCollapsed": true, + "tags": [] + }, + "source": [ + "### Descriptors" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Descriptors\n", - "\n", "When you find a object, what do you actually return? Python has an descriptor system that is called when you access a class member. If the object has a `__get__` method (or `__set__` if you are setting something), then that is called with the instance as the argument and the result is returned. This is how methods work - (all) functions have a `__get__`, so when they are inside a class, they get called with `self` first; they return a \"bound\" method, one that will always include the instance in the call. This is how properties and staticmethod/classmethod work too, they customize `__get__`. There is also a `__delete__`. While it's technically part of class creation, `__set_name__` is also very useful for descriptors. " ] }, @@ -221,30 +262,57 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "jp-MarkdownHeadingCollapsed": true, + "tags": [] + }, + "source": [ + "### Special methods are accessed only on the class" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "jp-MarkdownHeadingCollapsed": true, + "tags": [] + }, "source": [ - "### Special methods are accessed only on the class\n", - "\n", "Special methods are slightly special. If you perform an action that calls a special method, _it explicitly calls the class_, not the instance. So for example, `f(3)` calls `F.__call__(f, 3)`, _not_ `f.__call__(3)`. This was mostly an optimization, but it means you can't provide special behavior for an instance of a class that is not shared by the rest of the class. Which is probably good." ] }, + { + "cell_type": "markdown", + "metadata": { + "jp-MarkdownHeadingCollapsed": true, + "tags": [] + }, + "source": [ + "### Accessing something dynamic" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Accessing something dynamic\n", - "\n", "Python lets you control this lookup with two special methods. One is `__getattr__`, which gets called if no attribute is found using the normal lookup method described above. You could use this to make a class that has dynamic methods beyond what it normally has. The second method is `__getattribute__`, which is called before anything else is looked up - you can literally do anything here, but at the cost that you have to then implement or revert to normal attribute lookup yourself to make the class usable, and simply getting at things like the data members in the class is a pain, since it will call itself. This should only be used in emergencies!\n", "\n", "For setting, you don't need this split, so there's just `__setattr__`." ] }, + { + "cell_type": "markdown", + "metadata": { + "jp-MarkdownHeadingCollapsed": true, + "tags": [] + }, + "source": [ + "### `__dict__` vs. `__slots__`" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### `__dict__` vs. `__slots__`\n", - "\n", "Most user classes are `__dict__` classes, which store a dict on every instance. This is partially because it's what you get by default, and partially because it fits with the philosophy of Python." ] }, @@ -346,8 +414,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Data + functions\n", - "\n", + "### Data + functions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "They allow you to bundle data with the functions that run on them. In a language like Python without (much) type overloading, this is important for design (and good for tab completion)." ] }, @@ -363,7 +436,10 @@ " self.y = y\n", "\n", " def mag(self) -> float:\n", - " return (self.x ** 2 + self.y ** 2) ** 0.5" + " return (self.x ** 2 + self.y ** 2) ** 0.5\n", + "\n", + "\n", + "Vector(3, 4)" ] }, { @@ -389,7 +465,10 @@ " y: float\n", "\n", " def mag(self) -> float:\n", - " return (self.x ** 2 + self.y ** 2) ** 0.5" + " return (self.x ** 2 + self.y ** 2) ** 0.5\n", + "\n", + "\n", + "Vector(3, 4)" ] }, { @@ -413,8 +492,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Functors\n", - "\n", + "### Functors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "I just told you it was a bad idea to set something outside your local scope, and often not even a good idea to just view something outside the local scope. So how do you write something that has scope? Use a class as a functor! Compare this:" ] }, @@ -579,8 +663,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### DSL (Domain Specific Language)\n", - "\n", + "### eDSL (embedded Domain Specific Language)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "You can customize almost every behavior of a class to make them very natural for whatever you are doing." ] }, @@ -591,7 +680,7 @@ "outputs": [], "source": [ "class Path(str):\n", - " def __truediv__(self, other):\n", + " def __truediv__(self, other: str):\n", " return self.__class__(f\"{self}/{other}\")" ] }, @@ -610,7 +699,7 @@ "source": [ "> Just in case you want to make a Path class like the one above - don't, use pathlib instead. We could have written `self.__class__` as `Path`, but then this would not subclass correctly and besides, using the class name inside the class is ugly and makes it harder to rename. If you return a normal string, then you can't keep applying `/`.\n", ">\n", - "> Also, I left off type annotations for this example, as to do them properly I need to use a TypeVar." + "> Also, I left off the return type annotation for this example, as to do them properly I need to use a TypeVar." ] }, { @@ -629,7 +718,7 @@ "outputs": [], "source": [ "class PathMixin:\n", - " def __truediv__(self, other):\n", + " def __truediv__(self, other: str):\n", " return self.__class__(f\"{self}/{other}\")\n", "\n", "\n", @@ -640,6 +729,13 @@ "Path(\".\") / \"myfile\" / \"program.py\"" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Statically typing mixins requires a Protocol, so I've left it off of this example too." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -653,8 +749,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Design considerations\n", - "\n", + "## Design considerations" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "Object oriented programming has been known to make it easy to create spaghetti messes of code. The following tips will help you not fall into the trap and end up with poorly designed code. \"Make it a class\", by itself, will not magically make your code better." ] }, @@ -662,17 +763,27 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Modular design\n", - "\n", - "You should break your code into _concepts_, and classes should help map those concepts to the computer. Different components of a detector might be classes, with an instance for each component. A vector, a URL, a remote data source, etc. You might have a class representing a unit of an analysis, and use either inheritance (okay) or a protocol (better, reduces coupling) to have real data processing vs. simulation generation. Etc." + "### Modular design" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You should break your code into _concepts_, and classes should help map those concepts to the computer. Different components of a detector might be classes, with an instance for each component. A vector, a URL, a remote data source, etc. You might have a class representing a unit of an analysis, and use either inheritance (okay) or a Protocol (better, reduces coupling) to have real data processing vs. simulation generation. Etc." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Unit test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Unit test\n", - "\n", "We will mention testing later, and there is another course on it, but I'm focusing on the word _unit_. You need to be able to run your classes standalone, in unit tests, and not only in place. This keeps the design modular - you will resist the desire to make a class that needs a class to make another class inside a class that only works with the file that sits on your work laptop, etc. And you'll be free to redesign parts without having to worry about everything breaking down.\n", "\n", "Always use pytest for unit testing." @@ -682,9 +793,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Inheritance\n", - "\n", - "You should be very cautious with Inheritance. It's a very powerful tool and should not be scary, but it's _so_ easy to misuse. These are the problems:\n", + "### Inheritance" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You should be very cautious with inheritance. It's a very powerful tool and should not be scary, but it's _so_ easy to misuse. These are the problems:\n", "\n", "* It is often in direct contrast with modular design (you are linking things via inheritance) and hard to truly unit test (one class might not be testable without the other).\n", "* It makes it harder to reason about where a method comes from when reading and debugging. Is it this class? A parent?\n", @@ -696,8 +812,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Functional design tips\n", - "\n", + "### Functional design tips" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "**Classes with multiple states are bad**: If you have classes that init, then call some function like `.initialize()` or `.read_data()`, this is bad design. It's better to do the read in the `__init__` or as a `classmethod`. If you have two distinct states, then you have to design every method to be aware of both possible states! It's much better to either do the init at the beginning, or have two classes, one for each state.\n", "\n", "**Never assign new members outside of __init__**: This is similar to the above point; if you assign a new member after init, then you now have two states; a \"without\" and a \"with\", so now you have to check `hasattr` in every other method, since you can't be sure what order the methods will be called in. Better to at least assign None in `__init__`. Or use `dataclasses`/`attrs`.\n", @@ -718,7 +839,7 @@ "\n", "The only reason the first example could be better is due to memory usage/copying, which usually isn't a problem in Python. If it's an error to call `cleanup()` before `init()`, you can enforce that statically via classes in the second example, while the first example will just crash at runtime.\n", "\n", - "**Support static design wherever possible**: We will cover this later, but if you add things that are not statically sound, you probably should redesign. Simply supporting static typing tends to push you toward better design, and away from problems like those listed above.\n" + "**Support static design wherever possible**: We will cover this later, but if you add things that are not statically sound, you probably should redesign. Simply supporting static typing tends to push you toward better design, and away from problems like those listed above." ] } ], diff --git a/notebooks/1.3 Logging.ipynb b/notebooks/1.3 Logging.ipynb index 5bb003e..154088e 100644 --- a/notebooks/1.3 Logging.ipynb +++ b/notebooks/1.3 Logging.ipynb @@ -92,7 +92,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If you need more from your logging, check out [structlog](https://www.structlog.org)!" + "If you need more from your logging, check out [structlog](https://www.structlog.org)! Also [rich](https://rich.readthedocs.io) can print beautiful logs. (And yes, you can combine structlog and rich!)" ] } ], diff --git a/notebooks/2.2 Generators.ipynb b/notebooks/2.2 Generators.ipynb index 26572c8..3561b48 100644 --- a/notebooks/2.2 Generators.ipynb +++ b/notebooks/2.2 Generators.ipynb @@ -82,12 +82,21 @@ " yield 3" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A function that has at least one `yield` in it creates a factory function that returns a generator (iterator)." + ] + }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], - "source": [] + "source": [ + "range4()" + ] }, { "cell_type": "code", @@ -172,7 +181,7 @@ "outputs": [], "source": [ "r = range(4)\n", - "list(r), list(r)" + "print(list(r), list(r))" ] }, { @@ -230,7 +239,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You might be tempted to place a loop inside the generator with a yield (`for item in middle_two(): yield item`), but `yield from` is simpler and also works correctly with generators." + "You might be tempted to place a loop inside the generator with a yield (`for item in middle_two(): yield item`), but `yield from` is simpler and also works correctly with generators (next section)." ] }, { diff --git a/notebooks/2.4 Context Managers.ipynb b/notebooks/2.4 Context Managers.ipynb index ab5c208..4729ae0 100644 --- a/notebooks/2.4 Context Managers.ipynb +++ b/notebooks/2.4 Context Managers.ipynb @@ -8,7 +8,7 @@ "\n", "Yes! Our journey is complete, we are where I wanted to be. Context managers are one of my favorites, and a little underused, especially in user code, when they are really easy to both write and use (while decorators, for comparison, are really easy to use but a bit tricky to write). A context manager has a specific purpose.\n", "\n", - "The first is what I call \"action at a distance\". It lets you schedule an action for later that is sure to always happen (unless you get a segfault, or exit a really nasty way). This is likely the most famous context manager:\n", + "A context manager is used for what I call \"action at a distance\". It lets you schedule an action for later that is sure to always happen (unless you get a segfault, or exit a really nasty way). This is likely the most famous context manager:\n", "\n", "```python\n", "with open(...) as f:\n", @@ -24,17 +24,25 @@ "metadata": {}, "outputs": [], "source": [ - "import contextlib\n", - "\n", + "import contextlib" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ "with contextlib.suppress(ZeroDivisionError):\n", - " 1 / 0" + " 1 / 0\n", + " print(\"This is never reached\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "But the real star of contextlib is `contextmanager`, which is a decorator that makes writing context managers really easy. Let's try one of my favorites, a timer context manager:" + "But the real star of contextlib is `contextmanager`, which is a decorator that makes writing context managers really easy. You use \"yield\" to break the before and after code. Let's try one of my favorites, a timer context manager:" ] }, { @@ -72,7 +80,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As an extra bonus, `contextmanager` uses `ContextDecorator`, so the objects it makes can also be used as Decorators! [Pretty much everything](https://docs.python.org/3/library/contextlib.html) in the `contextlib` module that does not have the word `async` in it is worth learning. `contextlib.closing` turns an object with a `.close()` into a context manager, and `contextlib.ExitStack` lets you nest context managers without eating up massive amounts of whitespace." + "As an extra bonus, `contextmanager` uses `ContextDecorator`, so the objects it makes can also be used as Decorators!" ] }, { @@ -98,6 +106,13 @@ "**Just a quick word on this**: if you are coming from a language like JavaScript or Ruby, you might be thinking these look like blocks/Procs/lambdas. They are not; they are unscoped, and you cannot access the code inside the with block from the context manager (unlike a decorator, too). So you cannot create a \"run this twice\" context manager, for example. They are only for action-at-a-distance." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Pretty much everything](https://docs.python.org/3/library/contextlib.html) in the `contextlib` module that does not have the word `async` in it is worth learning. `contextlib.closing` turns an object with a `.close()` into a context manager, and `contextlib.ExitStack` lets you nest context managers without eating up massive amounts of whitespace." + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/notebooks/2.5 Static Typing.ipynb b/notebooks/2.5 Static Typing.ipynb index e6ee86d..f62f651 100644 --- a/notebooks/2.5 Static Typing.ipynb +++ b/notebooks/2.5 Static Typing.ipynb @@ -56,7 +56,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "No. It does *nothing* at runtime, except store the object. And in the upcoming Python 3.11 (or 3.7+ with `from __future__ import annotations`), it doesn't even store the actual object, just the string you type here, so then anything that can pass the Python parser is allowed here.\n", + "No. It does *nothing* at runtime, except store the object. And in the upcoming Python 3.11 or 3.12 (or 3.7+ with `from __future__ import annotations`), it doesn't even store the actual object, just the string you type here, so then anything that can pass the Python parser is allowed here.\n", "\n", "It is not useless though! For one, it helps the reader. Knowing the types expected really gives you a much better idea of what is going on and what you can do and can't do.\n", "\n", @@ -104,9 +104,10 @@ "outputs": [], "source": [ "%%save_and_run mypy\n", - "from typing import Optional\n", + "from __future__ import annotations # Python 3.7+\n", "\n", - "def f(x: Optional[int]) -> Optional[int]:\n", + "\n", + "def f(x: int | None) -> int | None:\n", " return x * 5\n", "\n", "f(4)" @@ -195,6 +196,7 @@ "%%save_and_run mypy --strict\n", "from typing import Union, List\n", "\n", + "\n", "# Generic types take bracket arguments\n", "def f(x: int) -> List[int]:\n", " return list(range(x))\n", @@ -224,9 +226,9 @@ "outputs": [], "source": [ "%%save_and_run mypy --strict\n", - "\n", "from __future__ import annotations\n", "\n", + "\n", "def f(x: int) -> list[int]:\n", " return list(range(x))\n", "\n", @@ -278,6 +280,13 @@ " ..." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Yes, the `...` is actually part of the code here; it's conventional to use it instead of `pass` for typing." + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/notebooks/2.6 Using Packages.ipynb b/notebooks/2.6 Using Packages.ipynb index 481b746..ead1375 100644 --- a/notebooks/2.6 Using Packages.ipynb +++ b/notebooks/2.6 Using Packages.ipynb @@ -52,8 +52,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Safe libraries\n", - "\n", + "### Safe libraries" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "There are likely a _few_ libraries (possibly one) that you just have to install globally. Go ahead, but be careful (and always use your system package manager instead if you can, like [`brew` on macOS](https://brew.sh) or the Windows ones - Linux package managers tend to be too old to use for Python libraries).\n", "\n", "Ideas for safe libraries: the other libraries you see listed in this lesson! It's likely better than bootstrapping them. In fact, you can get away with just one:" @@ -63,8 +68,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### pipx: pip for executables!\n", - "\n", + "### pipx: pip for executables!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "If you are installing an \"application\", that is, it has a script end-point and you don't expect to import it, *do not use pip*; use [pipx](https://pypa.github.io/pipx/). It will isolate it in a virtual environment, but hide all that for you, and then you'll just have an application you can use with no global/user side effects!\n", "\n", "```bash\n", @@ -81,8 +91,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Directly running applications\n", - "\n", + "#### Directly running applications" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "Pipx also has a very powerful feature: you can install and run an application in a temporary environment!\n", "\n", "For example, this works just as well as the second two lines above:\n", @@ -99,12 +114,12 @@ "pipx run build\n", "```\n", "\n", - "> This is great for CI! Pipx is installed by default in GitHub Actions; you do not need `actions/setup-python` to run it.\n", + "> This is great for CI! Pipx is installed by default in GitHub Actions (GHA); you do not need `actions/setup-python` to run it.\n", "\n", - "If the command and the package have different names, then you have to write this with a `--spec`. Currently, you also have to use spec to pin a version.\n", + "If the command and the package have different names, then you have to write this with a `--spec`. ~~Currently, you also have to use spec to pin a version.~~ (Fixed in pipx 0.17+, not yet out on GHA though)\n", "\n", "```bash\n", - "pipx run --spec cibuildwheel==1.12.0 cibuildwheel --platform linux\n", + "pipx run --spec cibuildwheel==2.3.1 cibuildwheel --platform linux\n", "```" ] }, @@ -112,8 +127,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Environment tools\n", - "\n", + "### Environment tools" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "There are other tools we are about to talk about, like `virtualenv`, `poetry`, `pipenv`, `nox`, `tox`, etc. that you could also install with `pip` (or better yet, with `pipx`), and are _not too_ likely to interfere or break down if you use `pip`. But keep it to a minimum or use `pipx`." ] }, @@ -121,12 +141,27 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Nox and Tox\n", - "\n", - "You can also use a task runner tool like `nox` or `tox`. These create and manage virtual environment for each task (called sessions in `nox`). This is a very simple way to avoid making and entering an environment, and is great for less common tasks, like scripts and docs.\n", - "\n", - "### Python launcher\n", - "\n", + "### Nox and Tox" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also use a task runner tool like `nox` or `tox`. These create and manage virtual environment for each task (called sessions in `nox`). This is a very simple way to avoid making and entering an environment, and is great for less common tasks, like scripts and docs." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Python launcher" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "The Python launcher for Unix (a Rust port of the one bundled with Python on Windows by a Python core developer) supports virtual environments in a `.venv` folder. So if you make a virtual environment with `python -m venv .venv` or `virtualenv .venv`, then you can just run `py ` instead of `python ` and it uses the virtual environment for you. This feature has not been back-ported to the Windows version yet." ] }, @@ -134,8 +169,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Environments\n", - "\n", + "## Environments" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "There are several environment systems available for Python, and they generally come in two categories. The Python Packaging Authority supports PyPI (Python Package Index), and all the systems except one build on this (usually by pip somewhere). The lone exception is Conda, which has a completely separate set of packages (often but not always with matching names)." ] }, @@ -143,8 +183,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Environment specification\n", - "\n", + "### Environment specification" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "All systems have an environment specification, something like this:\n", "\n", "```\n", @@ -169,8 +214,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Locking an environment\n", - "\n", + "### Locking an environment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "But now you want to share your environment with someone else. But let's say `rich` updated and now something doesn't work. You have a working environment (until you update), but your friend does not, theirs installed broken (this just happened to me with `IPython` and `jedi`, by the way). How do you recover a working version without going back to your computer? With a lock file! This would look something like this:\n", "\n", "```\n", @@ -187,8 +237,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Dev environments or Extras\n", - "\n", + "### Dev environments or Extras" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "Some environment tools have the idea of a \"dev\" environment, or optional components to the environment that you can ask for. Look for them wherever fine environments are made.\n", "\n", "When you install a package via pip or any of the (non-locked) methods, you can also ask for \"extras\", though you have to know about them beforehand. For example, `pip install rich[jupyter]` will add some extra requirements for interacting with notebooks. *These add requirements only*, you can't change the package with an extra." @@ -198,8 +253,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Conda environments\n", - "\n", + "### Conda environments" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "If you use Conda, the environment file is called `environment.yaml`. The one we are using can be seen here:" ] }, diff --git a/notebooks/2.7 Creating Packages.ipynb b/notebooks/2.7 Creating Packages.ipynb index 0e0d4b9..8cc7de7 100644 --- a/notebooks/2.7 Creating Packages.ipynb +++ b/notebooks/2.7 Creating Packages.ipynb @@ -54,13 +54,25 @@ }, { "cell_type": "markdown", - "id": "457cb2bc-7eea-4cb2-bebf-ab68ee30a8e5", + "id": "82a23486-3fa4-4a9e-ae6f-b0254b67e551", + "metadata": {}, + "source": [ + "## Distributions" + ] + }, + { + "cell_type": "markdown", + "id": "d0fe488c-496e-4ad9-8957-15fc7ccc7138", + "metadata": {}, + "source": [ + "### Wheel: fast and simple" + ] + }, + { + "cell_type": "markdown", + "id": "0369ba7a-19aa-449c-9d1c-6112016bc400", "metadata": {}, "source": [ - "## Distributions\n", - "\n", - "### Wheel: fast and simple\n", - "\n", "A wheel is just a normal zipped file with the extension `.whl`. It contains folders that get copied to specific locations, and a metadata folder.\n", "\n", "It _does not_ contain `setup.py`/`setup.cfg`/`pyproject.toml`.\n", @@ -79,21 +91,34 @@ }, { "cell_type": "markdown", - "id": "dfc5e3cf-1083-439f-8f32-74465b49745f", + "id": "1c476e05-fb9c-491d-8c1b-8f0439087c50", + "metadata": {}, + "source": [ + "### SDist: Source distribution" + ] + }, + { + "cell_type": "markdown", + "id": "df3a7ace-7fd4-4da6-9b2a-10e9be02671b", "metadata": {}, "source": [ - "### SDist: Source distribution\n", - "\n", "This is a `.tar.gz` file holding the files needed to make a wheel. It is often a subset of the files in the GitHub repo, though sometimes it contains generated files, like `version.py` or maybe Cython/SWIG generated source files. If there is no matching wheel (only for projects with binary components, in general), then pip gets the SDist and builds/installs manually." ] }, { "cell_type": "markdown", - "id": "5db28d41-3187-47b8-8970-874ff8b95934", + "id": "735fd27b-1e24-4937-82c4-27de73870663", "metadata": {}, "source": [ - "## PDM/Flit/Poetry: A breath of fresh air\n", - "\n", + "## PDM/Flit/Poetry: A breath of fresh air" + ] + }, + { + "cell_type": "markdown", + "id": "7abb1008-2ec5-4631-aef0-455e3d700cc5", + "metadata": {}, + "source": [ + "See for a complete setup!\n", "\n", "Let's look at an all-in-one solution: PDM. It is a bit younger than Poetry, the current leader of all-in-one solutions, but it follows standards much better. There are some caveats:\n", "\n", @@ -195,11 +220,17 @@ }, { "cell_type": "markdown", - "id": "92b88e4b-3fab-4b4d-9177-c494f3a19d49", + "id": "a0f2ff90-48f9-400e-b04e-0a0a9c478153", + "metadata": {}, + "source": [ + "## Setuptools: Classic, powerful, verbose" + ] + }, + { + "cell_type": "markdown", + "id": "197d8a4c-0e61-42b1-bbaa-ce4c2107dd74", "metadata": {}, "source": [ - "## Setuptools: Classic, powerful, verbose\n", - "\n", "The most powerful (and originally, forced by pip) tool is setuptools. This is a collection of hacks built on top of distutils, which is a collections of hacks to build packages (which was the standard library tool that is now deprecated and may be removed in Python 3.12). There are some awful examples around on using it, so look at for a proper example.\n", "\n", "The short version:\n", @@ -217,11 +248,17 @@ }, { "cell_type": "markdown", - "id": "02e8423a-e33d-4323-8eda-72aa09efa6f8", + "id": "4fcb46d9-c26c-4492-b2d2-a0315a375c0f", + "metadata": {}, + "source": [ + "## Flit: Lightweight, simple" + ] + }, + { + "cell_type": "markdown", + "id": "bb0f745b-e8f0-497c-b57f-7d0aaf758014", "metadata": {}, "source": [ - "## Flit: Lightweight, simple\n", - "\n", "Flit is great for simple projects that don't need all the bells and wistles. Ironically, it's currently more stable that setuptools is or will be till Python 3.12, since setuptools is fighting through the distutils deprecation process. The PyPA is likely to start moving some core packages to using Flit. Short guide for Flit:\n", "\n", "* Consider using the flit command line tool for a streamlined experience (though you don't need to, and I don't)\n", diff --git a/notebooks/3.6 Code Quality and CI.ipynb b/notebooks/3.6 Code Quality and CI.ipynb index 6e1c018..a113491 100644 --- a/notebooks/3.6 Code Quality and CI.ipynb +++ b/notebooks/3.6 Code Quality and CI.ipynb @@ -11,8 +11,20 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Pre-commit\n", - "\n", + "For more information, please see [Scikit-HEP/developer](https://scikit-hep.org/developer), which covers this in much more detail!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pre-commit" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "One of my favorite tools is [pre-commit](https://pre-commit.com). It allows you to drive almost any \"fixer\" or \"linter\" available, all from one place. It handles environments and caching and even updates for you.\n", "\n", "To configure, add a `.pre-commit-config.yaml` file like this:\n", @@ -41,8 +53,13 @@ " - id: black\n", "```\n", "\n", - "The file has a list of repos (local checks can be written too). Each repo contains pre-commit hooks that you can run and configure. You should put modifying \"fixer\" checks before the \"linter\" checks, just in case they fix something that then gets linted.\n", - "\n", + "The file has a list of repos (local checks can be written too). Each repo contains pre-commit hooks that you can run and configure. You should put modifying \"fixer\" checks before the \"linter\" checks, just in case they fix something that then gets linted." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "You can install pre-commit from `brew` (macOS), or via `pipx`/`pip` for anything with Python.\n", "\n", "You can then run it like this:\n", @@ -76,8 +93,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## CI: GitHub Actions\n", - "\n", + "## CI: GitHub Actions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "One of the most important aspect of good code is Continuous Integration (CI); every change should be tested and ideally not allowed to be merged unless it passes. If you support multiple versions of Python or OSs, you should test on each of them. CI lets you do this, and other contributors get the benefit too; if you have good tests you can feel comfortable about making and accepting changes.\n", "\n", "There are many services, but the most popular and possibly one of the best designed ones is GitHub Actions. It is really easy to setup, doesn't require extra permissions or accounts, and runs 10(!) parallel jobs, and supports all three OSs, often with the same code, and is highly modular. This is what a simple job would look like:\n", @@ -111,8 +133,13 @@ " run: pytest\n", "```\n", "\n", - "That's it, 6 jobs run and test your code!\n", - "\n", + "That's it, 6 jobs run and test your code!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "* [Official docs](https://docs.github.com/en/actions/guides/building-and-testing-python) are good\n", "* [Scikit-HEP/developer](https://scikit-hep.org/developer/gha_basic) has some good help, too!\n", "\n",