Skip to content

mlabs-haskell/styleguide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Use it!

In your flake, add

{ 
  inputs.styleguide.url = "github:mlabs-haskell/style-guide";

  outputs = inputs @ {...}: inputs.flake-utils.lib.eachDefaultSystem (system: {
    # ... or your preferred way to handle ${system}
    checks.format = inputs.styleguide.lib.${system}.mkCheck self;
    formatter = inputs.styleguide.lib.${system}.mkFormatter self;
  });
}

Run nix fmt to format your code. Build checks.${system}.format in CI to check formatting.

Introduction

This document describes a set of standards for code. It also explains our reasoning for these choices, and acts as a living document of our practices for current and future contributors to the project. We intend for this document to evolve as our needs change, as well as act as a single point of truth for standards.

Motivation

The desired outcomes from the prescriptions in this document are as follows.

Increase consistency

Inconsistency is worse than any standard, as it requires us to track a large amount of case-specific information. Software development is already a difficult task due to the inherent complexities of the problems we seek to solve, as well as the inherent complexities foisted upon us by decades of bad historical choices we have no control over. For newcomers to a project and old hands alike, increased inconsistency translates to developmental friction, resulting in wasted time, frustration and ultimately, worse outcomes for the code in question.

To avoid putting ourselves into this boat, both currently and in the future, we must strive to be automatically consistent. Similar things should look similar; different things should look different; as much as possible, we must pick some rules and stick to them; and this has to be clear, explicit and well-motivated. This will ultimately benefit us, in both the short and the long term. The standards described here, as well as this document itself, is written with this foremost in mind.

Limit non-local information

There is a limited amount of space in a developer's skull; we all have bad days, and we forget things or make decisions that, perhaps, may not be ideal at the time. Therefore, limiting cognitive load is good for us, as it reduces the amount of trouble we can inflict due to said skull limitations. One of the worst contributors to cognitive load (after inconsistency) is non-local information

  • the requirement to have some understanding beyond the scope of the current unit of work. That unit of work can be a data type, a module, or even a whole project; in all cases, the more non-local information we require ourselves to hold in our minds, the less space that leaves for actually doing the task at hand, and the more errors we will introduce as a consequence.

Thus, we must limit the need for non-local information at all possible levels. 'Magic' of any sort must be avoided; as much locality as possible must be present everywhere; needless duplication of effort or result must be avoided. Thus, our work must be broken down into discrete, minimal, logical units, which can be analyzed, worked on, reviewed and tested in as much isolation as possible. This also applies to our external dependencies.

Thus, many of the decisions described here are oriented around limiting the amount of non-local knowledge required at all levels of the codebase. Additionally, we aim to avoid doing things 'just because we can' in a way that would be difficult for other Haskellers to follow, regardless of skill level.

Minimize impact of legacy

Haskell is a language that is older than some of the people currently writing it; parts of its ecosystem are not exempt from it. With age comes legacy, and much of it is based on historical decisions which we now know to be problematic or wrong. We can't avoid our history, but we can minimize its impact on our current work.

Thus, we aim to codify good practices in this document as seen today. We also try to avoid obvious 'sharp edges' by proscribing them away in a principled, consistent and justifiable manner.

Automate away drudgery

As developers, we should use our tools to make ourselves as productive as possible. There is no reason for us to do a task if a machine could do it for us, especially when this task is something boring or repetitive. We love Haskell as a language not least of all for its capability to abstract, to describe, and to make fun what other languages make dull or impossible; likewise, our work must do the same.

Many of the tool-related proscriptions and requirements in this document are driven by a desire to remove boring, repetitive tasks that don't need a human to perform. By removing the need for us to think about such things, we can focus on those things which do need a human; thus, we get more done, quicker.

Conventions

The words MUST, SHOULD, MUST NOT, SHOULD NOT and MAY are defined as per RFC 2119.

Tools

Compiler warning settings

The following warnings MUST be enabled for all builds of any project, or any project component, in the ghc-options of the Cabal file:

  • -Wall
  • -Wcompat
  • -Wincomplete-uni-patterns
  • -Wincomplete-record-updates
  • -Wredundant-constraints
  • -Wmissing-export-lists
  • -Wmissing-deriving-strategies
  • -Werror

Additionally, -Wredundant-constraints SHOULD be enabled for all builds of any project, in the ghc-options of the Cabal file. Exceptions are allowed when the additional constraints are designed to ensure safety, rather than due to reliance on any method. If this warning is to be disabled, it MUST be disabled in the narrowest possible scope; ideally, this SHOULD be a single module.

Justification

Most of these options are suggested by Alexis King - the justifications for them can be found at the link. These fit well with our motivations, and thus, should be used everywhere. The -Werror ensures that warnings cannot be ignored: this means that problems get fixed sooner. We also add -Wmissing-export-lists and -Wmissing-deriving-strategies: the first ensures that we clearly indicate what is, and isn't, part of a module's public API, and the second ensures that we have clarity about how everything is derived. As we mandate both export lists and deriving strategies in this document, these warnings ensure compliance, as well as checking it automatically.

The permissible exception stems from how redundant constraints are detected by GHC; basically, unless a type class method from a constraint is used within the body of a definition, that constraint is deemed redundant. This is mostly accurate, but some type-level safety constraints can be deemed redundant as a result of this approach. In this case, a limited disabling (per module, ideally) of -Wredundant-constraints is acceptable, as it represents a workaround to a technical problem, not an issue with the warning itself.

Linting

Every source file MUST be free of warnings as produced by HLint, using the settings described in .hlint.yaml. A copy of such a file is provided in this repository.

Justification

HLint automates away the detection of many common sources of boilerplate and inefficiency. It also describes many useful refactors, which in many cases make the code easier to read and understand. As this is fully automatic, it saves effort on our part, and ensures consistency across the codebase without us having to think about it.

Code formatting

Every source file MUST be formatted according to Fourmolu, with the following settings (as per its settings file):

  • indentation: 2
  • comma-style: leading
  • record-brace-space: true
  • indent-wheres: true
  • diff-friendly-import-export: true
  • respectful: true
  • haddock-style: multi-line
  • newlines-between-decls: 1

A copy of a configuration file with these settings is provided in this repository.

Each source code line MUST be at most 80 characters wide.

Justification

Consistency is the most important goal of readable codebases. Having a single standard, automatically enforced, means that we can be sure that everything will look similar, and not have to spend time or mind-space ensuring that our code complies. It also helps with git diffs, as it 'spreads around' the differences less.

Lines wider than 80 characters become difficult to read, especially when viewed on a split screen. It is also a long-standing convention, not just in Haskell. Lastly, very long lines tend to indicate that we need better naming or refactoring.

Code practices

Naming

camelCase MUST be used for all non-type, non-data-constructor names; otherwise, TitleCase MUST be used. Acronyms used as part of a naming identifier (such as 'JSON', 'API', etc) SHOULD be downcased; thus repairJson and fromHttpService are correct. Exceptions are allowed for external libraries (Aeson's parseJSON for example).

Justification

camelCase for non-type, non-data-constructor names is a long-standing convention in Haskell (in fact, HLint checks for it); TitleCase for type names or data constructors is mandatory. Obeying such conventions reduces cognitive load, as it is common practice among the entire Haskell ecosystem. There is no particular standard regarding acronym casing: examples of always upcasing exist (Aeson) as well as examples of downcasing (http-api-data). One choice for consistency (or as much as is possible) should be made however.

Modules

Imports

All modules MUST use the following conventions for imports:

  • import Foo (Baz (Quux, quux), Bar, frob)
  • import qualified Bar.Foo as Foo

If ImportQualifiedPost is enabled, the following form MAY also be used:

  • import Bar.Foo qualified as Foo

Some specific examples cases follow. Type class methods SHOULD be imported alongside their class:

import Control.Applicative (Alternative ((<|>)))

An exception is given when only the method is required:

import Control.Applicative (empty)

Record fields MUST be imported alongside their record:

import Data.Monoid (Endo (appEndo))

Data types from modules imported qualified SHOULD be imported unqualified by themselves:

import Data.Vector (Vector)
import qualified Data.Vector as Vector

An exception is given if such an import would cause a name clash:

-- no way to import both of these without clashing on the Vector type name
import qualified Data.Vector as Basic
import qualified Data.Vector.Storable as Storable

-- We now use Basic.Vector to refer to the Vector in Data.Vector, and
-- Storable.Vector otherwise.

We also permit an exception to use a 'hiding import' to replace part of the
``Prelude``:

```haskell
-- replace the String-based readFile with a Text-based one
import Prelude hiding (readFile)
import Data.Text.IO (readFile)

Data constructors MUST be imported individually. For example, given the following data type declaration:

module Quux where

data Foo = Bar Int | Baz

Its corresponding import should be:

import Quux (Foo, Bar, Baz)

Qualified imports SHOULD use their entire module name (that is, the last component of its hierarchical name) as the prefix. For example:

import qualified Data.Vector as Vector

Exceptions are granted when:

  • The import would cause a name clash anyway (such as different vector modules); or
  • We have to import a data type qualified as well.

Qualified imports of multiple modules MUST NOT be imported under the same name. Thus, the following is wrong:

-- Do not do this!
import qualified Foo.Bar as Baz
import qualified Foo.Quux as Baz

Justification

One of the biggest challenges for modules which depend on other modules (especially ones that come from the project, rather than an external library) is knowing where a given identifier's definition can be found. Having explicit imports of the form described helps make this search as straightforward as possible. This also limits cognitive load when examining the sources (if we don't import something, we don't need to care about it in general). Lastly, being explicit avoids stealing too many useful names.

In general, type names occur far more often in code than function calls: we have to use a type name every time we write a type signature, but it's unlikely we use only one function that operates on said type. Thus, we want to reduce the amount of extra noise needed to write a type name if possible. Additionally, name clashes from function names are far more likely than name clashes from type names: consider the number of types on which a size function makes sense. Thus, importing type names unqualified, even if the rest of the module is qualified, is good practice, and saves on a lot of prefixing.

Exports

All modules MUST have explicit export lists; that is, every module must state what exactly it exports. Export lists SHOULD be separated using Haddock headings:

module Foo.Bar (
  -- * Types
  Baz,
  Quux (Quux),
  -- * Construction
  mkBaz,
  quuxFromBaz,
  -- etc
  ) where

An exception is granted when the module provides few exported identifiers, or if the module doesn't have a large variety of functionality. In the specific case of modules that exist only to provide instances (for compatibility, for example), the export list MUST be empty.

Exports of data constructors or fields SHOULD be explicit:

-- This is ideal
module Foo.Bar (
  Baz(Baz, quux, frob)
  ) where

An exception is granted if the number of fields or constructors is large; then, wildcard exports MAY be used:

-- This is fine if Baz has a lot of constructors or fields
module Foo.Bar (
  Baz(..)
  ) where

Justification

Explicit export lists are an immediate, clear and obvious indication of what publically visible interface a module provides. It gives us stability guarantees (namely, we know we can change things that aren't exported and not break downstream code at compile time), and tells us where to go looking first when inspecting or learning the module. Additionally, it means there is less chance that implementation details 'leak' out of the module due to errors on the part of developers, especially new developers.

Allowing wildcard exports, while disallowing wildcard imports, is justified on the grounds of information locality. Seeing a wildcard import of all of a type's data constructors or fields doesn't necessarily indicate the usages of said data constructors or fields without looking up the module from where they're exported; having this import be explicit reduces how much searching we have to do. However, if we are reading an export list, we have the type definition in the same file we're already looking at, making it fairly easy to check.

Plutus module import naming conventions

In addition to the general module import rules, we follow some conventions on how we import the Plutus API modules, allowing for some flexibility depending on the needs of a particular module.

Modules under the names Plutus, Ledger and Plutus.V1.Ledger SHOULD be imported qualified with their module name, as per the general module standards. An exception to this is Plutus.V1.Ledger.Api, where the Ledger name is preferred.

Some other exceptions to this are allowed where it may be more convenient to avoid longer qualified names.

For example:

import Plutus.V1.Ledger.Slot qualified as Slot
import Plutus.V1.Ledger.Tx qualified as Tx
import Plutus.V1.Ledger.Api qualified as Ledger
import Ledger.Oracle qualified as Oracle
import Plutus.Contract qualified as Contract

In some cases it may be justified to use a shortened module name:

import Plutus.V1.Ledger.AddressMap qualified as AddrMap

Modules under PlutusTx that are extensions to PlutusTx.Prelude MAY be imported unqualified when it is reasonable to do so.

The Plutus.V1.Ledger.Api module SHOULD be avoided in favour of more specific modules where possible. For example, we should avoid:

import Plutus.V1.Ledger.Api qualified as Ledger

In favour of:

import Plutus.V1.Ledger.Scripts qualified as Scripts

Justification

The Plutus API modules can be confusing, with numerous modules involved, many exporting the same items. Consistent qualified names help ease this problem, and decrease ambiguity about where imported items come from.

LANGUAGE pragmata

The following pragmata MUST be enabled at project level (that is, in the Cabal file):

  • BangPatterns
  • BinaryLiterals
  • ConstraintKinds
  • DataKinds
  • DeriveFunctor
  • DeriveGeneric
  • DeriveTraversable
  • DerivingStrategies
  • DerivingVia
  • DuplicateRecordFields
  • EmptyCase
  • FlexibleContexts
  • FlexibleInstances
  • GADTs
  • GeneralizedNewtypeDeriving
  • HexFloatLiterals
  • InstanceSigs
  • ImportQualifiedPost
  • KindSignatures
  • LambdaCase
  • MultiParamTypeClasses
  • NoImplicitPrelude
  • NumericUnderscores
  • OverloadedStrings
  • ScopedTypeVariables
  • StandaloneDeriving
  • TupleSections
  • TypeApplications
  • TypeOperators
  • TypeSynonymInstances
  • UndecidableInstances

Any other LANGUAGE pragmata MUST be enabled per-file. All language pragmata MUST be at the top of the source file, written as {-# LANGUAGE PragmaName #-}.

Furthermore, the following pragmata MUST NOT be used, or enabled, anywhere:

  • DeriveDataTypeable
  • DeriveFoldable
  • PartialTypeSignatures
  • PostfixOperators

Justification

DataKinds, DuplicateRecordFields, GADTs, TypeApplications, TypeSynonymInstances and UndecidableInstances are needed globally to use the GHC plugin from record-dot-preprocessor. While some of these extensions are undesirable to use globally, we end up needing them anyway, so we can't really avoid this.

BangPatterns are a much more convenient way to force evaluation than repeatedly using seq. Furthemore, they're not confusing, and are considered ubiquitous enough for GHC2021. Having them on by default simplifies a lot of performance tuning work, and they don't really need signposting.

BinaryLiterals, HexFloatLiterals and NumericUnderscores all simulate features that are found in many other programming languages, and that are extremely convenient in a range of settings, ranging from dealing with large numbers to bit-twiddling. If anything, it is more surprising and annoying when these aren't enabled, and should really be part of Haskell syntax anyway. Enabling this project-wide actually encourages better practice and readability.

The kind Constraint is not in Haskell2010, and thus, isn't recognized by default. While working with constraints as first-class objects isn't needed often, this extension effectively exists because Haskell2010 lacks exotic kinds altogether. Since we require explicit kind signatures (and foralls) for all type variables, this needs to be enabled as well. There is no harm in enabling this globally, as other rich kinds (such as Symbol or Nat) don't require an extension for their use, and this doesn't change any behaviour (Constraint exists whether you enable this extension or not, as do 'exotic kinds' in general).

DerivingStrategies is good practice (and in fact, is mandated by this document); it avoids ambiguities between GeneralizedNewtypeDeriving and DeriveAnyClass, allows considerable boilerplate savings through use of DerivingVia, and makes the intention of the derivation clear on immediate reading, reducing the amount of non-local information about derivation priorities that we have to retain. DeriveFunctor and GeneralizedNewtypeDeriving are both obvious and useful extensions to the auto-derivation systems available in GHC. Both of these have only one correct derivation (the former given by parametricity guarantees, the latter by the fact that a newtype only wraps a single value). As there is no chance of unexpected behaviour by these, no possible behaviour variation, and that they're key to supporting both the stock and newtype deriving strategies, having these on by default removes considerable tedium and line noise from our code. A good example are newtype wrappers around monadic stacks:

newtype FooM a = FooM (ReaderT Int (StateT Text IO) a)
  deriving newtype (
    Functor, 
    Applicative, 
    Monad, 
    MonadReader Int, 
    MonadState Text, 
    MonadIO
    )

Deriving Traversable is a little tricky. While Traversable is lawful (though not to the degree Functor is, permitting multiple implementations in many cases), deriving it is complicated by issues of role assignation for higher-kinded type variables and the fact that you can't coerce through a Functor. These are arguably implementation issues, but repairing this situation requires cardinal changes to Functor, which is unlikely to ever happen. Even newtype or via derivations of Traversable are mostly impossible; thus, we must have special support from GHC, which DeriveTraversable enables. This is a very historically-motivated inconsistency, and should really not exist at all. While this only papers over the problem (as even with this extension on, only stock derivations become possible), it at least means that it can be done at all. Having it enabled globally makes this inconsistency slightly less visible, and is completely safe.

While GHC Generics are far from problem-free, many parts of the Haskell ecosystem require Generic, either as such (c.f. beam-core) or for convenience (c.f aeson, hashable). Additionally, several core parts of Plutus (including ToSchema) are driven by Generic. The derivation is trivial in most cases, and having to enable an extension for it is quite annoying. Since no direct harm is done by doing this, and use of Generic is already signposted clearly (and is mostly invisible), having this on globally poses no problems.

EmptyCase not being on by default is an inconsistency of Haskell 2010, as the report allows us to define an empty data type, but without this extension, we cannot exhaustively pattern match on it. This should be the default behaviour for reasons of symmetry.

FlexibleContexts and FlexibleInstances paper over a major deficiency of Haskell2010, which in general isn't well-motivated. There is no real reason to restrict type arguments to variables in either type class instances or type signatures: the reasons for this choice in Haskell2010 are entirely for the convenience of the implementation. It produces no ambiguities, and in many ways, the fact this isn't the default is more surprising than anything. Additionally, many core libraries rely on one, or both, of these extensions being enabled (mtl is the most obvious example, but there are many others). Thus, even for popularity and compatibility reasons, these should be on by default.

InstanceSigs are harmless by default, and introduce no complications. Their not being default is strange. ImportQualifiedPost is already a convention of several MLabs projects, and helps with formatting of imports.

KindSignatures become extremely useful in any setting where 'exotic kinds' (meaning, anything which isn't Type or Type -> Type or similar) are commonplace; much like type signatures clarify expectations and serve as active documentation (even where GHC can infer them), explicit kind signatures serve the same purpose 'one level up'. When combined with the requirement to provide explicit foralls for type variables defined in this document, they simplify the usage of 'exotic kinds' and provide additional help from both the type checker and the code. Since this project is Plutus-based, we use 'exotic kinds' extensively, especially in row-polymorphic records; thus, in our case, this is especially important. This also serves as justification for ScopedTypeVariables, as well as ironing out a weird behaviour where in cases such as

foo :: a -> b
foo = bar . baz
   where
      bar :: String -> b
      bar = ...
      baz :: a -> String
      baz = ...

cause GHC to produce fresh type variables in each where-bind. This is confusing and makes little sense - if the user wanted a fresh variable, they would name it that way. What's worse is that the type checker emits an error that makes little sense (except to those who have learned to look for this error), creating even more confusion, especially in cases where the type variable is constrained:

foo :: (Monoid m) => m -> String
foo = bar . baz
   where
      baz :: m -> Int
      baz = ... -- this has no idea that m is a Monoid, since m is fresh!

LambdaCase reduces a lot of code in the common case of analysis of sum types. Without it, we are forced to either write a dummy case argument:

foo s = case s of
-- rest of code here

Or alternatively, we need multiple heads:

foo Bar = -- rest of code
foo (Baz x y) = -- rest of code
-- etc

LambdaCase is shorter than both of these, and avoids us having to bind variables, only to pattern match them away immediately. It is convenient, clear from context, and really should be part of the language to begin with.

MultiParamTypeClasses are required for a large number of standard Haskell libraries, including mtl and vector, and in many situations. Almost any project of non-trivial size must have this extension enabled somewhere, and if the code makes significant use of mtl-style monad transformers or defines anything non-trivial for vector, it must use it. Additionally, it arguably lifts a purely implementation-driven decision of the Haskell 2010 language, much like FlexibleContexts and FlexibleInstances. Lastly, although it can introduce ambiguity into type checking, it only applies when we want to define our own multi-parameter type classes, which is rarely necessary. Enabling it globally is thus safe and convenient.

Based on the recommendations of this document (driven by the needs of being cardinally connected with Plutus), NoImplicitPrelude is required to allow us to default to the Plutus prelude instead of the one from base.

OverloadedStrings deals with the problem that String is a suboptimal choice of string representation for basically any problem, with the general recommendation being to use Text instead. It is not, however, without its problems:

  • ByteStrings are treated as ASCII strings by their IsString instance;
  • The semantics of Plutus' BuiltinByteString vary considerably by use site, with little indication;
  • Overly polymorphic behaviour of many functions (especially in the presence of type classes) forces extra type signatures;

These are usually caused not by the extension itself, but by other libraries and their implementations of either IsString or overly polymorphic use of type classes without appropriate laws (Aeson's KeyValue is a particularly egregious offender here). The convenience of this extension in the presence of literals, and the fact that for BuiltinByteString there is no other way to construct literals, makes it worth using by default.

StandaloneDeriving is mostly needed for GADTs, or situations where complex type-level computations drive type class instances, requiring users to specify constraints manually. This can pose some difficulties syntactically (such as with deriving strategies), but isn't a problem in and of itself, as it doesn't really change how the language works. Having this enabled globally is not problematic.

TupleSections smooths out an oddity in the syntax of Haskell 2010 regarding partial application of tuple constructors. Given a function like foo :: Int -> String -> Bar, we accept it as natural that we can write foo 10 to get a function of type String -> Bar. However, by default, this logic doesn't apply to tuple constructors. As special cases are annoying to keep track of, and in this case, serve no purpose, as well as being clear from their consistent use, this should also be enabled by default; it's not clear why it isn't already.

TypeOperators is practically a necessity when dealing with type-level programming seriously. Much how infix data constructors are extremely useful (and sometimes clearer than their prefix forms), infix type constructors serve a similar function. Additionally, Plutus relies on operators at the type level significantly - for example, it's not really possible to define a row-polymorphic record or variant without them. Having to enable this almost everywhere is a needless chore, and having type constructors behaving differently to data constructors here is a needless source of inconsistency.

We exclude DeriveDataTypeable, as Data is a strictly-worse legacy version of Generic, and Typeable no longer needs deriving for anything anyway. The only reason to derive either of these is for compatibility with legacy libraries, which we don't have any of, and the number of which shrinks every year. If we're using this extension at all, it's probably a mistake.

Foldable is possibly the most widely-used lawless type class. Its only laws are about self-consistency (such as agreement between foldMap and foldr), but unlike something like Functor, Foldable doesn't have any laws specifying its behaviour, outside of consistency laws (such as between foldMap and foldr) and 'it compiles'. As a result, even if we accept its usefulness (a debatable position in itself), there are large numbers of possible implementations that could be deemed 'valid'. The approach taken by DeriveFoldable is one such approach, but this requires knowing its derivation algorithm, and may well not be the implementation you need. Unlike a Functor derivation (whose meaning is obvious), a Foldable one is anything but, and requires referencing a lot of non-local information to determine how it will behave (especially for the 'richer' Foldable, with many additional methods). If you need a Foldable instance, you will either newtype or via-derive it (which doesn't need this extension anyway), or you'll write your own (which also doesn't need this extension). Enabling this encourages bad practices, is confusing, and ultimately doesn't really benefit anything.

PartialTypeSignatures is a misfeature. Allowing leaving in type holes (to be filled by GHC's inference algorithm) is an anti-pattern for the same reason that not providing top-level signatures is: while it's possible (mostly) for GHC to infer signatures, we lose considerable clarity and active documentation by doing so, in return for (quite minor) convenience. While the use of typed holes during development is a good practice, they should not remain in final code. Given that Plutus projects require us to do some fairly advanced type-level programming (where inference often fails), this extension can often provide totally incorrect results due to GHC's 'best-effort' attempts at type checking. There is no reason to leave behind typed holes instead of filling them in, and we shouldn't encourage this.

PostfixOperators are arguably a misfeature. Infix operators already require a range of special cases to support properly (what symbols create an infix operator, importing them at the value and type level, etc), which postfix operators make worse. Furthermore, they are seldom, if ever, used, and typically aren't worth the trouble. Haskell is not Forth, none of our dependencies rely on postfix operators, and defining our own creates more problems than it solves.

record-dot-preprocessor

The GHC plugin from record-dot-preprocessor SHOULD be enabled globally.

Justification

Haskell records are documentedly and justifiably subpar: the original issue for the record dot preprocessor extension provides a good summary of the reasons. While a range of extensions (including DuplicateRecordFields, DisambiguateRecordFields, NamedFieldPuns, and many others) have been proposed, and accepted, to mitigate the situation, the reality is that, even with them in place, use of records in Haskell is considerably more difficult, and less flexible, than in any other language in widespread use today. The proposal described in the previous link provides a solution which is familiar to users of most other languages, and addresses the fundamental issue that makes Haskell records so awkward.

While the proposal for the record dot syntax that this preprocessor enables is coming, it's not available in the current version of Haskell used by Plutus (and thus, transitively, by us). Additionally, the earliest this will be available is GHC 9.2, and given that our dependencies must support this version too, it'll be considerable time before we can get its benefits. The preprocessor gives us these benefits immediately, at some dependency cost. While it's not a perfect process, as it involves enabling several questionable extensions, and can require disabling an important warning, it significantly reduces issues with record use, making it worthwhile. Additionally, when GHC 9.2 becomes usable, we can upgrade to it seamlessly.

Prelude

The PlutusTx.Prelude MUST be used. A 'hiding import' to remove functionality we want to replace SHOULD be used when necessary. If functionality from the Prelude in base is needed, it SHOULD be imported qualified. Other preludes MUST NOT be used.

Justification

For Plutus, we are in some ways limited by what Plutus requires (and provides). Especially for on-chain code, the Plutus prelude is the one we need to use, and therefore, its use should be as friction-free as possible. As many modules may contain a mix of off-chain and on-chain code, we also want to make impendance mismatches as limited as possible.

We can assume a familiarity (or at least, the goal of such) with Plutus stuff. Additionally, every Haskell developer is familiar with the Prelude from base. Thus, any replacements of the Plutus prelude functionality with the base prelude should be clearly indicated locally.

Haskell is a 30-year-old language, and the Prelude is one of its biggest sources of legacy. A lot of its defaults are questionable at best, and often need replacing. As a consequence of this, a range of 'better Preludes' have been written, with a range of opinions: while there is a common core, a large number of decisions are opinionated in ways more appropriate to the authors of said alternatives and their needs than those of other users of said alternatives. This means that, when a non-base Prelude is in scope, it often requires familiarity with its specific decisions, in addition to whatever cognitive load the current module and its other imports impose. Given that we already use an alternative prelude (in tandem with the one from base), additional alternatives present an unnecessary cognitive load. Lastly, the dependency footprint of many alternative Preludes is highly non-trivial; it isn't clear if we need all of this in our dependency tree.

For all of the above reasons, the best choice is 'default to Plutus, with local replacements from base'.

Versioning

A project MUST use the PVP. Two, and only two, version numbers MUST be used: a major version and a minor version.

Justification

The Package Versioning Policy is the conventional Haskell versioning scheme, adopted by most packages on Hackage. It is clearly described, and even automatically verifiable by use of tools like policeman. Thus, adopting it is both in line with community standards (making it easier to remember), and simplifies cases such as Hackage publication or open-sourcing in general.

Two version numbers (major and minor) is the minimum allowed by the PVP, indicating compilation-breaking and compilation-non-breaking changes respectively. As parsimony is best, and more granularity than this isn't generally necessary, adopting this model is the right decision.

Documentation

Every publically-exported definition MUST have a Haddock comment, detailing its purpose. If a definition is a function, it SHOULD also have examples of use using Bird tracks. The Haddock for a publically-exported definition SHOULD also provide an explanation of any caveats, complexities of its use, or common issues a user is likely to encounter.

If the code project is a library, these Haddock comments SHOULD carry an @since annotation, stating what version of the library they were introduced in, or the last version where their functionality or type signature changed.

For type classes, their laws MUST be documented using a Haddock comment.

Each repository must also have a README which should explain how to build the application and/or library. If the repository contains one or more executable, the readme should also explain how to run each executable, including command line arguments/options.

Justification

Code reading is a difficult task, especially when the 'why' rather than the 'how' of the code needs to be deduced. A good solution to this is documentation, especially when this documentation specifies common issues, provides examples of use, and generally states the rationale behind the definition.

For libraries, it is often important to inform users what changed in a given version, especially where 'major bumps' are concerned. While this would ideally be addressed with accurate changelogging, it can be difficult to give proper context. @since annotations provide a granular means to indicate the last time a definition changed considerably, allowing someone to quickly determine whether a version change affects something they are concerned with.

As stated elsewhere in the document, type classes having laws is critical to our ability to use equational reasoning, as well as a clear indication of what instances are and aren't permissible. These laws need to be clearly stated, as this assists both those seeking to understand the purpose of the type class, and also the expected behaviour of its instances.

Type and kind signatures

All module-level definitions, as well as where-binds, MUST have explicit type signatures. Type variables MUST have an explicit forall scoping them, and all type variables MUST have explicit kind signatures. Thus, the following is wrong:

data Foo a = Bar | Baz [a]

quux :: (Monoid m) => [m] -> m -> m

Instead, write it like this:

data Foo (a :: Type) = Bar | Baz [a]

quux :: forall (m :: Type) . (Monoid m) => [m] -> m -> m

Each explicit type signature MUST correspond to one definition only. Thus, the following is wrong:

bar :: Int
baz :: Int
(bar, baz) = someOtherFunction someOtherValue

Instead, write it like this:

bar :: Int
bar = fst . someOtherFunction $ someOtherValue

baz :: Int
baz = snd . someOtherFunction $ someOtherValue

Justification

Explicit type signatures for module-level definitions are a good practice in Haskell for several reasons: they aid type-driven development by providing better compiler feedback, act as a form of 'active documentation' describing what we expect a function to do (and not do), and help us plan and formulate our thoughts while we implement. While GHC can, in theory, infer type signatures, not having them significantly impedes readability, and can easily go wrong in the presence of more advanced type-level features (or even rank-2 polymorphism, which is ubiquitous in the form of the ST monad at least); there is no reason not to have them.

Type-level programming is mandated in many places by Plutus (including, but not limited to, row-polymorphic records and variants from Data.Row). This often requires use of TypeApplications, which essentially makes not only the type variables, but their order, part of the API of any definition that uses them. While there is an algorithm determining this precisely, something that is harmless at the value level (such as re-ordering constraints) could potentially serve as an API break. Additionally, this algorithm is a huge source of non-local information, and in the presence of a large number of type variables, of different kinds, can easily become confusing. Having explicit foralls quantifying all type variables makes it clear what the order for these type variables is for TypeApplications, and also allows us to choose it optimally for our API, rather than relying on what the algorithm would produce. This is significantly more convenient, and means less non-local information and confusion.

Additionally, type-level programming requires significant use of 'exotic kinds', which in our case include Constraint -> Type and Row Type, to name but a few. While GHC can (mostly) infer kind signatures, much the same way as we explicitly annotate type signatures as a form of active documentation (and to assist the type checker when using type holes), explicitly annotating kind signatures allows us to be clear to the users where exotic kinds are expected, as well as ensuring that we don't make any errors ourselves. This, together with explicit foralls, essentially bring the same practices to the kind level as the Haskell community already considers to be good at the type level.

where bindings are quite common in idiomatic Haskell, and quite often contain non-trivial logic. They're also a common refactoring, and 'hole-driven development' tool, where you create a hole to be filled with a where-bound definition. Even in these cases, having an explicit signature on where-bindings helps: during development, you can use typed holes inside the where-binding with useful information (absent a signature, you'll get nothing), and it makes the code much easier to understand, especially if the where-binding is complex. It's also advantageous when 'promoting' where-binds to full top-level definitions, as the signature is already there. Since we need to do considerable type-level programming as part of Plutus, this becomes even more important, as GHC's type inference algorithm can often fail in those cases on where-bindings, which will sometimes fail to derive, giving a very strange error message, which would need a signature to solve anyway. By making this practice proactive, we are decreasing confusion, as well as increasing readability. While in theory, this standard should extend to let-bindings as well, these are much rarer, and can be given signatures with :: if ScopedTypeVariables is on (which it is for us by default) if needed.

While it is possible to provide definitions for multiple signatures at once at the module level, it's almost never a good idea to do so. Even in fairly straightforward cases (like the provided example), it can be confusing, and in cases where the 'definition disassembly' is more complex (or involves other language features, such as named field puns or wildcards) definitely is confusing. Furthemore, it's almost never warranted; it can be more concise, but at the cost of clarity, which is never a viable tradeoff long-term. Lastly, documentation and refactoring of such multi-definitions is more difficult as a result. Keeping strictly to a 'one signature, one definition' structure aids readability and maintainability, and is almost never particularly verbose anyway.

Other

Lists SHOULD NOT be field values of types; this extends to Strings. Instead, Vectors (Texts) SHOULD be used, unless a more appropriate structure exists. On-chain code, due to a lack of alternatives, is one place lists can be used as field values of types.

Partial functions MUST NOT be defined. Partial functions SHOULD NOT be used except to ensure that another function is total (and the type system cannot be used to prove it).

Derivations MUST use an explicit strategy. Thus, the following is wrong:

newtype Foo = Foo (Bar Int)
    deriving (Eq, Show, Generic, FromJSON, ToJSON)

Instead, write it like this:

newtype Foo = Foo (Bar Int)
    deriving stock (Generic)
    deriving newtype (Eq, Show)
    deriving anyclass (FromJSON, ToJSON)

Deriving via SHOULD be preferred to newtype derivation, especially where the underlying type representation could change significantly.

type SHOULD NOT be used. The only acceptable case is abbreviation of large type-level computations. In particular, type MUST NOT be used to create an abstraction boundary.

Sum types containing record fields MUST NOT be defined. Thus, the following is not allowed:

data Foo = Bar | Baz { quux :: Int, frob :: (Int, Int) }

Justification

Haskell lists are a large example of the legacy of the language: they (in the form of singly linked lists) have played an important role in the development of functional programming (and for some 'functional' languages, continue to do so). However, from the perspective of data structures, they are suboptimal except for extremely specific use cases. In almost any situation involving data (rather than control flow), an alternative, better structure exists. Although it is both acceptable and efficient to use lists within functions (due to GHC's extensive fusion optimizations), from the point of view of field values, they are a poor choice from both an efficiency perspective, both in theory and in practice. For almost all cases where you would want a list field value, a Vector field value is more appropriate, and in almost all others, some other structure (such as a Map) is even better. We make a named exception for on-chain code, as no alternatives presently exist.

Partial functions are runtime bombs waiting to explode. The number of times the 'impossible' happened, especially in production code, is significant in our experience, and most partiality is easily solvable. Allowing the compiler to support our efforts, rather than being blind to them, will help us write more clear, more robust, and more informative code. Partiality is also an example of legacy, and it is legacy of considerable weight. Sometimes, we do need an 'escape hatch' due to the impossibility of explaining what we want to the compiler; this should be the exception, not the rule.

Derivations are one of the most useful features of GHC, and extend the capabilities of Haskell 2010 considerably. However, with great power comes great ambiguity, especially when GeneralizedNewtypeDeriving is in use. While there is an unambiguous choice if no strategy is given, it becomes hard to remember. This is especially dire when GeneralizedNewtypeDeriving combines with DeriveAnyClass on a newtype. Explicit strategies give more precise control over this, and document the resulting behaviour locally. This reduces the number of things we need to remember, and allows more precise control when we need it. Lastly, in combination with DerivingVia, considerable boilerplate can be saved; in this case, explicit strategies are mandatory.

The only exception to the principle above is newtype deriving, which can occasionally cause unexpected problems; if we use a newtype derivation, and change the underlying type, we get no warning. Since this can affect the effect of some type classes drastically, it would be good to have the compiler check our consistency.

type is generally a terrible idea in Haskell. You don't create an abstraction boundary with it (any operations on the 'underlying type' still work over it), and compiler output becomes very inconsistent (sometimes showing the type definition, sometimes the underlying type). If your goal is to create an abstraction boundary with its own operations, newtype is both cost-free and clearer; if that is not your goal, just use the type you'd otherwise rename, since it's equivalent semantically. The only reasonable use of type is to hide complex type-level computations, which would otherwise be too long. Even this is somewhat questionable, but the questionability comes from the type-level computation being hidden, not type as such.

The combination of record syntax and sum types, while allowed, causes considerable issues. One of the biggest problems with this combination is that is sneaks in partiality 'via the back door'; at the same time, it also produces confusing warnings with -Wno-incomplete-record-updates and record-dot-preprocessor. While arguably convenient in some cases, this ultimately creates more problems than it solves.

Design practices

Parse, don't validate

Boolean blindness SHOULD NOT be used in the design of any function or API. Returning more meaningful data SHOULD be the preferred choice. The general principle of 'parse, don't validate' SHOULD guide design and implementation.

Justification

The description of boolean blindness gives specific reasons why it is a poor design choice; additionally, it runs counter to the principle of 'parse, don't validate. While sometimes unavoidable, in many cases, it's possible to give back a more meaningful response than 'yes' or 'no, and we should endeavour to do this. Designs that avoid boolean blindness are more flexible, less bug-prone, and allow the type checker to assist us when writing. This, in turn, reduces cognitive load, improves our ability to refactor, and means fewer bugs from things the compiler could have checked if a function wasn't boolean-blind.

No multi-parameter type-classes without functional dependencies

Any multi-parameter type class MUST have a functional dependency restricting its relation to a one-to-many at most. In cases of true many-to-many relationships, type classes MUST NOT be used as a solution to the problem.

Justification

Multi-parameter type classes allow us to express more complex relationships among types; single-parameter type classes effectively permit us to 'subset' Hask only. However, multi-parameter type classes make type inference extremely flakey, as the global coherence condition can often lead to the compiler being unable to determine what instance is sought even if all the type parameters are concrete, due to anyone being able to add a new instance at any time. This is largely caused by multi-parameter type classes defaulting to effectively representing arbitrary many-to-many relations.

When we do not have arbitrary many-to-many relations, multi-parameter type classes are useful and convenient. We can indicate this using functional dependencies, which inform the type checker that our relationship is not arbitrarily many-to-many, but rather many-to-one or even one-to-one. This is a standard practice in many libraries (mtl being the most ubiquitous example), and allows us the benefits of multi-parameter type classes without making type checking confusing and difficult.

In general, many-to-many relationships pose difficult design choices, for which type classes are not the correct solution. If a functional dependency cannot be provided for a type class, it suggests that the current design relies inherently on a many-to-many relation, and should be either rethought to eliminate it, or be dealt with using a more appropriate means.

Type classes must have laws

Any type class not imported from an external dependency MUST have laws. These laws MUST be documented in a Haddock comment on the type class definition, and all instances MUST follow these laws.

Justification

Type classes are a powerful feature of Haskell, but can also be its most confusing. As they allow arbitrary ad-hoc polymorphism, and are globally visible, it is important that we limit the confusion this can produce. Additionally, type classes without laws inhibit equational reasoning, which is one of Haskell's biggest strengths, especially in the presence of what amounts to arbitrary ad-hoc polymorphism.

Additionally, type classes with laws allow the construction of provably correct abstractions above them. This is also a common feature in Haskell, ranging from profunctor optics to folds. If we define our own type classes, we want to be able to abstract above them with total certainty of correctness. Lawless type classes make this difficult to do: compare the number of abstractions built on Functor or Traversable as opposed to Foldable.

Thus, type classes having laws provides both ease of understanding and additional flexibility.

Libraries and frameworks

Use Type.Reflection instead of Data.Typeable

Data.Typeable from base SHOULD NOT be used; the only exception is for interfacing with legacy libraries. Whenever its capabilities are required, Type.Reflection SHOULD be used.

Justification

Data.Typeable was the first attempt to bring runtime type information to GHC; this mechanism is necessary, as GHC normally performs type erasure. The original design of Data.Typeable.Typeable required the construction of a TypeRep, which could be user-specified. This led to issues of correctness, as user-specified TypeReps could easily not follow the conventions that GHC expected, and also coherency, as there's no guarantee that for any given type, its TypeRep would be unique. This was later subsumed into the DeriveDataTypeable extension, which made it impossible to define Typeable instances except through the mechanisms provided by GHC.

Additionally, as Data.Typeable predated TypeApplications, its API requires a value of a specific type to direct which TypeRep to provide. This suffers from similar problems as Foreign.Storable.sizeOf, as frequently, there is no suitable value to provide. This forced developers to write code like

typeOf (undefined :: a)

This looks strange, and isn't the approach taken by modern APIs. Lastly, Data.Typeable had to be derived for any type that wanted to use its mechanisms, which forced developers to 'pay' for these instances, whether they wanted to or not.

Type.Reflection has been the go-to API for these purposes since GHC 8.2. It improves the situation with Data.Typeable by replacing the old mechanism with a compiler-generated singleton. Furthermore, deriving Typeable is now unnecessary, much in the same way as deriving Coercible is not necessary: GHC handles all of this. Additionally, the API is now based on TypeApplications, which allows us to write

typeRep @a

The system is also entirely pay-as-you-go - instead of the responsibility being placed on the data types (thus requiring you to pay the cost of the instances whether you needed them or not), the responsibility is now on the functions that consume them: if you specify a (Typeable a) => constraint, this informs GHC that the singleton for TypeRep a is needed in this function, but not anywhere else.

Since Type.Reflection can do everything Data.Typeable can, has a more modern API, and also lower cost, there is no reason to use Data.Typeable anymore except for legacy compatibility reasons.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages