Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happens with empty or whitespace only lines? #23

Open
jridgewell opened this issue Sep 29, 2020 · 5 comments
Open

What happens with empty or whitespace only lines? #23

jridgewell opened this issue Sep 29, 2020 · 5 comments
Labels
question Further information is requested

Comments

@jridgewell
Copy link
Member

jridgewell commented Sep 29, 2020

When determining the common indentation of all lines, I can think of 3 possible behaviors:

  1. Empty lines (eg, the nothingness between \n\n newlines) are ignored
    • Whitespace only lines would affect the common indentation
  2. Whitespace-only (eg, \n \n) lines are ignored
    • This would include empty lines
    • Whitespace only lines could be emptied, or preserved exactly, or we could remove some amount of indentation
  3. All lines are considered
    • Empty lines would cause 0 indentation removal
    • Whitespace lines would affect the common indentation

My preference is to ignore only empty lines, it makes the rest of the algorithm easier to reason about. If a lines contains any chars, then it is used when determining the common indentation. The whitespace-only option has a weird failure case where the whitespace line contains a different indentation than the non-whitespace lines. And considering all lines makes it impossible to break content into sections (eg, markdown).

@jridgewell jridgewell added the question Further information is requested label Sep 29, 2020
@bakkot
Copy link
Contributor

bakkot commented Oct 2, 2021

Looking at other languages:

  • In Python's textwrap.dedent, whitespace-only lines are not considered when determining common prefixes, and whitespace-only lines are emptied (but not removed).
  • In Java 13's """ strings, whitespace-only lines are not considered when determining common prefixes, and whitespace-only lines are emptied (but not removed). Also, trailing whitespace is stripped from each line.
  • In Kotlin's trimIndent, whitespace-only lines are not considered when determining common prefixes, and whitespace-only lines are emptied (but not removed). Also, a leading and a trailing newline are stripped (if either is present).
  • In Ruby's <<~ heredocs, whitespace-only lines are not considered when determining common prefixes, and whitespace-only lines are trimmed up to the common prefix.

So, there's uniform agreement that whitespace-only lines are not considered when determining common prefixes (i.e. option 2). I think we should do that.

And of the languages I checked, 3/4 empty out whitespace-only lines entirely. My preference would be to go with the majority.

@jridgewell
Copy link
Member Author

Thanks for the research! If we clear whitespace-only lines, Option 2 seems acceptable.

The mentioned failure case only happens if we don't clear the lines, and they have a different common prefix whitespace, what happens? I find that to be a terrible design, but it's side-stepped entirely be clearing the whole line.

@jridgewell
Copy link
Member Author

Kotlin treats spaces and tabs interchangeably, which I really dislike:

val withoutIndent =
    """
          	 ABC
            123
            456
        """.trimIndent()
println(withoutIndent) // ABC\n123\n456

And they also do not entirely remove whitespace-only lines. They simply remove up to X number of leading whitespace chars, with X being the least leading whitespace of every line that contains non-whitespace.

I think Kotlin has a terrible behavior.


Ruby treats a tab as 8 spaces, and removes the least number of spaces from every line. I also hate Ruby's behavior.


Java's haven't been implemented yet, so I can't test exactly what's happening. They explicitly mention ignoring and removing all whitespace from whitespace-only lines. They don't specifically mention spaces or tabs, but do say:

Compute the common white space prefix of the set of determining lines, by counting the number of leading white space characters on each line and taking the minimum count.


Python is the most sensible of all here. They remove all whitespace on whitespace-only lines, and treat spaces and tabs as separate.

@bakkot
Copy link
Contributor

bakkot commented Jan 31, 2022

And they also do not entirely remove whitespace-only lines. They simply remove up to X number of leading whitespace chars, with X being the least leading whitespace of every line that contains non-whitespace.

Huh, don't know how I missed that, sorry.


Java's haven't been implemented yet, so I can't test exactly what's happening.

Java's """ strings were in preview starting in version 13 and released as a full feature in version 15. Any reasonably recent version should support them.

Anyway, from testing locally, it looks like Java treats spaces and tabs as interchangeable, both being a single character. And they do indeed empty out whitespace-only lines.

Here I've translated space characters to . and tab characters to >------- so it's readable. (:set list | set listchars=tab:>-,space:. in vim is helpful for this sort of thing.)

package.com.example;

class.Main.{
..public.static.void.main(String[].args).{
....var.string.=."""
..A
...........

>-------.B
>------->-------C
...D
.....""";
....for.(var.line.:.string.split("\n")).{
......System.out.println("'".+.line.+."'");
....}
..}
}

produces

'A'
''
''
'B'
'C'
' D'

@Conaclos
Copy link

Conaclos commented Aug 4, 2023

I think about a fourth: all trailing whitespaces are removed.

For instance, the following template (. denotes a trailing whitespace):

dedent`
function f(p) {
  if (p) {
    ${1}...
...
  }
}
`

produces the following string:

function f(p) {
  if (p) {
    1

  }
}

Otherwise, I found the first behavior (ignoring empty newlines and taking whitesapce-only lines into account) more consistent than the second behavior (ignoring both).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants