Thus far in the book, we’ve discussed entirely new language syntax, such as property value shorthands, arrow functions, destructuring, or generators; and entirely new built-ins, such as WeakMap
, Proxy
, or Symbol
. This chapter, on the other hand, is mostly devoted to existing built-ins that were improved in ES6. These improvements consist mostly of new instance methods, properties, and utility methods.
ES6 introduces numeric literal representations for binary and octal numbers.
Before ES6, your best bet when it comes to binary representation of integers was to just pass them to parseInt
with a radix of 2
.
parseInt('101', 2)
// <- 5
You can now use the new 0b
prefix to represent binary integer literals. You could also use the 0B
prefix, with a capital B
. The two notations are equivalent.
console.log(0b000) // <- 0
console.log(0b001) // <- 1
console.log(0b010) // <- 2
console.log(0b011) // <- 3
console.log(0b100) // <- 4
console.log(0b101) // <- 5
console.log(0b110) // <- 6
console.log(0b111) // <- 7
In ES3, parseInt
interpreted strings of digits starting with a 0
as an octal value. That meant things got weird quickly when you forgot to specify a radix of 10
. As a result, specifying the radix of 10
became a best practice, so that user input like 012
wouldn’t unexpectedly be parsed as the integer 10
.
console.log(parseInt('01'))
// <- 1
console.log(parseInt('012'))
// <- 10
console.log(parseInt('012', 10))
// <- 12
When ES5 came around, the default radix in parseInt
changed, from 8
to 10
. It was still recommended you specified a radix
for backwards compatibility purposes. If you wanted to parse strings as octal values, you could explicitly pass in a radix of 8
as the second argument.
console.log(parseInt('100', 8))
// <- 64
You can now use the 0o
prefix for octal literals, which are new in ES6. You could also use 0O
, which is equivalent. Having a 0
followed by an uppercase O
may be hard to distinguish in some typefaces, which is why it is suggested that you stick with the lowercase 0o
notation.
console.log(0o001) // <- 1
console.log(0o010) // <- 8
console.log(0o100) // <- 64
You might be used to hexadecimal literals present in other languages, commonly prefixed with 0x
. Those were already introduced to the JavaScript language in ES5. The prefix for literal hexadecimal notation is either 0x
, or 0X
, as shown in the following code snippet.
console.log(0x0ff) // <- 255
console.log(0xf00) // <- 3840
Besides these minor syntax changes where octal and binary literals were introduced, a few methods were added to Number
in ES6. The first four Number
methods that we’ll be discussing — Number.isNaN
, Number.isFinite
, Number.parseInt
, and Number.parseFloat
— already existed as functions in the global namespace. In addition, the methods in Number
are slightly different in that they don’t coerce non-numeric values into numbers before producing a result.
This method is almost identical to the global isNaN
method. Number.isNaN
returns whether the provided value
is NaN
, whereas isNaN
returns whether value
is not a number. These two questions have slightly different answers.
The next snippet quickly shows that, when passed to Number.isNaN
, anything that’s not NaN
will return false
, while NaN
will produce true
. Note how in the last case we’re already passing NaN
to Number.isNaN
, as that’s the result of dividing two strings.
Number.isNaN(123)
// <- false, integers are not NaN
Number.isNaN(Infinity)
// <- false, Infinity is not NaN
Number.isNaN('a hundred')
// <- false, 'a hundred' is not NaN
Number.isNaN(NaN)
// <- true, NaN is NaN
Number.isNaN('a hundred' / 'two')
// <- true, 'a hundred' / 'two' is NaN, NaN is NaN
The isNaN
method, in contrast, casts non-numeric values passed to it before evaluating them against NaN
. This results in significantly different return values. In the following example, each alternative produces different results because isNaN
, unlike Number.isNaN
, casts the value
passed to it through Number
first.
isNaN('a hundred')
// <- true, because Number('a hundred') is NaN
isNaN(new Date())
// <- false, because Number(new Date()) uses Date#valueOf, which returns a unix timestamp
Number.isNaN
is more precise than its global counterpart, because it doesn’t involve casting. There’s still a few reasons why Number.isNaN
can be a source of confusion.
First off, isNaN
casts input through Number(value)
before comparison, while Number.isNaN
doesn’t. Neither Number.isNaN
nor isNaN
answer the "is this not a number?" question, but instead they answer whether value
— or Number(value)
— is NaN
.
In most cases, what you actually want to know is whether a value identifies as a number — typeof NaN === 'number'
— and is a number. The isNumber
function in the following code snippet does just that. Note that it’d work with both isNaN
and Number.isNaN
due to type checking. Everything that reports a typeof
value of 'number'
is a number, except for NaN
, so we filter out those out as false positive results.
function isNumber(value) {
return typeof value === 'number' && !Number.isNaN(value)
}
You can use that method to figure out whether a value
is a number or not. In the next snippet there’s a few examples of how isNumber
works.
isNumber(1)
// <- true
isNumber(Infinity)
// <- true
isNumber(NaN)
// <- false
isNumber('two')
// <- false
isNumber(new Date())
// <- false
There is a function, that was already in the language, that somewhat resembles our custom isNumber
function: isFinite
.
The rarely-promoted isFinite
method has been available since ES3. It returns a boolean value indicating whether the provided value
matches none of: Infinity
, -Infinity
, and NaN
.
The isFinite
method coerces values through Number(value)
, while Number.isFinite
doesn’t. This means that values that can be coerced into non-NaN
numbers will be considered finite numbers by isNumber
— even though they aren’t explicit numbers.
Here are a few examples using the global isFinite
function.
isFinite(NaN)
// <- false
isFinite(Infinity)
// <- false
isFinite(-Infinity)
// <- false
isFinite(null)
// <- true, because Number(null) is 0
isFinite(-13)
// <- true, because Number(-13) is -13
isFinite('10')
// <- true, because Number('10') is 10
Using Number.isFinite
is a safer bet, as it doesn’t incur in unexpected casting. You could always use Number.isFinite(Number(value))
if you did want the value
to be cast into its numeric representation. Separating the two aspects, casting versus computing, results in more explicit code.
Here are a few examples using the Number.isFinite
method.
Number.isFinite(NaN)
// <- false
Number.isFinite(Infinity)
// <- false
Number.isFinite(-Infinity)
// <- false
Number.isFinite(null)
// <- false, because null is not a number
Number.isFinite(-13)
// <- true
Number.isFinite('10')
// <- false, because '10' is not a number
Creating a ponyfill[1] for Number.isFinite
would involve returning false
for non-numeric values, effectively turning off the type-casting feature, and then calling isFinite
on the input value.
function numberIsFinite(value) {
return typeof value === 'number' && isFinite(value)
}
The Number.parseInt
method works the same as parseInt
. It is, in fact, the same.
console.log(Number.parseInt === parseInt)
// <- true
The parseInt
function has support for hexadecimal literal notation in strings. Specifying the radix
is not even necessary: based on the 0x
prefix, parseInt
infers that the number must be base 16.
parseInt('0xf00')
// <- 3840
parseInt('0xf00', 16)
// <- 3840
If you provided another radix
, parseInt
would bail after the first non-digit character.
parseInt('0xf00', 10)
// <- 0
parseInt('5xf00', 10)
// <- 5, illustrating there's no special treatment here
While parseInt
accepts input in hexadecimal literal notation strings, its interface hasn’t changed in ES6. Therefore, binary and octal literal notation strings won’t be interpreted as such. This introduces a new inconsistency in ES6, where parseInt
understands 0x
, but not 0b
nor 0o
.
parseInt('0b011')
// <- 0
parseInt('0b011', 2)
// <- 0
parseInt('0o100')
// <- 0
parseInt('0o100', 8)
// <- 0
It’s up to you to drop the prefix before parseInt
, if you wanted to use parseInt
to read these literals. You’ll also need to specify the corresponding radix
of 2 for binary numbers or 8 for octals.
parseInt('0b011'.slice(2), 2)
// <- 3
parseInt('0o110'.slice(2), 8)
// <- 72
In contrast, the Number
function is perfectly able to cast these strings into the correct numbers.
Number('0b011')
// <- 3
Number('0o110')
// <- 72
Like parseInt
, parseFloat
was added to Number
without any modifications whatsoever.
console.log(Number.parseFloat === parseFloat)
// <- true
Luckily, parseFloat
didn’t have any special behavior with regard to hexadecimal literal strings, meaning that Number.parseFloat
is unlikely to introduce any confusion.
The parseFloat
function was added to Number
for completeness. In future versions of the language, there will be less global namespace pollution. When a function serves a specific purpose, it’ll be added to the relevant built-in, rather than as a global.
This is a new method coming in ES6, and it wasn’t previously available as a global function. The isInteger
method returns true
if the provided value
is a finite number that doesn’t have a decimal part.
console.log(Number.isInteger(Infinity)) // <- false
console.log(Number.isInteger(-Infinity)) // <- false
console.log(Number.isInteger(NaN)) // <- false
console.log(Number.isInteger(null)) // <- false
console.log(Number.isInteger(0)) // <- true
console.log(Number.isInteger(-10)) // <- true
console.log(Number.isInteger(10.3)) // <- false
You might want to consider the following code snippet as a ponyfill for Number.isInteger
. The modulus operator returns the remainder of dividing the same operands. If we divide by one, we’re effectively getting the decimal part. If that’s 0
, then it means the number is an integer.
function numberIsInteger(value) {
return Number.isFinite(value) && value % 1 === 0
}
Next up we’ll dive into floating point arithmetic, which is well-documented as having interesting corner cases.
The EPSILON
property is a new constant value being added to the Number
built-in. The following snippet shows its value.
Number.EPSILON
// <- 2.220446049250313e-16
Number.EPSILON.toFixed(20)
// <- '0.00000000000000022204'
Let’s take a look at the canonical example of floating point arithmetic.
0.1 + 0.2
// <- 0.30000000000000004
0.1 + 0.2 === 0.3
// <- false
What’s the margin of error in this operation? Let’s move the operands around and find out.
0.1 + 0.2 - 0.3
// <- 5.551115123125783e-17
5.551115123125783e-17.toFixed(20)
// <- '0.00000000000000005551'
We could use Number.EPSILON
to figure out whether the difference is small enough to be negligible, Number.EPSILON
denotes a safe margin of error for floating point arithmetic rounding operations.
5.551115123125783e-17 < Number.EPSILON
// <- true
The following piece of code can be used to figure out out whether the result of a floating point operation is within the expected margin of error. We use Math.abs
, because that way the order of left
and right
won’t matter. In other words, withinMarginOfError(left, right)
will produce the same result as withinMarginOfError(right, left)
.
function withinMarginOfError(left, right) {
return Math.abs(left - right) < Number.EPSILON
}
The next snippet shows withinMarginOfError
in action.
withinMarginOfError(0.1 + 0.2, 0.3)
// <- true
withinMarginOfError(0.2 + 0.2, 0.3)
// <- false
Using floating point representation, not every integer can be represented precisely.
This is the largest integer that can be safely and precisely represented in JavaScript, or any language that represents integers using floating point as specified by the IEEE-754 standard[2], for that matter. The next bit of code shows exactly how large Number.MAX_SAFE_INTEGER
is.
Number.MAX_SAFE_INTEGER === Math.pow(2, 53) - 1
// <- true
Number.MAX_SAFE_INTEGER === 9007199254740991
// <- true
As you might expect, there’s also the opposite constant: the minimum. It’s the negative value of Number.MAX_SAFE_INTEGER
.
Number.MIN_SAFE_INTEGER === -Number.MAX_SAFE_INTEGER
// <- true
Number.MIN_SAFE_INTEGER === -9007199254740991
// <- true
Floating point arithmetic becomes unreliable beyond the [MIN_SAFE_INTEGER, MAX_SAFE_INTEGER]
range. The 1 === 2
statement evaluates to false
, because these are different values. If we add Number.MAX_SAFE_INTEGER
to each operand, however, it’d seem 1 === 2
is indeed true.
1 === 2
// <- false
Number.MAX_SAFE_INTEGER + 1 === Number.MAX_SAFE_INTEGER + 2
// <- true
Number.MIN_SAFE_INTEGER - 1 === Number.MIN_SAFE_INTEGER - 2
// <- true
When it comes to checking whether an integer is safe, a Number.isSafeInteger
function has been added to the language.
This method returns true
for any integer in the [MIN_SAFE_INTEGER, MAX_SAFE_INTEGER]
range. Like with other Number
methods introduced in ES6, there’s no type coercion involved. The input must be numeric, an integer, and within the aforementioned bounds in order for the method to return true
. The next snippet shows a comprehensive set of inputs and outputs.
Number.isSafeInteger('one') // <- false
Number.isSafeInteger('0') // <- false
Number.isSafeInteger(null) // <- false
Number.isSafeInteger(NaN) // <- false
Number.isSafeInteger(Infinity) // <- false
Number.isSafeInteger(-Infinity) // <- false
Number.isSafeInteger(Number.MIN_SAFE_INTEGER - 1) // <- false
Number.isSafeInteger(Number.MIN_SAFE_INTEGER) // <- true
Number.isSafeInteger(1) // <- true
Number.isSafeInteger(1.2) // <- false
Number.isSafeInteger(Number.MAX_SAFE_INTEGER) // <- true
Number.isSafeInteger(Number.MAX_SAFE_INTEGER + 1) // <- false
When we want to verify if the result of an operation is within bounds, we must verify not only the result but also both operands[3]. One — or both — of the operands may be out of bounds, while the result is within bounds but incorrect. Similarly, the result may be out of bounds even if both operands are within bounds. Checking all of left
, right
, and the result of left op right
is, thus, necessary to verify that we can indeed trust the result.
In the following example both operands are within bounds, but the result is incorrect.
Number.isSafeInteger(9007199254740000)
// <- true
Number.isSafeInteger(993)
// <- true
Number.isSafeInteger(9007199254740000 + 993)
// <- false
9007199254740000 + 993
// <- 9007199254740992, should be 9007199254740993
Certain operations and numbers, such as the following code snippet, may return correct results even when operands are out of bounds. The fact that correct results can’t be guaranteed, however, means that these operations can’t be trusted.
9007199254740000 + 994
// <- 9007199254740994
In the next example, one of the operands is out of bounds, and thus we can’t trust the result to be accurate.
Number.isSafeInteger(9007199254740993)
// <- false
Number.isSafeInteger(990)
// <- true
Number.isSafeInteger(9007199254740993 + 990)
// <- false
9007199254740993 + 990
// <- 9007199254741982, should be 9007199254741983
A subtraction in our last example would produce a result that is within bounds, but that result would also be inaccurate.
Number.isSafeInteger(9007199254740993)
// <- false
Number.isSafeInteger(990)
// <- true
Number.isSafeInteger(9007199254740993 - 990)
// <- true
9007199254740993 - 990
// <- 9007199254740002, should be 9007199254740003
If both operands are out of bounds, the output could end up in the safe space, even though the result is incorrect.
Number.isSafeInteger(9007199254740995)
// <- false
Number.isSafeInteger(9007199254740993)
// <- false
Number.isSafeInteger(9007199254740995 - 9007199254740993)
// <- true
9007199254740995 - 9007199254740993
// <- 4, should be 2
We can conclude that the only safe way to assert whether an operation produces correct output is with a utility function such as the one below. If we can’t ascertain that the operation and both operands are within bounds, then the result may be inaccurate, and that’s a problem. It’s best to throw
in those situations and have a way to error-correct, but that’s specific to your programs. The important part is to actually catch these kinds of difficult bugs to deal with.
function safeOp(result, ...operands) {
const values = [result, ...operands]
if (!values.every(Number.isSafeInteger)) {
throw new RangeError('Operation cannot be trusted!')
}
return result
}
You could use safeOp
to ensure all operands, including the result
are safely within bounds.
safeOp(9007199254740000 + 993, 9007199254740000, 993)
// <- RangeError: Operation cannot be trusted!
safeOp(9007199254740993 + 990, 9007199254740993, 990)
// <- RangeError: Operation cannot be trusted!
safeOp(9007199254740993 - 990, 9007199254740993, 990)
// <- RangeError: Operation cannot be trusted!
safeOp(9007199254740993 - 9007199254740995, 9007199254740993, 9007199254740995)
// <- RangeError: Operation cannot be trusted!
safeOp(1 + 2, 1, 2)
// <- 3
That’s all there is when it comes to Number
, but we’re not done with arithmetics-related improvements quite yet. Let’s turn our attention to the Math
built-in.
ES6 introduces heaps of new static methods to the Math
built-in. Some of them were specifically engineered towards making it easier to compile C into JavaScript, and you’ll seldom need them for day-to-day JavaScript application development. Others are complements to the existing rounding, exponentiation, and trigonometry API surface.
Let’s get right to it.
Many languages have a mathematical sign
method that returns a vector (-1
, 0
, or 1
) representation for the sign of the provided input. JavaScript’s Math.sign
method does exactly that. However, the JavaScript flavor of this method has two more possible return values: -0
, and NaN
. Check out the examples in the following code snippet.
Math.sign(1) // <- 1
Math.sign(0) // <- 0
Math.sign(-0) // <- -0
Math.sign(-30) // <- -1
Math.sign(NaN) // <- NaN
Math.sign('one') // <- NaN, because Number('one') is NaN
Math.sign('0') // <- 0, because Number('0') is 0
Math.sign('7') // <- 1, because Number('7') is 7
Note how Math.sign
casts its input into numeric values? While methods introduced to the Number
built-in don’t cast their input via Number(value)
, most of the methods added to Math
share this trait, as we shall see.
We already had Math.floor
and Math.ceil
in JavaScript, with which we can round a number down or up, respectively. Now we also have Math.trunc
as an alternative, which discards the decimal part without any rounding. Here, too, the input is coerced into a numeric value through Number(value)
.
Math.trunc(12.34567) // <- 12
Math.trunc(-13.58) // <- -13
Math.trunc(-0.1234) // <- -0
Math.trunc(NaN) // <- NaN
Math.trunc('one') // <- NaN, because Number('one') is NaN
Math.trunc('123.456') // <- 123, because Number('123.456') is 123.456
Creating a simple ponyfill for Math.trunc
would involve checking whether the value is greater than zero and applying one of Math.floor
or Math.ceil
, as shown in the following code snippet.
function mathTrunc(value) {
return value > 0 ? Math.floor(value) : Math.ceil(value)
}
The Math.cbrt
method is short for "cubic root", similarly to how Math.sqrt
is short for "square root". The following snippet has a few usage examples.
Math.cbrt(-1) // <- -1
Math.cbrt(3) // <- 1.4422495703074083
Math.cbrt(8) // <- 2
Math.cbrt(27) // <- 3
Note that this method also coerces non-numerical values into numbers.
Math.cbrt('8') // <- 2, because Number('8') is 8
Math.cbrt('one') // <- NaN, because Number('one') is NaN
Let’s move on.
This operation is the result of computing e
to the value
minus 1
. In JavaScript, the e
constant is defined as Math.E
. The function in the following snippet is a rough equivalent of Math.expm1
.
function expm1(value) {
return Math.pow(Math.E, value) - 1
}
The <code>e<sup>value</sup></code> operation can be expressed as Math.exp(value)
as well.
function expm1(value) {
return Math.exp(value) - 1
}
Note that Math.expm1
has higher precision than merely doing Math.exp(value) - 1
, and should be the preferred alternative.
expm1(1e-20)
// <- 0
Math.expm1(1e-20)
// <- 1e-20
expm1(1e-10)
// <- 1.000000082740371e-10
Math.expm1(1e-10)
// <- 1.00000000005e-10
The inverse function of Math.expm1
is Math.log1p
.
This is the natural logarithm of value
plus 1
, — <code><em>ln</em>(value + 1)</code> — and the inverse function of Math.expm1
. The base e
logarithm of a number can be expressed as Math.log
in JavaScript.
function log1p(value) {
return Math.log(value + 1)
}
Just like with Math.expm1
, Math.log1p
method is more precise than executing the Math.log(value + 1)
operation by hand.
log1p(1.00000000005e-10)
// <- 1.000000082690371e-10
Math.log1p(1.00000000005e-10)
// <- 1e-10, exactly the inverse of Math.expm1(1e-10)
Base ten logarithm of a number — <code><em>log</em><sub>10</sub>(value)</code>.
Math.log10(1000)
// <- 3
You could ponyfill Math.log10
using the Math.LN10
constant.
function mathLog10(value) {
return Math.log(x) / Math.LN10
}
And then there’s Math.log2
.
Base two logarithm of a number — <code><em>log</em><sub>2</sub>(value)</code>.
Math.log2(1024)
// <- 10
You could ponyfill Math.log2
using the Math.LN2
constant.
function mathLog2(value) {
return Math.log(x) / Math.LN2
}
Note that the ponyfill version won’t be as precise as Math.log2
, as demonstrated in the following example.
Math.log2(1 << 29) // native implementation
// <- 29
mathLog2(1 << 29) // ponyfill implementation
// <- 29.000000000000004
The <<
operator performs a "bitwise left shift"[4]. In this operation, the bits on the binary representation of the left hand side number are shifted as many places to the left as indicated in the right hand side of the operation. The following couple of examples show how shifting works, using the binary literal notation introduced in Binary and Octal Literals.
0b00000001 // 1
0b00000001 << 2 // shift bits two places to the left
0b00000100 // 4
0b00001101 // 1
0b00001101 << 4 // shift bits four places to the left
0b11010000 // 208
The Math
object is getting trigonometric functions in ES6.
-
Math.sinh(value)
returns the hyperbolic sine ofvalue
-
Math.cosh(value)
returns the hyperbolic cosine ofvalue
-
Math.tanh(value)
returns the hyperbolic tangent ofvalue
-
Math.asinh(value)
returns the hyperbolic arc-sine ofvalue
-
Math.acosh(value)
returns the hyperbolic arc-cosine ofvalue
-
Math.atanh(value)
returns the hyperbolic arc-tangent ofvalue
Using Math.hypot
returns the square root of the sum of the squares of every provided argument.
Math.hypot(1, 2, 3)
// <- 3.741657386773941, the square root of (1*1 + 2*2 + 3*3)
We could ponyfill Math.hypot
by performing these operations manually. We can use Math.sqrt
to compute the square root and Array#reduce
, combined with the spread operator, to sum the squares.[5]
function mathHypot(...values) {
const accumulateSquares (total, value) => total + value * value
const squares = values.reduce(accumulateSquares, 0)
return Math.sqrt(squares)
}
Our handmade function is, surprisingly, more precise than the native one for this particular use case. In the next code sample, we see the hand-rolled hypot
function offers precision with one more decimal place.
Math.hypot(1, 2, 3) // native implementation
// <- 3.741657386773941
mathHypot(1, 2, 3) // ponyfill implementation
// <- 3.7416573867739413
At the beginning of Math, we talked about how some of the new Math
methods are specifically engineered towards making it easier to compile C into JavaScript. Those are the last three methods we’ll cover, and they help us deal with 32-bit numbers.
The name for this method is an acronym for "count leading zero bits in 32-bit binary representations of a number". Keeping in mind that the <<
operator performs a "bitwise left shift"[4], let’s take a look at the next code snippet describing sample input and output for Math.clz32
.
Math.clz32(0) // <- 32
Math.clz32(1) // <- 31
Math.clz32(1 << 1) // <- 30
Math.clz32(1 << 2) // <- 29
Math.clz32(1 << 29) // <- 2
Math.clz32(1 << 31) // <- 0
You may recall template literals from [template_literals], and how those can be used to mix strings and variables, or any valid JavaScript expression, to produce string output.
function greet(name) {
return `Hello, ${ name }!`
}
greet('Gandalf')
// <- 'Hello, Gandalf!'
Besides the template literal syntax, strings got a number of new methods in ES6. These can be categorized as string manipulation methods and Unicode related methods. Let’s start with the former.
Prior to ES6, whenever we wanted to check if a string begins with a certain other string, we’d use the String#indexOf
method, as shown in the following code snippet. A result of 0
means that the string starts with the provided value.
'hello gary'.indexOf('gary')
// <- 6
'hello gary'.indexOf('hello')
// <- 0
'hello gary'.indexOf('stephan')
// <- -1
If you wanted to check if a string started with another one, then, you’d compare them with String#indexOf
and check whether the lookup value is found at the beginning of the string: the 0
index.
'hello gary'.indexOf('gary') === 0
// <- false
'hello gary'.indexOf('hello') === 0
// <- true
'hello gary'.indexOf('stephan') === 0
// <- false
You can now use the String#startsWith
method instead, avoiding the unnecessary complexity of checking whether an index matches 0
.
'hello gary'.startsWith('gary')
// <- false
'hello gary'.startsWith('hello')
// <- true
'hello gary'.startsWith('stephan')
// <- false
In order to figure out whether a string contains a value starting at an specific index, using String#indexOf
, we would have to grab a slice of that string first.
'hello gary'.slice(6).indexOf('gary') === 0
// <- true
We can’t simply check whether the index is 6
, because that this would give you false negatives when the queried value is found before reaching that index of 6
. The following example shows how, even when the query 'ell'
string is indeed at index 6
, merely comparing the String#indexOf
result with 6
is insufficient to attain a correct result.
'hello ell'.indexOf('ell') === 6
// <- false, because the result was 1
We could use the startIndex
parameter for indexOf
to get around this problem without relying on String#slice
. Note that we’re still comparing against 6
in this case, because the string wasn’t sliced up in a setup operation.
'hello ell'.indexOf('ell', 6) === 6
// <- true
Instead of keeping all of these string searching implementation details in your head and writing code that’s most concerned with how to search, as opposed to what is being searched, we could use String#startsWith
passing in the optional startIndex
parameter as well.
'hello ell'.startsWith('ell', 6)
// <- true
This method mirrors String#startsWith
in the same way that String#lastIndexOf
mirrors String#indexOf
. It tells us whether a string ends with another string.
'hello gary'.endsWith('gary')
// <- true
'hello gary'.endsWith('hello')
// <- false
As the opposite of String#startsWith
, there’s a position index that indicates where the lookup should end, instead of where it should start. It defaults to the length of the string.
'hello gary'.endsWith('gary', 10)
// <- true
'hello gary'.endsWith('gary', 9)
// <- false, it ends with 'gar' in this case
'hello gary'.endsWith('hell', 4)
// <- true
String#includes
is one last method that can simplify a specific use case for String#indexOf
.
You can use String#includes
to figure out whether a string contains another one, as shown in the following piece of code.
'hello gary'.includes('hell')
// <- true
'hello gary'.includes('ga')
// <- true
'hello gary'.includes('rye')
// <- false
This is equivalent to the ES5 use case of String#indexOf
where we’d test the result against -1
, checking to see whether the search string was anywhere to be found, as demonstrated in the next code snippet.
'hello gary'.indexOf('ga') !== -1
// <- true
'hello gary'.indexOf('rye') !== -1
// <- false
You can also provide String#includes
with a start index where searching should begin.
'hello gary'.includes('ga', 4)
// <- true
'hello gary'.includes('ga', 7)
// <- false
Let’s move onto something that’s not just an String#indexOf
alternative.
This handy method allows you to repeat a string count
times.
'ha'.repeat(1)
// <- 'ha'
'ha'.repeat(2)
// <- 'haha'
'ha'.repeat(5)
// <- 'hahahahaha'
'ha'.repeat(0)
// <- ''
The provided count
should be a non-negative finite number.
'ha'.repeat(Infinity)
// <- RangeError
'ha'.repeat(-1)
// <- RangeError
Decimal values are floored to the nearest integer.
'ha'.repeat(3.9)
// <- 'hahaha', count was floored to 3
Using NaN
is interpreted as a count
of 0
.
'ha'.repeat(NaN)
// <- ''
Non-numeric values are coerced into numbers.
'ha'.repeat('ha')
// <- ', because Number('ha') is NaN
'ha'.repeat('3')
// <- 'hahaha', because Number('3') is 3
Values in the (-1, 0)
range are rounded to -0
becase count
is passed through ToInteger
, as documented by the specification[6]. That step in the specification dictates that count
be casted with a formula like the one in the next code snippet.
function ToInteger(number) {
return Math.floor(Math.abs(number)) * Math.sign(number)
}
The ToInteger
function translates any values in the (-1, 0)
range into -0
. As a result, when passed to String#repeat
, numbers in the (-1, 0)
range will be treated as zero, while numbers in the [-1, -Infinity)
range will result an exception, as we learned earlier.
'na'.repeat(-0.1)
// <- ', because count was rounded to -0
'na'.repeat(-0.9)
// <- ', because count was rounded to -0
'na'.repeat(-0.9999)
// <- ', because count was rounded to -0
'na'.repeat(-1)
// <- Uncaught RangeError: Invalid count value
An example use case for String#repeat
may be the typical padding function. The indent
function in the next code snippet takes a multiline string and indents every line with as many spaces
as desired, using a default of two spaces.
function indent(text, spaces = 2) {
return text
.split('\n')
.map(line => ' '.repeat(spaces) + line)
.join('\n')
}
indent(`a
b
c`, 2)
// <- ' a\n b\n c'
At the time of this writing, there’s two new string padding methods slated for publication in ES2017: String#padStart
and String#padEnd
. Using these methods, we wouldn’t have to implement something like indent
in the previous code snippet. When performing string manipulation, we often want to pad a string so that it’s formatted consistently with a style we have in mind. This can be useful when formatting numbers, currency, HTML, and in a variety of other cases usually involving monospaced text.
Using padStart
, we will specify the desired length for the target string and the padding string, which defaults to a single space character. If the original string is at least as long as the specified length, padStart
will result in a null operation, returning the original string unchanged.
In the following example, the desired length of a properly padded string is 5, and the original string already has a length of at least 5, so it’s returned unchanged.
'01.23'.padStart(5)
// <- '01.23'
In the next example, the original string has a length of 4, thus padStart
adds a single space at the beginning of the string, bringing the length to the desired value of 5.
'1.23'.padStart(5)
// <- ' 1.23'
The next example is just like the previous one, except it uses '0'
for padding instead of the default ' '
value.
'1.23'.padStart(5, '0')
// <- '01.23'
Note that padStart
will keep padding the string until the maximum length is reached.
'1.23'.padStart(7, '0')
// <- '0001.23'
However, if the padding string is too long, it may be truncated. The provided length is the maximum length of the padded string, except in the case where the original string is already larger than that.
'1.23'.padStart(7, 'abcdef')
// <- 'abc1.23'
The padEnd
method has a similar API, but it adds the padding at the end of the original string, instead of at the beginning. The following snippet illustrates the difference.
'01.23'.padEnd(5) // <- '01.23'
'1.23'.padEnd(5) // <- '1.23 '
'1.23'.padEnd(5, '0') // <- '1.230'
'1.23'.padEnd(7, '0') // <- '1.23000'
'1.23'.padEnd(7, 'abcdef') // <- '1.23abc'
At the time of this writing, there’s a proposal for string trimming in stage 2, containing the String#trimStart
and String#trimEnd
methods. Using trimStart
removes any whitespace from the beginning of a string, while using trimEnd
removes any whitespace from the end of a string.
' this should be left-aligned '.trimStart()
// <- 'this should be left-aligned '
' this should be right-aligned '.trimEnd()
// <- ' this should be right-aligned'
Let’s switch protocols and learn about Unicode.
JavaScript strings are represented using UTF-16 code units[7]. Each code unit can be used to represent a code point in the [U+0000, U+FFFF]
range — also known as the BMP, short for Basic Multilingual Plane. You can represent individual code points in the BMP plane using the '\u3456'
syntax. You could also represent code units in the [U+0000, U+0255]
range using the \x00..\xff
notation. For instance, '\xbb'
represents '»'
, the U+00BB
code point, as you can also verify by doing String.fromCharCode(0xbb)
.
For code points beyond U+FFFF
, you’d represent them as a surrogate pair. That is to say, two contiguous code units. For instance, the horse emoji ''
code point is represented with the '\ud83d\udc0e'
contiguous code units. In ES6 notation you can also represent code points using the '\u{1f40e}'
notation (that example is also the horse emoji).
Note that the internal representation hasn’t changed, so there’s still two code units behind that single code point. In fact, '\u{1f40e}'.length
evaluates to 2
, one for each code unit.
The '\ud83d\udc0e\ud83d\udc71\u2764'
string, found in the next code snippet, evaluates to a few emoji.
'\ud83d\udc0e\ud83d\udc71\u2764' // <- ''
While that string consists of 5 code units, we know that the length should really be 3 — as there’s only 3 emoji.
'\ud83d\udc0e\ud83d\udc71\u2764'.length // <- 5 ''.length // <- 5, still
Counting code points before ES6 was tricky, as the language didn’t make an effort to help in the Unicode department. Take for instance Object.keys
, as seen in the following code snippet. It returns five keys for our 3-emoji string, because those 3 code points use 5 code units in total.
Object.keys('') // <- ['0', '1', '2', '3', '4']
If we now consider a for
loop, we can observe more clearly how this is a problem. In the following example, we wanted to extract each individual emoji from the text
string, but we got each code unit instead of the code points they form.
const text = '' for (let i = 0; i < text.length; i++) { console.log(text[i]) // <- '\ud83d' // <- '\udc0e' // <- '\ud83d' // <- '\udc71' // <- '\u2764' }
Luckily for us, in ES6 strings adhere to the iterable protocol. We can use the string iterator to go over code points, even when those code points are made of surrogate pairs.
Given the problems with looping by code units, the iterables produced by the string iterator yield code points instead.
for (const codePoint of '') { console.log(codePoint) // <- '' // <- '' // <- '' }
Measuring the length of a string in terms of code points, as we saw earlier, is impossible with String#length
, because it counts code units instead. We can, however, use an iterator to split the string into its code points, like we did in the for..of
example.
We could use the spread operator, which relies on the iterator protocol, to split an string into an array made up of its conforming code points and then pull that array’s length
, getting the correct code point count, as seen next.
[...''].length // <- 3
Keep in mind that splitting strings into code points isn’t enough if you want to be 100% precise about string length. Take for instance the combining overline Unicode code unit, represented with \u0305
. On its own, this code unit is just an overline, as shown below.
'\u0305'
// <- ' ̅'
When preceded by another code unit, however, they are combined together into a single glyph.
function overlined(text) {
return '${ text }\u0305'
}
overlined('o')
// <- 'o̅'
'hello world'.split('').map(overlined).join('')
// <- 'h̅e̅l̅l̅o̅ ̅w̅o̅r̅l̅d̅'
Attempts to näively figure out the actual length by counting code points prove insufficient, just like when using String#length
to count code points, as shown next.
'o̅'.length
// <- 2
[...'o̅'].length
// <- 2, should be 1
[...'h̅e̅l̅l̅o̅ ̅w̅o̅r̅l̅d̅'].length
// <- 22, should be 11
[...'h̅e̅l̅l̅o̅ world'].length
// <- 16, should be 11
As Unicode expert Mathias Bynens points out, splitting by code points isn’t enough. Unlike surrogate pairs like the emojis we’ve used in our earlier examples, other grapheme clusters aren’t taken into account by the string iterator[8]. In those cases we’re out of luck, and have to fall back to regular expressions or utility libraries to correctly calculate string length.
Multiple code points that combine into a single visual glyph are getting more common.[9] There is a new proposal in the works (currently in stage 2) that may settle the matter of iterating over grapheme clusters once and for all. It introduces an Intl.Segmenter
built-in, which can be used to split a string into an iterable sequence.
To use the Segmenter
API, we start by creating an instance of Intl.Segmenter
specifying a locale and the granularity level we want: per grapheme, word, sentence or line. The segmenter instance can be used to produce an iterator for any given string, splitting it by the specified granularity
. Note that the segmenting algorithm may vary depending on the locale, which is why it is a part of the API.
The following example defines a getGraphemes
function which produces an array of grapheme clusters for any given locale and piece of text.
function getGraphemes(locale, text) {
const segmenter = new Intl.Segmenter(locale, { granularity: 'grapheme' })
const sequence = segmenter.segment(text)
const graphemes = [...sequence].map(item => item.segment)
return graphemes
}
getGraphemes('es', 'Esto está bien bueno!')
Using the segmenter proposal, we wouldn’t have any trouble splitting strings containing emoji or other combining code units. You can learn more about the Segmenter
proposal at: https://mjavascript.com/out/segmenter.
Let’s look at more Unicode-related methods introduced in ES6.
We can use String#codePointAt
to get the numeric representation of a code point at a given position in a string. Note that the expected start position is indexed by code unit, not by code point. In the example below we print the code points for each of the three emoji in our demo ''
string.
const text = '\ud83d\udc0e\ud83d\udc71\u2764'
text.codePointAt(0)
// <- 0x1f40e
text.codePointAt(2)
// <- 0x1f471
text.codePointAt(4)
// <- 0x2764
Identifying the indices that need to be provided to String#codePointAt
may prove cumbersome, which is why you should instead loop through a string iterator that can identify them on your behalf. You can then call .codePointAt(0)
for each code point in the sequence, and 0
will always be the correct start index.
const text = '\ud83d\udc0e\ud83d\udc71\u2764'
for (const codePoint of text) {
console.log(codePoint.codePointAt(0))
// <- 0x1f40e
// <- 0x1f471
// <- 0x2764
}
We could also reduce our example to a single line of code by using a combination of the spread operator and Array#map
.
const text = '\ud83d\udc0e\ud83d\udc71\u2764'
[...text].map(cp => cp.codePointAt(0))
// <- [0x1f40e, 0x1f471, 0x2764]
You can take the base-16 representation of those base-10 code points, and use them to create a string with the new Unicode code point escape syntax of \u{codePoint}
. This syntax allows you to represent Unicode code points that are beyond the BMP. That is, code points outside the [U+0000, U+FFFF]
range that are typically represented using the \u1234
syntax.
Let’s start by updating our example to print the hexadecimal version of our code points.
const text = '\ud83d\udc0e\ud83d\udc71\u2764'
[...text].map(cp => cp.codePointAt(0).toString(16))
// <- ['1f40e', '1f471', '2764']
We could wrap those base-16 values in '\u{codePoint}'
and voilá: you’d get the emoji values once again.
'\u{1f40e}' // <- '' '\u{1f471}' // <- '' '\u{2764}' // <- ''
This method takes in a number and returns a code point. Note how I can use the 0x
prefix with the terse base-16 code points we got from String#codePointAt
moments ago.
String.fromCodePoint(0x1f40e) // <- '' String.fromCodePoint(0x1f471) // <- '' String.fromCodePoint(0x2764) // <- ''
You can just as well use plain base-10 literals and achieve the same results.
String.fromCodePoint(128014) // <- '' String.fromCodePoint(128113) // <- '' String.fromCodePoint(10084) // <- ''
You can pass in as many code points as you’d like to String.fromCodePoint
.
String.fromCodePoint(0x1f40e, 0x1f471, 0x2764) // <- ''
As an exercise in futility, we could map a string to their numeric representation of code points, and back to the code points themselves.
const text = '\ud83d\udc0e\ud83d\udc71\u2764' [...text] .map(cp => cp.codePointAt(0)) .map(cp => String.fromCodePoint(cp)) .join('') // <- ''
Reversing an string has potential to cause issues as well.
Consider the following piece of code.
const text = '\ud83d\udc0e\ud83d\udc71\u2764'
text.split('').map(cp => cp.codePointAt(0))
// <- [55357, 56334, 55357, 56433, 10084]
text.split('').reverse().map(cp => cp.codePointAt(0))
// <- [10084, 56433, 128014, 55357]
The problem is that we’re reversing individual code units, while we’d have to reverse code points for a correct solution. If, instead, we were to use the spread operator to split the string by its code points, and then reversed that, the code points would be preserved and the string would be properly reversed.
const text = '\ud83d\udc0e\ud83d\udc71\u2764' [...text].reverse().join('') // <- ''
This way we avoid breaking up code points. Once again, keep in mind that this won’t work for all grapheme clusters[8].
[...'hello\u0305'].reverse().join('')
// <- ` ̅olleh`
The last Unicode-related method we’ll be addressing is .normalize
.
There are different ways of representing strings that look identical to humans even though their code points differ. Consider the following example, where two seemingly identical strings aren’t deemed equal by any JavaScript runtime.[8]
'mañana' === 'mañana'
// <- false
What’s going on here? We have an ñ
on the left version, while the version on the right has a combining tilde character + ̃` and an n
. The two are visually identical, but if we take a look at the code points, we’ll notice they’re different.
[...'mañana'].map(cp => cp.codePointAt(0).toString(16))
// <- ['6d', '61', 'f1', '61', '6e', '61']
[...'mañana'].map(cp => cp.codePointAt(0).toString(16))
// <- ['6d', '61', '6e', '303', '61', '6e', '61']
Just like with the 'hello̅'
examples, the second string has a length of 7
, even though visually it is also 6
glyphs long.
[...'mañana'].length
// <- 6
[...'mañana'].length
// <- 7
If we normalize the second version, using String#normalize
, we’ll get back the same code points we had in the first version.
const normalized = 'mañana'.normalize()
[...normalized].map(cp => cp.codePointAt(0).toString(16))
// <- ['6d', '61', 'f1', '61', '6e', '61']
normalized.length
// <- 6
Note that we should use String#normalize
on both strings when comparing them if we want to test for equality.
function compare(left, right) {
return left.normalize() === right.normalize()
}
const normal = 'mañana'
const irregular = 'mañana'
normal === irregular
// <- false
compare(normal, irregular)
// <- true
In this section we’ll take a look at regular expressions in and after ES6. There’s a couple of regular expressions flags which were introduced in ES6: the /y
or sticky flag, and the /u
or Unicode flag. Then we’ll discuss five proposals which are making their way through the ECMAScript specification development process at TC39.
The sticky matching y
flag introduced in ES6 is similar to the global g
flag. Like global regular expressions, sticky ones are typically used to match several times until the input string is exhausted. Sticky regular expressions move lastIndex
to the position after the last match, just like global regular expressions. The only difference is that a sticky regular expression must start matching where the previous match left off, unlike global regular expressions that move onto the rest of the input string when the regular expression goes unmatched at any given position.
The following example illustrates the difference between the two. Given an input string like 'haha haha haha'
and the /ha/
regular expression, the global flag will match every occurrence of 'ha'
, while the sticky flag will only match the first two, since the third occurrence doesn’t match starting at index 4
, but rather at index 5
.
function matcher(regex, input) {
return () => {
const match = regex.exec(input)
const lastIndex = regex.lastIndex
return { lastIndex, match }
}
}
const input = 'haha haha haha'
const nextGlobal = matcher(/ha/g, input)
console.log(nextGlobal()) // <- { lastIndex: 2, match: ['ha'] }
console.log(nextGlobal()) // <- { lastIndex: 4, match: ['ha'] }
console.log(nextGlobal()) // <- { lastIndex: 7, match: ['ha'] }
const nextSticky = matcher(/ha/y, input)
console.log(nextSticky()) // <- { lastIndex: 2, match: ['ha'] }
console.log(nextSticky()) // <- { lastIndex: 4, match: ['ha'] }
console.log(nextSticky()) // <- { lastIndex: 0, match: null }
We can verify that the sticky matcher would work if we forcefully moved lastIndex
with the next piece of code.
const rsticky = /ha/y
const nextSticky = matcher(rsticky, input)
console.log(nextSticky()) // <- { lastIndex: 2, match: ['ha'] }
console.log(nextSticky()) // <- { lastIndex: 4, match: ['ha'] }
rsticky.lastIndex = 5
console.log(nextSticky()) // <- { lastIndex: 7, match: ['ha'] }
Sticky matching was added to JavaScript as a way of improving the performance of lexical analyzers in compilers, which heavily rely on regular expressions.
ES6 also introduced a u
flag. The u
stands for Unicode, but this flag can also be thought of as a more strict version of regular expressions.
Without the u
flag, the following snippet has a regular expression containing an 'a'
character literal that was unnecessarily escaped.
/\a/.test('ab')
// <- true
Using an escape sequence for an unreserved character such as a
in a regular expression with the u
flag results in an error, as shown in the following bit of code.
/\a/u.test('ab')
// <- SyntaxError: Invalid regular expression: /\a/: Invalid escape
The following example attempts to embed the horse emoji in a regular expression by way of the \u{1f40e}
notation which ES6 introduced for strings like '\u{1f40e}'
, but the regular expression fails to match against the horse emoji. Without the u
flag, the \u{…}
pattern is interpreted as having an unnecessarily escaped u
character followed by the rest of the sequence.
/\u{1f40e}/.test('') // <- false /\u{1f40e}/.test('u{1f40e}') // <- true
The u
flag introduces support for Unicode code point escapes, like the \u{1f40e}
horse emoji, within regular expressions.
/\u{1f40e}/u.test('') // <- true
Without the u
flag, the .
pattern matches any BMP symbol except for line terminators. The following example tests U+1D11E MUSICAL SYMBOL G CLEF
, an astral symbol that doesn’t match the dot pattern in plain regular expressions.
const rdot = /^.$/
rdot.test('a') // <- true
rdot.test('\n') // <- false
rdot.test('\u{1d11e}') // <- false
When using the u
flag, Unicode symbols that aren’t on the BMP are matched as well. The next snippet shows how the astral symbol matches when the flag is set.
const rdot = /^.$/u
rdot.test('a') // <- true
rdot.test('\n') // <- false
rdot.test('\u{1d11e}') // <- true
When the u
flag is set, similar Unicode awareness improvements can be found in quantifiers and character classes, both of which treat each Unicode code point as a single symbol, instead of matching on the first code unit only. Insensitive case matching with the i
flag performs Unicode case folding when the u
flag is set as well, which is used to normalize code points in both the input string and the regular expression.[10]
Up until now, JavaScript regular expressions could group matches in numbered capturing groups and non-capturing groups. In the next snippet we’re using a couple of groups to extract a key and value from an input string containing a key value pair delimited by '='
.
function parseKeyValuePair(input) {
const rattribute = /([a-z]+)=([a-z]+)/
const [, key, value] = rattribute.exec(input)
return { key, value }
}
parseKeyValuePair('strong=true')
// <- { key: 'strong', value: 'true' }
There’s also non-capturing groups, which are discarded and not present in the final result, but are still useful for matching. The following example supports input with key value pairs delimited by ' is '
in addition to '='
.
function parseKeyValuePair(input) {
const rattribute = /([a-z]+)(?:=|\sis\s)([a-z]+)/
const [, key, value] = rattribute.exec(input)
return { key, value }
}
parseKeyValuePair('strong is true') // <- { key: 'strong', value: 'true' }
parseKeyValuePair('flexible=too') // <- { key: 'flexible', value: 'too' }
While array destructuring in the previous example hid our code’s reliance on magic array indices, the fact remains that matches are placed in an ordered array regardless. The named capture groups proposal[11] (in stage 3 at the time of this writing) adds syntax like (?<groupName>)
to Unicode-aware regular expressions, where we can name capturing groups which are then returned in a groups
property of the returned match object. The groups
property can then be destructured from the resulting object when calling RegExp#exec
or String#match
.
function parseKeyValuePair(input) {
const rattribute = /(?<key>[a-z]+)(?:=|\sis\s)(?<value>[a-z]+)/u
const { groups } = rattribute.exec(input)
return groups
}
parseKeyValuePair('strong=true') // <- { key: 'strong', value: 'true' }
parseKeyValuePair('flexible=too') // <- { key: 'flexible', value: 'too' }
JavaScript regular expressions support backreferences, where captured groups can be reused to look for duplicates. The following snippet uses a backreference for the first capturing group to identify cases where a username is the same as a password in a piece of 'user:password'
input.
function hasSameUserAndPassword(input) {
const rduplicate = /([^:]+):\1/
return rduplicate.exec(input) !== null
}
hasSameUserAndPassword('root:root') // <- true
hasSameUserAndPassword('root:pF6GGlyPhoy1!9i') // <- false
The named capture groups proposal adds support for named backreferences, which refer back to named groups.
function hasSameUserAndPassword(input) {
const rduplicate = /(?<user>[^:]+):\k<user>/u
return rduplicate.exec(input) !== null
}
hasSameUserAndPassword('root:root') // <- true
hasSameUserAndPassword('root:pF6GGlyPhoy1!9i') // <- false
The \k<groupName>
reference can be used in tandem with numbered references, but the latter are better avoided when already using named references.
Lastly, named groups can be referenced from the replacement passed to String#replace
. In the next code snippet we use String#replace
and named groups to change an American date string to use Hungarian formatting.
function americanDateToHungarianFormat(input) {
const ramerican = /(?<month>\d{2})\/(?<day>\d{2})\/(?<year>\d{4})/u
const hungarian = input.replace(ramerican, '$<year>-$<month>-$<day>')
return hungarian
}
americanDateToHungarianFormat('06/09/1988')
// <- '1988-09-06'
If the second argument to String#replace
is a function, then the named groups can be accessed via a new parameter called groups
that is at the end of the parameter list. The signature for that function now is (match, …captures, groups)
. In the following example, note how we’re using a template literal that’s similar to the replacement string found in the last example. The fact that replacement strings follow a $<groupName>
syntax as opposed to a `${ groupName }` syntax means we can name groups in replacement strings without having to resort to escape codes if we were using template literals.
function americanDateToHungarianFormat(input) {
const ramerican = /(?<month>\d{2})\/(?<day>\d{2})\/(?<year>\d{4})/u
const hungarian = input.replace(ramerican, (match, capture1, capture2, capture3, groups) => {
const { month, day, year } = groups
return `${ year }-${ month }-${ day }`
})
return hungarian
}
americanDateToHungarianFormat('06/09/1988') // <- '1988-09-06'
The proposed Unicode property escapes[12] (currently in stage 3) are a new kind of escape sequence that’s available in regular expressions marked with the u
flag. This proposal adds a escape in the form of \p{LoneUnicodePropertyNameOrValue}
for binary Unicode properties and \p{UnicodePropertyName=UnicodePropertyValue}
for non-binary Unicode properties. In addition, \P
is the negated version of a \p
escape sequence.
The Unicode standard defines properties for every symbol. Armed with these properties, one may make advanced queries about Unicode characters. For example, symbols in the greek alphabet have a Script
property set to Greek
. We could use the new escapes to match any greek Unicode symbol.
function isGreekSymbol(input) {
const rgreek = /^\p{Script=Greek}$/u
return rgreek.test(input)
}
isGreekSymbol('π')
// <- true
Or, using \P
, we could match non-greek Unicode symbols.
function isNonGreekSymbol(input) {
const rgreek = /^\P{Script=Greek}$/u
return rgreek.test(input)
}
isNonGreekSymbol('π')
// <- false
When we need to match every Unicode decimal number symbol, and not just [0-9]
like \d
does, we could use \p{Decimal_Number}
as shown next.
function isDecimalNumber(input) {
const rdigits = /^\p{Decimal_Number}+$/u
return rdigits.test(input)
}
isDecimalNumber('𝟏𝟐𝟑𝟜𝟝𝟞𝟩𝟪𝟫𝟬𝟭𝟮𝟯𝟺𝟻𝟼')
// <- true
For an exhaustive overview of supported Unicode properties and values visit: https://mjavascript.com/out/unicode-property-list.
JavaScript has had positive lookahead assertions for a long time. That feature allows us to match an expression but only if it’s followed by another expression. These assertions are expressed as (?=…)
. Regardless of whether a lookahead assertion matches, the results of that match are discarded and no characters of the input string are consumed.
The following example uses a positive lookahead to test whether an input string has a sequence of letters followed by .js
, in which case it returns the filename without the .js
part.
function getJavaScriptFilename(input) {
const rfile = /^(?<filename>[a-z]+)(?=\.js)\.[a-z]+$/u
const match = rfile.exec(input)
if (match === null) {
return null
}
return match.groups.filename
}
getJavaScriptFilename('index.js') // <- 'index'
getJavaScriptFilename('index.php') // <- null
There are also negative lookahead assertions, which are expressed as (?!…)
as opposed to (?=…)
for positive lookaheads. In this case, the assertion succeeds only if the lookahead expression isn’t matched. The next bit of code uses a negative lookahead and we can observe how the results are flipped: now any expression other than '.js'
results in a passed assertion.
function getNonJavaScriptFilename(input) {
const rfile = /^(?<filename>[a-z]+)(?!\.js)\.[a-z]+$/u
const match = rfile.exec(input)
if (match === null) {
return null
}
return match.groups.filename
}
getNonJavaScriptFilename('index.js') // <- null
getNonJavaScriptFilename('index.php') // <- 'index'
The proposal for lookbehind[13] (stage 3) introduces positive and negative lookbehind assertions, denoted with (?⇐…)
and (?<!…)
respectively. These assertions can be used to ensure a pattern we want to match is or isn’t preceded by another given pattern. The following snippet uses a positive lookbehind to match the digits in dollar amounts, but not for amounts in euros.
function getDollarAmount(input) {
const rdollars = /^(?<=\$)(?<amount>\d+(?:\.\d+)?)$/u
const match = rdollars.exec(input)
if (match === null) {
return null
}
return match.groups.amount
}
getDollarAmount('$12.34') // <- '12.34'
getDollarAmount('€12.34') // <- null
On the other hand, a negative lookbehind could be used to match numbers that aren’t preceded by a dollar sign.
function getNonDollarAmount(input) {
const rnumbers = /^(?<!\$)(?<amount>\d+(?:\.\d+)?)$/u
const match = rnumbers.exec(input)
if (match === null) {
return null
}
return match.groups.amount
}
getNonDollarAmount('$12.34') // <- null
getNonDollarAmount('€12.34') // <- '12.34'
When using the .
pattern, we typically expect to match every single character. In JavaScript, however, a .
expression doesn’t match astral characters (which can be fixed by adding the u
flag) nor line terminators.
const rcharacter = /^.$/
rcharacter.test('a') // <- true
rcharacter.test('\t') // <- true
rcharacter.test('\n') // <- false
This sometimes drives developers to write other kinds of expressions to synthesize a pattern that matches any character. The expression in the next bit of code matches any character that’s either a whitespace character or a non-whitespace character, delivering the behavior we’d expect from the .
pattern matcher.
const rcharacter = /^[\s\S]$/
rcharacter.test('a') // <- true
rcharacter.test('\t') // <- true
rcharacter.test('\n') // <- true
The dotAll
proposal[14] (stage 3) adds an s
flag which changes the behavior of .
in JavaScript regular expressions to match any single character.
const rcharacter = /^.$/s
rcharacter.test('a') // <- true
rcharacter.test('\t') // <- true
rcharacter.test('\n') // <- true
Often, when we have a regular expression with a global or sticky flag, we want to iterate over the set of captured groups for each match. Currently, it can be a bit of a hassle to produce the list of matches: we need to collect the captured groups using String#match
or RegExp#exec
in a loop, until the regular expression doesn’t match the input starting at the lastIndex
position property. In the following piece of code, the parseAttributes
generator function does just that for a given regular expression.
function* parseAttributes(input) {
const rattributes = /(\w+)="([^"]+)"\s/ig
while (true) {
const match = rattributes.exec(input)
if (match === null) {
break
}
const [ , key, value] = match
yield [key, value]
}
}
const html = '<input type="email" placeholder="[email protected]" />'
console.log(...parseAttributes(html))
// <- ['type', 'email'] ['placeholder', '[email protected]']
One problem with this approach is that it’s tailor-made for our regular expression and its capturing groups. We could fix that issue by creating a matchAll
generator which is only concerned about looping over matches and collecting sets of captured groups, as shown in the following snippet.
function* matchAll(regex, input) {
while (true) {
const match = regex.exec(input)
if (match === null) {
break
}
const [ , ...captures] = match
yield captures
}
}
function* parseAttributes(input) {
const rattributes = /(\w+)="([^"]+)"\s/ig
yield* matchAll(rattributes, input)
}
const html = '<input type="email" placeholder="[email protected]" />'
console.log(...parseAttributes(html))
// <- ['type', 'email'] ['placeholder', '[email protected]']
A bigger source of confusion is that rattributes
mutates its lastIndex
property on each call to RegExp#exec
, which is how it can track the position after the last match. When there are no matches left, lastIndex
is reset back to 0
. A problem arises when we don’t iterate over all possible matches for a piece of input in one go — which would reset lastIndex
to 0
— and then we use the regular expression on a second piece of input, obtaining unexpected results.
While it looks like our matchAll
implementation wouldn’t fall victim of this given it loops over all matches, it’s be possible to iterate over the generator by hand, meaning that we’d run into trouble if we reused the same regular expression, as shown in the next bit of code. Note how the second matcher should report ['type', 'text']
but instead starts at an index much further ahead than 0
, even misreporting the 'placeholder'
key as 'laceholder'
.
const rattributes = /(\w+)="([^"]+)"\s/ig
const email = '<input type="email" placeholder="[email protected]" />'
const emailMatcher = matchAll(rattributes, email)
const address = '<input type="text" placeholder="Enter your business address" />'
const addressMatcher = matchAll(rattributes, address)
console.log(emailMatcher.next().value)
// <- ['type', 'email']
console.log(addressMatcher.next().value)
// <- ['laceholder', 'Enter your business address']
One solution would be to change matchAll
so that lastIndex
is always 0
when we yield back to the consumer code, while keeping track of lastIndex
internally so that we can pick up where we left off in each step of the sequence.
The following piece of code shows that indeed, that’d fix the problems we’re observing. Reusable global regular expressions are often avoided for this very reason: so that we don’t have to worry about resetting lastIndex
after every use.
function* matchAll(regex, input) {
let lastIndex = 0
while (true) {
regex.lastIndex = lastIndex
const match = regex.exec(input)
if (match === null) {
break
}
lastIndex = regex.lastIndex
regex.lastIndex = 0
const [ , ...captures] = match
yield captures
}
}
const rattributes = /(\w+)="([^"]+)"\s/ig
const email = '<input type="email" placeholder="[email protected]" />'
const emailMatcher = matchAll(rattributes, email)
const address = '<input type="text" placeholder="Enter your business address" />'
const addressMatcher = matchAll(rattributes, address)
console.log(emailMatcher.next().value)
// <- ['type', 'email']
console.log(addressMatcher.next().value)
// <- ['type', 'text']
console.log(emailMatcher.next().value)
// <- ['placeholder', '[email protected]']
console.log(addressMatcher.next().value)
// <- ['placeholder', 'Enter your business address']
The String#matchAll
proposal[15] (in stage 1 at the time of this writing) introduces a new method for the string prototype which would behave in a similar fashion as our matchAll
implementation, except the returned iterable is a sequence of match
object as opposed to just the captures
in the example above. Note that the String#matchAll
sequence contains entire match
objects, and not just numbered captures. This means we could access named captures through match.groups
for each match
in the sequence.
const rattributes = /(?<key>\w+)="(?<value>[^"]+)"\s/igu
const email = '<input type="email" placeholder="[email protected]" />'
for (const { groups: { key, value } } of email.matchAll(rattributes)) {
console.log(`${ key }: ${ value }`)
}
// <- type: email
// <- placeholder: [email protected]
Over the years, libraries like Underscore and Lodash spoke loudly of missing features when it came to arrays. As a result, ES5 brought in heaps of functional methods to arrays: Array#filter
, Array#map
, Array#reduce
, Array#reduceRight
, Array#forEach
, Array#some
, and Array#every
.
ES6 brings a few more methods that will help manipulate, fill, and filter arrays.
Before ES6, JavaScript developers often needed to cast arguments
to a function into an array.
function cast() {
return Array.prototype.slice.call(arguments)
}
cast('a', 'b')
// <- ['a', 'b']
We’ve already explored more terse ways of doing this in [ecmascript6-essentials], when we first learned about rest and spread. You could, for instance, use the spread operator. As you no doubt remember, the spread operator leverages the iterator protocol to produce a sequence of values in arbitrary objects. The downside is that the objects we want to cast with spread must adhere to the iterator protocol by having implemeted Symbol.iterator
. Luckily for us, arguments
does implement the iterator protocol in ES6.
function cast() {
return [...arguments]
}
cast('a', 'b')
// <- ['a', 'b']
Using the function rest parameter would be better for this particular case as it wouldn’t involve the arguments
object, nor any added logic in the function body.
function cast(...params) {
return params
}
cast('a', 'b')
// <- ['a', 'b']
You may also want to cast NodeList
DOM element collections, like those returned from document.querySelectorAll
, through the spread operator. This can be helpful when we need access to native array methods like Array#map
or Array#filter
. This is possible because the DOM standard upgraded NodeList
to an iterable, after ES6 defined the iterator protocol.
[...document.querySelectorAll('div')]
// <- [<div>, <div>, <div>, …]
What happens when we try to cast a jQuery collection through the spread operator? If you’re on a modern version of jQuery that implements the iterator protocol, spreading a jQuery object will work, otherwise you may get an exception.
[...$('div')]
// <- [<div>, <div>, <div>, …]
The new Array.from
method is a bit different. It doesn’t only rely on the iterator protocol to figure out how to pull values from an object. It has support for array-likes out the box, unlike the spread operator. The following code snippet will work with any version of jQuery.
Array.from($('div'))
// <- [<div>, <div>, <div>, …]
The one thing you cannot do with either Array.from
nor the spread operator is to pick a start index. Suppose you wanted to pull every <div>
after the first one. With Array#slice
, you could do the following.
[].slice.call(document.querySelectorAll('div'), 1)
Of course, there’s nothing stopping you from using Array#slice
after casting. This is a bit easier to read than the previous example, as it keeps the slice call closer to the index at which we want to slice the array.
Array.from(document.querySelectorAll('div')).slice(1)
Array.from
has three arguments, although only the input
is required. To wit:
-
input
— the array-like or iterable object you want to cast -
map
— a mapping function that’s executed on every item ofinput
-
context
— thethis
binding to use when callingmap
With Array.from
you cannot slice, but you can dice. The map
function will efficiently map the values into something else as they’re being added to the array that results from calling Array.from
.
function typesOf() {
return Array.from(arguments, value => typeof value)
}
typesOf(null, [], NaN)
// <- ['object', 'object', 'number']
Do note that, for the specific case of dealing with arguments
, you could also combine rest parameters and Array#map
. In this case in particular, we may be better off just doing something like the snippet of code found next. It’s not as verbose as the previous example. Like with the Array#slice
example we saw earlier, the mapping is more explicit in this case.
function typesOf(...all) {
return all.map(value => typeof value)
}
typesOf(null, [], NaN)
// <- ['object', 'object', 'number']
When dealing with array-like objects, it makes sense to use Array.from
if they don’t implement Symbol.iterator
.
const apple = {
type: 'fruit',
name: 'Apple',
amount: 3
}
const onion = {
type: 'vegetable',
name: 'Onion',
amount: 1
}
const groceries = {
0: apple,
1: onion,
length: 2
}
Array.from(groceries)
// <- [apple, onion]
Array.from(groceries, grocery => grocery.type)
// <- ['fruit', 'vegetable']
The Array.of
method is exactly like the cast
function we played around with earlier. Next is a code snippet that shows how Array.of
might be ponyfilled.
function arrayOf(...items) {
return items
}
The Array
constructor has two overloads: …items
, where you provide the items for the new array; and length
, where you provide its numeric length. You can think about Array.of
as a flavor of new Array
that doesn’t support a length
overload. In the following code snippet, you’ll find some of the unexpected ways in which new Array
behaves thanks to its single-argument length
overloaded constructor. If you’re confused about the undefined x ${ count }
notation in the browser console, that’s indicating there are array holes in those positions. This is also known as an sparse array.
new Array() // <- []
new Array(undefined) // <- [undefined]
new Array(1) // <- [undefined x 1]
new Array(3) // <- [undefined x 3]
new Array('3') // <- ['3']
new Array(1, 2) // <- [1, 2]
new Array(-1, -2) // <- [-1, -2]
new Array(-1) // <- RangeError: Invalid array length
In contrast, Array.of
has more consistent behavior because it doesn’t have the special length
case. This makes it a more desirable way of consistently creating new arrays programatically.
console.log(Array.of()) // <- []
console.log(Array.of(undefined)) // <- [undefined]
console.log(Array.of(1)) // <- [1]
console.log(Array.of(3)) // <- [3]
console.log(Array.of('3')) // <- ['3']
console.log(Array.of(1, 2)) // <- [1, 2]
console.log(Array.of(-1, -2)) // <- [-1, -2]
console.log(Array.of(-1)) // <- [-1]
Let’s start with the signature of Array#copyWithin
.
Array.prototype.copyWithin(target, start = 0, end = this.length)
The Array#copyWithin
method copies a sequence of array elements within an array instance to the "paste position" starting at target
. The elements to be copied are taken from the [start, end)
range. The Array#copyWithin
method returns the array instance itself.
Let’s lead with a simple example. Consider the items
array in the following code snippet.
const items = [1, 2, 3, , , , , , , , ]
// <- [1, 2, 3, undefined x 7]
The function call shown below takes the items
array and determines that it’ll start "pasting" items in the sixth position (zero-based). It further determines that the items to be copied will be taken starting in the second position, until the third position (not inclusive).
const items = [1, 2, 3, , , , , , , , ]
items.copyWithin(6, 1, 3)
// <- [1, 2, 3, undefined × 3, 2, 3, undefined × 2]
Reasoning about Array#copyWithin
is hard. Let’s break it down.
If we consider that the items to be copied were taken from the [start, end)
range, then we could express that using an Array#slice
call. These are the items that were pasted at the target
position. We can use .slice
to grab the copy.
const items = [1, 2, 3, , , , , , , , ]
const copy = items.slice(1, 3)
// <- [2, 3]
We could also consider the pasting part of the operation as an advanced usage of Array#splice
. The next code snippet does just that, passing the paste position to splice, telling it to remove as many items as we want to copy, and inserting the pasted items. Note that we’re using the spread operator so that elements are inserted individually, and not as an array, through .splice
.
const items = [1, 2, 3, , , , , , , , ]
const copy = items.slice(1, 3)
// <- [2, 3]
items.splice(6, 3 - 1, ...copy)
console.log(items)
// <- [1, 2, 3, undefined × 3, 2, 3, undefined × 2]
Now that we better understand the internals of Array#copyWithin
, we can generalize the example in order to implement the custom copyWithin
function shown in the following code snippet.
function copyWithin(items, target, start = 0, end = items.length) {
const copy = items.slice(start, end)
const removed = end - start
items.splice(target, removed, ...copy)
return items
}
The example we’ve been trying so far would work just as well with our custom copyWithin
function.
copyWithin([1, 2, 3, , , , , , , , ], 6, 1, 3)
// <- [1, 2, 3, undefined × 3, 2, 3, undefined × 2]
A convenient utility method to replace all items in an array with the provided value
. Note that sparse arrays will be filled in their entirety, while existing items will be replaced by the fill value.
['a', 'b', 'c'].fill('x') // <- ['x', 'x', 'x']
new Array(3).fill('x') // <- ['x', 'x', 'x']
You could also specify the starting index and end index. In this case, as shown next, only the items in those positions would be filled.
['a', 'b', 'c', , ,].fill('x', 2)
// <- ['a', 'b', 'x', 'x', 'x']
new Array(5).fill('x', 0, 1)
// <- ['x', undefined x 4]
The provided value
can be anything, and is not just limited to primitive values.
new Array(3).fill({})
// <- [{}, {}, {}]
You can’t fill arrays using a mapping method that takes an index
parameter or anything like that.
const map = i => i * 2
new Array(3).fill(map)
// <- [map, map, map]
The Array#find
method runs a callback
for each item
in an array until the first one that returns true
, and then returns that item
. The method follows the signature of (callback(item, i, array), context)
that’s also present in Array#map
, Array#filter
, and others. You can think of Array#find
as a version of Array#some
that returns the matching element instead of just true
.
['a', 'b', 'c', 'd', 'e'].find(item => item === 'c')
// <- 'c'
['a', 'b', 'c', 'd', 'e'].find((item, i) => i === 0)
// <- 'a'
['a', 'b', 'c', 'd', 'e'].find(item => item === 'z')
// <- undefined
There’s an Array#findIndex
method as well, and it leverages the same signature. Instead of returning a boolean value, or the element itself, Array.findIndex
returns the index of the matching element, or -1
if no matches occur. Here’s a few examples
['a', 'b', 'c', 'd', 'e'].findIndex(item => item === 'c')
// <- 2
['a', 'b', 'c', 'd', 'e'].findIndex((item, i) => i === 0)
// <- 0
['a', 'b', 'c', 'd', 'e'].findIndex(item => item === 'z')
// <- -1
Returns an iterator that yields a sequence holding the keys for the array. The returned value is an iterator, meaning you can iterate over it with for..of
, the spread operator, or by manually calling .next()
.
['a', 'b', 'c', 'd'].keys()
// <- ArrayIterator {}
Here’s an example using for..of
.
for (const key of ['a', 'b', 'c', 'd'].keys()) {
console.log(key)
// <- 0
// <- 1
// <- 2
// <- 3
}
Unlike Object.keys
, and most methods that iterate over arrays, this sequence doesn’t ignore array holes.
Object.keys(new Array(4))
// <- []
[...new Array(4).keys()]
// <- [0, 1, 2, 3]
Now onto values.
Same thing as Array#keys()
, but the returned iterator is a sequence of values instead of keys. In practice, you’ll want to iterate over the array itself most of the time, but getting an iterator can come in handy sometimes.
['a', 'b', 'c', 'd'].values()
// <- ArrayIterator {}
You can use for..of
or any other methods like a spread operator to pull out the iterable sequence. The example below uses the spread operator on an array’s .values()
to create a copy of that array.
[...['a', 'b', 'c', 'd'].values()]
// <- ['a', 'b', 'c', 'd']
Note that omitting the .values()
method call would still produce a copy of the array: the sequence is iterated and spread over a new array.
Similar to both preceding methods, except Array#entries
returns an iterator with a sequence of key-value pairs.
['a', 'b', 'c', 'd'].entries()
// <- ArrayIterator {}
Each item in the sequence is a two dimensional array with the key and the value for an item in the array.
[...['a', 'b', 'c', 'd'].entries()]
// <- [[0, 'a'], [1, 'b'], [2, 'c'], [3, 'd']]
Great, one last method left!
This is exactly the same as the Array#values
method.
const list = ['a', 'b', 'c', 'd']
list[Symbol.iterator] === list.values
// <- true
[...list[Symbol.iterator]()]
// <- ['a', 'b', 'c', 'd']
The example below combines a spread operator, an array, and Symbol.iterator
to iterate over its values. Can you follow the code?
[...['a', 'b', 'c', 'd'][Symbol.iterator]()]
// <- ['a', 'b', 'c', 'd']
Let’s break it down. First, there’s the array.
['a', 'b', 'c', 'd']
// <- ['a', 'b', 'c', 'd']
Then we get an iterator.
['a', 'b', 'c', 'd'][Symbol.iterator]()
// <- ArrayIterator {}
Last, we spread the iterator over a new array, creating a copy.
[...['a', 'b', 'c', 'd'][Symbol.iterator]()]
// <- ['a', 'b', 'c', 'd']
Array
methods by reading the following "Fun with Native Arrays" article: https://mjavascript.com/out/native-arrays.
String#repeat
in ECMAScript 6 Specification, section 21.1.3.13: https://mjavascript.com/out/array-repeat.
u
flag in regular expressions, read the following piece, titled "Unicode-aware regular expressions in ECMAScript 6 by Mathias Bynens": https://mjavascript.com/out/regexp-unicode, from Mathias Bynens.
String#matchAll
proposal document at: https://mjavascript.com/out/string-matchall.