Add support for SPL (Splunk query language) #1970

romain-durban · 2023-06-13T17:31:14Z

Hello, here is a proposal to add support for SPL (Splunk query language).

SPL is technically fileless, meaning that it does not come up in a dedicated file with a specific file extension. However, I have here assumed that users could store a query in a FILENAME.spl file, that's the first extension which might come to mind. Technically, Splunk stores its knowledge obects in INI configuration files, parts of it can be SPL queries. However, syntax highlighting for SPL can become very helpful when sharing pieces of queries in threads, such as on Github or Gitlab.

It is a stretch to consider SPL as real language, it is extremely permissive and can be ambiguous. Fortunately we do not need here to validate the syntax, just highlight notable elements.

I'll now share some information to explain why the Lexer is done this way here. Theorically, SPL's grammar is vary basic, however to achieve a useful kind of syntax highlighting we need a lot of compensation.

Basic syntax and implicit Search command

SPL is basically a succession of commands. A command starts with a pipe, followed by its name and then arguments/input data:

| commandA arg1=true fieldA, fieldB
| commandB

There are several types of commands, but the most important here for us is "Generating commands", it is a type of command which does not need data as input but will provide a data output. Each query should start by a Generating Command.
However, for ease of use like for a search engine, Splunk considers that by default the command "search" is implictly used if the query does not start by an explicit command call like | commandName. That's the first exception that will mess with us. Lexer needs to be able to handle this implicit default state.

This:

index=_internal sourcetype=splunkd

is the exact same thing as:

| search index=_internal sourcetype=splunkd

Subqueries

Splunk allows to have subqueries in the queries, which will be executed beforehand. There is no actual limit of how deep they can be nested and they can occur nearly anywhere in the query, their output will technically be, on runtime, more SPL. They are defined between brackets. A subquery follows almost the same rules as a query, except that the first pipe (to start a command) is optional. That's another exception to handle.

End of query

There is technically no way of indicating we have reached the end of a query apart from EOF.
Consequently, it is unfortunately not really possible to put several queries in a same file/block and expect all of them to always be syntaxically highlighted correctly. This is due to the implicit "search" command at the beginning of the query, if a query ends and another starts with such implicit state, we cannot know where the previous query ends and where the new one starts.

Arguments position and nature

Command arguments can be positioned anywhere after the command name. Some arguments are parameters which the command expects. Each command has a different set of known parameters. Arguments are used like this argName=value. Almost anything else will be treated as input for the command. Note that inputs can also be of the shape field=value, such as in a search command when providing the data filters. This is why we have a large dictionary listing the expected arguments for each command, so that we can highlight only what is valid in the context of the current command.

Operators

Apart for the usual arithmetic operators which do not really matter here, Splunk commands use a wide variety of operators. There are the usual boolean operators for the definition of conditions (AND, OR, NOT etc.). But there are also operators like "BY" or "GROUPBY" for agregations, "AS" for aliases/renaming but also operators which define a specific section of the command like for a SQL query, such as "WHERE", "FROM" etc. We want to highlight them but only when they are in the appropriate location. This is why we have a large dictionary defining the possible advanced operators.

Functions

Some commands allow the use of various functions, such as agregation functions, for more advanced features. There are several types of functions, in this lexer we have considered the following ones:

Eval functions: functions to evaluate data/fields, they basically transform data
Agregation functions: functions used in agregating commands, like in SQL, such as count, avg etc.
Convert functions; specificaly used for data type conversions
Filter functions: specifically used for data filtering

Some commands support functions, but only of some specific types and we want to highligh only valid calls. This is why we have several data structures listing the functions of a given type and the list of commands supporting them. Splunk allows agregation commands to also support eval functions inside agregation functions. That's another exception to handle.

There are some other specificities/tricks, but I'll spare you the detail, they are less important.

Example of highlighted SPL

Official syntax highlighting

Splunk provides syntax highlighting in its interface, but it is surprisingly light. I have here pushed it a little farther, with some personally choices based on past experience. It was also often inspired by the syntax highlighting proposed there : https://github.com/ChrisYounger/highlighter

Deprecated commands

Similarly to Chris Younger in his implementation, deprecated SPL commands are not highlighted even though they can be found in the official documentation (with a warning). This is made in order to alert the user that a command being used should not be.

Pushing this first working version, further tests will be done later

Getting rid of tabs

Removed the :search_command state to reduce redundancy and playing instead with the states stack

Fixed some issues, now covering some commands which were missing and now better highlighting some special operators, args and functions

Fixed a minor syntax error in the example, for good measure

One of the regex for multiline comments did not have the multiline option and was raising an error on \n

romain-durban · 2023-06-14T08:10:44Z

Hello @tancnle,

Sorry, I did not notice the lexer was raising one Error token because it was on a "\n".
I have fixed it (a multiline mode was missing on one of the regexes).

Added missing EOF newline to comply to the linelint rule

romain-durban added 6 commits June 3, 2023 16:28

First working version

8c7311c

Pushing this first working version, further tests will be done later

Replaces tabs by spaces

c1a454e

Getting rid of tabs

Simplified states

e5bdd4c

Removed the :search_command state to reduce redundancy and playing instead with the states stack

Bugfixes and improvements

da1c45d

Fixed some issues, now covering some commands which were missing and now better highlighting some special operators, args and functions

Minor syntax error in example

9e2312c

Fixed a minor syntax error in the example, for good measure

Small regex fix

8975e4c

One of the regex for multiline comments did not have the multiline option and was raising an error on \n

Added missing EOF newline

ac0e7fb

Added missing EOF newline to comply to the linelint rule

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for SPL (Splunk query language) #1970

Add support for SPL (Splunk query language) #1970

romain-durban commented Jun 13, 2023

romain-durban commented Jun 14, 2023

Add support for SPL (Splunk query language) #1970

Are you sure you want to change the base?

Add support for SPL (Splunk query language) #1970

Conversation

romain-durban commented Jun 13, 2023

Basic syntax and implicit Search command

Subqueries

End of query

Arguments position and nature

Operators

Functions

Example of highlighted SPL

Official syntax highlighting

Deprecated commands

romain-durban commented Jun 14, 2023