Skip to content
Maximiliano edited this page Jun 9, 2021 · 2 revisions

The lint command

The lint command can be broken into two functionalities:

  1. detection which refers to identifying bad coding practices in one or multiple Stata do-files;
  2. correction which refers to correcting bad coding practices in a Stata do-file.

Detection

The typical usage of the detection feature is to point to a do-file that you would like to know whether there is any bad practice. This feature serves as a tool to understand what needs to be modified in your d-file in terms of style and checks.

The syntax is:

lint "test/bad.do" 

which gives you the following output:

Summary (number of lines where bad practices are detected) =======================

[Style]
Hard tabs instead of soft tabs (whitespaces) used: Yes
Abstract index used in for-loop: 3
Not proper indentation in for-loop for if-else statement: 7
Not proper indentation in newline: 1
Missing whitespaces around math symbols: 0
Incomplete conditions: 6
Not explicit if statement: 0
Delimit used: 1
cd used: 0
Lines too long: 5
Brackets not used for global macro: 0

[Check]
Missing values properly treated?: 7
Backslash used in file path?: 0
Bang (!) used instead of tilde (~) for negation?: 6

The output is divided into two:

  1. Style:
    • By style, we refer to lines of codes that contain bad coding practices such as bad spacing but do not influence the results created by your code.
  2. ChecksL
    • By checks, we refer to lines of codes that contain bad coding practices that could potentially influence the results in your code such as ignoring missing values or the replication of your code.

If you want to get the lines where those bad coding practices appear you can use the option verbose:

Style =====================
(line 14) style: Use 4 white spaces instead of tabs. (This may apply to other lines as well.)
(line 15) style: Avoid to use "delimit". For line breaks, use "///" instead.
(line 17) style: This line is too long (82 characters). Use "///" for line breaks so that one line has at most 80 characters.
(line 25) style: After declaring for loop statement or if-else statement, add indentation (4 whitespaces).
...
...
Check =====================
(line 25) check: Are you taking missing values into account properly? (Remember that "a != 0" includes cases where a is missing.)
(line 25) style: Are you using tilde (~) for negation? If so, for negation, use bang (!) instead of tilde (~).
...
...

Examples

Let's see a global folder path for running these examples:

*** Set the global to folder where test files are stored
    global project      "D:/Documents/RA Jobs/DIME/analytics/linter/stata-linter"
    global test_dir     "${project}/test"

This repo contains one example called "bad" that is located in the /test folder.

For one do-file, we can run the following lines of code that will detect bad practices.

// Detect --------------------------------------------------------------------
lint "${test_dir}/bad.do"
lint "${test_dir}/bad.do", verbose  
lint "${test_dir}/bad.do", verbose nosummary
lint "${test_dir}/bad.do", nosummary

You can export the results to an excel file:

// Lint with results in excel file
lint "${test_dir}/bad.do", nosummary          ///
  excel("${test_dir}/detect_lint.xlsx")      

You can also detect bad practices in all the do-files that are in a directory:

// Lint a folder
lint "${test_dir}"

And you can also export the results of that folder:

// Lint a folder and create an excel file
lint "${test_dir}",                           ///
  excel("${test_dir}/detect_output_all.xlsx")

Correction

The correction feature corrects bad coding practices in a Stata do-file. The typical usage of this feature is to point to the do-file that requires correction and create a new do-file with the correction. By default Stata will ask you what kind of changes you would like to make.

Disclaimer: Note that this command is not guaranteed to correct codes without changing results. It is strongly recommended that, after using this command, you check if the results of the do file do not change.

The command requires you to point to the do-file that needs correction just as we did the detection feature (i.e., lint "input_file") and to point to the new do-file that will have the corrections with using. Therefore, the syntax behaves as:

lint "input_file" using "output_file" 

We recommend you that the output file name be different from the input file name as the original do-file should be kept as a backup.

Examples

The basic usage would be:

  lint "${test_dir}/bad.do"                     ///
    using "${test_dir}/bad_corrected.do"       

In your Stata window you should get the detect features and the following message:

------------------------------------------------------------
Correcting do-file
------------------------------------------------------------
 
Created PATH/bad_corrected.do.

And that's it. You have created your new do-file with the corrections.

If you don't want Stata to ask you about every single want of the changes, you can use the option automatic as follows:

lint "${test_dir}/bad.do"                     ///
  using "${test_dir}/bad_corrected.do",       ///
  nosummary                                   ///
  replace automatic

You can also combine the two features and export the detection results to excel at the same time as follows:

// detecting + correcting + excel file results
lint "${test_dir}/bad.do"                     ///
  using "${test_dir}/bad_corrected.do",       ///
  excel("${test_dir}/detect_lint.xlsx")       ///                               
  replace                                     ///
  automatic   

Options

The lint command has the following options that can be used for either detecting and/or correcting a do-file:

  • verbose: shows all the lines where bad practices appear.
  • nosummary: suppress the summary of bad practices.
  • indent(): specify the number of whitespaces used for indentation (default is 4).
  • nocheck: removes suggestions to check and only show style problems.
  • linemax(): maximum number of characters in a line (default: 80)
  • tab_space(): number of whitespaces used instead of hard tabs (default is 4).
  • excel(): export detection results to excel.
  • automatic: correct all bad coding practices without asking if you want each bad coding practice to be corrected or not (use only with the correction feature).
  • replace: replace the existing output file (use only with the correction feature).
  • inprep: allow the output file name to be the same as the name of the input file (use only with the correction feature).