Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add java rule EC24 : Optimize Database SQL Queries (Clause LIMIT) #20

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

dedece35
Copy link
Member

@dedece35 dedece35 commented Mar 13, 2024

implement issue green-code-initiative/creedengo-rules-specifications#239
old PR : green-code-initiative/creedengo-rules-specifications#221 (duplicated because of stale state and no come back from original developer, thus I copied and corrected the code with review feddbacks)

@dedece35 dedece35 self-assigned this Mar 13, 2024
@dedece35 dedece35 changed the title Add java rule EC80 : Optimize Database SQL Queries (Clause LIMIT) Add java rule EC24 : Optimize Database SQL Queries (Clause LIMIT) Mar 17, 2024
@dedece35 dedece35 requested a review from utarwyn March 17, 2024 15:39
pom.xml Outdated
@@ -67,7 +67,7 @@
<google.re2j>1.7</google.re2j>

<!-- temporary version waiting for real automatic release in ecocode repository -->
<ecocode-rules-specifications.version>1.5.0</ecocode-rules-specifications.version>
<ecocode-rules-specifications.version>1.5.1-SNAPSHOT</ecocode-rules-specifications.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can wait for the next version of the artifact instead of using a snapshot here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for approvement of this implementation before releasing ecocode-rules-specifications

dummyCall("SELECT user FROM myTable LIMIT 50"); // Compliant
}

@Query("select t from Todo t where t.status != 'COMPLETED'") // Noncompliant {{Optimize Database SQL Queries (Clause LIMIT)}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the JavaScript plugin, I added the WHERE keyword in the list of limiting keywords so as not to be too strict either. We could consider that this query already performs a stricter selection than all the data in the table. I don't know what your opinion is on this? maybe it lacks measure too :p

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utarwyn update done

@Djoums
Copy link

Djoums commented Mar 19, 2024

This rule is going to be very low quality, I don't think it should be merged at all.
Just writing Console.Log("select * from table") will trigger a warning, but writing ExecuteCommand("select *" + " from table") won't

Copy link

This PR has been automatically marked as stale because it has no activity for 30 days.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

@github-actions github-actions bot added the stale label Apr 19, 2024
@dedece35
Copy link
Member Author

This rule is going to be very low quality, I don't think it should be merged at all. Just writing Console.Log("select * from table") will trigger a warning, but writing ExecuteCommand("select *" + " from table") won't

HI @Djoums, I'm ok with you with this rule hasn't an optimal quality, but ... to be homogenous with JS language that already has implemented this rule (check "where" and "limit" clauses), I think we can integrate the implementation in Java language and maybe in python, and php language.
For me, the main argument for this actual implementation is that this is a first level check on simple queries.
If a developer wants to avoid this rule, he will always find a way for it.
Maybe could we improve the rule with your new complex use cases : splitting query into several strings.

What do you think ?

@dedece35 dedece35 added 🗃️ rule rule improvment or rule development or bug 🔥 in progress 🔥 and removed stale labels Apr 23, 2024
@Djoums
Copy link

Djoums commented Apr 24, 2024

Hi @dedece35 , it's a product choice really : is a low quality rule better than nothing ?

IMO not in this case, because it is so limited that I don't see a use case for it, on top of being very prone to false positives/true negatives.

This rule may be helpful when developers build quick models, because they tend to quickly hard code sql queries to save time. But that's precisely when they don't need rules like this enforced. When it comes to actual production code that requires quality (which is what we're targeting really) :

  • String interpolations/concatenations/etc cannot be handled by an analyzer (it would have to actually run the code to resolve them)
  • The analyzer cannot know if a query is intended to be ran (see the Console.Log example)
  • Many (if not most) sql queries are built programmatically, because their content depends on runtime variables. The analyzer cannot do anything in this case, see previous remark about having to actually run the code. That's why this rule should be enforced at a lower level imo, in the data access layer for ex.
  • ORMs are everywhere these days, for exemple Entity Framework or Dapper for .Net. The analyzer cannot help here, unless we provide a specific implementation for each one.

@dedece35
Copy link
Member Author

dedece35 commented Apr 26, 2024

Hi @Djoums and @utarwyn (@jhertout),
I agree that is a low quality rule with actual state. The only really strong use case is the usage of "@query" with all the query inside (because we can't give a variable in that string query). Maybe, can we only deal with this use case ?

If we aren't all ok with this rule integration, if we are all ok with NO integration, next we flag this rule in our rule array to study more the implementation. But, the consequence are (I have no problem with it) :

  • deletion of the rule implementation in JS plugin
  • deletion of test java files in ecocode-java-test-project repository
  • deletion of rule specification in ecocode repository
  • update RULES.md and ecocode-rules-specification with deletions

Do we deal with only one use case ? or do we delete this rule for study it later ?

@utarwyn, @jhertout, @jycr, @glalloue ?

@Djoums
Copy link

Djoums commented Apr 26, 2024

I think that any rule dealing with sql query content shouldn't be handled by a static analyzer (Sonar, Roslyn or any other), this is also the case for the "select *" rule. I'm fine with going for the simplest case, but it's probably not going to be very helpful, especially for production level code.

@utarwyn
Copy link
Member

utarwyn commented Apr 30, 2024

Hello! I agree that the implementation of this rule is very basic currently. However I find you a little bit though with what has been done. I think this rule is important and it would be a shame not to include it in our plugins. We should have in mind that we will not be able to cover 100% of use cases, but it will always be better than 0%. As long as we don’t get false positives.

that any rule dealing with sql query content shouldn't be handled by a static analyzer

I don't understand the idea behind this: how can we deal with SQL queries if this not through a static analyzer? Do not handle them at all? I think it’s possible to do basic static analysis like we have done here.

The analyzer cannot know if a query is intended to be ran

I think it's possible to check if a variable or string is passed in a @Transactional or a JDBC operation. This is already how Intellij IDEA works to apply syntax highlighting to SQL queries. We could imagine this same feature for other languages. Note that this would complicate the implementation.

Many (if not most) sql queries are built programmatically

I don't understand? You can perform prepared statements with variables but this rule reads the selection part of the query. So it shouldn’t change anything.

ORMs are everywhere these days

I agree with you that nowadays we often do not use basic SQL queries but an ORM instead. So it could be an improvement to support some famous ORM tools in our implementations. I already have some ideas to do it in JavaScript.

I leave you the final choice, but I think we should keep this rule and continue to iterate on it to improve our implementations.

@Djoums
Copy link

Djoums commented May 1, 2024

All I'm saying is that it will be closer to 5% rather than 100%. And given that it will be very simplistic, that it has the potential for many false positives/true negatives, and that the poor quality will possibly hurt our plugin image, I don't think it's worth it. But I'm not trying to block it, just giving my opinion.

I don't understand the idea behind this: how can we deal with SQL queries if this not through a static analyzer? Do not handle them at all? I think it’s possible to do basic static analysis like we have done here.

Static analyzers can analyze static queries, by definition, but many are dynamic (constructed in several steps, with conditionals, function calls, etc). If what the rule does is just scan all the code base for strings that contain "select *", that is really poor imo and prone to errors.

And for "LIMIT X" this is even worse, how do you identify strings that should/shouldn't have it appended ? If I write :

string myStr = "SELECT myField FROM myTable"; // Are you going to warn here ?
myStr += " LIMIT 30";

The only way I see to properly analyze any query is at runtime, in the DAL or the DB engine, because they are fully formed at that point.

Copy link

github-actions bot commented Jun 1, 2024

This PR has been automatically marked as stale because it has no activity for 30 days.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

@github-actions github-actions bot added the stale label Jun 1, 2024
@dedece35 dedece35 removed the stale label Jul 26, 2024
@dedece35
Copy link
Member Author

@Djoums @utarwyn
Regarding all of your arguments, I'm agree with both of you.
Thus, to keep a good quality for our rule, I propose following solution for implementation :

  • if query is only in a full string (like used in "@query", or directly called in execution method) : we can make the actual analyse
  • if query is affected to a variable, the analyse can only be done on one case :
    • if the variable is only affected once, and not returned by the current method, the analyse can be made (because variable is local in the method and not updated)
    • else NO analyse can me done because, the variable is (or can be) completed (with "+=" in the same method or later if the variable is returned by the current method)

are you ok ?
I think this kind of implementation guarantees a minimum of quality.
I begin this modification.

@dedece35 dedece35 mentioned this pull request Jul 26, 2024
@dedece35
Copy link
Member Author

My First test file :

interface LimitDbQueryResultInterface {

    @Query("select t from Todo t") // Noncompliant {{Try and limit the number of data returned for a single query (by using the LIMIT keyword for example)}}
    List<Todo> getTodo();

    @Query("select t " + "from Todo t") // Noncompliant {{Try and limit the number of data returned for a single query (by using the LIMIT keyword for example)}}
    List<Todo> getTodo2();

    @Query("select t from Todo t where t.status != 'COMPLETED'") // Compliant
    List<Todo> getTodoNotCompleted();

    @Query("select t from Todo t where t.status != 'COMPLETED' LIMIT 25") // Compliant
    List<Todo> getTodoNotCompletedLimit25();

}

@dedece35
Copy link
Member Author

my second test file


class LimitDbQueryResult {

    public void callLiteralSQLString() {
        dummyCall("SELECT user FROM myTable"); // Noncompliant {{Try and limit the number of data returned for a single query (by using the LIMIT keyword for example)}}
        dummyCall("SELECT user " + " FROM myTable"); // Noncompliant {{Try and limit the number of data returned for a single query (by using the LIMIT keyword for example)}}
        dummyCall("SELECT user FROM myTable LIMIT 50"); // Compliant
        dummyCall("SELECT user FROM myTable WHERE user.name = 'titi'"); // Compliant
    }

    private void buildAndCallQuery() {
        String sql1 = "SELECT user FROM myTable"; // Noncompliant {{Try and limit the number of data returned for a single query (by using the LIMIT keyword for example)}}
        String sql2 = "SELECT user FROM myTable LIMIT 50"; // Compliant
        String sql3 = "SELECT user FROM myTable WHERE user.name = 'titi'"; // Compliant
        String sql4 = "SELECT user"; // Noncompliant {{Try and limit the number of data returned for a single query (by using the LIMIT keyword for example)}}
        sql4 += "FROM myTable"; // Noncompliant {{Try and limit the number of data returned for a single query (by using the LIMIT keyword for example)}}

        dummyCall(sql1);
        dummyCall(sql2);
        dummyCall(sql3);
        dummyCall(sql4);
    }

    private void callQueryWithReturn() {
        String sql1 = "SELECT user FROM myTable"; // Compliant (more exactly, not controlled and not issued because string returned)
        dumyCall("");
        return sql1;
    }

    private void callQueryWithReturn() {
        String sql1 = "SELECT user FROM myTable"; // Compliant (more exactly, not controlled and not issued because string returned)
        return sql1;
    }

    private void callQueryWithReturn2() {
        return "SELECT user FROM myTable"; // Compliant (more exactly, not controlled and not issued because string returned)
    }

    private void callQueryWithReturn3() {
        String sql2 = "SELECT user"; // Compliant (more exactly, not controlled and not issued because string returned)
        sql2 += "FROM myTable"; // Compliant (more exactly, not controlled and not issued because string returned)
        return sql2;
    }

    private void callQueryWithInput(String sqlInit) {
        String sql1 = "SELECT user FROM myTable"; // Compliant (more exactly, not controlled and not issued because input string used)
        sql1 += sqlInit;
        dummyCall(sql1);
    }

    private void dummyCall(String request) {
    }
}

Copy link

@Djoums
Copy link

Djoums commented Jul 29, 2024

private void buildAndCallQuery()
{
    String sql4 = "SELECT user"; // Noncompliant - even though you don't know what comes next ?
    sql4 += " FROM myTable"; // Noncompliant - how do you know this is part of a SQL query in the first place ?
    sql4 += " LIMIT 10"; // This makes the query compliant, but you're still going to produce warnings on the 2 previous lines ?
    dummyCall(sql4);
}

This is a modified version of a part of your test code to explain why I still have a big issue here.

Producing warnings is not a trivial matter, there are people who take them very seriously (and even activate the option to consider warnings as errors), we can't just throw semi-random stuff in there 😐

@dedece35
Copy link
Member Author

private void buildAndCallQuery()
{
    String sql4 = "SELECT user"; // Noncompliant - even though you don't know what comes next ?
    sql4 += " FROM myTable"; // Noncompliant - how do you know this is part of a SQL query in the first place ?
    sql4 += " LIMIT 10"; // This makes the query compliant, but you're still going to produce warnings on the 2 previous lines ?
    dummyCall(sql4);
}

This is a modified version of a part of your test code to explain why I still have a big issue here.

Producing warnings is not a trivial matter, there are people who take them very seriously (and even activate the option to consider warnings as errors), we can't just throw semi-random stuff in there 😐

Hi @Djoums,
no, for me, only raise issues in all sentences where "sql4" is used only when the last analysis is done and there is a non compliance.

@Djoums
Copy link

Djoums commented Aug 15, 2024

Hi @Djoums, no, for me, only raise issues in all sentences where "sql4" is used only when the last analysis is done and there is a non compliance.

That needs advanced analyzis then, reconstructing the full string inside the analyzer. Also dummyCall is not guaranteed to actually run the sql query but it's probably ok in this case. Still not a fan though 😐

Copy link

This PR has been automatically marked as stale because it has no activity for 30 days.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

@github-actions github-actions bot added the stale label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🗃️ rule rule improvment or rule development or bug 🔥 in progress 🔥 stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants