Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@ node_modules/
.claude-session-id

squawk/
eugene/

# Auto generated treesitter files
crates/pgt_treesitter_grammar/src/grammar.json
crates/pgt_treesitter_grammar/src/node-types.json
crates/pgt_treesitter_grammar/src/parser.c
crates/pgt_treesitter_grammar/src/parser.c
378 changes: 378 additions & 0 deletions agentic/port_eugene_rules.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,378 @@
# Porting Rules from Eugene to postgres-language-server

This document tracks the progress of porting lint rules from [Eugene](https://github.com/kaaveland/eugene) to the postgres-language-server analyzer.

## Overview

Eugene is a PostgreSQL migration linter that detects dangerous operations. We are porting its rules to the postgres-language-server to provide similar safety checks within the language server environment.

**Eugene source location**: `eugene/eugene/src/lints/rules.rs`
**Hint metadata location**: `eugene/eugene/src/hint_data.rs`
**Example SQL files**: `eugene/eugene/examples/*/`

## Step-by-Step Porting Process

### 1. Understand the Rule

1. **Read Eugene's implementation** in `eugene/eugene/src/lints/rules.rs`
- Find the rule function (e.g., `added_serial_column`)
- Understand the AST patterns it matches
- Note any special logic (e.g., checking previous statements)

2. **Read the hint metadata** in `eugene/eugene/src/hint_data.rs`
- Get the ID (e.g., "E11")
- Get name, condition, effect, and workaround text
- This provides documentation content

3. **Review example SQL** in `eugene/eugene/examples/<ID>/`
- `bad.sql` - Invalid cases that should trigger the rule
- `good.sql` - Valid cases that should NOT trigger

### 2. Create the Rule

```bash
# Create rule with appropriate severity (error/warn)
just new-lintrule safety <ruleName> <severity>

# Example:
just new-lintrule safety addSerialColumn error
```

This generates:
- `crates/pgt_analyser/src/lint/safety/<rule_name>.rs`
- `crates/pgt_analyser/tests/specs/safety/<ruleName>/basic.sql`
- Updates to configuration files
- Diagnostic category registration

### 3. Implement the Rule

**File**: `crates/pgt_analyser/src/lint/safety/<rule_name>.rs`

#### Key Components:

```rust
use pgt_analyse::{context::RuleContext, declare_lint_rule, Rule, RuleDiagnostic, RuleSource};
use pgt_console::markup;
use pgt_diagnostics::Severity;

declare_lint_rule! {
/// Brief one-line description (shown in lists).
///
/// Detailed explanation of what the rule detects and why it's problematic.
/// Explain the PostgreSQL behavior and performance/safety implications.
///
/// ## Examples
///
/// ### Invalid
///
/// ```sql,expect_diagnostic
/// -- SQL that should trigger the rule
/// ALTER TABLE users ADD COLUMN id serial;
/// ```
///
/// ### Valid
///
/// ```sql
/// -- SQL that should NOT trigger
/// CREATE TABLE users (id serial PRIMARY KEY);
/// ```
///
pub RuleName {
version: "next",
name: "ruleName",
severity: Severity::Error, // or Warning
recommended: true, // or false
sources: &[RuleSource::Eugene("<ID>")], // e.g., "E11"
}
}

impl Rule for RuleName {
type Options = ();

fn run(ctx: &RuleContext<Self>) -> Vec<RuleDiagnostic> {
let mut diagnostics = Vec::new();

// Pattern match on the statement type
if let pgt_query::NodeEnum::AlterTableStmt(stmt) = &ctx.stmt() {
// Rule logic here

if /* condition */ {
diagnostics.push(
RuleDiagnostic::new(
rule_category!(),
None,
markup! {
"Error message with "<Emphasis>"formatting"</Emphasis>"."
},
)
.detail(None, "Additional context about the problem.")
.note("Suggested fix or workaround."),
);
}
}

diagnostics
}
}
```

#### Important Patterns:

**Accessing previous statements** (for rules like `multipleAlterTable`):
```rust
let file_ctx = ctx.file_context();
let previous = file_ctx.previous_stmts();
```

**Schema normalization** (treating empty schema as "public"):
```rust
let schema_normalized = if schema.is_empty() {
"public"
} else {
schema.as_str()
};
```

**Checking for specific ALTER TABLE actions**:
```rust
for cmd in &stmt.cmds {
if let Some(pgt_query::NodeEnum::AlterTableCmd(cmd)) = &cmd.node {
if cmd.subtype() == pgt_query::protobuf::AlterTableType::AtAddColumn {
// Handle ADD COLUMN
}
}
}
```

**Extracting column type**:
```rust
if let Some(pgt_query::NodeEnum::ColumnDef(col_def)) = &cmd.def.as_ref().and_then(|d| d.node.as_ref()) {
if let Some(type_name) = &col_def.type_name {
let type_str = get_type_name(type_name);
}
}

fn get_type_name(type_name: &pgt_query::protobuf::TypeName) -> String {
type_name
.names
.iter()
.filter_map(|n| {
if let Some(pgt_query::NodeEnum::String(s)) = &n.node {
Some(s.sval.as_str())
} else {
None
}
})
.collect::<Vec<_>>()
.join(".")
}
```

### 4. Create Comprehensive Tests

**Directory**: `crates/pgt_analyser/tests/specs/safety/<ruleName>/`

Create multiple test files covering:

#### Invalid Cases (should trigger):
```sql
-- expect_lint/safety/<ruleName>
-- Description of what this tests
<SQL that should trigger>
```

#### Valid Cases (should NOT trigger):
```sql
-- Description of what this tests
-- expect_no_diagnostics
<SQL that should NOT trigger>
```

#### Example Test Structure:
```
tests/specs/safety/addSerialColumn/
├── basic.sql # Basic case triggering the rule
├── bigserial.sql # Variant (bigserial type)
├── generated_stored.sql # Another variant (GENERATED)
├── valid_regular_column.sql # Valid: regular column
└── valid_create_table.sql # Valid: CREATE TABLE context
```

**Run tests and accept snapshots**:
```bash
cargo insta test -p pgt_analyser --accept
```

### 5. Verify and Generate Code

```bash
# Check compilation
cargo check

# Generate lint code and documentation
just gen-lint

# Run all tests
cargo test -p pgt_analyser --test rules_tests

# Final verification
just ready
```

### 6. Test the Rule Manually

Create a test SQL file:
```sql
-- test.sql
ALTER TABLE users ADD COLUMN id serial;
```

Run the CLI:
```bash
cargo run -p pgt_cli -- check /path/to/test.sql
```

## Common Pitfalls and Solutions

### 1. Schema Matching

**Problem**: Need to match tables across statements with different schema notations.

**Solution**: Normalize schema names:
```rust
let schema_normalized = if schema.is_empty() { "public" } else { schema.as_str() };
```

### 2. AST Navigation

**Problem**: Eugene uses a simplified AST (`StatementSummary`), but pgt uses the full PostgreSQL AST.

**Solution**: Use pattern matching and helper functions. Look at existing rules for examples.

### 3. File Context Rules

**Problem**: Rules that need to track state across statements (like `multipleAlterTable`).

**Solution**: Use `ctx.file_context()` to access `AnalysedFileContext`:
```rust
let file_ctx = ctx.file_context();
let previous_stmts = file_ctx.previous_stmts();
```

### 4. Transaction State

**Problem**: Some Eugene rules check transaction state (e.g., `RUNNING_STATEMENT_WHILE_HOLDING_ACCESS_EXCLUSIVE`).

**Solution**: This is more complex and may require extending the `AnalysedFileContext` to track transaction state. Consider implementing simpler rules first.

### 5. Test Expectations

**Problem**: `expect_lint` expects exactly ONE diagnostic, but rule generates multiple.

**Solution**: Either:
- Split into separate test files (one diagnostic each)
- Adjust the test to only trigger once
- Use expect_diagnostic for each occurrence (check existing tests)

## Rule Mapping Considerations

### Overlapping Rules

Some Eugene rules may overlap with existing pgt rules:

| Eugene Rule | Potential PGT Overlap | Action |
|-------------|----------------------|--------|
| `SET_COLUMN_TYPE_TO_JSON` | `preferJsonb` | Review both, may enhance existing |
| `CREATE_INDEX_NONCONCURRENTLY` | `requireConcurrentIndexCreation` | Review both |
| `CHANGE_COLUMN_TYPE` | `changingColumnType` | Review both |
| `ADD_NEW_UNIQUE_CONSTRAINT_WITHOUT_USING_INDEX` | `disallowUniqueConstraint` | Review both |

**When overlap exists**:
1. Compare implementations
2. If Eugene's is more comprehensive, consider updating the existing rule
3. If they cover different aspects, keep both
4. Document any differences

### Transaction-Aware Rules

These rules require tracking transaction state across multiple statements:

- `RUNNING_STATEMENT_WHILE_HOLDING_ACCESS_EXCLUSIVE` (E4)
- `LOCKTIMEOUT_WARNING` (E9)

**Approach**:
1. First implement simpler, statement-level rules
2. Design transaction state tracking in `AnalysedFileContext`
3. Add fields to track:
- Current transaction state (BEGIN/COMMIT/ROLLBACK)
- Locks held
- Lock timeout settings
4. Update state as statements are processed

## Eugene AST vs PostgreSQL AST

### Eugene's Simplified AST

Eugene uses `StatementSummary` enum with simplified representations:
```rust
enum StatementSummary {
AlterTable { schema: String, name: String, actions: Vec<AlterTableAction> },
CreateIndex { schema: String, idxname: String, concurrently: bool, target: String },
// ...
}

enum AlterTableAction {
AddColumn { column: String, type_name: String, stored_generated: bool, ... },
SetType { column: String, type_name: String },
// ...
}
```

### PostgreSQL AST (pgt_query)

We use the full PostgreSQL protobuf AST:
```rust
pgt_query::NodeEnum::AlterTableStmt(stmt)
-> stmt.cmds: Vec<Node>
-> NodeEnum::AlterTableCmd(cmd)
-> cmd.subtype: AlterTableType
-> cmd.def: Option<Node>
-> NodeEnum::ColumnDef(col_def)
```

**Translation Strategy**:
1. Look at Eugene's simplified logic
2. Map to corresponding PostgreSQL AST nodes
3. Use existing pgt rules as references
4. Check `pgt_query::protobuf` for available types/enums

## Useful References

- **Eugene source**: `eugene/eugene/src/lints/rules.rs`
- **Existing pgt rules**: `crates/pgt_analyser/src/lint/safety/`
- **Contributing guide**: `crates/pgt_analyser/CONTRIBUTING.md`
- **AST types**: `crates/pgt_query/src/lib.rs`
- **PostgreSQL protobuf**: `pgt_query::protobuf` module

## Next Steps

1. **Priority**: Port high-priority safety rules (E1, E2, E6, E7, E9)
2. **Review overlaps**: Check if E3, E5, E6, E7 overlap with existing rules
3. **Transaction tracking**: Design transaction state tracking for E4, E9
4. **Documentation**: Update Eugene source attribution in ported rules
5. **Testing**: Ensure comprehensive test coverage for all ported rules

## Template Checklist

When porting a new rule, ensure:

- [ ] Rule implementation in `src/lint/safety/<rule>.rs`
- [ ] Documentation with examples (invalid and valid cases)
- [ ] `sources: &[RuleSource::Eugene("<ID>")]` attribution
- [ ] At least 3-5 test files (mix of invalid and valid)
- [ ] Snapshot tests accepted with `cargo insta test --accept`
- [ ] All tests pass: `cargo test -p pgt_analyser --test rules_tests`
- [ ] Compilation clean: `cargo check`
- [ ] Code generation: `just gen-lint`
- [ ] Manual CLI test with sample SQL
- [ ] Update this document with completed rule
Loading