Skip to content

Conversation

@pulpdrew
Copy link
Contributor

@pulpdrew pulpdrew commented Oct 21, 2025

Closes HDX-2623

Summary

This change improves the performance of getKeyValues when getting values of a JSON key.

Generally, columns that are not referenced outside of a CTE will be pruned by the query planner. For JSON however, if the outer select references one field in a JSON column, then the inner select will read (it seems) the entire JSON object.

This PR also adds integration tests for getKeyValues to ensure that the function generates queries that work as expected in ClickHouse.

Performance impact (on single JSON Dashboard Filter)

  • Original: 15.03s
Screenshot 2025-10-21 at 3 28 07 PM
  • Optimized: 0.443s
Screenshot 2025-10-21 at 3 25 47 PM

@changeset-bot
Copy link

changeset-bot bot commented Oct 21, 2025

🦋 Changeset detected

Latest commit: 23f37db

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@hyperdx/common-utils Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link

vercel bot commented Oct 21, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
hyperdx-v2-oss-app Ready Ready Preview Comment Oct 23, 2025 9:20am

@claude
Copy link

claude bot commented Oct 21, 2025

PR Review: getKeyValues Performance Optimization

Performance Improvement: ✅ Excellent optimization - 34x speedup (15.03s → 0.443s)

Code Quality: ✅ Well-structured with comprehensive integration tests

Critical Issues

None found - this is a solid optimization.

Minor Observations

Integration Test Cleanup:

  • ⚠️ Missing afterAll cleanup for ClickHouse clients → Add await client.close() and await hdxClient.close() in packages/common-utils/src/tests/metadata.int.test.ts:74

Test Coverage:

  • ℹ️ Integration tests properly verify both code paths (disableRowLimit=true/false)
  • ℹ️ Tests cover JSON columns, materialized columns, and regular columns

Code Change Summary:
The optimization correctly addresses ClickHouse's behavior where referencing one JSON field causes the entire JSON object to be read. The fix:

  1. In the CTE, selects only the specific keys needed (not all columns)
  2. Applies aggregation in the outer query on these limited columns
  3. Properly maintains support for materialized columns

CI Configuration:

  • ✅ Correctly adds integration tests to common-utils with run-many -t ci:int --parallel=false
  • ✅ New dev-int-common-utils make target for local testing

Recommendation

LGTM with one minor fix - Add client cleanup in the integration test afterAll hook to prevent connection leaks.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 21, 2025

E2E Test Results

All tests passed • 26 passed • 3 skipped • 236s

Status Count
✅ Passed 26
❌ Failed 0
⚠️ Flaky 0
⏭️ Skipped 3

View full report →

const selectClause = keys
.map((k, i) => `groupUniqArray(${limit})(${k}) AS param${i}`)
.join(', ');
if (keys.length === 0) return [];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the functional changes are in this file.

This check was added because previously, the query would generate an empty select clause when no keys were provided, resulting in a query error. (eg. SELECT FROM table...)

@pulpdrew pulpdrew force-pushed the drew/optimize-filter-sampling branch from 3297abd to 23f37db Compare October 23, 2025 09:16
@pulpdrew pulpdrew marked this pull request as ready for review October 23, 2025 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant