Skip to content

Conversation

@demenech
Copy link
Member

@demenech demenech commented Sep 24, 2025

Summary by CodeRabbit

  • New Features
    • Faster dataset search for non-visualization queries via caching, improving load times and responsiveness.
    • Paginated dataset results with total counts to support UI pagination (page and limit).
    • Validated query handling for dataset search requests with clear error responses on invalid input.
  • Refactor
    • Dataset list now sources items from the new cached search results for improved consistency across the app.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 24, 2025

Walkthrough

Adds a cached dataset search pipeline: introduces a server-side cache and pagination in lib/data.ts, a GET API endpoint at /api/datasets/search, updates SearchContext to use the cached endpoint for non-visualization searches, and adjusts ListOfDatasets to render from results.

Changes

Cohort / File(s) Summary
Dataset list rendering
components/dataset/search/ListOfDatasets.tsx
Switches iteration source from searchResults?.datasets to searchResults?.results when rendering DatasetItem.
Search context wiring
components/dataset/search/SearchContext.tsx
Adds SWR fetcher to /api/datasets/search; for non-visualization type, replaces packageSearchResults and loading flags with cached datasets/results. Notes deprecation of original call.
Caching and pagination logic
lib/data.ts
Adds getCachedDatasets using unstable_cache to aggregate all datasets via pagination; introduces Zod searchOptionsSchema, SearchOptions type, and searchCachedDatasets to paginate cached results and return { results, count }.
API endpoint
pages/api/datasets/search.tsx
New Next.js API route handling GET; validates query with searchOptionsSchema; returns paginated cached datasets or 400 on validation error; 405 for non-GET.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant UI as UI (ListOfDatasets)
  participant Ctx as SearchContext
  participant API as /api/datasets/search (Next.js)
  participant Data as lib/data.ts
  participant Cache as getCachedDatasets (unstable_cache)
  participant Source as searchDatasets (external)

  UI->>Ctx: Trigger search (non-visualization)
  Ctx->>API: GET /api/datasets/search?page&limit
  API->>Data: searchCachedDatasets(options)
  Data->>Cache: getCachedDatasets()
  alt Cache warm/miss (first run or revalidate)
    Cache->>Source: searchDatasets(page=1..n, limit=10)
    Source-->>Cache: batched datasets
    Cache-->>Data: allDatasets[]
  else Cache hit
    Cache-->>Data: allDatasets[]
  end
  Data-->>API: { results (paged), count }
  API-->>Ctx: 200 OK JSON
  Ctx-->>UI: searchResults.results rendered
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • anuveyatsu
  • willy1989cv
  • luccasmmg

Poem

I cached a field of datasets bright,
In burrows deep, I stashed them tight.
A hop to API, a nibble of page,
Results pop out—precise, not vague.
With whisker-twitch and tidy logs,
The search now sprints—no lag, no fogs. 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "[DO NOT MERGE]: Datasets cache PoC" clearly and concisely summarizes the primary change — a proof-of-concept for a datasets caching implementation — and matches the diffs (new cache module, API route, and search context updates); it’s short and specific enough for a reviewer to understand the main intent. The "[DO NOT MERGE]" prefix is noisy but acceptable for a PoC branch.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch poc/fully-cached-data

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vercel
Copy link

vercel bot commented Sep 24, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
portaljs-cloud-frontend-template Ready Ready Preview Comment Sep 24, 2025 0:35am

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
components/dataset/search/ListOfDatasets.tsx (2)

204-214: Unreachable ResultsNotFound; zero-results UX is broken

return <ResultsNotFound /> is after a return inside the same branch, so it never runs and zero results render nothing instead of the empty state.

Apply this diff:

   if (isLoading) return null;

-  if (count > 0) {
-    return (
-      <Pagination
-        subsetOfPages={subsetOfPages}
-        setSubsetOfPages={setSubsetOfPages}
-        count={count}
-      />
-    );
-
-    return <ResultsNotFound />;
-  }
+  if (!count) {
+    return <ResultsNotFound />;
+  }
+
+  return (
+    <Pagination
+      subsetOfPages={subsetOfPages}
+      setSubsetOfPages={setSubsetOfPages}
+      count={count}
+    />
+  );
 
   // make a pagination component once insights are added
-  return null;
+  return null;

248-251: Typo in CTA: “Clear fitlers” → “Clear filters”

User-facing copy.

Apply this diff:

-          Clear fitlers
+          Clear filters
🧹 Nitpick comments (4)
components/dataset/search/SearchContext.tsx (1)

81-93: Stabilize SWR key and avoid passing unused object to fetcher

Use a descriptive, primitive-only key to improve cache hits and avoid accidental re-fetching. Also, only include query if it’s supported server-side.

Apply this diff:

-  const { data: cachedDatasets, isValidating: isLoadingCachedDatasets } =
-    useSWR([packagesOptions], async (options) => {
-      const searchParams = new URLSearchParams();
-      searchParams.set("limit", String(options.limit));
-      const page = Math.floor(options.offset ?? 0 / options.limit) + 1;
-      searchParams.set("page", String(page));
-      searchParams.set("query", String(options.query));
-      const datasets = await fetch(
-        `/api/datasets/search?${searchParams.toString()}`
-      );
-      const data = await datasets.json();
-      return data;
-    });
+  const { data: cachedDatasets, isValidating: isLoadingCachedDatasets } =
+    useSWR(
+      [
+        "cached_datasets",
+        Math.floor(((options.offset ?? 0) / options.limit)) + 1,
+        options.limit,
+        options.query
+      ],
+      async (_key, page, limit, query) => {
+        const searchParams = new URLSearchParams();
+        searchParams.set("limit", String(limit));
+        searchParams.set("page", String(page));
+        // NOTE: include `query` only if backend supports it; otherwise omit.
+        // searchParams.set("query", String(query));
+        const res = await fetch(`/api/datasets/search?${searchParams.toString()}`);
+        return res.json();
+      }
+    );
lib/data.ts (3)

35-36: revalidate: false produces permanently stale cache

For any long‑lived instance, consider time‑based revalidation or tags for on‑demand revalidation.

Option A (time-based):

-    revalidate: false, // TODO: what happens if the UI triggers a time-based revalidation?
+    revalidate: 3600, // 1h; tune as needed for freshness vs. load

Option B (tags): add tags: ["cached-datasets"] and trigger revalidateTag("cached-datasets") on admin actions.


58-68: Support query (even a simple contains) to match UI intent

The UI sends query, but schema ignores it and results aren’t filtered. For the PoC, a simple title/name contains match is enough.

Apply this diff:

-export async function searchCachedDatasets(options: SearchOptions) {
-  const { page, limit } = options;
+export async function searchCachedDatasets(options: SearchOptions & { query?: string }) {
+  const { page, limit, query } = options;
   const allDatasets = await getCachedDatasets();
-  const filteredDatasets = allDatasets;
+  const q = (query ?? "").trim().toLowerCase();
+  const filteredDatasets = q
+    ? allDatasets.filter((d) =>
+        ((d.title ?? d.name ?? "") as string).toLowerCase().includes(q)
+      )
+    : allDatasets;
   // NOTE: maybe https://github.com/itemsapi/itemsjs instead of minisearch ?

   const startIdx = (page - 1) * limit;
   const endIdx = startIdx + limit;
   const paginatedDatasets = filteredDatasets.slice(startIdx, endIdx);
   return { results: paginatedDatasets, count: filteredDatasets.length };
 }

And extend the schema to accept query:

 export const searchOptionsSchema = z.object({
   limit: z
     .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1).max(25))
     .optional()
     .default(10),
   page: z
     .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1))
     .optional()
     .default(1),
+  query: z.string().optional().default(""),
 });

11-12: Avoid noisy console logs in server code

Use a logger or remove before merge.

-    console.log("Revalidating datasets cache: ", new Date().getTime());
+    // log: revalidating datasets cache
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad3a3bc and 1d8d997.

📒 Files selected for processing (4)
  • components/dataset/search/ListOfDatasets.tsx (1 hunks)
  • components/dataset/search/SearchContext.tsx (3 hunks)
  • lib/data.ts (1 hunks)
  • pages/api/datasets/search.tsx (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
lib/data.ts (2)
schemas/dataset.interface.ts (1)
  • Dataset (4-39)
lib/queries/dataset.ts (1)
  • searchDatasets (13-97)
pages/api/datasets/search.tsx (1)
lib/data.ts (2)
  • searchOptionsSchema (39-48)
  • searchCachedDatasets (58-68)
🔇 Additional comments (2)
components/dataset/search/ListOfDatasets.tsx (1)

54-56: Switch to searchResults.results aligns with new API

Rendering now matches the new { results, count } shape. Looks good.

components/dataset/search/SearchContext.tsx (1)

112-120: Facets still sourced from live package search — intentional?

Dataset results come from cache, but facets are still taken from packageSearchResults. If that’s intended during PoC, fine; otherwise consider deriving facets from the cached corpus for consistency.

Comment on lines +16 to +22
const pageDatasets = await searchDatasets({
limit,
offset: limit * page,
groups: [],
orgs: [],
tags: [],
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Constrain cached corpus to datasets

Without type: "dataset", the cache may include visualizations. Aligns with UI using cache only for dataset searches.

Apply this diff:

       const pageDatasets = await searchDatasets({
         limit,
         offset: limit * page,
         groups: [],
         orgs: [],
         tags: [],
+        type: "dataset",
       });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const pageDatasets = await searchDatasets({
limit,
offset: limit * page,
groups: [],
orgs: [],
tags: [],
});
const pageDatasets = await searchDatasets({
limit,
offset: limit * page,
groups: [],
orgs: [],
tags: [],
type: "dataset",
});
🤖 Prompt for AI Agents
In lib/data.ts around lines 16 to 22, the searchDatasets call is missing a type
filter so the cached corpus can include non-dataset results (like
visualizations); add type: "dataset" to the argument object passed to
searchDatasets so the cache only stores datasets, e.g. include type: "dataset"
alongside limit, offset, groups, orgs, and tags to constrain results to
datasets.

Comment on lines +39 to +48
export const searchOptionsSchema = z.object({
limit: z
.preprocess((x) => Number(x), z.number().min(0).max(25))
.optional()
.default(10),
page: z
.preprocess((x) => Number(x), z.number().min(1))
.optional()
.default(1),
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Zod preprocess breaks defaults; missing params parse as NaN

Number(undefined) yields NaN, so .optional().default(...) won’t apply. Also prefer min(1) for limit.

Apply this diff:

 export const searchOptionsSchema = z.object({
-  limit: z
-    .preprocess((x) => Number(x), z.number().min(0).max(25))
-    .optional()
-    .default(10),
-  page: z
-    .preprocess((x) => Number(x), z.number().min(1))
-    .optional()
-    .default(1),
+  limit: z
+    .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1).max(25))
+    .optional()
+    .default(10),
+  page: z
+    .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1))
+    .optional()
+    .default(1),
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export const searchOptionsSchema = z.object({
limit: z
.preprocess((x) => Number(x), z.number().min(0).max(25))
.optional()
.default(10),
page: z
.preprocess((x) => Number(x), z.number().min(1))
.optional()
.default(1),
});
export const searchOptionsSchema = z.object({
limit: z
.preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1).max(25))
.optional()
.default(10),
page: z
.preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1))
.optional()
.default(1),
});
🤖 Prompt for AI Agents
In lib/data.ts around lines 39 to 48, the preprocess currently does Number(x)
which turns undefined into NaN so the .default() never applies; change the
preprocess to return undefined when input is undefined (e.g. preprocess: x => x
=== undefined ? undefined : Number(x)) so Zod can apply the optional default,
and update limit’s min constraint from 0 to 1 (keep max 25) so limit must be at
least 1.

Comment on lines +9 to +19
if (req.method === "GET") {
try {
const validatedOptions = searchOptionsSchema.parse(req.query);
const results = await searchCachedDatasets(validatedOptions);

res.status(200).json(results);
} catch (e) {
if (e instanceof ZodError) {
res.status(400).json({ message: "Validation Error", errors: e.issues });
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add 500 fallback and return after 400

Non‑Zod errors lead to a hanging request. Also return after sending 400 to avoid double responses.

Apply this diff:

   if (req.method === "GET") {
     try {
       const validatedOptions = searchOptionsSchema.parse(req.query);
       const results = await searchCachedDatasets(validatedOptions);

       res.status(200).json(results);
     } catch (e) {
       if (e instanceof ZodError) {
-        res.status(400).json({ message: "Validation Error", errors: e.issues });
+        return res
+          .status(400)
+          .json({ message: "Validation Error", errors: e.issues });
       }
+      console.error("Cached datasets search failed:", e);
+      return res.status(500).json({ message: "Internal Server Error" });
     }
   } else {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (req.method === "GET") {
try {
const validatedOptions = searchOptionsSchema.parse(req.query);
const results = await searchCachedDatasets(validatedOptions);
res.status(200).json(results);
} catch (e) {
if (e instanceof ZodError) {
res.status(400).json({ message: "Validation Error", errors: e.issues });
}
}
if (req.method === "GET") {
try {
const validatedOptions = searchOptionsSchema.parse(req.query);
const results = await searchCachedDatasets(validatedOptions);
res.status(200).json(results);
} catch (e) {
if (e instanceof ZodError) {
return res
.status(400)
.json({ message: "Validation Error", errors: e.issues });
}
console.error("Cached datasets search failed:", e);
return res.status(500).json({ message: "Internal Server Error" });
}
} else {
🤖 Prompt for AI Agents
In pages/api/datasets/search.tsx around lines 9 to 19, the try/catch only
handles ZodError and does not return after sending the 400 response nor handle
other exceptions, which can hang the request or cause double responses; modify
the catch to return immediately after res.status(400).json(...) for Zod errors,
and add an else branch that logs the error and responds with
res.status(500).json({ message: "Internal Server Error" }) for non‑Zod
exceptions so every error path sends a single response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants