[DO NOT MERGE]: Datasets cache PoC #21

demenech · 2025-09-24T12:35:19Z

Summary by CodeRabbit

New Features
- Faster dataset search for non-visualization queries via caching, improving load times and responsiveness.
- Paginated dataset results with total counts to support UI pagination (page and limit).
- Validated query handling for dataset search requests with clear error responses on invalid input.
Refactor
- Dataset list now sources items from the new cached search results for improved consistency across the app.

…search page

coderabbitai · 2025-09-24T12:35:27Z

Walkthrough

Adds a cached dataset search pipeline: introduces a server-side cache and pagination in lib/data.ts, a GET API endpoint at /api/datasets/search, updates SearchContext to use the cached endpoint for non-visualization searches, and adjusts ListOfDatasets to render from results.

Changes

Cohort / File(s)	Summary
Dataset list rendering `components/dataset/search/ListOfDatasets.tsx`	Switches iteration source from `searchResults?.datasets` to `searchResults?.results` when rendering `DatasetItem`.
Search context wiring `components/dataset/search/SearchContext.tsx`	Adds SWR fetcher to `/api/datasets/search`; for non-visualization type, replaces `packageSearchResults` and loading flags with cached datasets/results. Notes deprecation of original call.
Caching and pagination logic `lib/data.ts`	Adds `getCachedDatasets` using `unstable_cache` to aggregate all datasets via pagination; introduces Zod `searchOptionsSchema`, `SearchOptions` type, and `searchCachedDatasets` to paginate cached results and return `{ results, count }`.
API endpoint `pages/api/datasets/search.tsx`	New Next.js API route handling GET; validates query with `searchOptionsSchema`; returns paginated cached datasets or 400 on validation error; 405 for non-GET.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant UI as UI (ListOfDatasets)
  participant Ctx as SearchContext
  participant API as /api/datasets/search (Next.js)
  participant Data as lib/data.ts
  participant Cache as getCachedDatasets (unstable_cache)
  participant Source as searchDatasets (external)

  UI->>Ctx: Trigger search (non-visualization)
  Ctx->>API: GET /api/datasets/search?page&limit
  API->>Data: searchCachedDatasets(options)
  Data->>Cache: getCachedDatasets()
  alt Cache warm/miss (first run or revalidate)
    Cache->>Source: searchDatasets(page=1..n, limit=10)
    Source-->>Cache: batched datasets
    Cache-->>Data: allDatasets[]
  else Cache hit
    Cache-->>Data: allDatasets[]
  end
  Data-->>API: { results (paged), count }
  API-->>Ctx: 200 OK JSON
  Ctx-->>UI: searchResults.results rendered

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Visualizations search + Tags #20 — Similar refactor of dataset search wiring in ListOfDatasets and SearchContext, aligning result fields and loading states.

Suggested reviewers

anuveyatsu
willy1989cv
luccasmmg

Poem

I cached a field of datasets bright,
In burrows deep, I stashed them tight.
A hop to API, a nibble of page,
Results pop out—precise, not vague.
With whisker-twitch and tidy logs,
The search now sprints—no lag, no fogs. 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "[DO NOT MERGE]: Datasets cache PoC" clearly and concisely summarizes the primary change — a proof-of-concept for a datasets caching implementation — and matches the diffs (new cache module, API route, and search context updates); it’s short and specific enough for a reviewer to understand the main intent. The "[DO NOT MERGE]" prefix is noisy but acceptable for a PoC branch.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch poc/fully-cached-data

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vercel · 2025-09-24T12:35:43Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
portaljs-cloud-frontend-template	Ready	Preview	Comment	Sep 24, 2025 0:35am

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

components/dataset/search/ListOfDatasets.tsx (2)

204-214: Unreachable ResultsNotFound; zero-results UX is broken

return <ResultsNotFound /> is after a return inside the same branch, so it never runs and zero results render nothing instead of the empty state.

Apply this diff:

   if (isLoading) return null;

-  if (count > 0) {
-    return (
-      <Pagination
-        subsetOfPages={subsetOfPages}
-        setSubsetOfPages={setSubsetOfPages}
-        count={count}
-      />
-    );
-
-    return <ResultsNotFound />;
-  }
+  if (!count) {
+    return <ResultsNotFound />;
+  }
+
+  return (
+    <Pagination
+      subsetOfPages={subsetOfPages}
+      setSubsetOfPages={setSubsetOfPages}
+      count={count}
+    />
+  );
 
   // make a pagination component once insights are added
-  return null;
+  return null;

248-251: Typo in CTA: “Clear fitlers” → “Clear filters”

User-facing copy.

Apply this diff:

-          Clear fitlers
+          Clear filters

🧹 Nitpick comments (4)

components/dataset/search/SearchContext.tsx (1)

81-93: Stabilize SWR key and avoid passing unused object to fetcher

Use a descriptive, primitive-only key to improve cache hits and avoid accidental re-fetching. Also, only include query if it’s supported server-side.

Apply this diff:

-  const { data: cachedDatasets, isValidating: isLoadingCachedDatasets } =
-    useSWR([packagesOptions], async (options) => {
-      const searchParams = new URLSearchParams();
-      searchParams.set("limit", String(options.limit));
-      const page = Math.floor(options.offset ?? 0 / options.limit) + 1;
-      searchParams.set("page", String(page));
-      searchParams.set("query", String(options.query));
-      const datasets = await fetch(
-        `/api/datasets/search?${searchParams.toString()}`
-      );
-      const data = await datasets.json();
-      return data;
-    });
+  const { data: cachedDatasets, isValidating: isLoadingCachedDatasets } =
+    useSWR(
+      [
+        "cached_datasets",
+        Math.floor(((options.offset ?? 0) / options.limit)) + 1,
+        options.limit,
+        options.query
+      ],
+      async (_key, page, limit, query) => {
+        const searchParams = new URLSearchParams();
+        searchParams.set("limit", String(limit));
+        searchParams.set("page", String(page));
+        // NOTE: include `query` only if backend supports it; otherwise omit.
+        // searchParams.set("query", String(query));
+        const res = await fetch(`/api/datasets/search?${searchParams.toString()}`);
+        return res.json();
+      }
+    );

lib/data.ts (3)

35-36: revalidate: false produces permanently stale cache

For any long‑lived instance, consider time‑based revalidation or tags for on‑demand revalidation.

Option A (time-based):

-    revalidate: false, // TODO: what happens if the UI triggers a time-based revalidation?
+    revalidate: 3600, // 1h; tune as needed for freshness vs. load

Option B (tags): add tags: ["cached-datasets"] and trigger revalidateTag("cached-datasets") on admin actions.

58-68: Support query (even a simple contains) to match UI intent

The UI sends query, but schema ignores it and results aren’t filtered. For the PoC, a simple title/name contains match is enough.

Apply this diff:

-export async function searchCachedDatasets(options: SearchOptions) {
-  const { page, limit } = options;
+export async function searchCachedDatasets(options: SearchOptions & { query?: string }) {
+  const { page, limit, query } = options;
   const allDatasets = await getCachedDatasets();
-  const filteredDatasets = allDatasets;
+  const q = (query ?? "").trim().toLowerCase();
+  const filteredDatasets = q
+    ? allDatasets.filter((d) =>
+        ((d.title ?? d.name ?? "") as string).toLowerCase().includes(q)
+      )
+    : allDatasets;
   // NOTE: maybe https://github.com/itemsapi/itemsjs instead of minisearch ?

   const startIdx = (page - 1) * limit;
   const endIdx = startIdx + limit;
   const paginatedDatasets = filteredDatasets.slice(startIdx, endIdx);
   return { results: paginatedDatasets, count: filteredDatasets.length };
 }

And extend the schema to accept query:

 export const searchOptionsSchema = z.object({
   limit: z
     .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1).max(25))
     .optional()
     .default(10),
   page: z
     .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1))
     .optional()
     .default(1),
+  query: z.string().optional().default(""),
 });

11-12: Avoid noisy console logs in server code

Use a logger or remove before merge.

-    console.log("Revalidating datasets cache: ", new Date().getTime());
+    // log: revalidating datasets cache

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad3a3bc and 1d8d997.

📒 Files selected for processing (4)

components/dataset/search/ListOfDatasets.tsx (1 hunks)
components/dataset/search/SearchContext.tsx (3 hunks)
lib/data.ts (1 hunks)
pages/api/datasets/search.tsx (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

lib/data.ts (2)

schemas/dataset.interface.ts (1)

Dataset (4-39)

lib/queries/dataset.ts (1)

searchDatasets (13-97)

pages/api/datasets/search.tsx (1)

lib/data.ts (2)

searchOptionsSchema (39-48)

searchCachedDatasets (58-68)

🔇 Additional comments (2)

components/dataset/search/ListOfDatasets.tsx (1)

54-56: Switch to searchResults.results aligns with new API

Rendering now matches the new { results, count } shape. Looks good.

components/dataset/search/SearchContext.tsx (1)

112-120: Facets still sourced from live package search — intentional?

Dataset results come from cache, but facets are still taken from packageSearchResults. If that’s intended during PoC, fine; otherwise consider deriving facets from the cached corpus for consistency.

components/dataset/search/SearchContext.tsx

coderabbitai · 2025-09-24T12:40:54Z

lib/data.ts

+      const pageDatasets = await searchDatasets({
+        limit,
+        offset: limit * page,
+        groups: [],
+        orgs: [],
+        tags: [],
+      });


⚠️ Potential issue

Constrain cached corpus to datasets

Without type: "dataset", the cache may include visualizations. Aligns with UI using cache only for dataset searches.

Apply this diff:

const pageDatasets = await searchDatasets({ limit, offset: limit * page, groups: [], orgs: [], tags: [], + type: "dataset", });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const pageDatasets = await searchDatasets({

limit,

offset: limit * page,

groups: [],

orgs: [],

tags: [],

});

const pageDatasets = await searchDatasets({

limit,

offset: limit * page,

groups: [],

orgs: [],

tags: [],

type: "dataset",

});

🤖 Prompt for AI Agents

In lib/data.ts around lines 16 to 22, the searchDatasets call is missing a type filter so the cached corpus can include non-dataset results (like visualizations); add type: "dataset" to the argument object passed to searchDatasets so the cache only stores datasets, e.g. include type: "dataset" alongside limit, offset, groups, orgs, and tags to constrain results to datasets.

coderabbitai · 2025-09-24T12:40:54Z

lib/data.ts

+export const searchOptionsSchema = z.object({
+  limit: z
+    .preprocess((x) => Number(x), z.number().min(0).max(25))
+    .optional()
+    .default(10),
+  page: z
+    .preprocess((x) => Number(x), z.number().min(1))
+    .optional()
+    .default(1),
+});


⚠️ Potential issue

Zod preprocess breaks defaults; missing params parse as NaN

Number(undefined) yields NaN, so .optional().default(...) won’t apply. Also prefer min(1) for limit.

Apply this diff:

export const searchOptionsSchema = z.object({ - limit: z - .preprocess((x) => Number(x), z.number().min(0).max(25)) - .optional() - .default(10), - page: z - .preprocess((x) => Number(x), z.number().min(1)) - .optional() - .default(1), + limit: z + .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1).max(25)) + .optional() + .default(10), + page: z + .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1)) + .optional() + .default(1), });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export const searchOptionsSchema = z.object({

limit: z

.preprocess((x) => Number(x), z.number().min(0).max(25))

.optional()

.default(10),

page: z

.preprocess((x) => Number(x), z.number().min(1))

.optional()

.default(1),

});

export const searchOptionsSchema = z.object({

limit: z

.preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1).max(25))

.optional()

.default(10),

page: z

.preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1))

.optional()

.default(1),

});

🤖 Prompt for AI Agents

In lib/data.ts around lines 39 to 48, the preprocess currently does Number(x) which turns undefined into NaN so the .default() never applies; change the preprocess to return undefined when input is undefined (e.g. preprocess: x => x === undefined ? undefined : Number(x)) so Zod can apply the optional default, and update limit’s min constraint from 0 to 1 (keep max 25) so limit must be at least 1.

coderabbitai · 2025-09-24T12:40:55Z

pages/api/datasets/search.tsx

+  if (req.method === "GET") {
+    try {
+      const validatedOptions = searchOptionsSchema.parse(req.query);
+      const results = await searchCachedDatasets(validatedOptions);
+
+      res.status(200).json(results);
+    } catch (e) {
+      if (e instanceof ZodError) {
+        res.status(400).json({ message: "Validation Error", errors: e.issues });
+      }
+    }


⚠️ Potential issue

Add 500 fallback and return after 400

Non‑Zod errors lead to a hanging request. Also return after sending 400 to avoid double responses.

Apply this diff:

if (req.method === "GET") { try { const validatedOptions = searchOptionsSchema.parse(req.query); const results = await searchCachedDatasets(validatedOptions); res.status(200).json(results); } catch (e) { if (e instanceof ZodError) { - res.status(400).json({ message: "Validation Error", errors: e.issues }); + return res + .status(400) + .json({ message: "Validation Error", errors: e.issues }); } + console.error("Cached datasets search failed:", e); + return res.status(500).json({ message: "Internal Server Error" }); } } else {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (req.method === "GET") {

try {

const validatedOptions = searchOptionsSchema.parse(req.query);

const results = await searchCachedDatasets(validatedOptions);

res.status(200).json(results);

} catch (e) {

if (e instanceof ZodError) {

res.status(400).json({ message: "Validation Error", errors: e.issues });

}

}

if (req.method === "GET") {

try {

const validatedOptions = searchOptionsSchema.parse(req.query);

const results = await searchCachedDatasets(validatedOptions);

res.status(200).json(results);

} catch (e) {

if (e instanceof ZodError) {

return res

.status(400)

.json({ message: "Validation Error", errors: e.issues });

}

console.error("Cached datasets search failed:", e);

return res.status(500).json({ message: "Internal Server Error" });

}

} else {

🤖 Prompt for AI Agents

In pages/api/datasets/search.tsx around lines 9 to 19, the try/catch only handles ZodError and does not return after sending the 400 response nor handle other exceptions, which can hang the request or cause double responses; modify the catch to return immediately after res.status(400).json(...) for Zod errors, and add an else branch that logs the error and responds with res.status(500).json({ message: "Internal Server Error" }) for non‑Zod exceptions so every error path sends a single response.

feat(datasets): cache all datasets and use the cache on the datasets …

1d8d997

…search page

coderabbitai bot reviewed Sep 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DO NOT MERGE]: Datasets cache PoC #21

[DO NOT MERGE]: Datasets cache PoC #21

Uh oh!

demenech commented Sep 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 24, 2025 •

edited

Loading

Uh oh!

vercel bot commented Sep 24, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Sep 24, 2025

Uh oh!

coderabbitai bot Sep 24, 2025

Uh oh!

coderabbitai bot Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[DO NOT MERGE]: Datasets cache PoC #21

Are you sure you want to change the base?

[DO NOT MERGE]: Datasets cache PoC #21

Uh oh!

Conversation

demenech commented Sep 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

vercel bot commented Sep 24, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

demenech commented Sep 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 24, 2025 •

edited

Loading