-
Notifications
You must be signed in to change notification settings - Fork 0
[DO NOT MERGE]: Datasets cache PoC #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a cached dataset search pipeline: introduces a server-side cache and pagination in lib/data.ts, a GET API endpoint at /api/datasets/search, updates SearchContext to use the cached endpoint for non-visualization searches, and adjusts ListOfDatasets to render from results. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant UI as UI (ListOfDatasets)
participant Ctx as SearchContext
participant API as /api/datasets/search (Next.js)
participant Data as lib/data.ts
participant Cache as getCachedDatasets (unstable_cache)
participant Source as searchDatasets (external)
UI->>Ctx: Trigger search (non-visualization)
Ctx->>API: GET /api/datasets/search?page&limit
API->>Data: searchCachedDatasets(options)
Data->>Cache: getCachedDatasets()
alt Cache warm/miss (first run or revalidate)
Cache->>Source: searchDatasets(page=1..n, limit=10)
Source-->>Cache: batched datasets
Cache-->>Data: allDatasets[]
else Cache hit
Cache-->>Data: allDatasets[]
end
Data-->>API: { results (paged), count }
API-->>Ctx: 200 OK JSON
Ctx-->>UI: searchResults.results rendered
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/dataset/search/ListOfDatasets.tsx (2)
204-214: UnreachableResultsNotFound; zero-results UX is broken
return <ResultsNotFound />is after areturninside the same branch, so it never runs and zero results render nothing instead of the empty state.Apply this diff:
if (isLoading) return null; - if (count > 0) { - return ( - <Pagination - subsetOfPages={subsetOfPages} - setSubsetOfPages={setSubsetOfPages} - count={count} - /> - ); - - return <ResultsNotFound />; - } + if (!count) { + return <ResultsNotFound />; + } + + return ( + <Pagination + subsetOfPages={subsetOfPages} + setSubsetOfPages={setSubsetOfPages} + count={count} + /> + ); // make a pagination component once insights are added - return null; + return null;
248-251: Typo in CTA: “Clear fitlers” → “Clear filters”User-facing copy.
Apply this diff:
- Clear fitlers + Clear filters
🧹 Nitpick comments (4)
components/dataset/search/SearchContext.tsx (1)
81-93: Stabilize SWR key and avoid passing unused object to fetcherUse a descriptive, primitive-only key to improve cache hits and avoid accidental re-fetching. Also, only include query if it’s supported server-side.
Apply this diff:
- const { data: cachedDatasets, isValidating: isLoadingCachedDatasets } = - useSWR([packagesOptions], async (options) => { - const searchParams = new URLSearchParams(); - searchParams.set("limit", String(options.limit)); - const page = Math.floor(options.offset ?? 0 / options.limit) + 1; - searchParams.set("page", String(page)); - searchParams.set("query", String(options.query)); - const datasets = await fetch( - `/api/datasets/search?${searchParams.toString()}` - ); - const data = await datasets.json(); - return data; - }); + const { data: cachedDatasets, isValidating: isLoadingCachedDatasets } = + useSWR( + [ + "cached_datasets", + Math.floor(((options.offset ?? 0) / options.limit)) + 1, + options.limit, + options.query + ], + async (_key, page, limit, query) => { + const searchParams = new URLSearchParams(); + searchParams.set("limit", String(limit)); + searchParams.set("page", String(page)); + // NOTE: include `query` only if backend supports it; otherwise omit. + // searchParams.set("query", String(query)); + const res = await fetch(`/api/datasets/search?${searchParams.toString()}`); + return res.json(); + } + );lib/data.ts (3)
35-36:revalidate: falseproduces permanently stale cacheFor any long‑lived instance, consider time‑based revalidation or tags for on‑demand revalidation.
Option A (time-based):
- revalidate: false, // TODO: what happens if the UI triggers a time-based revalidation? + revalidate: 3600, // 1h; tune as needed for freshness vs. loadOption B (tags): add
tags: ["cached-datasets"]and triggerrevalidateTag("cached-datasets")on admin actions.
58-68: Supportquery(even a simple contains) to match UI intentThe UI sends
query, but schema ignores it and results aren’t filtered. For the PoC, a simple title/name contains match is enough.Apply this diff:
-export async function searchCachedDatasets(options: SearchOptions) { - const { page, limit } = options; +export async function searchCachedDatasets(options: SearchOptions & { query?: string }) { + const { page, limit, query } = options; const allDatasets = await getCachedDatasets(); - const filteredDatasets = allDatasets; + const q = (query ?? "").trim().toLowerCase(); + const filteredDatasets = q + ? allDatasets.filter((d) => + ((d.title ?? d.name ?? "") as string).toLowerCase().includes(q) + ) + : allDatasets; // NOTE: maybe https://github.com/itemsapi/itemsjs instead of minisearch ? const startIdx = (page - 1) * limit; const endIdx = startIdx + limit; const paginatedDatasets = filteredDatasets.slice(startIdx, endIdx); return { results: paginatedDatasets, count: filteredDatasets.length }; }And extend the schema to accept
query:export const searchOptionsSchema = z.object({ limit: z .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1).max(25)) .optional() .default(10), page: z .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1)) .optional() .default(1), + query: z.string().optional().default(""), });
11-12: Avoid noisy console logs in server codeUse a logger or remove before merge.
- console.log("Revalidating datasets cache: ", new Date().getTime()); + // log: revalidating datasets cache
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
components/dataset/search/ListOfDatasets.tsx(1 hunks)components/dataset/search/SearchContext.tsx(3 hunks)lib/data.ts(1 hunks)pages/api/datasets/search.tsx(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
lib/data.ts (2)
schemas/dataset.interface.ts (1)
Dataset(4-39)lib/queries/dataset.ts (1)
searchDatasets(13-97)
pages/api/datasets/search.tsx (1)
lib/data.ts (2)
searchOptionsSchema(39-48)searchCachedDatasets(58-68)
🔇 Additional comments (2)
components/dataset/search/ListOfDatasets.tsx (1)
54-56: Switch tosearchResults.resultsaligns with new APIRendering now matches the new
{ results, count }shape. Looks good.components/dataset/search/SearchContext.tsx (1)
112-120: Facets still sourced from live package search — intentional?Dataset results come from cache, but facets are still taken from
packageSearchResults. If that’s intended during PoC, fine; otherwise consider deriving facets from the cached corpus for consistency.
| const pageDatasets = await searchDatasets({ | ||
| limit, | ||
| offset: limit * page, | ||
| groups: [], | ||
| orgs: [], | ||
| tags: [], | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constrain cached corpus to datasets
Without type: "dataset", the cache may include visualizations. Aligns with UI using cache only for dataset searches.
Apply this diff:
const pageDatasets = await searchDatasets({
limit,
offset: limit * page,
groups: [],
orgs: [],
tags: [],
+ type: "dataset",
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const pageDatasets = await searchDatasets({ | |
| limit, | |
| offset: limit * page, | |
| groups: [], | |
| orgs: [], | |
| tags: [], | |
| }); | |
| const pageDatasets = await searchDatasets({ | |
| limit, | |
| offset: limit * page, | |
| groups: [], | |
| orgs: [], | |
| tags: [], | |
| type: "dataset", | |
| }); |
🤖 Prompt for AI Agents
In lib/data.ts around lines 16 to 22, the searchDatasets call is missing a type
filter so the cached corpus can include non-dataset results (like
visualizations); add type: "dataset" to the argument object passed to
searchDatasets so the cache only stores datasets, e.g. include type: "dataset"
alongside limit, offset, groups, orgs, and tags to constrain results to
datasets.
| export const searchOptionsSchema = z.object({ | ||
| limit: z | ||
| .preprocess((x) => Number(x), z.number().min(0).max(25)) | ||
| .optional() | ||
| .default(10), | ||
| page: z | ||
| .preprocess((x) => Number(x), z.number().min(1)) | ||
| .optional() | ||
| .default(1), | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zod preprocess breaks defaults; missing params parse as NaN
Number(undefined) yields NaN, so .optional().default(...) won’t apply. Also prefer min(1) for limit.
Apply this diff:
export const searchOptionsSchema = z.object({
- limit: z
- .preprocess((x) => Number(x), z.number().min(0).max(25))
- .optional()
- .default(10),
- page: z
- .preprocess((x) => Number(x), z.number().min(1))
- .optional()
- .default(1),
+ limit: z
+ .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1).max(25))
+ .optional()
+ .default(10),
+ page: z
+ .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1))
+ .optional()
+ .default(1),
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export const searchOptionsSchema = z.object({ | |
| limit: z | |
| .preprocess((x) => Number(x), z.number().min(0).max(25)) | |
| .optional() | |
| .default(10), | |
| page: z | |
| .preprocess((x) => Number(x), z.number().min(1)) | |
| .optional() | |
| .default(1), | |
| }); | |
| export const searchOptionsSchema = z.object({ | |
| limit: z | |
| .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1).max(25)) | |
| .optional() | |
| .default(10), | |
| page: z | |
| .preprocess((x) => (x === undefined || x === "" ? undefined : Number(x)), z.number().int().min(1)) | |
| .optional() | |
| .default(1), | |
| }); |
🤖 Prompt for AI Agents
In lib/data.ts around lines 39 to 48, the preprocess currently does Number(x)
which turns undefined into NaN so the .default() never applies; change the
preprocess to return undefined when input is undefined (e.g. preprocess: x => x
=== undefined ? undefined : Number(x)) so Zod can apply the optional default,
and update limit’s min constraint from 0 to 1 (keep max 25) so limit must be at
least 1.
| if (req.method === "GET") { | ||
| try { | ||
| const validatedOptions = searchOptionsSchema.parse(req.query); | ||
| const results = await searchCachedDatasets(validatedOptions); | ||
|
|
||
| res.status(200).json(results); | ||
| } catch (e) { | ||
| if (e instanceof ZodError) { | ||
| res.status(400).json({ message: "Validation Error", errors: e.issues }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add 500 fallback and return after 400
Non‑Zod errors lead to a hanging request. Also return after sending 400 to avoid double responses.
Apply this diff:
if (req.method === "GET") {
try {
const validatedOptions = searchOptionsSchema.parse(req.query);
const results = await searchCachedDatasets(validatedOptions);
res.status(200).json(results);
} catch (e) {
if (e instanceof ZodError) {
- res.status(400).json({ message: "Validation Error", errors: e.issues });
+ return res
+ .status(400)
+ .json({ message: "Validation Error", errors: e.issues });
}
+ console.error("Cached datasets search failed:", e);
+ return res.status(500).json({ message: "Internal Server Error" });
}
} else {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (req.method === "GET") { | |
| try { | |
| const validatedOptions = searchOptionsSchema.parse(req.query); | |
| const results = await searchCachedDatasets(validatedOptions); | |
| res.status(200).json(results); | |
| } catch (e) { | |
| if (e instanceof ZodError) { | |
| res.status(400).json({ message: "Validation Error", errors: e.issues }); | |
| } | |
| } | |
| if (req.method === "GET") { | |
| try { | |
| const validatedOptions = searchOptionsSchema.parse(req.query); | |
| const results = await searchCachedDatasets(validatedOptions); | |
| res.status(200).json(results); | |
| } catch (e) { | |
| if (e instanceof ZodError) { | |
| return res | |
| .status(400) | |
| .json({ message: "Validation Error", errors: e.issues }); | |
| } | |
| console.error("Cached datasets search failed:", e); | |
| return res.status(500).json({ message: "Internal Server Error" }); | |
| } | |
| } else { |
🤖 Prompt for AI Agents
In pages/api/datasets/search.tsx around lines 9 to 19, the try/catch only
handles ZodError and does not return after sending the 400 response nor handle
other exceptions, which can hang the request or cause double responses; modify
the catch to return immediately after res.status(400).json(...) for Zod errors,
and add an else branch that logs the error and responds with
res.status(500).json({ message: "Internal Server Error" }) for non‑Zod
exceptions so every error path sends a single response.
Summary by CodeRabbit