-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing repositories #286
Conversation
For future consideration, each of the bullet points in the PR description could be its own separate change request. This would vastly simplify reviews and shorten time for revisions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of great work. Can see that this will work as is, but I found a couple functional changes (in addition to code style things). Of importance:
- chunk the bulk data requests to ES to ensure that we have a maximum request size
- clean up the repository requests to paginated endpoints to prevent infinite loops
apps/server/src/utils/logger.ts
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the pretty printing in the dev console, but it can be a lot of noise in a log file in a production env where the colors arent being rendered. I'm curious to see how these logs look in a deployed environment and if we might want a flag to disable the pretty printing.
Might be nice to have some env log options for enabling pretty printing, logging as json, and choosing the log level, etc.
(Honestly we should standardize all this into a logger we can reuse across overture packages but thats another issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Development environment: Moved pretty printing to dev only (as recommended here).
- Production: standard JSON format with ISO date format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some cool work parsing arrays of repositories. May want to have a separate file for the repo env processing, its kind of messy mixed in with the other env stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created a new file to separate repo env processing as suggested
apps/server/src/config/provider.ts
Outdated
const getRepositoryConfig = ( | ||
repos: (z.infer<typeof lyricSchemaDefinition> | z.infer<typeof songSchemaDefinition>)[], | ||
): (LyricRepositoryConfig | SongRepositoryConfig)[] => { | ||
const lyricRepos = repos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare this as const lyricRepos: LyricRepositoryConfig[]
and then you won't need the type assertion on the map return. This will then validate if your return type is correct as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix type assertion as requested. thanks for the suggestion!
} catch (error) { | ||
logger.error(`Error fetching Lyric records on category '${categoryId}'`, error); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the request always fails we will get stuck in an infinite look because hasMoreData
is never updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed the code to throw an error to stop infinite loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeat comment re: infinite loop potential across multiple functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw an error to stop infinite loop
/** | ||
* Finds the repository by its repository code | ||
* @param repositories | ||
* @param repoCode | ||
* @returns | ||
*/ | ||
export const getRepoInformation = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description should clearly state that if the repository cannot be found this will return undefined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specified in comment when undefined
is returned
/** | ||
* Determines if passed configuration is appropiate for Lyric | ||
* @param object | ||
* @returns | ||
*/ | ||
export const isLyricConfiguration = ( | ||
object: LyricRepositoryConfig | SongRepositoryConfig, | ||
): object is LyricRepositoryConfig => { | ||
return object.type === 'LYRIC'; | ||
}; | ||
|
||
/** | ||
* Determines if passed configuration is appropiate for Song | ||
* @param object | ||
* @returns | ||
*/ | ||
export const isSongConfiguration = ( | ||
object: LyricRepositoryConfig | SongRepositoryConfig, | ||
): object is SongRepositoryConfig => { | ||
return object.type === 'SONG'; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since TS 5.5 you shouldn't need to declare the type assertion for this predicate, if this type narrowing is valid then TS will infer it on its own. (ie. remove object is songRepositoryConfig
.
Since this is just using differentiated union on object.type
you could probably replace these functions with a switch(repository.case)
in the code that uses these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed these 2 functions as they are no longer needed, replaced them with a switch case as suggested,Typescript handles automatically discriminated union to know each type
/** | ||
* Checks if a given value is an array of objects where each object has values that are either | ||
* strings, numbers, booleans, `null`, or nested objects | ||
* @param value | ||
* @returns | ||
*/ | ||
export const isArrayOfObjects = (value: unknown): value is Array<Record<string, DataRecordValue>> => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats the purpose of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is specifically used when we expect the value to be strictly an array of objects. For instance, when fetching records from Song/Lyric, we expect to retrieve an array of records, any other type is considered an invalid response.
738b19a
to
935f5ca
Compare
README.md
Outdated
- kafka | ||
- elastic search | ||
- other helper tools if you want like kafka rest proxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- kafka | |
- elastic search | |
- other helper tools if you want like kafka rest proxy | |
- Kafka | |
- Elasticsearch | |
- other helper tools you may want, like a Kafka RESTful proxy |
const repoCount = getRepoCount(); | ||
|
||
// Loop through the repositories found | ||
for (let i = 0; i < repoCount; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please replace this with repoCount.forEach
to make things more declarative (and consistent with the rest of our codebases)
#ImperativeSucks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repoCount
is a number, so this may be a reasonable case for a forLoop. We would want to have a utility function to re-write this in a declarative way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right... and in my defence, coffee hadn't kicked in.
my argument against imperative/procedural code stands, we're meant to tell the system "what" to do, not "how"
README.md
Outdated
|
||
- **Maestro Common:** Designed to centralize common utilities, reusable functions, and TypeScript type definitions. | ||
|
||
- **Maestro Indexer Client:** Abstracts communication with Elasticsearch clients, supporting both version 7 and version 8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Maestro Indexer Client:** Abstracts communication with Elasticsearch clients, supporting both version 7 and version 8. | |
- **Maestro Indexer Client:** Abstracts communication with Elasticsearch clients, supporting both version 7 and 8. |
docs/usage.md
Outdated
- switch your Aliases if it still points to the old index | ||
- now that your data is migrated Maestro will be indexing based on the new mapping. | ||
- Create new index with the updated mapping | ||
- Reindex your data (you can point maestro to your new index and trigger indexing on full repositories to do that, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Reindex your data (you can point maestro to your new index and trigger indexing on full repositories to do that, | |
- Reindex your data (you can point Maestro to your new index and trigger indexing on full repositories to do that, |
docs/usage.md
Outdated
@@ -63,66 +62,67 @@ curl -X POST \ | |||
|
|||
## Kafka topics | |||
|
|||
Maestro can be configured as mentioned under the running configurations section to listen to kafka topics | |||
Maestro can be configured as mentioned under the running configurations section to listen to kafka topics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maestro can be configured as mentioned under the running configurations section to listen to kafka topics | |
Maestro can be configured as mentioned under the running configurations section to listen to Kafka topics |
* @returns | ||
*/ | ||
export const isArrayOfObjects = (value: unknown): value is Array<DataRecordNested> => { | ||
return ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious to see this implemented in vanilla code, instead of Zod (and I like that).
why the choice though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh didn't thought about using Zod for this, initially this function was checking only arrays types were added later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking this has improved a bit since I first read it, though I am concerned we haven't addressed pagination, or ES bulk request chunking. I see Result objects for some responses which is good.
I know you plan on chunking some of those changes into a separate PR adn thats fine. Since this is a major refactor and being worked on in a branch separate form the primary, I say go ahead and merge this when you feel its ready, and I'll continue to review future change requests.
apps/server/src/config/envConfig.ts
Outdated
MAESTRO_ELASTICSEARCH_VERSION: z.coerce | ||
.number() | ||
.int() | ||
.refine((val) => val === 7 || val === 8, { message: 'Version must be 7 or 8' }), // Ensure 7 or 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this just be a zod enum? Doing a refine to a limited set of values seems way trickier. We also would probably like to have a list of valid versions to use as a switch in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created a constant in common
package to be used anywhere in the monorepo
apps/server/src/config/envConfig.ts
Outdated
const loggerConfigSchema = z.object({ | ||
MAESTRO_LOGGING_LEVEL_ROOT: z.string().default('info'), | ||
MAESTRO_LOGGING_LEVEL: z.string().default('info'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are a set of valid logging levels, lets define those. zod enum use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added enum
with valid logger levels.
apps/server/src/config/provider.ts
Outdated
import { env } from './envConfig.js'; | ||
import { type lyricSchemaDefinition, repositoryTypes, type songSchemaDefinition } from './repositoryConfig.js'; | ||
|
||
setLogLevel(env.MAESTRO_LOGGING_LEVEL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is happening at a global import level, not inside the provider instance. Isn't the logger to use part of the dependencies passed to each provider instance? The log level should be set there based on the dependencies/config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved this setter function into the MaestroProviderConfig
object. Good catch!
* @returns an array of valid repositories | ||
* @throws an error if repository is invalid | ||
*/ | ||
export const validateRepositories = (env: NodeJS.ProcessEnv) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeJS.ProcessEnv
I didn't know of this type, interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's part of the typescript global types 😉
|
||
type SongRepositoryConfig = { | ||
type ValueOf<T> = T[keyof T]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a common utility type that you could export from its own file for global use. Typically I call this Values<T>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just call this file isEmpty.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if that is the case, this export in line 6 should become default
.
however, there is a case for this validation
to be a barrelled folder instead
utils
|-validation
|-index (exports centrally from all files in this folder)
|-isEmpty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created a folder validation
to hold isEmpty.ts
* @param value The value to check. This can be of any type. | ||
* @returns `true` if the value is `null`, `undefined`, an empty object, an empty array or an empty string; otherwise, `false`. | ||
*/ | ||
export const isEmpty = (value: unknown) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok, is there a test for this?
[null, undefined, '', ' ', '0', true, false, {}, []].map((value)=>({value,isEmpty:isEmpty(value)}))
[
{ value: null, isEmpty: true },
{ value: undefined, isEmpty: true },
{ value: "", isEmpty: true },
{ value: " ", isEmpty: false },
{ value: "0", isEmpty: false },
{ value: true, isEmpty: false },
{ value: false, isEmpty: false },
{ value: {}, isEmpty: true },
{ value: [], isEmpty: true }
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this has a suite of unit tests
|
||
return { | ||
indexName: index, | ||
successful, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this get addressed?
packages/common/src/types/logger.ts
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inclined to suggest this could follow the standard "module" pattern, i.e. having this file named "types.ts" and placed within the "src/logger" folder. that pattern facilitates finding all code in their relative/contextual scopes. in this current pattern you have "buckets full of X", which may or not include the ones for this or that component or controller, or provider, or whatever.
let me explain myself: following what you did in logger, if i want to create a new button for example, and knowing me you know my button needs to look special, right? so it has all sorts of fancy styles and contains an emoji image in it to make it "cool!", so then I go create the "src/Button" folder and within I create the ts file for it (Button/Button.ts), then I'd go back out to src create an "images" folder where I will put all the images including my emoji for the button, then put the css file in a "src/styles" folder, and then put the types in this "src/types" folder here where all the types are?
later on, if we realize my button was silly and want to remove it from the codebase, I'll be chasing down files all over the repo instead of just deleting the one single folder and calling it a day. I don't know, this file/folder structure pattern seems scattered and hard to maintain, but I'm willing to be convinced otherwise.
@joneubank probably has a different opinion from mine?
packages/repository/README.md
Outdated
> [!NOTE] | ||
> This package is likely not the one you want to use in your project, it is primarily used as a dependency within the Maestro monorepo. [@overture-stack/maestro-provider](https://www.npmjs.com/package/@overture-stack/maestro-provider) is what you are most likely to want to use. | ||
|
||
This package is an internal component of the Maestro monorepo designed to manage interactions with data source repositories. It serves as the central interface for retrieving data from various repositories, such as **Song** and **Lyric**, ensuring a streamlined and consistent approach to data access. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This package is an internal component of the Maestro monorepo designed to manage interactions with data source repositories. It serves as the central interface for retrieving data from various repositories, such as **Song** and **Lyric**, ensuring a streamlined and consistent approach to data access. | |
This package is an internal component of the Maestro monorepo designed to manage interactions with data source repositories. It serves as the central interface for retrieving data from various repositories, such as **SONG** and **Lyric**, ensuring a streamlined and consistent approach to data access. |
"version": "5.0.0-alpha.1", | ||
"description": "A submodule of Maestro that manages interactions with data source repositories, providing a unified interface for efficient data access and management", | ||
"main": "dist/index.js", | ||
"typings": "dist/index.d.ts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought this was supposed to be "types"? curious to understand what the difference is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can be both. ref: Note that the "typings" field is synonymous with "types", and could be used as well.
Closing this PR as most of the feeback was resolved, the rest of work will be done on a refactoring PR to reduce complexity. |
Summary:
Song
orLyric
. New environment variables were added to facilitate repository configuration.Lyric
as data source repository for indexing the entire repository by category (all studies/organizations), a specific study/organization, or an individual document by it's unique ID. Issue Maestro V5 - Lyric integration #283docs/
folder according new implementations.Related issues: