-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-structure Capella Bucket=>Scope=>Collection configuration #379
Comments
It implies that the metadata is moved from METAR collection to COMMON
collection and that METAR collection will only have type "DD" documents
(the same for RAOB collection). This will require code changes to ingest,
metadata scripts, and client.
randy
…On Wed, May 29, 2024 at 11:06 AM Gopa ***@***.***> wrote:
No change to bucket, 3 scopes , development, integration, production, and
2 scopes under each, currently just METAR and COMMON.
—
Reply to this email directly, view it on GitHub
<#379>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGDVQPSO6P5YDQA2D2J6CNDZEYDJBAVCNFSM6AAAAABIPLDB3CVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDGOBRGY4DQOI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Randy Pierce
|
From a quick Google-ing a scope cannot be renamed after it is created. Have sent email to Couchbase ...
|
A couple of other questions:
|
To summarize the discussion from the dev meeting: We decided we need to move this issue up and address how best to use collections, scopes, and buckets for our project & application. We would like to come up with some use cases & whiteboard through how key parts of the application lifecycle would work with different data models. Ideally this would happen during the ingest meeting. During the meeting we
Information needed
ContextCouchbase Server 7 (released in 2021) introduced Scopes & Collections. Previously it was recommended to put all data in a “Bucket” and distinguish the documents with a |
This link explains Collections and Scope: Just noting down some salient points below: A collection is a data container. A scope is a mechanism for the grouping of multiple collections. Up to 1000 scopes can be created per cluster. Benefits of Scopes and Collections The logical grouping of similar documents; potentially simplifying operations such as query, XDCR, and backup and restore. The increased efficiency of indexing, due to the Data Service being able to provide documents from specific collections to the Index Service. Simplified querying, since query statements are able to easily specify particular subsets of documents. Easier migration from relational databases to Couchbase Server, since collections can be designed to correspond to pre-existing relational tables. Secure isolation of different document-types, within a bucket; allowing applications to be specifically authorized to use only their appropriate subsets of data (see Access to Scopes and Collections, below). This should help give us some guidance in organizing our document hierarchy. Lets plan to discuss further. |
Thanks, Gopa! That makes it sound like it would be beneficial to explore using collections more. 2. How do collections, scopes, and buckets interact with XDCR & Time-To-Live fields?TTL fields
XDCR
|
During the dev meeting we confirmed that we:
And we need the following for today:
|
So here are a few answers from my point of view.
- *We also noted that we could use a new scope to distinguish between
the on-prem and aws ingest systems." * - No. There should not be a scope
that distinguishes between "where" a system is deployed. That should not
matter. The data is the data regardless of where it resides.
- *"Do we want 3 copies of the data?" - * No. The test and development
scopes should always be quite limited in size. Data duplication is not a
good thing in this case, and in my opinion.
- *"Want the Database Collections to mirror the document subset fields" *-
This is only because it is handy. The type "DD" documents are the lion
share of the data, and they are best suited to benefit from differentiated
scopes. The type "MD" metadata documents won't benefit much from having
different collections. In my opinion it would be fine to just put them all
in a "metadata" collection named whatever. Indexing will be efficient
because the data set will be small. I think the challenge for the metadata
update scripts comes from querying the actual data anyway. Let's just pick
a name and not worry about this too much.
The primary types that we have are ...
> select distinct raw type from vxdata._default.RAOB
[
"DD",
"DF",
"JOB",
"MD"
]
DD is data
DF is data file (records what data file is already ingested)
JOB is a JOB spec
MD is a metadata document
I'm doing a more exhaustive query on the METAR collection but that query
may take a long time. I'll send those results later when the query finishes.
In addition these there are test types i.e. DD-TEST, MD-TEST, JOB-TEST etc.
subsets are METAR, RAOB, and COMMON - same story as the above on the
exhaustive list
docType is a much more dynamic field that DD or MD.
> select distinct raw docType from vxdata._default.RAOB
[
"obs",
"ingest",
"ingest_mapping",
"station",
"stationReference"
]
These are all I have for RAOBS so far.
obs are observations
ingest are ingest templates
ingest_mapping is used for prepbufr mnemonic mappings
station is a station document
stationReference is used for keeping a list of stations that we are
interested in
In addition there will be...
model - a model document
partial_sums - a partial sums document
ctc - a contingency document,
and probably others.
For scenarios, use cases, etc I think the five listed above are enough to
get us started. Eventually we will need to do much more specific ones but
we do not have enough context yet to approach those.
randy
…On Thu, Jul 25, 2024 at 6:56 AM Ian McGinnis ***@***.***> wrote:
During the dev meeting we confirmed that we:
- Want the Database Scope to reflect the environment development, test,
and prod were mentioned.
- We also noted that we could use a new scope to distinguish
between the on-prem and aws ingest systems.
- Do we want 3 copies of the data? How much data is retained/what
data goes where?
- Want the Database Collections to mirror the document subset fields
- We will need to redo our indices to take advantage of this
- The contents and naming of the "metadata" collection is still an
open discussion. Do we have a singular metadata collection, or do
we have multiple collections based on metadata type? (Job, Stations, etc...)
And we need the following for today:
1. @randytpierce <https://github.com/randytpierce> & @gopa-noaa
<https://github.com/gopa-noaa> - To provide a list of document type,
docType, and subset fields currently in use.
2. @randytpierce <https://github.com/randytpierce> & @gopa-noaa
<https://github.com/gopa-noaa> - To consider what scenarios we want to
whiteboard out. Currently, we have:
- Ingesting data via cron, for various data types if relevant
- Ingesting data via event, for various data types if relevant
- Expiring data
- Retrieving archived data
- Querying data from MATS
—
Reply to this email directly, view it on GitHub
<#379 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGDVQPT6XTZKJOCBAV4SFLLZODYWJAVCNFSM6AAAAABIPLDB3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJQGI2TKNRTGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Randy Pierce
|
Here are the results from the METAR Collection:
|
And on the On-Prem Cluster:
|
On-Prem Cluster output for types:
|
To summarize the meeting last week:
Remaining questions:
I'm sure I missed a few things. 🙂 |
Forgot to take notes in last meeting, if I remember correctly, here are the main points:
Questions:
|
Recording here current state of affairs ... Magma storage transition status:
Buckets->Scopes->Collections adb-cb1
abd-cb2,3,4
Capella
|
Based on our decisions on Sep 18th (see above)
|
|
So action plans from our last meeting:
|
I'd add one more action item:
|
No change to bucket, 3 scopes , development, integration, production, and 3 collections under each, currently just METAR, RAOB, and COMMON.
The text was updated successfully, but these errors were encountered: