Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to Optionally get site.info if not present #3324

Open
wants to merge 54 commits into
base: develop
Choose a base branch
from

Conversation

Sweetdevil144
Copy link
Contributor

@Sweetdevil144 Sweetdevil144 commented Jul 5, 2024

Description

This PR aims to refactor site.id and improve str_ns. The end goal is to make the the System independent of DB. Currently, I'm refactoring to create a siteID if not present already. I'll add test cases to check this util function too.
Work still in pending

Some comments from @mdietze :

  1. Here's my slightly different interpretation :
  2. Require the user to input a dataframe with lat/lon and optionally siteID; if siteID is not provided, >construct a unique identifier from lat/lon
  3. replace str_ns with provided unique identifier
  4. Provide a helper function that takes in a BETY connection and site IDs and returns lat, lon, and str_ns

Motivation and Context

May Fix a Subtask of #3307

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

In do_conversion.R

Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
@github-actions github-actions bot added the Tests label Jul 5, 2024
Signed-off-by: Abhinav Pandey <[email protected]>
@Sweetdevil144
Copy link
Contributor Author

@meetagrawal09 can you cross check if corresponding changes in test.met.process are valid or not? :)

Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
@infotroph
Copy link
Member

I think you're taking the wrong approach to task 2 here: Yes, "if siteID is not provided, construct a unique identifier from lat/lon", but it needs to be constructed without using the DB. As we discussed in Slack, this could be as simple as something like id_str <- paste0(lat, "_", lon).

@Sweetdevil144
Copy link
Contributor Author

Current code is updated for db-less interaction. What more improvements can be made? Would creating a new site-id be correct for a pre existing lat and lon or should we consider fetching it from db if connection is present. Else-if con=NULL, we can generate a new siteID. We can create a int+str (For eg : dcf544e6c etc) styled site-id to distinguish this str_ns from those site-id's when con!=NULL.

Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
@Sweetdevil144
Copy link
Contributor Author

Should we add a unit test to check if get.new.site() works correctly?

base/db/R/get.new.site.R Outdated Show resolved Hide resolved
Copy link
Member

@infotroph infotroph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting my notes from our live review -- they're a bit cryptic, but hopefully enough to remind you of the key points from the conversation.

Basically I think the entire get.new.site function is trying to handle too many cases, and instead when the db isn't available we should simply paste together lat and lon to use as a siteid.


if (is.null(site$id) | is.na(site$id)) {
if ((!is.null(site$lat) && !is.null(site$lon)) |
(!is.na(site$lat) && !is.na(site$lon))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird little gotcha here -- the !is.null will fire when lat and lon are NA, and the !is.na will return a missing value when lat and lon are null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want && instead of |

lat = site$lat,
lon = site$lon
)
str_ns <- paste0(new.site$lat, "-", new.site$lon)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to use a string other than "-" to concatenate here, to avoid confusion between separator characters and negative lon values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced current logic with a "_" sign

}
}

site.info <- list(new.site = new.site, str_ns = str_ns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q in live review: why this structure? A: to match what's expected at met.process line 165

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed Explanation : Current structure is followed to match data flow of what information is later on utilised in met.process for other function calls

Comment on lines 141 to 145
generate_new_siteID <- function() {
# Generate a random number. Assuming higher order integers to increase randomness in IDs
random_id <- sample(10000:99999999, 1)
return(as.numeric(random_id))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want IDs to be deterministic -- don't need this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you suggest a method to perform so? I am unable to determine one right now :) !!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One approach I suggest is as follow : Hashing to generate a unique id using lat and lon of the site. This would ensure greater precision upto 8 decimal points.

base/db/R/get.new.site.R Outdated Show resolved Hide resolved
base/workflow/R/do_conversions.R Outdated Show resolved Hide resolved
site.info <- PEcAn.DB::get.new.site(site, con=con, latlon = latlon)

# extract new.site and str_ns from site.info
new.site <- site.info$new.site
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hypothesis to confirm before acting: No downstream object uses new.site$id, can remove it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new.site$id is later on passed on to download.raw.met.module for further conversions in convert_input function. I guess for that instance we will have to keep it rather than duplicating the code.

modules/data.land/R/ic_process.R Outdated Show resolved Hide resolved
modules/data.land/R/soil_process.R Outdated Show resolved Hide resolved
@infotroph infotroph mentioned this pull request Sep 3, 2024
13 tasks
Copy link
Contributor Author

@Sweetdevil144 Sweetdevil144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From previous comments on #3348, I couldn't resist removing unnecessary changes. I will post a discussion regarding this PR and gather insights from other 'members & contributors' too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants