-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop a function to split text exceeding 200 characters #19
Comments
Acknowledging the historical context of SAS V5 transport file restrictions, I'd like to point out that these limitations don't apply in R, offering us a chance to opt-out. However, my experience with submissions is limited, and I'm seeking clarity: are we considering this feature to meet current regulatory expectations, or is it more about maintaining legacy compatibility? |
Agree that SAS V5 limitations do not apply to R dataframes. However, we still want to have this feature to be compliant with SDTMIG rules. I am also lacking submission experience with R packages, so looking for more feedback. |
The submission SAS files still have the V5 limitations, which include no string can exceed 200 characters in length. Even if the xpt files are generated using R, the SASV5 limitations still apply unfortunately. |
I had a look at this in SQL and if we consider a text field mapping to DSTERM for example where the field can be a maximum of 600 characters and we cannot split words, the formula for calculating the max number of required output variables is: The first (up to) 200 characters maps to DSTERM, with the remainder of the source string mapping to the QNAM variables DSTERM1, DSTERM2, DSTERM3. I have used the SQL code: This will select up to 200 characters without splitting words and insert into the DSTERM DS variable. Similar statements can then be used to select the other strings for each of the subsequent supplemental QVAL values. |
Hi @parikp06 and @pendingintent, I appreciate the clarification on the compliance with SDTMIG rules and the current submission requirements. Given this context, I'm ready to take on the development of this feature. I'll await any additional feedback or a decision on moving forward with this approach. |
|
This function will be called within a dataframe with input variable to be split. Upon successful execution, it should generate SplitVarN variables along with SplitCount to the output dataframe. |
So it does not push SPLITVARn into SUPP domains? |
no, it'll just create required list of variables. SplitCount will help in determining how many SUPP variables to be populated. |
Keep the 200 character limit as parameterized and keep the default value as 200. |
Hi @madhan0923 and @GomathiVallinayagam, are you working on this issue, or can I take it over? |
Yes @galachad , we are working on the issue. |
Adam recently did a very nice refactor of a similar function in Roche roak package. I would encourage collaborating with him, at least to get his ideas of function design 🙏 |
sure @edgar-manukyan, we are about to complete the draft. with Adam's help we can release the 1st version for QC |
Hi Madhan (@madhan0923): I just had a look at your code in PR #32. I have a few suggestions/remarks. I think your implementation of In addition, it might be a good idea to also implement a reverse function, e.g. From reading SDTMIG v3.4. (page 55), I think three questions remain:
It might help taking a look at some code I've written in the past that uses R matrices to represent protein alignments, where I developed a similar function to reflow the alignment to a different width: https://github.com/maialab/agvgd/blob/master/R/split_alignment_by_lines.R |
Hello everyone,
|
Feature Idea
Purpose
Split text exceeding 200 characters into multiple variables.
SDTMIG v3.3 | CDISC
Functionality
Based on the current requirement for the SAS v5 Transport file format, it is not possible to store the long text strings exceeding 200 characters using only one variable. Therefore, the SDTMIG has defined conventions for storing long text string using multiple variables. Follow the steps mentioned below to create a function to split long text into multiple variables.
Split the first 200 characters of text into split_var1 and each additional 200 characters of text into split_varN
When splitting a text string into several variables, the text should be split between words to improve readability, so it does not break a word.
Create a variable split_count to hold the number of new variables created by the split.
Default length to split on would be 200. However, please allow this value can be passed via split_length parameter to any other length.
Relevant Input
A character variable to be split into multiple variables.
An integer parameter split_length which would be set to 200 by default.
Relevant Output
One or more variables after splitting: split_var1 – split_varN.
Variable to hold the number of splits: split_count
Reproducible Example/Pseudo Code
split_vars(var_to_split,split_length=200)
The text was updated successfully, but these errors were encountered: