Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract transcripts and utrons from reads where XS tags are Assigned #89

Open
Tracked by #64
ns-rse opened this issue Nov 29, 2024 · 0 comments
Open
Tracked by #64
Labels

Comments

@ns-rse
Copy link
Contributor

ns-rse commented Nov 29, 2024

Currently we have...

            if read1.get_tag("XS") == "Assigned":
                transcripts_read1 = tx2gene[read1.get_tag("XT")]
                utrons1 = [utron_coords[tx] for tx in transcripts_read1]
                utrons1 = sum(utrons1, [])
            else:
                utrons1 = list()

            if read2.get_tag("XS") == "Assigned":
                transcripts_read2 = tx2gene[read2.get_tag("XT")]
                utrons2 = [utron_coords[tx] for tx in transcripts_read2]
                utrons2 = sum(utrons2, [])
            else:
                utrons2 = list()

This is duplicated code, instead a function taking a read should be used.

It looks like its simply building a set since utrons_coords is initialised as a dictionary and only the
transcripts_read values are used as keys, nothing gets assigned to the value of the dictionary.

tx2gene is a dictionary of non-transcript data from the GTF file. where the key is the gene_id and the value is the
transcript_id

So this is pulling out of that dictionary transcript_id for reads that have been assigned where the tag ix XT.

These are then sum(), not clear why an empty list is passed in here?

Early skeleton, need to look at this more closely.

def extract_utrons(read: ?, tag: str = "XT") -> dict:
    """
    Extract utrons from assigned transcripts.

    Parameters
    ----------
    read : ?
        A read.
    tag : str
        Tag to extract, default is 'XT'.

    Returns
    -------
    tuple[transcript, utron]
        Description
    """
    if read.get_tag("XS") == "Assigned":
        return {}

def sum_utrons(read: ?, tag: str = "XT") -> int:
    return sum(extract_utrons(read, tag))
@ns-rse ns-rse mentioned this issue Nov 29, 2024
26 tasks
@ns-rse ns-rse added the isoslam label Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant