-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HELP #121
Comments
You should not be providing a TopLevel URI. You do need to provide a URI prefix. Do not though use a SBH URI prefix. Instead, maybe use something like https://baldwin.org/ or really any domain. You might also use the converter rather than the validator, since it has less options that may lead to confusion: |
Can I suggest https://www.imperial.ac.uk/baldwinlab/[projectname] ? |
Thanks for the useful input. That worked and it overcame one issue compared to directly importing .gb files to SynBioHub in that it preserved the name of the object. However the annotation of the imported object hasn't worked so well. I am exporting parts from Benchling with the intention of pushing a fairly large library from Benchling to SBH. When the files are imported none of the annotations are correctly classified, so promoters, RBS etc are all just engineered regions and do not have the correct ontology. I was hoping that a file converter might deal with these issues, but apparently not. Any suggestions how to do a better job on this? The engineered regions have labels e.g. Terminator; Promoter in the .gb - can these be converted into the correct sequence ontology so they display correctly in SBH? Example .gb file attached (as .txt) ? |
This is what it looks like after import |
The issue you are having is due to a really inconvenient feature of Benchling. Namely, the annotation type field is free text, so it does not restrict you to a limited set of types. SnapGene on the other hand limits you to a semi-standard set of GenBank annotation feature types. Without this restriction than converting to Sequence Ontology types is not possible to do in all cases. Any variations in your text string with the GenBank feature types makes it difficult or impossible to know what type you are referring to. I mention that the GenBank feature types are semi-standard. There is nowhere that I have been able to find a list of the standard GenBank feature types. Instead, there is sort of a community sourced agreement on what they should be. I've collected these from various sources and mapped them to the Sequence Ontology. This is what the GenBank to SBOL converter does. Here is the list: https://docs.google.com/spreadsheets/d/1X870i3NhO7xEhqhLXK4eravNd72x-O-xbrpmlT835nY/edit?usp=sharing We could add to this list, but again Benchling not restricting your types makes this a never ending problem. I suggest that if you want good conversion that you restrict the types you use in Benchling to this list. This spreadsheet also lists the SBOL Visual glyph that you get for the specified GenBank feature type. Note that many GenBank features do not yet have SBOL Visual glyphs assigned to them. On the flip side, there are SBOL Visual glyphs without a corresponding GenBank feature type. As per your specific example, nucleotide, spacer, and RiboJ are not GenBank feature types. While Terminator and Promoter, should be terminator and promoter. I could potentially fix these later two by making my converter case insensitive. I have hesitated to do this though to make round-tripping consistent. In summary, if you want your GenBank features to convert to specific Sequence Ontology features, you need to use the semi-standard list. If things are missing, you can suggest additions to our list. In any case, care needs to be taken when entering your types in Benchling, since mis-spellings will defeat the conversion. |
Thanks Chris, that's really helpful. I will take a look at some of the other features that we have that correspond to SBOL glyphs to see if there are other useful mappings that we could suggest. |
@cjmyers What do you think of incorporating TYTO lookups to try to resolve unknown terms? |
TYTO? |
So here's a few more suggestions to add to the list: ribozyme | SO:0000374 | RS |
@geoffbaldwin https://github.com/SynBioDex/tyto is a python library that @bbartley has built that has functions that let one easily map between names and ontology terms. |
Using tyto sounds plausible to pull this out of the JAVA code and make it more extensible. However, I'm not exactly sure how to integrate the python code into the JAVA library. I guess it could be through a webservice, but this would make off-line conversion not possible. |
@cjmyers Looks like there's simple solutions for calling python from Java, as long as your python is native (which TYTO is): https://stackoverflow.com/questions/8898765/calling-python-in-java/8899042 |
I am trying to test SBOL converter to convert genbank files to SBOL.
I keeps flagging the following error:
Converting GenBank to SBOL Version 2
TopLevel https://synbiohub.org/user/gbaldwin/ not found
I have used a valid synbiohub URL - I have no idea what it is looking for here.
I also don't know what to include in the URI prefix for converted objects. There is no documentation on this and the video only covers import of SBOL files and conversion to other formats. I need help getting Genbank files into SBOL.
Thanks,
Geoff
The text was updated successfully, but these errors were encountered: