Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISCON input file encoding #260

Open
pelljam opened this issue Sep 19, 2023 · 7 comments
Open

DISCON input file encoding #260

pelljam opened this issue Sep 19, 2023 · 7 comments
Labels
help wanted Extra attention is needed

Comments

@pelljam
Copy link

pelljam commented Sep 19, 2023

Hello,

I just wondered if you could clarify what encoding is supported by DISCON for the primary input file? I notice in the control_interface.py you encode using UTF-8:

self.accINFILE = self.param_name.encode('utf-8')

However when I try to use a path with Unicode characters, it can't find the file? This is on Windows.

Thanks,

James

@dzalkind
Copy link
Collaborator

Hi,

I might need a little more information to help you. What exactly is your use case and the issue you are seeing?

Are you trying to run with the python control interface or something else? If possible, use quotes in your paths and try to simplify it.

Best, Dan

@pelljam
Copy link
Author

pelljam commented Sep 19, 2023

Hi Dan,

Thank you for your reply.

We maintain our own wrapper/interface to the ROSCO DLL. When we pass the input file path to this, currently we encode using ASCII. However, having seen your interface encoding using UTF-8, I wondered if that was supported. However, when I tried to do that and then pass a path with Unicode characters to ROSCO, it failed to find the input file.

So I was just wondering if you could confirm what encoding should be used when calling the DLL?

Thanks,

James

@dzalkind
Copy link
Collaborator

Thanks for the context, James!

I can't confirm any specific encoding. That may just be what is needed for the dynamic library set up in python. It's been a while since that was developed and no one I've asked seems to recall the specifics.

Is this discrepancy causing you issues? If not, I'm inclined to let it go and note that the DISCON.IN file path should be encoded in ASCII.

@davidheff
Copy link
Contributor

It feels uncomfortable to restrict users to ASCII in the modern day. This was something that happened in pre Unicode days, but for 2023 it's really awkward for any real word usage. Perhaps in English speaking parts of the world it won't give anyone any trouble, but the majority of the world routinely uses characters outside ASCII.

Fundamentally what seems to be needed is a way for the code which opens files, to handle non ASCII characters in some way. I'm no Fortran expert. Does Fortran have support Unicode file names?

@davidheff
Copy link
Contributor

My best guess is that, at least on typical Windows Fortran environments, OPEN expects filenames to be encoded with the active code page. So probably the best that you can do is try to encode any file names that way. In Python you'd using this encoding:

f"cp{ctypes.windll.kernel32.GetACP()}"

I think! But that will only get you so far. You are still in trouble if your file name / path has characters from outside the active code page. This is of course why Unicode exists.

@dzalkind
Copy link
Collaborator

Thanks for the feedback. These are valid points.

The type of the input file is determined here:

CHARACTER(KIND=C_CHAR), INTENT(IN ) :: accINFILE(NINT(avrSWAP(50))) ! The name of the parameter input file

I'm not sure we currently have the bandwidth to support this, but input is welcome from the community.

ROSCO also reads in several strings, which might require updates, too.

@dzalkind dzalkind added the help wanted Extra attention is needed label Sep 20, 2023
@davidheff
Copy link
Contributor

It's not the format of the file itself that is the issue. It's just the handling of the file names. So when DISCON is called and the file name is passed in by the host, that file name, which for my usage needs to be a complete absolute path, can often have non ASCII characters. That file name gets passed as the first argument to OPEN. So to resolve this there'd need to be a way to open a file whose name can come from arbitrary set of characters, i.e. Unicode.

What I don't know, as I'm not a Fortran programmer, is how Unicode is typically handled for such scenarios. It would astound me though if there wasn't a clean way to do this in 2023 in Fortran, although I guess you never know with Fortran!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants