read.ctd of alternative types? #1929
Replies: 17 comments 7 replies
-
oce has no code to read that directly. We could add it, but we'd need documentation from the supplier on the file format. (Without documentation, we are left to guess. If you look at #1927 you can see the sort of rabbit holes that need to be explored as a result of guessing...) But you can get results in a couple of minutes, just by doing e.g. for your file library(oce)
d <- read.table("14190549.csv", skip=51, header=FALSE)
ctd <- as.ctd(salinity=d$V6, temperature=d$V3, pressure=d$V2)
plot(ctd, eos="unesco") where I'm using 'unesco' because otherwise we need to parse the lon and lat. I get a graph like below. Does this look right? Note that (if we get docs) we'll be able to parse the header. Here I'm just using a "visual" method to find what seems to be a header line. For fun, you can count over to find the lon and lat lines, and read them in, as well. Then supply them as e.g. |
Beta Was this translation helpful? Give feedback.
-
Just to give a hint of how the coding could go --
tells us which line of the file has that |
Beta Was this translation helpful? Give feedback.
-
Oh, I just learned the German for "pressure". That's fun. |
Beta Was this translation helpful? Give feedback.
-
I wrote some more code. It at least can skip the header info and get to the data. The lon-lat data are odd. I see an "N" in the latitude, so I just trimmed it out. A better code would maybe handle "e", "E", "w", "W", "n", "N", "s", "S" and possibly whatever abbreviations are used in German, and set the signs appropriately. But anyway I'm just totally guessing -- the numbers are big so I'm guessing it's degree*100 or something odd like that. I'm attaching a zipfile. If you expand it you'll get some R code, the output I see as text and the output as a graph. If we get docs describing the format, it would be sensible to do more (e.g. the "original name" column could get filled in, and the units also). But we really need docs to take any further steps. I hope this helps. @richardsc may also have comments, if he's not tied up in meetings today. |
Beta Was this translation helpful? Give feedback.
-
Hi @dankelley , Thank you for the rapid response. At a first glance, the panel plot looks about right given the intense halocline and warmer waters that frequent parts of the Bornholm deeps. Indeed, the Druck x density relationship makes sense. The salinity values seem to have read from the column to the left (Boden = bottom) and the lat/lon values are reported in degrees decimal minutes (e.g., 5517.5251N = 55°17.5251) |
Beta Was this translation helpful? Give feedback.
-
I like incorporating new data types, and my code is about 80% of the answer. I'm not sure what you are saying about salinity. Is it wrong, below? Codelibrary(oce)
f <- "14190549.csv"
l <- readLines(f)
dataStart <- grep("^Lines[ ]*:[ ]*[0-9]*$", l)
if (1 != length(dataStart))
stop("cannot find 'Lines :' in the data file.")
# how many lines might there be in between?
dataNames <- strsplit(gsub("^;[ ]*", "", l[dataStart+2L]), "[ ]+")[[1]]
dataNamesOriginal <- dataNames
# FIXME: use a list and a loop for the name conversions.
dataNames[dataNames=="Druck"] <- "pressure"
dataNames[dataNames=="SALIN"] <- "salinity"
dataNames[dataNames=="Temp."] <- "temperature"
dataNames[dataNames=="Lat"] <- "latitude"
dataNames[dataNames=="Long"] <- "longitude"
d <- read.table(f, skip=dataStart + 4, col.names=dataNames, header=FALSE)
# Not sure on the "N" in latitude. We need docs to know what is possible
# in the location strings.
lon <- as.numeric(d$longitude[1])
londeg <- floor(lon / 100)
lonmin <- lon - londeg*100
longitude <- londeg + lonmin / 60.0
cat("lon=", lon, " deg=", londeg, " min=", lonmin, " -> longitude=", longitude, "\n")
lat <- as.numeric(gsub("N","",d$latitude[1]))
latdeg <- floor(lat / 100)
latmin <- lat - latdeg*100
latitude <- latdeg + latmin / 60.0
cat("lat=", lat, " deg=", latdeg, " min=", latmin, " -> latagitude=", latitude, "\n")
ctd <- as.ctd(salinity=d$salinity, temperature=d$temperature, pressure=d$pressure,
longitude=longitude, latitude=latitude)
head(ctd[["salinity"]]) # a check agaist the file
summary(ctd)
png("ctd_german.png")
plot(ctd, span=3000) Output
|
Beta Was this translation helpful? Give feedback.
-
I made the code get "original names" so you can do e.g. PS. my trial name for the function is `read.ctd.ssda(). That's just a guess at a name that might make sense. We normally use an abbreviation for a manufacturer. Your suggestion? Once we get docs we can do more, e.g. maybe This is it for me for the day, I think. What you see below might be enough to get you going. I am guessing that "Montag" means "Date" but would want to see a format before writing code to interpret that. And I see a dot after the day number, which is not something I've ever seen in a file. Decoding based on guesses is not a good idea. library(oce)
read.ctd.ssda <- function(file, debug=getOption("oceDebug"))
{
l <- readLines(file)
dataStart <- grep("^Lines[ ]*:[ ]*[0-9]*$", l)
if (1 != length(dataStart))
stop("cannot find 'Lines :' in the data file.")
# how many lines might there be in between?
dataNames <- strsplit(gsub("^;[ ]*", "", l[dataStart+2L]), "[ ]+")[[1]]
dataNamesOriginal <- dataNames
# Use standard oce names for some things. (FIXME: add others.)
nameMapping <- list(
pressure="Druck",
latitude="Lat",
longitude="Long",
salinity="SALIN",
sigma="SIGMA",
temperature="Temp.")
for (name in names(nameMapping)) {
filename <- nameMapping[[name]]
dataNames[dataNames == filename] <- name
}
d <- read.table(file, skip=dataStart + 4, col.names=dataNames, header=FALSE)
# Not sure on the "N" in latitude. We need docs to know what is possible
# in the location strings.
lon <- as.numeric(d$longitude[1])
londeg <- floor(lon / 100)
lonmin <- lon - londeg*100
longitude <- londeg + lonmin / 60.0
oceDebug(debug, "lon=", lon, " deg=", londeg, " min=", lonmin, " -> longitude=", longitude, "\n")
lat <- as.numeric(gsub("N","",d$latitude[1]))
latdeg <- floor(lat / 100)
latmin <- lat - latdeg*100
latitude <- latdeg + latmin / 60.0
oceDebug(debug, "lat=", lat, " deg=", latdeg, " min=", latmin, " -> latitude=", latitude, "\n")
ctd <- as.ctd(salinity=d$salinity, temperature=d$temperature, pressure=d$pressure,
longitude=longitude, latitude=latitude)
ctd@metadata$dataNamesOriginal <- nameMapping
# Add non-standard data
for (n in names(d)) {
if (!n %in% c(c("salinity", "pressure", "temperature", "latitude", "longitude"))) {
ctd <- oceSetData(ctd, n, d[[n]], note=NULL)
}
}
ctd
}
d <- read.ctd.ssda("14190549.csv")
head(d[["salinity"]]) # a check agaist the file
summary(d)
png("ctd_ssda.png")
plot(d, span=3000) |
Beta Was this translation helpful? Give feedback.
-
I see RawO2 as a name. What's that? |
Beta Was this translation helpful? Give feedback.
-
I have decoded new data, as suggested. To avoid too many confusing attachments here, please look at https://github.com/dankelley/oce-issues/blob/main/19xx/1929/1909.R for the code. If you run this, it will spit out a summary that can be checked for column renaming. It also spits out a demo that the times are decoded right ... please check these things. NOTE: the easiest is to just clone the oce-issues repo. That way you can do "git pull" to refresh to the latest version. Eventually, when we have docs so we know more, this code can go into oce. Of course I'll make it decode the date but we need docs on the format to be sure that will be on line 3 etc. |
Beta Was this translation helpful? Give feedback.
-
Wonderful! Thank you. I will check over and decode if necessary. The send the docs to finalize the discussion. |
Beta Was this translation helpful? Give feedback.
-
Please do I added some more for oxygen. I don't really know what "mg" means. Maybe "mg/L" or something? Anyway, the best plan (once we know for sure, mg/kg or mg/L for example) would be for you to "git pull", then run the R file, and if there's something you want changed in the "summary" table, let me know. We'll want units, especially. No rush on this. The reason I put stuff into the oce-issues repo is so I won't have to keep this in my head anymore. If you post to this issue thread, I'll get an email, and then I can look to see what's needed next. If the docs list a whole lot of variables, I won't add them until the weekend. (For example, the SBE docs have page after page of variables, since their software lets you save things in all kinds of crazy units, e.g. pressure in PSI, depth in feet, etc. and we want oce to decode as many as possible because SBE instruments are very, very common.) |
Beta Was this translation helpful? Give feedback.
-
Hi @dankelley, Apologies for the delay. Sea and Sun have finally relayed the CTD software documentation. To summarize the variables we usually collect in our cruises and in the file shared:
If it is worth parsing days that print in the output files here are the translations: I will be able to Thanks in advance. |
Beta Was this translation helpful? Give feedback.
-
Thanks Dan, a quick test of the function directly https://github.com/dankelley/oce-issues/blob/main/19xx/1929/1909.R and it successfully parsed the test file and plots as you showed above. I can do a pull and build tomorrow. |
Beta Was this translation helpful? Give feedback.
-
Hi Liam. In "develop" commit 4bf0d88, I have incorporated my code (after some improvements) into oce. We may be near to finalizing this issue, but I am hoping you will read and respond to the following list.
The code library(oce)
d <- read.ctd.ssda("14190549.csv")
summary(d)
png("ssda.png")
plot(d, span=3000)
dev.off()
# Finally, print some values for checking against the file.
head(data.frame(S=d[["salinity"]], T=d[["temperature"]], p=d[["pressure"]])) produces
|
Beta Was this translation helpful? Give feedback.
-
In commit 46e302d of branch "develop", I've made the PAR change, added new units, and added a test file. I wonder, @LiamMacNeil, whether you can look at the test file (https://github.com/dankelley/oce/blob/develop/tests/testthat/test_ctd_ssda.R) to see if I'm reading longitude and latitude correctly (since they are in a weird format I don't think I've seen before) and also whether I'm reading S, T and p correctly. It should be obvious to you, by examining the data file (https://github.com/dankelley/oce/blob/develop/inst/extdata/ctd_ssda.csv). I think it will only take you a few minutes to do these checks, and I'd appreciate the help. One more thing -- many oce functions detect serial numbers from file contents. In this case, I didn't know whether this information was in the file. If it is, and if you happen to know where, then I can add an ability to detect it. (I will not code it to work on filenames, however, because people often rename files and so detecting something from a filename is risky ... better to let the user keep track of this information, unless it's in the file.) We are one step away from closing this discussion, I think. |
Beta Was this translation helpful? Give feedback.
-
Hi @dankelley , The test file correctly reads in the coordinates from decimal degrees minutes and values of T, S, pressure all read correctly. Running the following:
Produces: Which looks as expected. Last thing for dates-- I had to check with a colleague who just returned from cruise-- the identity or serial number of a cast (assuming this is what you mean) is embedded in the filename as Year-Month-Day-Hour-Minute written as y-m-dd-hh-mm (e.g., 14190549.csv file indicates 2021-04-19-05:49. I'm not fond of this syntax but it is the norm. Unfortunately there is no simpler identity number to detect in the file. But thank you for checking and another grand thank you for your work, I expect this will be used frequently. Please let me know of any additional questions and I will continue using the develop branch until it is committed to the main. |
Beta Was this translation helpful? Give feedback.
-
I'm closing this because it seems to have been addressed. To be honest, I never look at github "discussion" and only noticed this long after the fact. It could have been close a year ago (to within a day, by a funny coincidence). |
Beta Was this translation helpful? Give feedback.
-
Hi @dankelley and @richardsc,
Is there a possible extension of
read.ctd
to accommodate different file formats? I commonly encounter files from SSDA Sea & Sun Technology´s Standard Data Acquisition software from German cruises and it would be incredibly handy to use oce (instead of the existing Fortran pipeline). I've attached an example file here.14190549.csv
Thank you for the excellent toolbox!
Beta Was this translation helpful? Give feedback.
All reactions