Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a pipe instead of a temp file #4

Open
jfischoff opened this issue Jun 1, 2018 · 12 comments
Open

Use a pipe instead of a temp file #4

jfischoff opened this issue Jun 1, 2018 · 12 comments

Comments

@jfischoff
Copy link

I don't see why this library writes to temp file as opposed to using a pipe with a call like this one: https://hackage.haskell.org/package/process-1.6.3.0/docs/System-Process.html#v:createPipe

It seems like a pipe is more robust and less likely to fail.

@sol
Copy link
Member

sol commented Jun 1, 2018

Hey, thanks for the feedback.

I'm not the original author of this library and I'm not actively working on it.

using a pipe

@jfischoff if you want to give this a shot then I'm more than happy to accept a patch. My only requirement would be that it works both on *nix and Windows, and that there are tests that demonstrate this. We may want to setup AppVeyor for that.

@jfischoff
Copy link
Author

jfischoff commented Jun 2, 2018 via email

@sol
Copy link
Member

sol commented Jun 2, 2018

wrote a similar one that uses a pipe

Nice! If you are interested to collaborate on this, I'm happy to add anybody as a co-maintainer who made at least one quality contribution.

@jfischoff
Copy link
Author

jfischoff commented Jun 2, 2018 via email

@nh2
Copy link
Member

nh2 commented Jun 14, 2018

Keep in mind the OS pipe buffers.

If a pipe is full, writing to it will block. That doesn't happen for temporary files. If your output is larger than the pipe buffer, and you don't consume from the pipe, then your program will get stuck.

@jfischoff
Copy link
Author

jfischoff commented Jun 14, 2018

If a pipe is full, writing to it will block

You need to read from it on another thread.

@nh2
Copy link
Member

nh2 commented Jun 15, 2018

@jfischoff Right, but then put it where?

You have to put it either onto disk or into memory.

If the goal is disk, then doing it through a pipe read by a thread writing it to disk doesn't seem better than writing to a temporary file directly.

If the goal is memory, createPipe seems unnecessary, you could directly read from the process's standard handles.

From the issue description it isn't quite clear to me what the issue or goal is:

It seems like a pipe is more robust and less likely to fail.

Robust against what, the disk being full?

@jfischoff
Copy link
Author

You have to put it either onto disk or into memory.

The goal is to avoid writing to disk and then reading from disk and just buffer in memory.

If the goal is memory, createPipe seems unnecessary, you could directly read from the process's standard handles.

You will have to elaborate on what you mean here.

Robust against what, the disk being full?

Yes. Also it just seems unnecessarily complex to write to the disk just to read from disk.

@nh2
Copy link
Member

nh2 commented Jun 15, 2018

Also it just seems unnecessarily complex to write to the disk just to read from disk.

I imagine people would do that so that they can handle more output than fits into RAM (or, not to occupy that RAM). But I see that indeed silently writes it to the file and then reads it wholly back into memory, for example here:

str <- hGetContents tmpHandle
str `deepseq` return (str,a)

So you're right, that can be achieved more easily without a roundtrip through files.

I think going via the disk would be beneficial if silently actually offered functions via which the captured output could be read incrementally/streamingly; then it would really save some RAM. And due to the OS buffer cache, if one happens to have enough free RAM to fit it, this wouldn't be much slower than just doing it in memory as all the contents will still be in memory.

You will have to elaborate on what you mean here.

I somehow assumed you were talking about using silently to capture e.g. the stdout of some process. But I realise that wasn't the case, so what you said about createPipe makes total sense.

@jfischoff
Copy link
Author

I think going via the disk would be beneficial if silently actually offered functions via which the captured output could be read incrementally/streamingly

Agree

@manofearth
Copy link

manofearth commented Sep 4, 2019

For me it makes sense, because I'd like to capture stdout in unit tests. Therefore "not fitting into RAM" isn't the case (due to small amounts of test data).

Here the function (inspired by the library):

myCapture :: IO a -> IO (String, a)
myCapture action = do
  bracket redirect restore runActionAndCapture
  where
    redirect = do
      (pipeReadEnd, pipeWriteEnd) <- createPipe
      old <- hDuplicate stdout
      hDuplicateTo pipeWriteEnd stdout
      return (pipeReadEnd, pipeWriteEnd, old)

    runActionAndCapture (pipeReadEnd, _, _) = do
      a <- action
      hFlush stdout
      c <- readAvailable pipeReadEnd
      return (c, a)

    restore (pipeReadEnd, pipeWriteEnd, old) = do
      hDuplicateTo old stdout
      hClose old
      hClose pipeWriteEnd
      hClose pipeReadEnd

readAvailable :: Handle -> IO String
readAvailable h = do
  isReady <- hReady h
  if isReady
    then do
      c <- hGetChar h
      tail <- readAvailable h
      return $ c : tail
    else return []

Haven't tested it under Windows, though

@treeowl
Copy link

treeowl commented Feb 7, 2021

I think it makes sense to have both a file-based interface and at least one pipe-based one. One pipe-based solution could use a second thread to pull output from the pipe, making it available on request. A second one could use the pipe blocking mechanism, producing a stream of output chunks followed by a result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants