Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS X support #51

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

OS X support #51

wants to merge 12 commits into from

Conversation

haileys
Copy link

@haileys haileys commented Sep 17, 2015

This pull request adds OS X support to Semian.

The biggest potentially controversial change here is that I've removed the ticket acquisition timeout entirely. This is because OS X doesn't have semtimedop. I asked @sirupsen about this and he backgrounded me on the original motivation for this timeout and why it isn't strictly necessary anymore.

Because we no longer block on semaphore operations, I was also able to remove the GVL release code.

cc @sirupsen @csfrancis @byroot

@byroot
Copy link
Contributor

byroot commented Sep 17, 2015

this timeout and why it isn't strictly necessary anymore.

Yes. We recently removed all usage of it. So it's fine. We'll have to do a major version bump though.

Flagging this for later review even though @csfrancis is the authoritative person here.

Also you have a CI failure that seems legit.

@haileys
Copy link
Author

haileys commented Sep 17, 2015

The other thing I discussed with @sirupsen is the use of SHA1 to generate Semaphore keys from strings. There's a slight potential for this to generate the same key for multiple resources, or even between totally different applications that both happen to use SysV semaphores.

Is there any reason why Semian uses SHA1 for this and not ftok? ftok generates a key from a file path and guarantees uniqueness this way, rather than SHA1's probabilistic uniqueness.

@haileys
Copy link
Author

haileys commented Sep 17, 2015

@byroot Pushed up f7372fb which fixes that failure. Turns out that test was implicitly relying on timeouts (as it tried to acquire the resource immediately after sending SIGKILL to the child process that had already acquired the resource). I've added a call to Process.waitpid after Process.kill so we only try to acquire the resource after the child process has died.

@sirupsen
Copy link
Contributor

The argument I have for not requiring a ticket count I tried to explain to Charlie last night is that the equation for a ticket count's effectiveness effectively looks like this: p^t = q, where p is probability of sampling a worker talking to the resource that has ticket count t, an q is the accepted false positive rate for sampling t workers talking to the resource. When t grows beyond 6-8 because of this exponential relationship a timeout has a very questionable value, and makes it harder to argue about an effective ticket count. A t of, say, 8 is as appropriate for 15 workers as it is for 80 workers (this means that it's better, for Semian, to have more workers on the same box as you reduce capacity less during a failure in that case). The reason we introduced the timeout was that we ran SysV namespaces with only 4 workers, which meant that we couldn't go beyond a ticket count of 2 and thus needed another mechanism to tune. However, we no longer do this and I don't anticipate us doing this again—and this is why I think it's absolutely fine to remove semtimedop in favour of never having a timeout.

@sirupsen
Copy link
Contributor

I've put it on my list to test this PR in production. I'll attempt to do that next week. Nothing really jumps out, thanks @charliesome :)

@@ -23,7 +23,7 @@
have_func 'rb_thread_blocking_region'
have_func 'rb_thread_call_without_gvl'

$CFLAGS = "-D_GNU_SOURCE -Werror -Wall "
$CFLAGS = "-D_GNU_SOURCE -Wall "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you have to get rid of -Werror?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHA1 is deprecated in the OS X system OpenSSL headers and causes a warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets switch to something else then rather than disabling the warning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fw42 Charlie already suggested following up with ftok

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to ftok will likely require changes in the external Semian API. This might not be a big deal since this branch will already require a major version bump, but using ftok properly will not be as simple as just replacing the SHA1 call.

We could pass -Wno-deprecated-declarations to disable this specific warning, but I'm not sure how that will affect compilers that don't support that option.

@byroot
Copy link
Contributor

byroot commented Sep 18, 2015

I tested this branch with shopify in dev and test. Seems all green. We can try it out on production servers once the last concerns are settled.

@haileys
Copy link
Author

haileys commented Sep 18, 2015

Since this pull request will already require a major version bump, I'd like to discuss another breaking change that we could make while we're at it.

IMO we should use ftok rather than SHA1 to generate semaphore keys. ftok returns a key which is generated from a file's identity on disk rather than just being a simple digest. This guarantees that there won't be clashes between two semaphores. While clashes are fairly unlikely with SHA1, (1 in 4 billion with a 32 bit key_t), a guarantee that a clash won't happen is even better 😁

Using ftok requires that we have a file on disk per resource that is visible to all workers. We could just use /tmp/semian.#{resource_name}.sem or something, but this feels a little sketchy and makes more assumptions about environments that Semian will be used in than I'm comfortable making - it assumes that /tmp exists and is writable (in this brave new world of containerising everything, this may not be a reasonable assumption). It also assumes that the all processes sharing an IPC namespace also share the same root directories and mount namespaces - again, not necessarily a reasonable assumption these days.

I think the ideal solution here is to add a parameter to Semian::Resource (and expose that up through Semian.register) that allows the user to customise the filename used for ftok. That could be something in /tmp by default, or it could be a required parameter.

@byroot
Copy link
Contributor

byroot commented Sep 18, 2015

Using ftok requires that we have a file on disk per resource that is visible to all workers.

This (unless I'm missing something) is I'm afraid impossible for us since we're running our processes in multiple docker containers. We do share the IPC namespace between containers, but not the filesytem.

@sirupsen
Copy link
Contributor

Yeah, @byroot has a valid point for our deployment. I really don't want to do a shared mount between all the containers just for this :/

@csfrancis
Copy link
Contributor

While clashes are fairly unlikely with SHA1, (1 in 4 billion with a 32 bit key_t), a guarantee that a clash won't happen is even better

ftok isn't bulletproof either, unfortunately (from the BSD docs on OSX):

BUGS
The returned key is computed based on the device minor number and inode of the specified path, in combination with the lower 8 bits of the given id. Thus, it is quite possible for the routine to return duplicate keys.

@csfrancis
Copy link
Contributor

Perhaps we could do something like store a mapping of all active resource_id -> sem_ids in a hash to detect collisions? It seems unlikely that we would encounter one, but if we did, the user would definitely want to know about it.

@byroot
Copy link
Contributor

byroot commented Sep 22, 2015

I think you are going to far guys. We're talking about hashing human identifiers. Even MD5 would do. The risk of accidental collisions is extremely low.

@haileys
Copy link
Author

haileys commented Sep 22, 2015

ftok isn't bulletproof either, unfortunately

Ugh, I missed that. That's a shame. I guess SHA1 is good enough in that case then

@sirupsen
Copy link
Contributor

I'd question again why we even need this. Why can't we just disable this in test and dev? @charliesome

@byroot
Copy link
Contributor

byroot commented Nov 11, 2015

@sirupsen because then you can't run your integration tests on OSX. :sad_panda:

@sirupsen
Copy link
Contributor

Yeah, but those don't fork etc., so you should be able to use the in-memory adapter, no?

Unless you mean the Semian test suite.

@byroot
Copy link
Contributor

byroot commented Nov 11, 2015

Oh. Well, if we end up having an in-memory semaphore, then yes.

@Soleone
Copy link
Member

Soleone commented May 24, 2016

any updates on this by chance? this came up to test certain semian details in development mode in osx

@bf4
Copy link

bf4 commented May 28, 2017

@sirupsen @charliesome This looks like an interesting PR but is about a year and a half old... any reason it's still open? looks like it's targeted for only Linux in prod, so it could be closed with 'make sure your dev environment is linux; we don't expect support for BSD/OSX'?

@sirupsen
Copy link
Contributor

@bf4 we're still interested in this work upstream, someone just needs to own it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants