Skip to main content

Installer or command that hangs? Use /dev/urandom instead of /dev/random, but constrained to a particular process

Okay, so I'm working on making an Ansible role for deploying an Oracle 10g Webgate, and I want it working on RHEL 6 and RHEL 7. I managed to do that (yay; took a bit of persuading), but quickly noticed that if you don't do something to prevent it, the installer (InstallShield on Linux... yuck, and not just because it wraps Java 6), your entropy pool drains very very quickly and the installer just sits there, hanging.

You can verify that its blocking because of entropy by using a command such as:

watch cat /proc/sys/kernel/random/entropy_avail

and if it isn't hovering between 2000 and 3000, then you have a potential issue; if its staying under 1024 or so, then you very likely will be experiencing hanging behaviour, depending on which application is draining the pool (commonly by trying to read a bunch of data from /dev/random)

I should note that this is in a VMWare environment, so no hardware random number generation for me. Instead, what I typically do is to push out a sysctl change to change sys.kernel.random.read_wakeup_threshold from the default of 64 to a more useful 1024, which generally makes things (especially things like WebLogic startup times) much more responsive.

Additionally, for instances where I'm dealing with Oracle Java < 8, I edit (tip. locate and set the following:


These days, this gets done as part of an Ansible role I have for deploying Oracle Java.

But, in the case of this particular installer, that wasn't really accessible, and although has options for doing a silent install (useful, if insufficiently documented, at least by Oracle), it doesn't have a way of passing in JVM arguments. (ie. -Dsecurerandom.source=...)

In my dev environment (a Vagrant VM), I just deleted /dev/random and recreated it with the same major&minor numbers as /dev/urandom, but I didn't want to do that in my real environments, partly to reduce the potential for surprise later on.

But we can do something much more similar, and more constrained (I like constrained effects), using part of the Linux namespaces system to use an independent mount table -- part of what makes Docker work. It does require root privilege however (or at least the power to modify the mount table).

To demonstrate, I wrote this as a one-liner, here reformatted for a bit of enhanced readability. You can copy-paste this into a root shell to demonstrate. (naturally, read it before pasting anything into a root shell)

echo "Before unsharing"; \
  ls -l /dev/*random ; \
  unshare --mount bash -c '
    mount -o bind /dev/urandom /dev/random;
    echo "After unsharing and bind-mounting random device";
    ls -l /dev/*random;
    echo "... finickity installer goes here"
  '; \
  echo "Back in the real world"; \
  ls -l /dev/*random

Here's the output, with some highlighting to make it easier to see the difference.

Before unsharing

crw-rw-rw- 1 root root 1, 8 Mar 16 23:59 /dev/random
crw-rw-rw- 1 root root 1, 9 Mar 16 23:59 /dev/urandom
After unsharing and bind-mounting random device
crw-rw-rw- 1 root root 1, 9 Mar 16 23:59 /dev/random
crw-rw-rw- 1 root root 1, 9 Mar 16 23:59 /dev/urandom
... finickity installer goes here
Back in the real world
crw-rw-rw- 1 root root 1, 8 Mar 16 23:59 /dev/random
crw-rw-rw- 1 root root 1, 9 Mar 16 23:59 /dev/urandom

Note that if the command you want to run should not run as root (quite likely), then you can change the line in the code above with a command that uses sudo, su, runuser, etc. possibly using a wrapper script. Or just use an interactive shell.

So, let's see how to apply this... this post isn't about Ansible, or about deploying a 10g Webgate using Ansible (perhaps I'll document that later), but it won't hurt to show the key part. In this particular case, because the webgate will be running in Apache httpd (from RHEL package), the installer needs to run as root; if this was for an OHS deployment, then I would have to wrap this command in a 'sudo' command, and probably also make sure requiretty was disabled (at least for transitioning to the target user).

Note that this part of the playbook is running as 'root' already.

Ansible task before integrating this improvement

- name: run the installer ... beware entropy starvation
  command: /root/webgate_installers/{{ webgate_installers_and_patches[0] }}/{{ webgate_installers_and_patches[0] }} -options /root/webgate_installers/install_options.txt -silent -is:silent
    chdir: /root/webgate_installers/{{ webgate_installers_and_patches[0] }}
    creates: "{{ webgate_install_location }}/oblix"

Ansible task after integrating this improvement

- name: run the installer ... beware entropy starvation
  command: unshare --mount bash -c 'mount -o bind /dev/urandom /dev/random; /root/webgate_installers/{{ webgate_installers_and_patches[0] }}/{{ webgate_installers_and_patches[0] }} -options /root/webgate_installers/install_options.txt -silent -is:silent'
    chdir: /root/webgate_installers/{{ webgate_installers_and_patches[0] }}
    creates: "{{ webgate_install_location }}/oblix"


Reverting my change to my Vagrant VM, making /dev/random be what it normally is... (you might reasonably wonder why I didn't just blow it away and restart it, but I had got some other Apache-related work in there), 

Watching entropy_avail while the playbook was running (in a mode that caused it to do the entire webgate install from fresh -- removing the previous install), it showed as stable and bouyant, and the install proceeded at a very healthy rate.

(Note: it would have been faster, but there is a certain, unnecessary, registration activity the install wants to run that takes a little while to time out).


Previously, because this particular installer is such a hog, while running this playbook, I would need to open up a new window onto the (thankfully only one) server being executed on, and run commands such as updatedb, rpm --verify --all and any other command I could think of to generate some disk activity -- that being the only source of entropy in that environment.

Now, instead of that, I have a playbook that runs in a fairly constant and minimal amount of time, rather than timing out and requiring manual intervention. Now that's the kind of speed-up I like to see.

Hope it helps,


Popular posts from this blog

ORA-12170: TNS:Connect timeout — resolved

If you're dealing with Oracle clients, you may be familiar with the error message
ERROR ORA-12170: TNS:Connect timed out occurred I was recently asked to investigate such a problem where an application server was having trouble talking to a database server. This issue was blocking progress on a number of projects in our development environment, and our developers' agile post-it note progress note board had a red post-it saying 'Waiting for Cameron', so I thought I should promote it to the front of my rather long list of things I needed to do... it probably also helped that the problem domain was rather interesting to me, and so it ended being a late-night productivity session where I wasn't interrupted and my experimentation wouldn't disrupt others. I think my colleagues are still getting used to seeing email from me at the wee hours of the morning.

This can masquerade as a number of other error strings as well. Here's what you might see in the sqlnet.log f…

Getting MySQL server to run with SSL

I needed to get an old version of MySQL server running with SSL. Thankfully, that support has been there for a long time, although on my previous try I found it rather frustrating and gave it over for some other job that needed doing.

If securing client connections to a database server is a non-negotiable requirement, I would suggest that MySQL is perhaps a poor-fit and other options, such as PostgreSQL -- according to common web-consensus and my interactions with developers would suggest -- should be first considered. While MySQL can do SSL connections, it does so in a rather poor way that leaves much to be desired.

UPDATED 2014-04-28 for MySQL 5.0 (on ancient Debian Etch).

Here is the fast guide to getting SSL on MySQL server. I'm doing this on a Debian 7 ("Wheezy") server. To complete things, I'll test connectivity from a 5.1 client as well as a reasonably up-to-date MySQL Workbench 5.2 CE, plus a Python 2.6 client; just to see what sort of pain awaits.

UPDATE: 2014-0…

From DNS Packet Capture to analysis in Kibana

UPDATE June 2015: Forget this post, just head for the Beats component for ElasticSearch. Beats is based on PacketBeat (the same people). That said, I haven't used it yet.

If you're trying to get analytics on DNS traffic on a busy or potentially overloaded DNS server, then you really don't want to enable query logging. You'd be better off getting data from a traffic capture. If you're capturing this on the DNS server, ensure the capture file doesn't flood the disk or degrade performance overmuch (here I'm capturing it on a separate partition, and running it at a reduced priority).

# nice tcpdump -p -nn -i eth0 -s0 -w /spare/dns.pcap port domain

Great, so now you've got a lot of packets (set's say at least a million, which is a reasonably short capture). Despite being short, that is still a massive pain to work with in Wireshark, and Wireshark is not the best tool for faceting the message stream so you can can look for patterns (eg. to find relationshi…