Posts

Installer or command that hangs? Use /dev/urandom instead of /dev/random, but constrained to a particular process

Okay, so I'm working on making an Ansible role for deploying an Oracle 10g Webgate, and I want it working on RHEL 6 and RHEL 7. I managed to do that (yay; took a bit of persuading), but quickly noticed that if you don't do something to prevent it, the installer (InstallShield on Linux... yuck, and not just because it wraps Java 6), your entropy pool drains very very quickly and the installer just sits there, hanging.

You can verify that its blocking because of entropy by using a command such as:

watch cat /proc/sys/kernel/random/entropy_avail

and if it isn't hovering between 2000 and 3000, then you have a potential issue; if its staying under 1024 or so, then you very likely will be experiencing hanging behaviour, depending on which application is draining the pool (commonly by trying to read a bunch of data from /dev/random)

I should note that this is in a VMWare environment, so no hardware random number generation for me. Instead, what I typically do is to push out a sysc…

[Humbledown highlights] VLAN tag stripping in Virtualbox (actually, Intel NICs et al.)

This is historical material from my old site, but as I have just bumped into a page that linked to it, I thought I would republish it. I have not verified that this material is still accurate. Feel free to post an update as a comment and I'll publish it. VLAN tag stripping in Virtualbox (actually, Intel NICs et al.) Mon Jan 17 21:41:46 NZDT 2011 VirtualBox, VLAN, Networking, Linux, Mac OS X, Teaching, Study Short version: When using VirtualBox “Internal Network” adaptors for VLAN environments, don't use the ”Intel PRO/1000” family of adaptors, which are the default for some operating system types. Instead, use the either the Paravirtualised adaptor (which would require your guest to have Xen’s virtio drivers, or “AMD PCNet” family of adaptors.
This is because the “Intel PRO/1000” family of adaptors strip the VLAN tags: this is not a problem specific to Virtualbox: it occurs also in other virtualisation products and also on native systems; although you hear about it more often in…

Memcached logging (and others) under Systemd on RHEL7

I've been getting into RHEL7 lately (yay, a valid reason to get into at work!) and this means learning about systemd. This post is not about systemd... at least its not another systemd tutorial. This post is about how I got memcached to emit its logging to syslog while running under systemd, and how to configure memcache to sanely log to a file that I can expose via a file share.

Its 2:21am, so this is gonna be quick. I wrote this because I needed memcached, and in response to some bad advice on Stack Overflow.


Use IPTables NOTRACK to implement stateless rules and reduce packet loss.

I recently struck a performance problem with a high-volume Linux DNS server and found a very satisfying way to overcome it. This post is not about DNS specifically, but useful also to services with a high rate of connections/sessions (UDP or TCP), but it is especially useful for UDP-based traffic, as the stateful firewall doesn't really buy you much with UDP. It is also applicable to services such as HTTP/HTTPS or anything where you have a lot of connections...

We observed times when DNS would not respond, but retrying very soon after would generally work. For TCP, you may find that you get a a connection timeout (or possibly a connection reset? I haven't checked that recently).

Observing logs, you might the following in kernel logs:
kernel: nf_conntrack: table full, dropping packet. You might be inclined to increase net.netfilter.nf_conntrack_max and net.nf_conntrack_max, but a better response might be found by looking at what is actually taking up those entries in your conne…

The importance of being liberal in a Cisco environment

Okay, so today I grappled with a Cisco sized gorilla and won. -- me, earlier this year, apparently feeling rather chuffed with myself.

I had recently launched a new service for a client, and a harvester on the Internet was experiencing timeouts in trying to connect, and so our data was not being harvested by the harvester that we needed. There was no evidence of a problem in the web-server logs because no HTTP request ever had a chance to make it through.

It seems that Cisco products (some unstudied subset, but likely firewalls and NATs) seem to play a bit too fast and loose for Linux’s liking and change packets in ways that makes the Linux firewall (iptables) stateful connection tracking occassionally see such traffic as INVALID. This manifests in in things such as connection timeouts, and as such you won’t notice it in things like webserver logs. In a traffic capture, you may recognise it as a lot of retransmissions.


How long has that command been running? (in seconds)

The ps command is fairly obtuse at the best of the times, but I am very thankful for things like ps -eo uid,command. I wish it were as easy to have ps report the starttime of a process in a form I can use. See my pain looking at the start-time and elapsed time (which is not wall-clock) for a rather antiquarian system.

# ps -eo stime,etime ... selected extracts ... Apr20  1-02:30:30  2013 731-03:01:37 12:59    01:42:25 14:41       00:00
Yeah, I really don't want to touch that with any scripting tool. Best to head to /proc/... you may actually want to use ps or something like pgrep to determine the PID of the progress of interest.

According to proc(5), we want the 28th element in /proc/PID/stat:  "starttime: ...The time in jiffies the process started after system boot.". A jiffie is explained in time(7) -- this from RHEL 5:

   The Software Clock, HZ, and Jiffies
       The  accuracy  of  many system calls and timestamps is limited by the resolution of
       the software clock, …

Answering 'Are we there yet?' in a Unix setting

Image
Often -- commonly during an outage window -- you might get asked "How far through is that (insert-length-process-here)?". Such processes are common in outage windows; particularly unscheduled outages where filesystem-related work may be involved, but crop up in plenty of places.

In a UNIX/Linux environment, a lot of processes are very silent about progress (certainly with regard to % completed), but a lot of time, we can deduce how far through an operation is. This post illustrates with a few examples, and then slaps on a very simple and easy user-interface.

But 'Are we there yet?' is rather similar in spirit to 'Where is up to?' or 'What is it doing?', so I'll address that here too. In fact, I'll address those first, because they often lead up to the first question. And we won't just cover filesystem operations, but they will be first because that's what's on my mind as I write this.