Contemplating Clusters
Posted 2021-03-30 13:41 PDT | Tags: software hardware
As far as I can tell, there's not a lot of good documentation out there for making a computer cluster out of open-source software.
Back in 2003, I suggested to my boss at that we publish a How-To document, since we were supposed to be all open-sourcey and stuff, but he didn't think it was a good idea. He said everyone was doing what we were doing, and there was no point in documenting what would soon become common knowledge.
That sounded totally reasonable at the time, and the broad strokes had already been written out in the Beowulf Cluster How-To but the industry has taken a quirky turn since then. The FAANG companies have hoovered up nearly everyone with distributed systems experience, and most companies which would have built their own clusters fifteen years ago are renting VMs in "The Cloud" instead.
"The Cloud" is provided by those same FAANG companies. who run customers' work loads on their big-ass in-house clusters and jealously guard their operational art.  This bodes ill for open docmentation.
On one hand the economies of scale are hard to argue against, but on the other hand there are still niche uses for clustering up a big pile of hardware, and guides for doing it well haven't improved much in the last twenty years.  If anything they have grown dated or fallen off the web entirely.  I had to dig this practical guide out of the Wayback Machine:
Someone in r/HPC asked for advice making a thirty-node cluster, and after looking at the current How-To docs, I pointed them at those and offered some advice on power and heat management, since that aspect of cluster operations was totally absent from all of the documents I could find.
It's been eating at me, because there's a lot more also absent from those documents -- scaling factors, effective use of ssh multiplexing, job control frameworks (like Gearman), tiered master nodes, monitoring .. it makes me think that either there's documentation I haven't found or more needs to be written.
Rather than leaping right in and brain-dumping to this blog, I've joined #clusterlabs-dev and #linux-cluster on Freenode, to get a feel for how the community thinks about these things nowadays.  I haven't built a nontrivial cluster in nearly ten years, and some of my skills are bound to be a bit stale and perhaps irrelevant.
There are a few more modern guides, but they too are narrow of scope, like
The #clusterlabs-dev channel folks have thusfar pointed me at and!_and_why_do_I_care%3F which is nice and modern, but also quite narrow.
When I asked them about power/heat management best practices, they responded with a resounding "meh!" so perhaps I'll write about that first.  Doubtless my FAANG friends will rip it to shreds, but that will just make for a better second draft.

Reminiscing about First Pacific Networks
Posted 2020-09-08 19:46 PDT | Tags: hardware tales
In 1996 I was out of UCSC, living in San Jose, and needing a job.  My buddy Tash landed me a gig at First Pacific Networks as a junior system administrator, and that paid rent while I figured out how to get a programming job and kickstart my career.
FPN built these nifty little cigar box sized units which plugged into a broadband network, hijacked part of its band, and used it to stream voice, video and data (ethernet).  They also made headboards which plugged into a central office and managed the network.  It turned out that some countries (like Russia) there were extremely good broadband networks for squirting video around, but their telephone and data infrastructure were for crap.  FPN's project was an elegant solution that leveraged this existing infrastructure to provide the rest of it.
Well, at least it was "elegant" on paper.
The box had an RJ11 socket for plugging in a landline telephone, and that worked quite well.  It also had an AUI ethernet port, which worked rather less well.

The AUI port

My clearest memories of FPN was of those AUI ports.  Mostly that they were horribly, horribly unreliable, and hard to fasten/unfasten without bending the little clasps that flanked the socket, which of course made it even less secure.  They were always falling out, and our device drivers did not handle random disconnects well at all.
Part of the problem was that the AUI cable itself was really damn thick and stiff, so just bending it into a U-shape from the back of the box to the back of the PC put quite a bit of flexural tension on the whole rig.  Bumping the desk would often be enough to send the whole thing sproinging loose.
The temptation was to just glue the cursed things in place, but of course that was a no-no.  Such is the life of a junior sysadmin.  That, and replacing dead monitors, fixing power supplies, and rescuing borderline-tech-illiterate employees who deleted COMMAND.EXE or CONFIG.SYS to "save room" on their PCs.
Fortunately I didn't have to suffer that long.  I automated away some of my more annoying tasks when I could by writing software, and after a few months of that I was able to transfer to FPN's engineering team as a junior software engineer.
It was the break I needed.  After about half a year of writing network protocol extensions and simulations to demonstrate the scalability of our network (or lack thereof), I accepted a job at Cygnus Solutions as a GNU toolchain engineer, fixing bugs in GCC, GAS, GDB and the like.  My career had finally begun.

Building an Alternative to RHEL6: Saving the Repositories
Posted 2020-07-06 14:04 PDT | Tags: software rhel6fork
It has seemed to me for some years now that there is quite a bit of demand from the enterprise sector for a RHEL-like distribution which is just a forked RHEL6 with updated kernel and packages, omitting the "features" Red Hat introduced with RHEL7.
The desire to avoid these features is motivating a lot of businesses to keep using RHEL6 (or its derivatives) beyond its EOL, which is a suboptimal solution for everyone involved.  Red Hat has chosen to accomodate those customers to a degree by offering "Extended Lifecycle Support" for RHEL6 through the end of June 2024, so they are at least getting security patches for these systems.
The longer RHEL6 users lack alternatives to RHEL7/8 (and their derivatives), the more those users will feel pressured to make the painful transition to an operating system which is less reliable and harder to maintain.  The best time to develop an alternative was five years ago, when most RHEL6-using businesses were just starting to weigh their options, but I don't think the window of opportunity has closed quite yet.  There are still many foot-dragging stragglers who would welcome the chance to perpetuate their accustomed infrastructure.  If enough of these stragglers adopted the alternative and found it good, it might entice other companies which have already made the transition (and have been suffering the consequences) to transition back.
I've talked to some people who voiced interest in such a project, but couldn't get them to talk to each other, and AFAIK they didn't follow through.  I'm interested in contributing to the project, but have resisted taking the lead because I'm already drowning in unfinished projects and would rather invest my time and energy in making Slackware more enterprise ready.  Existing RHEL6 users would likely find an RHEL-based alternative more appealing, though.
Time is not on our side.  One of the consequences of delaying the fork is that the RHEL6 package repositories are being neglected and falling into disrepair.  Information is being lost, which an RHEL6 fork would need as a reference template.
I've been downloading Scientific Linux 6 and the related RHEL6 repositories -- sl, sl6x, sl-other, epel, adobe-linux, and rpmforge.  SL6 seems like the best RHEL6-clone from which to base a fork because their community is very fork-friendly.  They have invested resources in making "spins" (shallow forks) easy for their users and would gladly give advice to a project fork. In contrast, when I brought up the notion of a fork in CentOS6 forums, it was received with uniform hostility.  The difference was like night and day.
The downloads are ongoing.  I've got about 400GB so far, with I think about 200GB more to go.  It's quite a bit of information, but not unmanageable.
I'd still appreciate it if someone else took point, but while I'm waiting for that champion, I'll put in some work to make sure the repository data is complete and available as a working SL6 mirror.  If a project fully materializes, I'd be happy to hand over the mirror or manage it on behalf of the project.
I tried using yum and createrepo to construct a local mirror in the prescribed manner but the repo metadata was too badly in disrepair for some repos to work.  Some mirror lists are stale, some domains have disappeared, some mirrors are -empty- (directories are there, but no files), some are redirecting to nonexistent locations, and some have slightly-wrong packages which make yum unhappy.  I've switched to just wget'ing the entire contents of good mirrors (once I found them) and will reorganize them into proper repos and fix broken dependencies later.
That dysfunction will only grow worse with time, which makes it all the more important to obtain copies of them -now-, before the dysfunction deepens.
I'll also see about getting it onto redundant storage.  I'd hate to lose it all to a disk crash.

"The Tragedy of systemd", a rebuttal
Posted 2018-11-17 14:46 PST | Tags: software systemd
Benno Rice gave an interesting talk at the 2018 BSDCAN, titled "The Tragedy of systemd" --
He presents a very sympathetic narrative about systemd from the BSD perspective, and he does it fairly well, but he also makes some assertions of dubious validity.
Early in the talk he proposes there is a "confusion" between system configuration and service bootstrap in traditional UNIXy models. He talks about mounting filesystems and bringing up the network, and how these are "slightly disparate things" which should be treated differently and managed with different tools.
From my perspective, one of the strengths of the UNIX approach is that it abstracts a great many "slightly disparate things" as though they were the same thing. Making (almost) everything a file is a good example of this. When I was new to UNIX, it seemed strange to me, but over time I have grown appreciative of how powerful this abstraction can be.
When things are sufficiently different that they need to be treated as something other than a file, we have system calls like ioctl() specifically for that, and in my experience that has worked well.
Later he made a curious assertion without any accompanying reasoning to support it, that "other things manage services well [..] Windows always had a strong notion of services". He proceeds from there on the assumption that his assertion is true to discuss Windows' contract-oriented (declarative) approach, which he says "has been kind of neat".
I've known people to be used to Windows service management, but I've never known anyone who was familiar with both Windows and UNIX who preferred Windows service management. Mostly I've heard them complain that Windows was inflexible and opaque. I've not spent any time configuring Windows services myself, so perhaps some of you with such experience could weigh in on this. Is there anything to be said about Windows' declarative service management approach which warrants admiration?
Benno also praises the ideas behind launchd -- "if you need to have services running all the time, and you can't call the system booted until services have started, that's such a pain in the ass" and "We know we'll need this at some point but we won't start it until we actually need it." This is an approach systemd borrowed from launchd, not starting services until something requests them.
The problem I have with this approach (which is also applicable to inetd) is that when the system comes up, I want to know that its services will work. The litmus test for this is bringing the service up so that it processes its configuration and finds any errors or missing dependencies. This can be improved upon by monitoring services with things like Nagios, which submits requests and validates the results.
If the service doesn't come up until it's needed, then we don't know if it will work until the moment something actually needs it, which seems like a really bad idea. Also, this approach presupposes a lack of monitoring. Production systems absolutely should be monitored. If Nagios hits up a service immediately or soon after system start and starts the service, then why did we bother deferring the service start in the first place?
He also talks about how modern systems need to be more reactive to changes around them, which is absolutely true, but IMO the traditional UNIXy toolbox has adapted fairly well in this regard. It's not done yet -- things like changing wireless networks without interrupting TCP connections still needs some work -- but pitching the toolbox entirely is unwarranted.
He is dismissive of criticism of systemd as buggy, saying sardonically "it's software" and "we've all had bugs in our code". He says that if we hold init to a higher standard of quality, that implies we can never write another pid 1.
That is simplistic to the point of dishonesty. There are certainly ways we can reduce or mitigate bugs in our software. We can make software simpler, so that there are fewer things to go wrong. Less code means fewer bugs. We can also limit critical dependencies so that a failure in one component doesn't cause the system to more broadly fail (as is the case with systemd when dbus fails). We can also hold on to known-good, already debugged software until such a time that new software has absorbed enough debug/release cycles on ancillary systems that we can trust it on mission-critical systems.
Systemd as a project does none of these things. It is by no means simple. It contains a great deal of code, and thus a great number of bugs, which its developers have shown themselves reluctant to fix. It is tightly integrated with a variety of far-flung components and is vulnerable to any of them failing. It has been forced into adoption on mission-critical systems before it is ready (as RHEL6/CentOS6 fall out of support and there are no sufficiently good alternatives to RHEL-derived distributions in the enterprise).
Benno also says "UNIX as a concept is dead" in that we no longer have a diversity of UNIX systems across which software must be ported, and we don't have to be beholden to POSIX when it is inconvenient. This seemed curious, coming from a BSD developer in a Linux-dominated world. Projects do have reason to be portable. Projects can get away with being Linux-specific because Linux is dominant, but that's no substitute to being portable. How many projects of the past targeted the dominant platform of their time only to be left behind when the dominant platform changed? There is a vast ocean of excellent software which was written for MS-DOS or classic MacOS which were never ported forward to those platforms' successors.
He then talks about how "change can be scary" and "change threatens what we find familiar". These are true things, but he makes it sound like anyone who opposes systemd is a narrow-minded neanderthal. He also does not recognize that some change can be genuinely bad, or that there might be other ways to change which are less bad. He assumes that since people are reacting badly to the changes represented by systemd, then those people will always react badly to changes, and challenges people to overcome their "kneejerk reactions" and accept systemd rather than be opposed to all change in general.
As someone who dislikes systemd on the basis of its design and implementation, that felt a little disingenuous.
He said a lot more which I'm still mulling over, but these points stood out to me as rather uncompelling, and they make me view all of his points with a sharply critical eye.
Blog comments are not working at the moment, but please feel free to reply via these alternative channels:
   * My LQ blog --
   * IRC channel ##slackware-help on freenode
   * Facebook --
   * Twitter --
   * Email (please let me know if I may share your message via this blog) -- ttk (at) ciar (dot) org

General Purpose Cartridge Saga Update -- Treading Water
Posted 2017-10-28 15:12 PDT | Tags: defense gpc ballistics
(Note: This wasn't going to be the next entry I would write, but it ended up being the one that got finished before the other.  Writing is hard!)
It's been five years since I last wrote about the General Purpose Cartridge:
To rehash, the GPC is a hypothetical military rifle cartridge suitable for replacing both, NATO's legacy 5.56x45mm assault rifle cartridge and its 7.62x51mm full-power rifle cartridge.
In order to accomplish this, the GPC would have to satisfy a variety of semi-contradictory criteria:
      * It must be as lightweight as 5.56x45mm (or nearly so), to avoid increasing the soldier's burden,
      * Its recoil must be sufficiently gentle that carbines firing it in automatic mode are controllable,
      * Its terminal ballistics must at least match that of 5.56x45mm at short range (lethality at 50m),
      * Its terminal ballistics must at least match that of 7.62x51mm at long range (lethality at 800m)
      * Its external ballistics must at least match that of 7.62x51mm at long range (flat trajectory, low wind drift).
These criteria made the GPC a thorny enough subject, but recent events have made writing about it even more difficult:
      * The US Army has moved the goalposts to an undisclosed location with the introduction of new "enhanced performance" cartridges -- the M855A1 for 5.56x45mm and M80A1 for 7.62x51mm,
      * The terminal ballistics of these new cartridges do not fit neatly in existing analytical models,
      * Since the Army has decided to go lead-free, should a GPC also be lead-free or should it incorporate lead for its ballistics-enhancing characteristics?
      * Since the UK MoD is moving towards nonfragmenting bullets (with inferior terminal effects), should a GPC also be restricted to nonfragmenting bullets?
      * The methods used in my previous two articles had flaws: the empty brass weights for some cartridges were underestimated, and the lethality model was too primitive,
      * The AMU has purportedly decided on the new "264 USA" as its GPC cartridge, of which very little is yet known.
      * The AMU has argued making changes to its small arms doctrine to accomodate the characteristics of the 264 USA.  These arguments are relevant to the GPC concept as a whole, and could be used to justify altering its criteria.
So, before I can write about the GPC again, I need to purchase samples of more cartidge types and measure them, and update my methods.  I need to learn more about the ballistics of M855A1 and M80A1, and of 264 USA.  I need to justify my reasons for using or not using lead, and using or not using fragmenting bullets.  I need to decide whether the AMU's arguments are sufficient to justify changing the criteria for the GPC (particularly its weight constraint).
I haven't given up on the subject, but these things take time.  In the meantime I will try writing about these various issues separately.

Writing is Hard, Programming is Easy
Posted 2017-10-23 15:08 PDT | Tags: blogging
My last entry was four months ago.  So much for my determination to write something for this space every day!
It's not for lack of things to write about, but every time I sit down to write something, I think it would be nice to have more features for this blog (like comments and search), and end up working on the next version of the blogging software instead.  Writing software is a lot easier than writing blog articles.
As an interim solution, I could publish blog entries to Facebook as well as to here, and interested readers could post comments to Facebook.  That would suck for those of you not yet trapped in the Facebook quagmire, but perhaps would be better than nothing.
So .. yeah, let's try that.  Another article coming soon after this one.

Entropy Pump
Posted 2017-06-25 11:49 PDT | Tags: blogging entropy
So .. WTF is an "Entropy Pump", and what does it have to do with "blogging against inevitability?
Entropy has to do with irreversible transformations -- once an egg is scrambled, one cannot put it back together.  Once energy is converted into heat, one cannot reclaim 100% of that energy from that heat.
Eventually everything runs down.  This is inevitable.  Entropy cannot be reversed.
But it can be shuffled around!
Entropy pumping is a characteristic of all living things, as they incorporate low-entropy substances into themselves (food, clean water, sunlight, etc) and export entropy away from themselves.  They keep themselves ordered and energetic at the expense of making the rest of the universe less ordered and less energetic -- accelerating the Heat death of the universe to delay their own.
Fortunately the universe is much, much bigger than living things, so it doesn't mind so much.  Our planet is feeling a bit of strain -- more on that in future entries.
One corollary to this is that all causes are ultimately hopeless.  In the end the universe will die, and everything that came before it will come to naught
In the much shorter term, living things die.  The entropy pump grows less efficient, disorder accumulates, order decays, and eventually the pump can't keep itself sufficiently ordered to keep pumping.
This, too, is inevitable.
But we keep pumping anyway, don't we?
That's what this blog is about.  All causes are hopeless, but we strive to succeed regardless.  The words I write might be whispers against the wind, but I write them anyway.  History marches stubbornly on, but that doesn't mean I can't dance my own jig.
All entropy pumps are doomed to fail, but I'll keep mine running as best I can.

New Blog Seems To Be Working
Posted 2017-06-25 10:56 PDT | Tags: blogging software
Welcome to my new blog, "Entropy Pump"!  I finally gave up on Blosxom and wrote my own blogging software from scratch.  It ended up taking less time than all the hours I sank into trying to make Blosxom not suck.
That having been said, it is not complete.  The search feature doesn't work yet, there is still no way for visitors to leave comments, there's no way to link to a specific post, and pagination is broken.  If I don't fix pagination before posting twenty posts, only the twenty newest posts will be displayed, with no way of accessing older posts.  So I'd better fix pagination!
Also the side-panels to the right of the screen, over there --> are not very interesting.  I'll come up with more interesting content and replace them.
I'm fairly pleased at how the html and css turned out.  This page has a much spiffier look-and-feel than my old blogs hosted by LinuxQuestions and Slashdot, and it absolutely puts my main website to shame.  It's not great (I kind of suck at css) but finally feel like I can provide people links to my blog without feeling embarrassed about it.
Thinking the top priority will be permalinks for posts .. the title of each post should be a permalink.  Second priority will be pagination.  Then I can think about full-text search (Lucy or dezi?).  Also, maybe improving the markup a bit.  It would be nice if "Lucy" and "dezi" were links to and respectively, but right now the blog only knows how to expand full URLs and Wikipedia references.
As introductions go, this one's pretty boring.  Perhaps another entry is in order, ruminating on the significance of "Entropy Pump".  Will do that soon.
UPDATE 2017-06-25 12:25 -- adding permalinks was really, really trivial.

Test Post
Posted 2017-06-24 23:26 PDT | Tags: test
This is a new Blog, and it needs posts.  This is a test post.
Got to hit all of the features!  That means testing all the features.
Ebedding an image:
That's enough.  Time to see how it works.