Quantum was one of many independent bookstores plowed over (or under?) by Amazon and the Internet selling revolution. There's a high-end bar in that space now called Mead Hall, whose menu includes a variety of fermented honeys. Nowadays, even Barnes & Noble has a significantly attenuated technical book section, though it is still holding out. And somewhere in there are the X for Dummies and Idiot's Guide to Y.
Now, I never liked either title. But, the publishers of these and other books were often good at maintaining a degree of brand consistency. Perhaps the titles were ridiculous, but the Idiot's and Dummies series were a good way to get an overview of a new subject. The standard icons could be a bit hokey, but having consistent warnings of pitfalls were worth the books price on their own. Perhaps I'd read one through once, but they helped orient me. O'Reilly's Learning Z would be a slower but more detailed read, and an O'Reilly Cookbook might be a desk-side companion for solving specific problems. Some other series I learned to avoid; I forget which ones but several just grated on me or had too low an information density.
Nowadays, I tend to learn new things, or find solutions to immediate problems, online, which has its pros and cons. On the pro side, there tends to be coverage of almost anything, and googling an error message often (but not quite always) takes you to a page that explains the circumstances of that error. On the minus side, finding those initial overview guides can be challenging, and there is no consistency from one writer's tome to another's.
Warp has a nice cluster which drives my analyses, and many of them are controlled by Sun Grid Engine (SGE, rarely known as Oracle Grid Engine or OGE). Now, there are other packages out there for this and I won't make any claim that I arrived at SGE after a careful comparison. The system came with SGE and a few key tools (such as Celera Assembler and the PacBio toolkits) inherently support it, so that drove the usage. Later, StarCluster supported it out-of-the-box, so I got some more practice.
SGE is great when it works. It manages jobs across the cluster, distributing loads and making sure no machine is hammered by excessive multiple simultaneous jobs. As far as running jobs goes, I know enough to mostly get by. I can configure single processor and many multiple processor jobs. I can also track my jobs' statuses in the queues. One thing I don't really understand is how to correctly configure an OpenMPI job (namely, the wonderful assembler Ray) across multiple nodes, so within SGE I always run it on a single node. There are also ways to make one job stay queued until another finishes, but I haven't figured that out either.
Setting up SGE is murkier water. At least once something glitched on StarCluster in adding new nodes, and I had to get the new nodes in SGE. With a lot of help from Google, I did succeed. The original Warp cluster had some issues, and with a lot of help from the cluster fabricator I got some stuck nodes unstuck. The new cluster in the new space (the old cluster was shared with another group; people tend to develop an aversion to sharing compute resources with me after a few of my misadventures, and so I was split off in the new space). The new cluster also required some going-under-the-hood to get SGE running, but all was well.
BUT, this weekend something has gone wrong, and I haven't yet figured it out. A few of the old tricks have failed to make the system hum, and more weirdly some jobs are firing off fine and others are stuck in permanent limbo, with no painfully obvious distinction between them.
Now, this is a reminder that my knowledge of SGE is really pretty slim, and isn't very consolidated. It's more a bunch of tricks I know, a few of which I understand and a lot which are more follow-the-cookbook. There's also a nagging fear that there are very important things I don't know entirely. So a good book would be pretty useful right now. It probably wouldn't solve my problem, but it might well prepare me better to understand my problem and go find the answer to it.
A little problem there. Try searching Amazon under books for Sun Grid Engine. For SGE. For OGE. For Oracle Grid Engine. There are no books of that sort. No Grid Engine for Dummies. No Idiot's Guide to Grid Engine. No O'Reilly book with honeybees (or Giant Gerbils?). No Learn Grid Engine in 30 Days. NADA!
I can rationalize that result relatively easily. The publishers are only interested in the larger markets, particularly those that will attract persons who are less technically savvy or at least are not confident in their ability to pick things up. SGE is apparently too much of a niche product in their view.
So, more rounds of Google. More flailing away. Maybe some reboots - though a design feature of SGE is that rebooting doesn't reset certain error states; this is intended to prevent a seriously flawed node from becoming a black hole for jobs. A bunch of calls to the cluster assembler company. Maybe a paper book would just be a security blanket, but I'm feeling a need for one about now.