Day in the life of a Systems Administrator

Day in the life of a Unix Systems Administrator

Wow, been almost a year since I blogged anything. I’m getting lazy.

So what’s the daily life of a systems administrator like? Here was today:

The plan coming this morning: Begin quarterly “Vulnerability Audit Report”.

What did I do?
Windows server starts alerting on CPU at midnight, again. We fixed the problem on Tues. Why is it alerting again? Of course it corrects itself before I can get logged in and doesn’t go off again all day. Send an email to the person responsible for the application on that server to ask if the app was running any unusually CPU intensive jobs. Respond with a screenshot showing times CPU alerts went off. Get response of “nothing unusual”. As usual.

We updated the root password on all Unix servers last week. Get a list of 44 systems from a coworker that still have the old root password.
Check the list, confirm all still have the old root password.
Check the list against systems that were updated via Ansible. All on the Ansible list. No failures when running the Ansible playbook to update the root password. All spot-checks that the new root password was in effect at the time showed task was working as expected.
Begin investigating why these systems still have the old root password.
Speculation during team scrum that Puppet might be resetting the root password.
Begin testing a hypothesis that root password was, in fact, changed, but something else is re-setting it back to the old password.
Manually update root password on one host. Monitor /etc/shadow to see if it changes again after setting the password. (watch -d ls -l /etc/shadow)
Wait.
Wait.
Wait some more.
Wait 27 minutes, BOOM! /etc/shadow gets touched.
Investigate to see if Puppet is the culprit. I know nothing about Puppet. I’m an Ansible guy. The puppet guy (who knows just enough to have set up the server and built some manifests and get Puppet to update root the last time the root password was changed before I started working here.) is out today.
Look at log files in /var/log. Look at files in /etc/puppet on puppet server. Try to find anything that mentions “passw(or)?d&&root” (did I mention I’m not a puppet guy?). Find a manifest that says something about setting the root password, but it references a variable. Can’t find where the value of that variable is set.
Look some more at the target host. See in log files that it’s failing to talk to the Puppet server, so continuing to enforce the last set of configuration stuff it got. Great, fixing this on the Puppet server won’t necessarily fix all the clients that have been allowed to lose connectivity that no one noticed (entropy can be a bitch.)
Begin looking at what to change on the client (other than just “shut down the Puppet service” and “kill it with fire!”). Realize it’s much faster to surf all the files and directories involved with “mc”.
Midnight Commander not installed. Simple enough, “yum install mc”.
Yum: “What, you want to install something in the base RHEL repo? HAH! Entropy, baby! I have no idea what’s in the base repo.”.
Me: “Hold my beer.” (This is Texas, y’all.)
(No, not really. CTO frowns on drinking during work hours or drinking while logged into production systems. Or just drinking while logged in…)
OK, so more like:
Me:
“Hold my Diet Coke.”
Yum: “Red Hat repos? We don’t need no steeeenking Red Hat repos!”
Me:

Start updating Yum repo cache. Run out of space in /var. Discover when this server was built, it was built with much too small a /var. Start looking at what to clean up.
Fix logrotate to compress log files when it rotates them, manually compress old log files.
/var/lib/clamav is one of the larger directories. Oh, look, several failed DB updates that never got cleaned up.
Clean up the directory, run freshclam. Gee, clamav DB downloads sure are taking a long time given that it’s got a GigE connection to the local DatabaseMirror. Check Freshclam config. Yup, the local mirror is configured… external mirror ALSO configured. Dang it. Fix that. ClamAV DB updates no much faster.
Run yum repo cache update again. Run out of disk space again. Wait… why didn’t Nagios alert that /var was full?
Oh, look, when /var was made a separate partition, no on updated Nagios to monitor it.
Log into Nagios server to update config file for this host. Check changes into Git. Discover there have been a number of other Nagios changes lately that haven’t been checked into Git. Spend half an hour running git status / diff / add / delete / commit / push to get all changes checked into Git repo.
Restart Nagios server (it doesn’t like reloads. Every once in a while it goes bonkers and sends out “The sky is falling! ALL services on ALL servers are down! Run for your lives! The End is nigh!” if you try a simple reload.
Hmm… if Nagios is out of date for this host, is Cacti…
Update yum cache again. Run out of disk space again.
Good thing this is a VM, with LVM. Add another drive in vSphere, pvcreate, swing your partner, vgextend, lvresize -r, do-si-do!
yum repo cache update… FINALLY!
What was I doing again? Oh, right, install Midnight Commander…
Why? Oh yeah, searching for a Puppet file for….?
Right, root password override.

Every time I log into a server it seems like I find a half dozen things that need fixing. Makes you not want to log into anything, so you can actually get some work done. Oh, right, entropy…

Windows patching hell

A Unix sys admin struggling with patching Windows servers.

Never thought I’d end up babysitting MS Windows server patching and pulling my hair out as it takes an hour or more to install 100+ patches, reboot, 30 minutes “finalizing” the updates, declare it “failed” and 90 more minutes “reverting” the installs before rebooting again, wash, rinse repeat, until you successfully tell it which patch NOT to install.
I’m a Unix admin for Pete’s sake. There’s a reason I don’t (normally) do Windows. The only time a Linux server takes so long to boot is when it’s running on bare metal that takes 30 minutes to POST and/or it has lots of LUNs assigned and it takes a while to sort them all out.
I was hoping to have this 2008 server to a state that I could start installing the software it needs by the end of the day.

FINALLY it finished reverting and rebooting. Luckily it didn’t back out 100+ updates. The only one left to install is the one troublesome update that should be done last, because it causes this problem if you don’t.

Nope, spoke too soon. Had it re-check for updates and it now says ALL of the updates from the last go round still need to be installed. But now I see there’s a second update that partners with the known one, so hopefully de-selecting that one as well will fix the issue.

(And in another in a list of first that came with this job: never thought I’d be adding a new “Windows” sub-category under the System Administration category of this blog.)

First day on the new job

WAY over dressed for first day on the job. Khakis and a button up shirt, sports coat and nice shoes. Boss showed up in Bermuda shorts and a polo, with sneakers. About the same way he dressed when I interviewed, but that was a Friday, so I had no way to judge just how casual was “casual” the rest of the week.

I think I’m going to like it here. I can run home during lunch to change clothes.

And I’ll see what a scrum looks like for the first time.
Worked with “agile” developers at the last job, but we didn’t take part in their scrums.

What causes email to go to the spam folder?

A quick guide to some of the things ISPs look for to decide if it should go to the Inbox or the spam folder.

Recently a former colleague reached out to me on Linkedin to ask:

I have a question regarding email delivery. What cause emails to go into someone’s spam email box? I understand that there maybe(sic) filters that looks at the content to make that determination. I would think there are many other factors.

I replied:

Yes, there’s quite a number of things that can cause mail to go to the spam folder. The contents of the message are a big factor. Of course every ISP applies different rules, so what causes mail to go into the spam folder of a Yahoo! mailbox will differ from what matches the rules on Gmail, or Hotmail, etc. Some ISPs will allow certain mail through, but put it in the Spam folder that other ISPs would just reject outright when the sending mail server connects to send it.

Are you having a specific problem that you’re trying to solve?

He responded:

I don’t have a specific problem. Just interested in understanding how spam filtering works. Since I know an expert, why not ask directly.

Are there headers the ISP look at to validate the email?

I wrote up a quick primer on some of the esoterica of spam filtering.
This is by no means comprehensive, and not guaranteed 100% accurate.

Continue reading “What causes email to go to the spam folder?”

Email message receipts

Dear Customer,

Expecting our secure message receipts to behave exactly like Outlook message receipts is just plain silly. Here’s a tip: our application is NOT OUTLOOK. No, receipts returned by our mail encryption system do not use Outlook-specific properties like "OutlookMessageClass". Since our receipt is just an email message, it’s up to Outlook to decide what message class it is. If it doesn’t set it to the same "class" as the return receipts generated BY Outlook, well, we have no control over that.

(Tip number 2: Yes, Outlook/Exchange dominate the business email market. However they do NOT define how email works. Please stop expecting everything on the Internet to conform to the Microsoft Way.)

Dear Computer User,

Customer Support are not mind readers.

When emailing tech support about an issue with a user’s account, please keep in mind we don’t know who “Joan Smith” is. If you want us to do something for her email address, include her email address!

Oh, I’m sorry, did you need me to interpret that error message for you?

Dear Computer User,

When sending an error message to Tech Support, it’s generally helpful to say something about the message you are forwarding. We are not mind readers. Something like “I was doing X and clicked Y and this error message appeared” goes a long way to diagnosing the problem. While we’re at it, if the error message clearly says what the problem is, and it’s not something we can fix for you, but rather you need to fix for yourself, why waste our, and your, time?
To wit: forwarding us an email bounce message (and ONLY the bounce message!), when the bounce says:

The mail system

: host mail1.company.com[IP.AD.DR.ESS]
said: 550 5.1.1
: Recipient address
rejected: User unknown in virtual mailbox table (in reply to RCPT TO
command)

Says exactly what it means: User unknown. Forwarding this message to tech support of the sending mail server (without even saying why you’re sending it to them) is like dialing a phone number, getting a “number has been disconnected or is no longer in service” message, recording it, then dialing 411 and just playing the recording back to them. If you’re expecting the operator at the phone company to just figure out what you REALLY meant is “Why is my friend not answering the phone?” is rather silly. Expecting them to give you an answer more informative than “that number is out of service” is only marginally less silly.

Regards,

Every Technical Support Representative on the planet

Office annoyances

Dear Coworker,

You have a private office. This office has a door. Please close said door when you’re going to use speaker phone for extended periods.

Info, please?

Dear Computer User,

Sending tickets to Support with a subject line of just “Help” (even when spelled correctly!) is not very helpful for the poor techs who are staring at a screen full of tickets, trying to prioritize which one’s need immediate assistance and who can wait.

This falls in the “It’s broken. Fix it.” category. Help me help you.

 

Thank you,

Your friendly neighborhood support technician

Dear Computer User,

“Intranet Explorer”? Seriously?

Dear Computer User,

Dear Computer User,
Some details, maybe?

Dear Computer User,

Do you call your doctor and say “I don’t feel well”?
Do you call your mechanic and say “My car isn’t working right”?
Then why in God’s name do you email tech support and say “it isn’t working”? We can’t help you fix it if you don’t tell us WHAT is wrong?

Day two on the new job

Yesterday was spent mostly dealing with HR, getting benefit paperwork filled out, getting ID bages, waiting for a new workstation, then getting logins to all systems I need to log into.

Today has been reading some documentation, attending one meeting (a weekly ticket status update), familiarizing myself with all the different ticket / email systems. (Kana: support email. Not related to our Outlook / Exchange email used internally, HEAT: support ticket system (not to be confused with support mail system), Remedy: internal ticket system and replacement for HEAT. Are you confused yet? I am.)

Taking the train to work has it’s perks. It would take just as long to drive, I’d have to deal with traffic, put miles on my car and burn gas^H^H^Hmoney. Taking the train I drive 5 miles to the station, buy a ticket, wait for the train, then read my book for the next 40 minutes. Change trains at Union station, get off at City Place, take two escalators, through a secure door, another escalator then an elevator up 23 floors.

The break room is near by and has free soda machines (and free juice machines). Coffee is also free. Gotta buy our snacks though.

Now, if I could just get half the fluorescent lights over my desk turned off…

last few days

I’m back.
OK, I’ve been back about 36 hours now.
Not that most of you noticed I was gone.

Next time I have to go to Houston I’ll just drive. Travel time by Southwest Airlines from DAL to HOU, including getting a ride to DAL[1], allowing for security, waiting for boarding, waiting for shuttle to hotel from HOU, crack-head shuttle driver, is about an hour longer than it would have taken to just drive. Return trip was the same, sans crack-head driver, since we just took a taxi, whose driver had a bit more clue where he was going. And big cajones[2].

The hotel[3] was not the nicest I’ve ever stayed, but it was very nice. It was easily the nicest bed I’ve slept in. I must acquire a set of bedding like theirs. Mattress pad, nice sheets, top sheet, pad, another top sheet, nice comforter.
No vent fan in the bathroom, so all the mirrors (and my glasses) got fogged up. Who ever heard of a hotel/motel that doesn’t vent the bathroom?

The conference was, over all, a waste of time. Their “beginner track” was too basic. “Installation”, “Configuration” and “SSL” scheduled for an hour each, were done in 10 minutes. The “advanced track” covered “Advanced troubleshooting”, mySQL, Anti-spam and php. “Advanced Troubleshooting” was simply “How to use strace”. Gee, how informative. mySQL covered “why you shouldn’t upgrade to 4.1 unless you REALLY mean it”. PHP was “don’t install 5.0. Really. Just don’t.” All of them were presented by a guy who started each presentation with a rundown of his resume (as if we were supposed to be impressed that he was a “senior technician” with one of the vendors at the conference before he came to work for cPanel.) His anti-spam presentation basically amounted to “make anyone who sends you mail prove their a real person by blocking their mail until the respond to your auto responder” and “RBLs suck. The people who run them are evil and clueless.”[4] Obviously he’s been using the wrong RBLs and doesn’t know how much the “prove that you love me” technique just pisses people off.

However, it was two days off work, with pay, some good meals and socializing with other industry folks.

Yesterday, I met up with for a while. Turns out the place he’s staying here in Dallas is just the next apartment complex over. Afterward I came home and got ready for a pool party at Amythest’s, with her sister, and other DFW Ufies. Shared that bottle of wine I bought a couple of weeks ago at the wine tasting and watched a silly movie.

So far last night / today I’ve made progress on Project X by getting Open-LDAP installed and successfully added an entry to the database. Next I get to configure Qmail to authenticate against it.

[1] Since ${poe} was too cheap to pay for a shuttle. REALLY cheap, since we were going to need a shuttle at the HOU end anyway.
[2] Got in the exit lane for the freeway interchange, which came to a complete stop. So he got out of the lane, slammed on the gas, passed everyone waiting to get on the interchange and cut right back in at the very last second.
[3] If I ever have to travel on business and the person arranging the travel forgets to PAY for the hotel again, I will hand them my two week notice. Going to check into a $300/night hotel and being asked for MY credit card was not fun. One call to the boss and he took care of it with his card, but he had to fax them both sides of his credit card and drivers license.
[4] With FUD like “All it takes is your competitor forging headers once to get you added to a whole bunch of RBLs” and “You have to pay each of them a ‘bribe’ to their pet charity to get off their list”. Guess he’s never heard of rfc-ignorant, ORDB, MAPS-RSS, MAPS-DUL, SORBS, DSBL

Don’t suppose anyone knows how to get Plesk 8.0.0 to install on FreeBSD 6.0?

Start packages installation
Install package psa
bsdtar 1.02.023, libarchive 1.02.026
Use gtar
/usr/local/bin/gtar
bsdtar 1.02.023, libarchive 1.02.026
Use gtar
/usr/local/bin/gtar
To continue installing, you should install Perl 5.008008 (you have Perl 5.008007 installed)
Execute cmd failed: sh /root/psa/PSA_8.0.0/dist-standard-FreeBSD-6.0-i386/psa_v8.0.0_build80060406.16_os_FreeBSD_6.0_i386.sh
ERROR: Error while install .sh package
ERROR: Installation failed