<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Handling Human Error In the Datacenter</title>
	<atom:link href="http://www.tomkleinpeter.com/2008/08/11/handling-human-error-in-the-datacenter/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.tomkleinpeter.com/2008/08/11/handling-human-error-in-the-datacenter/</link>
	<description></description>
	<lastBuildDate>Thu, 26 Aug 2010 16:46:48 -0600</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Tom</title>
		<link>http://www.tomkleinpeter.com/2008/08/11/handling-human-error-in-the-datacenter/comment-page-1/#comment-4400</link>
		<dc:creator>Tom</dc:creator>
		<pubDate>Sun, 01 Mar 2009 18:06:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/?p=64#comment-4400</guid>
		<description>@Mathew Duafala -- agreed, that is a really good idea.</description>
		<content:encoded><![CDATA[<p>@Mathew Duafala &#8212; agreed, that is a really good idea.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mathew Duafala</title>
		<link>http://www.tomkleinpeter.com/2008/08/11/handling-human-error-in-the-datacenter/comment-page-1/#comment-4392</link>
		<dc:creator>Mathew Duafala</dc:creator>
		<pubDate>Sun, 01 Mar 2009 04:12:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/?p=64#comment-4392</guid>
		<description>I just saw him give this talk at Amazon last week.  It was really really good.  The best thing that I got out of it was the concept of a &#039;canary host&#039;.  That&#039;s a host that you run hotter than the rest (2x or more) so that you can see it brown out and fail before the rest of the hosts. It lets you see what the failure behavior is and gives you warning before your entire fleet of boxes are failing at once.</description>
		<content:encoded><![CDATA[<p>I just saw him give this talk at Amazon last week.  It was really really good.  The best thing that I got out of it was the concept of a &#8216;canary host&#8217;.  That&#8217;s a host that you run hotter than the rest (2x or more) so that you can see it brown out and fail before the rest of the hosts. It lets you see what the failure behavior is and gives you warning before your entire fleet of boxes are failing at once.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: website designs</title>
		<link>http://www.tomkleinpeter.com/2008/08/11/handling-human-error-in-the-datacenter/comment-page-1/#comment-1843</link>
		<dc:creator>website designs</dc:creator>
		<pubDate>Fri, 22 Aug 2008 18:26:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/?p=64#comment-1843</guid>
		<description>I&#039;m not so sure about his `$machineType-$number` scheme. When dealing with the physical boxen, I find it a lot easier to just say &quot;power down lucy&quot; or &quot;attach that FireWire drive to charlie&quot;. CNAMEs exist so that I can map these physical names to their roles -- my home directories will always live on &quot;users&quot;, even after I migrate them to a SAN. &quot;www&quot; will always be my webserver, even after I migrate it to a virtual machine.</description>
		<content:encoded><![CDATA[<p>I&#8217;m not so sure about his `$machineType-$number` scheme. When dealing with the physical boxen, I find it a lot easier to just say &#8220;power down lucy&#8221; or &#8220;attach that FireWire drive to charlie&#8221;. CNAMEs exist so that I can map these physical names to their roles &#8212; my home directories will always live on &#8220;users&#8221;, even after I migrate them to a SAN. &#8220;www&#8221; will always be my webserver, even after I migrate it to a virtual machine.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: samson</title>
		<link>http://www.tomkleinpeter.com/2008/08/11/handling-human-error-in-the-datacenter/comment-page-1/#comment-1751</link>
		<dc:creator>samson</dc:creator>
		<pubDate>Wed, 13 Aug 2008 18:32:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/?p=64#comment-1751</guid>
		<description>A minor tool, but molly-guard has saved me more than once. 
Basically if you call shutdown/reboot from an ssh session, it prompts you for the hostname that you wish to shutdown, hopefully preventing a shutdown of a server that isn&#039;t within physical reach in the middle of the night. 

http://packages.debian.org/etch/molly-guard

Capistrano looks interesting though, I&#039;ll have figure out what it gets me if I&#039;m not using RoR.</description>
		<content:encoded><![CDATA[<p>A minor tool, but molly-guard has saved me more than once.<br />
Basically if you call shutdown/reboot from an ssh session, it prompts you for the hostname that you wish to shutdown, hopefully preventing a shutdown of a server that isn&#8217;t within physical reach in the middle of the night. </p>
<p><a href="http://packages.debian.org/etch/molly-guard" rel="nofollow">http://packages.debian.org/etch/molly-guard</a></p>
<p>Capistrano looks interesting though, I&#8217;ll have figure out what it gets me if I&#8217;m not using RoR.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Adam</title>
		<link>http://www.tomkleinpeter.com/2008/08/11/handling-human-error-in-the-datacenter/comment-page-1/#comment-1741</link>
		<dc:creator>Adam</dc:creator>
		<pubDate>Tue, 12 Aug 2008 18:47:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/?p=64#comment-1741</guid>
		<description>This is really more about automation, but tools like Capistrano (http://capify.org) can be really helpful for managing a number of servers in different roles. I don&#039;t have a lot of experience with this tool (it&#039;s written in Ruby and used heavily by the Rails community), but it seems useful.

To the extent that this reduces the number of steps required to do something (even if those steps are simple things like opening an ssh session), you can reduce mental fatigue and stay focused on getting right the things that do require some mental effort.

Are there any other tools like this that people recommend?</description>
		<content:encoded><![CDATA[<p>This is really more about automation, but tools like Capistrano (<a href="http://capify.org" rel="nofollow">http://capify.org</a>) can be really helpful for managing a number of servers in different roles. I don&#8217;t have a lot of experience with this tool (it&#8217;s written in Ruby and used heavily by the Rails community), but it seems useful.</p>
<p>To the extent that this reduces the number of steps required to do something (even if those steps are simple things like opening an ssh session), you can reduce mental fatigue and stay focused on getting right the things that do require some mental effort.</p>
<p>Are there any other tools like this that people recommend?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Trevor</title>
		<link>http://www.tomkleinpeter.com/2008/08/11/handling-human-error-in-the-datacenter/comment-page-1/#comment-1731</link>
		<dc:creator>Trevor</dc:creator>
		<pubDate>Tue, 12 Aug 2008 02:46:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/?p=64#comment-1731</guid>
		<description>That&#039;s why I use Slicehost. RAID10, nightly backups and someone else getting paged when stuff breaks.</description>
		<content:encoded><![CDATA[<p>That&#8217;s why I use Slicehost. RAID10, nightly backups and someone else getting paged when stuff breaks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ammon Lauritzen</title>
		<link>http://www.tomkleinpeter.com/2008/08/11/handling-human-error-in-the-datacenter/comment-page-1/#comment-1730</link>
		<dc:creator>Ammon Lauritzen</dc:creator>
		<pubDate>Tue, 12 Aug 2008 01:50:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/?p=64#comment-1730</guid>
		<description>I don&#039;t think you can really stress the importance of renaming files (rather than deleting them) enough.

Whenever I&#039;m making a huge configuration change to a live system, I copy the entire directory about to be affected... and leave the copy around. Most of the time, that copy remains untouched until I&#039;m making another big change (at which point I&#039;ll generally finally delete it in favor of a backup of the current live version). But maybe 1 in 3 or 1 in 4 times... that backup will save you hours of work, or more.

One general rule of thumb of server maintenance that you didn&#039;t mention is to always operate as a normal user until you&#039;re absolutely sure you need to escalate privileges. And then make sure to relinquish root when you&#039;re done with that individual task. It requires more typing, but if you can resist the urge to leave root prompts open on 5 different terminals at once... ;)</description>
		<content:encoded><![CDATA[<p>I don&#8217;t think you can really stress the importance of renaming files (rather than deleting them) enough.</p>
<p>Whenever I&#8217;m making a huge configuration change to a live system, I copy the entire directory about to be affected&#8230; and leave the copy around. Most of the time, that copy remains untouched until I&#8217;m making another big change (at which point I&#8217;ll generally finally delete it in favor of a backup of the current live version). But maybe 1 in 3 or 1 in 4 times&#8230; that backup will save you hours of work, or more.</p>
<p>One general rule of thumb of server maintenance that you didn&#8217;t mention is to always operate as a normal user until you&#8217;re absolutely sure you need to escalate privileges. And then make sure to relinquish root when you&#8217;re done with that individual task. It requires more typing, but if you can resist the urge to leave root prompts open on 5 different terminals at once&#8230; <img src='http://www.tomkleinpeter.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick</title>
		<link>http://www.tomkleinpeter.com/2008/08/11/handling-human-error-in-the-datacenter/comment-page-1/#comment-1726</link>
		<dc:creator>Nick</dc:creator>
		<pubDate>Mon, 11 Aug 2008 23:18:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/?p=64#comment-1726</guid>
		<description>I rock 3 monitors at work and always have screens divided mentally by importance. Screen one is live / check it all twice before hitting enter type of data. Screen two is usually secondary, test or backup servers to pay attention to what was done so it can be duplicated. Screen three is email and web research... Or depending on the situation each screen is a different geographic location.  Biggest Trip Ups I have to watch out for are the &quot;Are you in the right directory? Are you on the right machine? Right Data? Right Dates?&quot; type stuff

Side Note: The instructor to a CCNA class I was taking walked around during our practice tests with a cheep cell phone refrigerator magnet and simulated the &quot;real world&quot; by making it ring while we took the test. Damn thing stressed us out, but didn&#039;t even come close to the owner calling you wanting to know when the database will be back up and you are sitting in the data center with a server in your lap replacing a mother board because your contacted repair man can not get there for a few hours..


I love this job...</description>
		<content:encoded><![CDATA[<p>I rock 3 monitors at work and always have screens divided mentally by importance. Screen one is live / check it all twice before hitting enter type of data. Screen two is usually secondary, test or backup servers to pay attention to what was done so it can be duplicated. Screen three is email and web research&#8230; Or depending on the situation each screen is a different geographic location.  Biggest Trip Ups I have to watch out for are the &#8220;Are you in the right directory? Are you on the right machine? Right Data? Right Dates?&#8221; type stuff</p>
<p>Side Note: The instructor to a CCNA class I was taking walked around during our practice tests with a cheep cell phone refrigerator magnet and simulated the &#8220;real world&#8221; by making it ring while we took the test. Damn thing stressed us out, but didn&#8217;t even come close to the owner calling you wanting to know when the database will be back up and you are sitting in the data center with a server in your lap replacing a mother board because your contacted repair man can not get there for a few hours..</p>
<p>I love this job&#8230;</p>
]]></content:encoded>
	</item>
</channel>
</rss>
