8 Pieces of Architectural Advice for CMS

I have some advice for those in the business of building large websites with content management systems.

1) Do not implement search yourself.

Your CMS sucks at search, and so do you. I see this again and again and again. Everyone is implementing search on large websites instead of using Google. Developers are afraid of looking unprofessional. Managers are answer yes to the question “do you want advanced/faceted search” (the correct answer is no – user’s don’t like it and don’t use it). As a result a lot of resources (both server and developer) go into implementing something that Google is awesome at. Even some very smart people, like Jeff Atwood roll their own search, and their users end up going to google.com and typing “foo site:stackoverflow.com”.

Users are very happy with Google CSE, and don’t mind the text ads. Those text ads – well, that’s revenue that you would otherwise would not have, however small this is. If you absolutely can’t do Google CSE – buy their search appliance. If you can’t do that either – well, you better be using Solr.

2) Do not implement comments yourself (unless comments are what you do for a living).

It is extremely difficult to get comments right. Users absolutely abhor comments. Spammers – well, they love it. Luckily, you can just go and get DISQUS to do all the heavy lifting for you. The time saved on using DISQUS can be used on building something else, meanwhile users absolutely love leaving comments through it, while spammers hate it.

3) Physically separate your admin interface from the stuff that is going to be used by your users.

Maciej Ceglowski has some words of advice about not having your blog hacked: cache your output in flat files and hide the admin interface. The benefits of this are tremendous: cached files are fast and secure. You will need to do some fancy footwork to serve up parts that change a lot, but you can do it the same way DISQUS and Google CSE do it – through the magic of AJAX.

4) Sanity check: calculate the amount of RAM in the home computers of all of your interns. Compare that to the amount of RAM in your server farm. Who wins?

5) Use a CDN and/or caching proxy, don’t be cheap. These things will save your butt when Yahoo and Digg will come a-knocking at the same time. I’m not even going to mention Memcached – you can’t get big without it at all.

6) Fight WYSIWYG editors. These things are the worst. They are the Devil. They are a security hole. You never get what you see. People paste from Word. Do I need to go on?

The best middle of the road solution is something like Markdown.

Do not underestimate the user’s ability to learn a few simple rules. When I worked at TV Guide there was this movie database application. Very non-technical editors were using a very scary-looking Unix-based interface at an amazing speed. When I rewrote it as a web interface, it became more “user-friendly”, but they could not enter stuff as fast as before.

7) Make sure you have good backups

8) I know you won’t be able to follow my advice, I know I can’t either. Life is a constant compromise.

The Capacitor Plague

I woke up from a nap to a loud pop and a smell of burning plastic. The source turned out to be one of the most precious and important to me digital devices: a ReadyNAS NV+, a small silver box with over a terabyte of hard drives that store my backups, music, and photos.

Network attached storage (NAS) is an engineering compromise. It’s a storage solution that lets you keep a bunch of drives in a self-contained device. It’s redundant: you can lose a drive (which is a statistical certainty) and not lose your data. There are also handy usb ports that let you connect usb drives and a button to run backup jobs onto these drives. It also serves as a print server, and in theory it can be used as a streaming media server. On the other hand it’s slow (gigabit networks are not fast enough when you need gigs of data fast), a complete nightmare to use with photo managers like Picasa, and an even worse nightmare if you want to use it as a Time Capsule.

I’ve spent a lot of time babysitting my ReadyNAS NV+: changing the defective RAM that it shipped with, updating the buggy firmware, finding the right drives for it (some don’t have the right temperature sensors). Don’t get me started on what it took to make it work with Mac’s Time Machine.

And after all that, the one box that was supposed to keep my precious digital archives safe was smoking. This was preceeded by a few days of weird performance issues and a couple of hangs. The power supply finally died a horrible death, and I realized that once again I was falling victim (or “mugu” as Nigerians say) to faulty capacitors.

According to Wikipedia, the name of this phenomenon is “Capacitor Plague“. There is an epidemic of failure in electrolytic capacitors from certain shady manufacturers. Electrolytic capacitors are usually found in power supplies. They are little aluminum cylinders filled with special film and electrolytic liquid or gel. Power supplies get very hot, and the liquid part of the capacitors, the electrolyte, always wants to either dry up or explode. The formula for the electrolyte is very hard to get right.

The rumor is that one or a few companies resorted to industrial espionage to steal electrolyte formulations. They weren’t entirely successful – they either got an incomplete formula or just plain Brawndo.

Spectrum Online did some digging:

“According to the source, a scientist stole the formula for an electrolyte from his employer in Japan and began using it himself at the Chinese branch of a Taiwanese electrolyte manufacturer. He or his colleagues then sold the formula to an electrolyte maker in Taiwan, which began producing it for Taiwanese and possibly other capacitor firms. Unfortunately, the formula as sold was incomplete.
“It didn’t have the right additives,” says Dennis Zogbi, publisher of Passive Component Industry magazine (Cary, N.C.), which broke the story last fall. According to Zogbi’s sources, the capacitors made from the formula become unstable when charged, generating hydrogen gas, bursting, and letting the electrolyte leak onto the circuit board. Zogbi cites tests by Japanese manufacturers that indicate the capacitor’s lifetimes are half or less of the 4000 hours of continuous ripple current they are rated for.”

Wastefulness of today’s society masks the problem: most people don’t perform autopsies on their dead $70 DVD players or $500 computers, they just use that as an excuse to buy the new hottness. The techies with (or without) spare time and soldering skills do the following: fill bulleten boards with tales of saving their devices by soldering in new capacitors; search for instructions on how to solder and purchase capacitors; and curse creatively after doing it for the 5th time.

The unique thing about the capacitor plague is how easy it is to identify: the capacitors literally blow their tops, venting electrolyte through the special stress relief indentations. It’s also unique in that anybody with a soldering iron has a very good chance of fixing it: the caps are easy to locate and solder. In the age when most electronic components are of the “surface mount” type (the size of a sesame seed) or chips with dozens legs as fine as silk, soldering in a two legged capacitor is very refreshing.

Here’s a nest of capacitors from my busted power supply: two in the left corner are clearly popped, the one on the right is probably ok:

In the last couple of years the following devices that I own fell prey to faulty caps: a cheap off-brand dvd player, a speed control on my Dodge Caravan’s air conditioner, a Netgear network hub, a huge and expensive Air King window fan, and now, my ReadyNAS. The interesting thing is that the problem exists in both high end and low end products, as well as in high tech and low tech ones (I did not know there were electronic components in the window fan).

I am out of warranty on my ReadyNAS because I bought it in May of 07. The following passage leads me to believe that the shitty capacitors are a problem that they are aware of and (maybe) fixed in newer releases of the hardware (they could not offer a 5 year warranty if they used the same capacitors – they’d just go broke).

“Please be aware that ReadyNAS purchased prior to August 21, 2007 carries a one-year limited warranty. Extended warranty purchased for these ReadyNAS will be honored by NETGEAR. ReadyNAS NV+ and 1100 purchased August 21, 2007 and later have a 5-year limited warranty, and the ReadyNAS Duo has a 3-year warranty.”

The brand name of the popped capacitors reads “Fuhjyyu”. It lead me to the an urban dictionary entry that says that Fuhjyyu is either

“1) Chinese word for feces.

or

(2) Brand name of abysmal quality capacitors that are installed on logic boards, switching power supplies and various other electronic components.”

There’s also a post from a guy who implores ReadyNas to stop using those capacitors.

Then there’s badcaps.net – a global capacitor gripefest that is too depressing to read.

You can see a nice gallery of busted caps over here

There are broader implications of this: coupled with the fragile lead free solder, leaky capacitors don’t only cause kajillions of dollars of damage, but will also make electronics of our era impossible to use in the near future. The aluminum in burnable cds and dvds are rotting too, destroying the record of our time.

What Up

The following will probably be only interesting to people who build their own computers (and probably not even them), so feel free to skip this post.

My little Shuttle XPC computer gave up the ghost (the onboard SATA raid controller got really messed up). It took me a good while to frankenstein together a reasonable machine out of all the parts that were stashed away in my apartment, so I am a-computin’ again. I am researching a purchase of a high powered replacement, but meanwhile, let me share some technical tidbits that I’ve learned along the way.

First of all, it’s really easy to actually fry a floppy drive. Fried floppy drives look like they are working, but they don’t. And without a floppy drive you can’t install (or repair) Windows 2000 or XP on a SATA or IDE raid array. Even if your motherboard claims that it can boot from a USB floppy drive, it probably can’t. Well, at least mine can’t. The moral of the story is that doing away with legacy hardware such as a floppy is not a good idea.

Anyhoo, it’s a good idea to have separate two drive arrays: one for data and one for the system and programs (it’s a good practice to point data directories such as the desktop to the data array). The data array should run raid 1 (mirroring) – that way if one drive dies, you will still have another. You can periodically backup onto a third drive and hold it in a remote location. With 250 gig SATA drives costing about 100 bucks there is no reason not to do this.

The system drive should run raid 0 (striping). Striping actually significantly speeds the system up. You can also keep the system drive small, say about 40 gig – and back it up onto the data drive via Norton Ghost. It might be a good idea to splurge on real SCSI drives and a card. In fact, that’s what I’ll probably do for my new computer, as drives seem to be more of a speed bottleneck than RAM or processors.

SATA drives can be mounted externally: it’s called eSATA. If you have a small computer such as an XPC it’s a very good idea, as the drives become much easier to cool. I am probably going to jerry rig an eSATA enclosure out of an old computer case and some hotswap thingies, but they are also available from these guys.

I am a big fan of dual monitors – have two 17″ lcds. To run a dual monitor setup you need either two video cards or a “dualhead” video card. Well, I have a dualhead Matrox P750 (bought it because it has 2 dvi outputs), and boy does it suck. Driver installation is a nightmare – there are several versions of video card bios and and a multitude of driver versions. Most don’t work and crash Window. If you do get it to work, the stupid card can’t work well with low color/resolution settings: it shows lines and crazy patterns in bios screens. There’s a bios fix out, but it does not work. More than that, if you want to color match the monitors through a calibration cajigger, you can’t set up individual color profiles on the monitors. In short, I am much better off with two separate cards.

Ad:


Did you know that you can color calibrate your monitor with a nifty usb powered gadget so that your digital photos will stop looking like crap? I’ve used one for a while, and it rocks!

GRETAGMACBETH Eye-One Display 2 (This is the one that I have. People say that it’s a little bit better than the cheaper
ColorVision Spyder 2)

ColorVision Spyder 2

Hey You. Yes, You. How About Some Tech Support Here?

I finally decided to build a nice SB62G2 based computer for my wife. But I can’t decide the following:
a) What kind of memory to get for it. That number of choices for DDR RAM confuses me to no end and there is no good FAQ in sight.
b) What kind of DVD burner to get (they all look good)
c) Which Pentium 4 is in the sweet spot of price/performance.
d) Which 17 inch flat panel monitor to get (about $500 – $600 range)
e) Which video card for the said flat panel to get.

The Mystery of Obidos

Whoa, caught amazon.com while it was down.
They are showing a page with Rufus, the Amazon dog.

By the way, I was meaning to write about that for some time now. Did you ever notice enigmatic word “obidos” in Amazon url?

Some theories from usenet:

  • Castle near Lisbon
  • OBI (Wan Kenobi) + DOS (Disk Operating System)
  • ‘OBI’ = Object Broker Interface

    This seems to be the correct answer though: Obidos is is a major port on the Amazon river.

    [update]
    Livejournal user hallerlake had this to add:

    “I worked at Amazon for a couple of years, and can mostly answer that.

    Obidos is the area where the Amazon is “concentrated” – it narrows to a point about a mile wide and a couple hundred feet deep. It’s the chokepoint of the Amazon. A wry sense of humor turned that to the naming scheme.

    The Amazon Marketplace (auctions+zshops+third party) code was called Varzea for similar reasons – it’s the delta point of the amazon river, where the river fans out.

    Amazon wrote their own web serving environment because the selection of scripting/webcontrol languages when they got started was so lousy. They had to call it something, so obidos it was. :) ”


    Obidos is huge, it might be over a gig by now. I don’t think it’s that bad, though. I haven’t been at Amazon for a few years. For a long time Amazon ran on the Netscape web server environment, then eventually moved to a specially tuned Apache. But yeah, the webservers had a lot of RAM in them so that we could fork a bunch of different processes… and a garbage collector got added to take care of some of the memory leaks. Even still we had a service that killed and restarted processes every hundred accesses or so. It wasn’t pretty.

    I don’t know who came up with the name… I’d bet on Shel Kaphan or possibly Joel Spiegel. Shel set the direction for the company’s software development and architecture, including standardization on C (instead of C++) due to easier debugging. Certainly for the first few years he was The Guy for software architecture; these days I would imagine Al Vermeulen has that task.