8 Pieces of Architectural Advice for CMS

I have some advice for those in the business of building large websites with content management systems.

1) Do not implement search yourself.

Your CMS sucks at search, and so do you. I see this again and again and again. Everyone is implementing search on large websites instead of using Google. Developers are afraid of looking unprofessional. Managers are answer yes to the question “do you want advanced/faceted search” (the correct answer is no – user’s don’t like it and don’t use it). As a result a lot of resources (both server and developer) go into implementing something that Google is awesome at. Even some very smart people, like Jeff Atwood roll their own search, and their users end up going to google.com and typing “foo site:stackoverflow.com”.

Users are very happy with Google CSE, and don’t mind the text ads. Those text ads – well, that’s revenue that you would otherwise would not have, however small this is. If you absolutely can’t do Google CSE – buy their search appliance. If you can’t do that either – well, you better be using Solr.

2) Do not implement comments yourself (unless comments are what you do for a living).

It is extremely difficult to get comments right. Users absolutely abhor comments. Spammers – well, they love it. Luckily, you can just go and get DISQUS to do all the heavy lifting for you. The time saved on using DISQUS can be used on building something else, meanwhile users absolutely love leaving comments through it, while spammers hate it.

3) Physically separate your admin interface from the stuff that is going to be used by your users.

Maciej Ceglowski has some words of advice about not having your blog hacked: cache your output in flat files and hide the admin interface. The benefits of this are tremendous: cached files are fast and secure. You will need to do some fancy footwork to serve up parts that change a lot, but you can do it the same way DISQUS and Google CSE do it – through the magic of AJAX.

4) Sanity check: calculate the amount of RAM in the home computers of all of your interns. Compare that to the amount of RAM in your server farm. Who wins?

5) Use a CDN and/or caching proxy, don’t be cheap. These things will save your butt when Yahoo and Digg will come a-knocking at the same time. I’m not even going to mention Memcached – you can’t get big without it at all.

6) Fight WYSIWYG editors. These things are the worst. They are the Devil. They are a security hole. You never get what you see. People paste from Word. Do I need to go on?

The best middle of the road solution is something like Markdown.

Do not underestimate the user’s ability to learn a few simple rules. When I worked at TV Guide there was this movie database application. Very non-technical editors were using a very scary-looking Unix-based interface at an amazing speed. When I rewrote it as a web interface, it became more “user-friendly”, but they could not enter stuff as fast as before.

7) Make sure you have good backups

8) I know you won’t be able to follow my advice, I know I can’t either. Life is a constant compromise.

FUD You

A common IT worker in computer related conversation spews more acronyms than a Soviet Commissar, but chances are he or she won’t be able to decipher half of them. Managers often don’t even know the meaning of the concepts that the acronyms represent.

Some acronyms are meaningless by design and recursive to boot. GNU? GNU’s Not Unix!

Others seem like acronyms, but aren’t. I always thought that TWAIN stood for “Technology Without An Interesting Name”, but it turns out it originates from “The Ballad of East and West” – “and never the twain shall meet”. Sometimes when I try to reinstall my scanner for the hundred’s time it seems to be very appropriate.

Some apparently stood for something at some point in time, but then lost their meaning. People understood COM to stand for “Common Object Model”, then “Component Object Model” and now it stands for that old difficult technology that only Don Box used to completely understand. You need to use .NET instead, which is an acronym looking non-acronym which stands for whatever Microsoft wants it to stand for. Now Expect Trouble. Never Edit Text. Next Exciting Technology. What is the dot for? Come on, every developer knows that dots make your code more powerful.

An acronym that is often used in conversations about Microsoft is “FUD”. It always made me think of Elmer Fudd (because people using it often sounded like him), but it’s actually a term coined by a computing pioneer, Dr. Gene Amdahl.

It stands for “Fear, Uncertainty and Doubt” – tactics that IBM salesmen used against Dr. Amdahl’s company. Amdahl made mainframes that were fully compatible with IBM’s, but cheaper and faster. It’s easy to use FUD on managers that were in charge of purchasing those multimillion dollar big irons. “Nobody was ever fired for going with IBM”, right?

The sheer existence of Amdahl was a huge boon to mainframe purchasing customers. The rumor was that if you placed an Amdahl mug on your table, IBM salespeople were gonna give you million dollar discounts.

Let me present an artifact from my collection: the famous “Million Dollar Mug”: