8 Pieces of Architectural Advice for CMS

I have some advice for those in the business of building large websites with content management systems.

1) Do not implement search yourself.

Your CMS sucks at search, and so do you. I see this again and again and again. Everyone is implementing search on large websites instead of using Google. Developers are afraid of looking unprofessional. Managers are answer yes to the question “do you want advanced/faceted search” (the correct answer is no – user’s don’t like it and don’t use it). As a result a lot of resources (both server and developer) go into implementing something that Google is awesome at. Even some very smart people, like Jeff Atwood roll their own search, and their users end up going to google.com and typing “foo site:stackoverflow.com”.

Users are very happy with Google CSE, and don’t mind the text ads. Those text ads – well, that’s revenue that you would otherwise would not have, however small this is. If you absolutely can’t do Google CSE – buy their search appliance. If you can’t do that either – well, you better be using Solr.

2) Do not implement comments yourself (unless comments are what you do for a living).

It is extremely difficult to get comments right. Users absolutely abhor comments. Spammers – well, they love it. Luckily, you can just go and get DISQUS to do all the heavy lifting for you. The time saved on using DISQUS can be used on building something else, meanwhile users absolutely love leaving comments through it, while spammers hate it.

3) Physically separate your admin interface from the stuff that is going to be used by your users.

Maciej Ceglowski has some words of advice about not having your blog hacked: cache your output in flat files and hide the admin interface. The benefits of this are tremendous: cached files are fast and secure. You will need to do some fancy footwork to serve up parts that change a lot, but you can do it the same way DISQUS and Google CSE do it – through the magic of AJAX.

4) Sanity check: calculate the amount of RAM in the home computers of all of your interns. Compare that to the amount of RAM in your server farm. Who wins?

5) Use a CDN and/or caching proxy, don’t be cheap. These things will save your butt when Yahoo and Digg will come a-knocking at the same time. I’m not even going to mention Memcached – you can’t get big without it at all.

6) Fight WYSIWYG editors. These things are the worst. They are the Devil. They are a security hole. You never get what you see. People paste from Word. Do I need to go on?

The best middle of the road solution is something like Markdown.

Do not underestimate the user’s ability to learn a few simple rules. When I worked at TV Guide there was this movie database application. Very non-technical editors were using a very scary-looking Unix-based interface at an amazing speed. When I rewrote it as a web interface, it became more “user-friendly”, but they could not enter stuff as fast as before.

7) Make sure you have good backups

8) I know you won’t be able to follow my advice, I know I can’t either. Life is a constant compromise.

Paid ReviewMe Post: Phone Spam Filter

These days a controversial company RevieMe.com became downright unethical – they make it abundantly clear that they became a link purchasing company. On the other hand Phone Spam Filter is a site I don’t mind sharing Google juice with, so it’s a quick and fun way to add a 50 bucks to my Kindle fund. Here’s my review:

The goal of this site is pretty simple: Phone Spam Filter is asking you to snitch on telemarketers. You search for a phone number that you received a marketing call from and then complain about it. Besides getting a little relief from venting at the phone spammers, you get a bit of satisfaction from knowing that you added them to a blacklist. Nothing good can come out of this for the dinner-interrupting bastards. Meanwhile it’s a good place to find out if mysterious phone numbers that show up on your phone are from run of the mill telemarketers or not.

The even cooler thing is that they have an API that can help you block calls from this blacklist if you have an Asterisk PBX or are willing to install some Windows software and have a modem connected to a phone line. While Asterisk is pretty awesome, running Windows and having a modem connected to a phone line is a horrible idea these days – there are dozens of viruses that want nothing more than make a few 1-900 phonecalls. In the future Phone Spam Filter guys are hoping to add integration with VOIP providers.

The Phonespamfilter technology is not as cool as JWZ-endorsed audio-cock technology (“their computer’s speakers should create some sort of cock-shaped soundwave and plunge it repeatedly through their skulls”), but I guess it’s a start.

They also have sites in Australia, New Zealand, France, and UK

Captcha Gotcha

I’ve been using CAPTCHA — Completely Automated Public Turing Test to tell Computers and Humans Apart”, that little graphic showing a string of numbers that needs to be typed in to submit a comment to this blog. Guess what – I see furious reloads of the comment page generated by spambots, yet 0 comment spam. Zero! I changed the script that generates my CAPTCHA so that it would make it easier for people. It’s weak enough that an automated solution might solve it, but I am yet to see a spammer sophisticated enough. There are enough unprotected blogs out there to make this sort of effort useless.

I guess soon enough we will see some kind of a spam Cold War when companies like Google will start using CAPTCHA as a method for email SPAM protection. We need to take our email back – now most of the time I don’t even feel like writing to people – there’s a very good chance that my email will get lost and ignored (well, that might also be that the people I write ignore my emails on their merits, but I like to stay optimistic). What’s funny, is that like with Cold War arms race, we might get some fringe benefits in the field of Artificial Intelligence. I say, bring it on.


I really hate email these days. Gmail might have solved (at least for me) the storage problem and mostly solved the spam problem (the filter is very efficient), but there is soooo much crappieness in email.

Email servers and clients are just out of whack lately. Even Gmail checks zip files for executables somehow (neat trick) and refuses to add them. It works ok if you change the extension to .zip.foo or something like that. But this at least is a decent way of dealing with the problem of people sending virus laden executables – warn that you are not sending it and let through people who are smart enough to rename the extension.

On the other hand I’ve encountered every type of nastiness – from silently dropping emails to stripping the attachments (again, silently) to bouncing the email back with absolutely unintelligible error messages.

Filter stupidity similar to what excellent Joe Grossberg is describing here is also rampant.

Oh, and trying to send out an email in Russian. Fugedaboudit! The extra bits in Unicode or KOI-8 get chewed off every which way rendering my laboriously typed and spelling error infested emails unreadable half the time. If there is a way to reliably send Russian encoded emails without using attachments – I was not able to find it yet.

Worst of all, you sit there waiting for a replies wondering – are people just ingnoring me? Did the message get silently dropped, swallawed or chewed up on the way? Did it get lost amongst spam about Ciagra and Vialis? (As a side note, my co-workers were joking this morning about how I should write on my cubicle dweller’s box that contains vitamins, painkillers, antiacid and caffeine pills “V1A8RA” in marker). Did the person mean to answer me but forgot lately? Did something happen to him or her?

But you know what I hate even more than email? Public comments in blogs. Letting my own often illiterate and/or stupid comments spill out onto the Information Superhighway and having them fester and petrify there for future generations is not a good idea. From now on my policy is not leaving any comments whatsoever. I’ll use exclusively email from now on. If you want to leave me a public comment in Livejournal – go ahead, but I’ll probably answer via email. I do try to answer most comments.

Also a part of this policy is not reading or writing any private posts in Livejournal. Nothing good ever comes out of them.

In other news, I am thinking about leaving a little note at the bottom explaining obscure puns in my topics. For instance this one is based on the Sopranos Episode 204 title – “Commendatori” (Knights). Babelfish tells me that “commentatore” means “commentator”.

Joko the Lawn Jockey

Immediately after landing in Manhattan, the delegation form Lawn Jockey planet demanded to see our leader.

Interesting, this ubiquitous lawn ornament seems to have an interesting history. It’s also interesting how almost all Jockos I’ve ever seen in New York (including in this stunning collection) were white. And I’ve seen a lot of them when I had a job delivering ad papers in many neighborhoods of Brooklyn. (Yes, I delivered paper spam).

That’s a Paddlin’.

Writing about “interesting” spam … That’s a paddlin’.
Using pictures of the family cat as userpics … That’s a paddlin’.
Using a black background / white text or a crazy background image … That’s a paddlin’.
Reposting memepool or slashdot links … That’s a paddlin’.
Writing “I never post quizzes, but I am going to make an exception for this one” … Oh, you’d better believe that’s a paddlin’.