8 Pieces of Architectural Advice for CMS

I have some advice for those in the business of building large websites with content management systems.

1) Do not implement search yourself.

Your CMS sucks at search, and so do you. I see this again and again and again. Everyone is implementing search on large websites instead of using Google. Developers are afraid of looking unprofessional. Managers are answer yes to the question “do you want advanced/faceted search” (the correct answer is no – user’s don’t like it and don’t use it). As a result a lot of resources (both server and developer) go into implementing something that Google is awesome at. Even some very smart people, like Jeff Atwood roll their own search, and their users end up going to google.com and typing “foo site:stackoverflow.com”.

Users are very happy with Google CSE, and don’t mind the text ads. Those text ads – well, that’s revenue that you would otherwise would not have, however small this is. If you absolutely can’t do Google CSE – buy their search appliance. If you can’t do that either – well, you better be using Solr.

2) Do not implement comments yourself (unless comments are what you do for a living).

It is extremely difficult to get comments right. Users absolutely abhor comments. Spammers – well, they love it. Luckily, you can just go and get DISQUS to do all the heavy lifting for you. The time saved on using DISQUS can be used on building something else, meanwhile users absolutely love leaving comments through it, while spammers hate it.

3) Physically separate your admin interface from the stuff that is going to be used by your users.

Maciej Ceglowski has some words of advice about not having your blog hacked: cache your output in flat files and hide the admin interface. The benefits of this are tremendous: cached files are fast and secure. You will need to do some fancy footwork to serve up parts that change a lot, but you can do it the same way DISQUS and Google CSE do it – through the magic of AJAX.

4) Sanity check: calculate the amount of RAM in the home computers of all of your interns. Compare that to the amount of RAM in your server farm. Who wins?

5) Use a CDN and/or caching proxy, don’t be cheap. These things will save your butt when Yahoo and Digg will come a-knocking at the same time. I’m not even going to mention Memcached – you can’t get big without it at all.

6) Fight WYSIWYG editors. These things are the worst. They are the Devil. They are a security hole. You never get what you see. People paste from Word. Do I need to go on?

The best middle of the road solution is something like Markdown.

Do not underestimate the user’s ability to learn a few simple rules. When I worked at TV Guide there was this movie database application. Very non-technical editors were using a very scary-looking Unix-based interface at an amazing speed. When I rewrote it as a web interface, it became more “user-friendly”, but they could not enter stuff as fast as before.

7) Make sure you have good backups

8) I know you won’t be able to follow my advice, I know I can’t either. Life is a constant compromise.

Zombie-free Mac Children’s Games

I was born at the beginning of the age of information. I welcome the content deluge.

I’m not a snob. I do not discriminate amongst the sources of content, gladly consuming books, television, movies, music, magazines, websites, wikis, and blogs. I like to think that thanks to technologies like ebook readers, blog aggregation, suggestion engines at Amazon and Netflix, and Tivo I limit my input to only the stuff that is “awesome” on the “Normal people” scale.

I remember the time when the flow of information available to me was limited to my father’s sizable library and a few hours a week of interesting TV culled from the 3 horrible channels of Soviet television, and really don’t miss it.

My 4 year old daughter is swimming in the sea of information together with me. We read books to her (the quality of children’s books these days is amazing), she watches dvd and tivo’d shows, youtube videos on a laptop. She really wants to play with a computer as well.

Unfortunately the only game that I have is “Plants Vs. Zombies“. We play it together usually as a reward for good behavior. She enjoys the “zen garden” part of the game, as well as the regular “zombie” part. This, of course led to the questions on the nature of zombies (uhh), their diet (brains), the nature of brains, and the absence of female zombies in the game (uhhh).

When Natalie was younger and I used to have a PC, there was a whole bunch of craptastic PC games (one even with a special keyboard, if I remember) that we used to play. These crashed often and were pretty retarded.

Now that I have a Mac, I’m looking for some better, zombie-free games suitable for a 4 year old. Finding good computer games is much more difficult than finding good children’s books. Do you have any suggestions?

Semi-literate Programming

I recently finished “Coders at Work“, a series of interviews with famous programmers.

On one hand, reading a book like this is a downer: it’s very clear to me that I occupy a place that is very close to the median of the bell curve, and the skill level of programmers is a very steep non-linear curve in itself. I’ll never be as good as JWZ or Brad Fitzpatrick. But I knew that before, and I am ok with it. On the other hand, this book inspired me to read more code.

The programmers in the book disagree on many points, but they mostly agree on the importance of writing readable code and educating yourself by reading other people’s code. I make my living writing in scripting languages, and I haven’t written a line of C or C++ since college. But there’s nothing preventing me from downloading and taking a look at the source of Apache, PHP, MySQL.

It’s important for me to understand “how the sausage is made” in the PHP stack, and as it turns out, what happens between Apache PHP and MySQL in term of requests and timeouts is not as simple as one might think. I asked at StackOverflow about this, but all the diagrams that people pointed me at were of the very rudimentary type: “look, here’s a happy cow, it goes to Bovine University, look – it’s all shrink wrapped on the supermarket shelf” instead of “sausage farm/slaughterhouse/truck/factory tour, starting with cow insemenation”.

When I downloaded the source code of mod_rewrite, arguably the most useful Apache module in the world, I was amazed to find out that it’s only 5000 lines of C with comments.

The book ends with the interview of Donald Knuth, and another two major questions that the interviewer is asking everyone is – “have you read Knuth’s books and have you tried literate programming”. It was interesting to find out that most of the famous programmers use Knuth’s the same way that I do. The books sit on my bookshelf, I look at them, I sometimes try to read them, I skip most of the math. They serve as a constant reminder to me that I suck at computer science even more than I suck at programming, and luckily there are people out there who know all of this stuff who are not idiots like me.

Here’s a photo of my cubicle at TV Guide circa 2002, Knuth’s books are holding a place of honor next to the mini fridge. By the way, taking pictures of the places where you work and live is something that you should not forget to do: years from now nobody will care about those pictures of flowers, shadows, and sunsets, but

I’ve read the book about Literate Programming at the time, and was rather inspired by it. Ok, maybe I didn’t read it and more like skimmed it. I don’t think I understood what real literate programming is.

The way I understand it, Literate Programming is a way to write programs as a narrative that is readable to computers and humans. My father, in his former career a site supervisor (a type of a contractor) is very fond of giving very detailed instructions to me, the same way he used to give instructions to construction workers. His instructions usually are exaustive algorithms, with error handling. I think that his instructions, expressed as a flow of conciousness, would work not only on me and construction workers, but on computers as well, and are similar to what Donald Knuth has in mind. All you really have to do is to build a layer of abstraction between these instructions and a computer language. Also, since computers don’t forget things, he would only need to repeat his instructions once.

These days my dad is a COBOL programmer. Everybody dumps on COBOL, but in my mind it’s a language worth of a lot of respect. It has a syntax that is very English-like, something that makes reading COBOL code easy. Well, maybe it’s like reading some old-timer’s newsgroup post written in all caps, but it’s still much closer to English than most other computer languages.

At the time I was reading “Literate Programming” I was using ASP 3.0, IIS, and SQL Server 97. My task was to write a system that would account for booked and pending business. This is something that had to be done since the age of Mad Men. You see, the dealings of clients, account executives (like Pete Cambell), their bosses, account coordinators, creative department, etc are rather convoluted. But in the end, to get paid, you have to have a system that will track who brought in what business, who handled what, and how the commissions need to be split.

This is normally the realm of something called EAS (Enterprise Application Software). Back at the turn of the century, this area was still dominated by a company called SAP, but there were a few smaller players, like Salesforce.com that tried to package these applications. Any sane IT manager looks to see if an EAS solution can be purchased first. It turned out that TV Guide’s buseness logic was impossible to shoehorn into any existing solution. SAP folks said – yeah, no problem, we’ll build you what you want, but our prices start at $1M, and then there are consultant fees. ERM world is a crazy place, you can read about some true craziness in “Cube Farm”, an account of one hapless developer’s adventures at Lawson Software. It’s a truly riveting book, and I fell that every developer out there should read it. It’s literally Lovecraftian in nature, that book.

In any case, it fell to me to develop the application from scratch. Inspired by Knuth, I decided to write some semi-literate code. Me and a project manager, Brad, went to the clients and interviewed them at length, documenting their existing process (aka the most complicated set of spreadsheets you’ve ever seen). In the past, before cheap computers, all you needed was a Joan Holloway, but I believe they stopped making them.

Brad went on to go back and forth with a very terse document about 5 pages in length that described how the new system would work. He would sit down with the clients and go through the narrative, step by step, confirming that this is what they wanted. Meanwhile I created an object oriented library that made dealing with the database, creating forms and navigation elements much easier. This is similar to to what you might find in a CMS like Drupal, only a little cruder.

When the document shaped up, I created the database schema, and then I took a big chunk of the document and pasted it into one huge comment block. I proceeded to break off chunks of that block and writing the code around it. Interestingly enough, as time went on, the project manager started helping me to write the code: enough of scary database abstration was hidden by simple classes and method, and there were tons of self-evident examples all around to copy and paste. I switched to writing reports that involved cubes, rollups and other fancy stuff. Stored procedures that did the reports also received comments from the document that described the reports.

This wasn’t a monolythic system – I was writing it for 2 years or so, releasing a chunk after chunk. In the end it was handed off to another developer, the whole transfer took only a couple of hours. There weren’t any major bugs, maintanence issues (I believe I received only one phone call about it after several years of continuous use). All in all I was pretty pleased with this approach and can absolutely recommend it.

I believe this is the reason why so many English majors become excellent programmers: if you can write for people, you can write for computers. Sometimes there are reasons why you can’t do both at the same time, but there’s no reason not to find some middle ground.

The Capacitor Plague

I woke up from a nap to a loud pop and a smell of burning plastic. The source turned out to be one of the most precious and important to me digital devices: a ReadyNAS NV+, a small silver box with over a terabyte of hard drives that store my backups, music, and photos.

Network attached storage (NAS) is an engineering compromise. It’s a storage solution that lets you keep a bunch of drives in a self-contained device. It’s redundant: you can lose a drive (which is a statistical certainty) and not lose your data. There are also handy usb ports that let you connect usb drives and a button to run backup jobs onto these drives. It also serves as a print server, and in theory it can be used as a streaming media server. On the other hand it’s slow (gigabit networks are not fast enough when you need gigs of data fast), a complete nightmare to use with photo managers like Picasa, and an even worse nightmare if you want to use it as a Time Capsule.

I’ve spent a lot of time babysitting my ReadyNAS NV+: changing the defective RAM that it shipped with, updating the buggy firmware, finding the right drives for it (some don’t have the right temperature sensors). Don’t get me started on what it took to make it work with Mac’s Time Machine.

And after all that, the one box that was supposed to keep my precious digital archives safe was smoking. This was preceeded by a few days of weird performance issues and a couple of hangs. The power supply finally died a horrible death, and I realized that once again I was falling victim (or “mugu” as Nigerians say) to faulty capacitors.

According to Wikipedia, the name of this phenomenon is “Capacitor Plague“. There is an epidemic of failure in electrolytic capacitors from certain shady manufacturers. Electrolytic capacitors are usually found in power supplies. They are little aluminum cylinders filled with special film and electrolytic liquid or gel. Power supplies get very hot, and the liquid part of the capacitors, the electrolyte, always wants to either dry up or explode. The formula for the electrolyte is very hard to get right.

The rumor is that one or a few companies resorted to industrial espionage to steal electrolyte formulations. They weren’t entirely successful – they either got an incomplete formula or just plain Brawndo.

Spectrum Online did some digging:

“According to the source, a scientist stole the formula for an electrolyte from his employer in Japan and began using it himself at the Chinese branch of a Taiwanese electrolyte manufacturer. He or his colleagues then sold the formula to an electrolyte maker in Taiwan, which began producing it for Taiwanese and possibly other capacitor firms. Unfortunately, the formula as sold was incomplete.
“It didn’t have the right additives,” says Dennis Zogbi, publisher of Passive Component Industry magazine (Cary, N.C.), which broke the story last fall. According to Zogbi’s sources, the capacitors made from the formula become unstable when charged, generating hydrogen gas, bursting, and letting the electrolyte leak onto the circuit board. Zogbi cites tests by Japanese manufacturers that indicate the capacitor’s lifetimes are half or less of the 4000 hours of continuous ripple current they are rated for.”

Wastefulness of today’s society masks the problem: most people don’t perform autopsies on their dead $70 DVD players or $500 computers, they just use that as an excuse to buy the new hottness. The techies with (or without) spare time and soldering skills do the following: fill bulleten boards with tales of saving their devices by soldering in new capacitors; search for instructions on how to solder and purchase capacitors; and curse creatively after doing it for the 5th time.

The unique thing about the capacitor plague is how easy it is to identify: the capacitors literally blow their tops, venting electrolyte through the special stress relief indentations. It’s also unique in that anybody with a soldering iron has a very good chance of fixing it: the caps are easy to locate and solder. In the age when most electronic components are of the “surface mount” type (the size of a sesame seed) or chips with dozens legs as fine as silk, soldering in a two legged capacitor is very refreshing.

Here’s a nest of capacitors from my busted power supply: two in the left corner are clearly popped, the one on the right is probably ok:

In the last couple of years the following devices that I own fell prey to faulty caps: a cheap off-brand dvd player, a speed control on my Dodge Caravan’s air conditioner, a Netgear network hub, a huge and expensive Air King window fan, and now, my ReadyNAS. The interesting thing is that the problem exists in both high end and low end products, as well as in high tech and low tech ones (I did not know there were electronic components in the window fan).

I am out of warranty on my ReadyNAS because I bought it in May of 07. The following passage leads me to believe that the shitty capacitors are a problem that they are aware of and (maybe) fixed in newer releases of the hardware (they could not offer a 5 year warranty if they used the same capacitors – they’d just go broke).

“Please be aware that ReadyNAS purchased prior to August 21, 2007 carries a one-year limited warranty. Extended warranty purchased for these ReadyNAS will be honored by NETGEAR. ReadyNAS NV+ and 1100 purchased August 21, 2007 and later have a 5-year limited warranty, and the ReadyNAS Duo has a 3-year warranty.”

The brand name of the popped capacitors reads “Fuhjyyu”. It lead me to the an urban dictionary entry that says that Fuhjyyu is either

“1) Chinese word for feces.

or

(2) Brand name of abysmal quality capacitors that are installed on logic boards, switching power supplies and various other electronic components.”

There’s also a post from a guy who implores ReadyNas to stop using those capacitors.

Then there’s badcaps.net – a global capacitor gripefest that is too depressing to read.

You can see a nice gallery of busted caps over here

There are broader implications of this: coupled with the fragile lead free solder, leaky capacitors don’t only cause kajillions of dollars of damage, but will also make electronics of our era impossible to use in the near future. The aluminum in burnable cds and dvds are rotting too, destroying the record of our time.

Cinematic New York

When you live and work in New York, you spend a huge amount of time on tv and movie sets. Most of the time the sets are abandoned by the shooting crews, but very frequently tv or movie magic is happening as you are walking by.

Why is New York so overrepresented on screen? Part of it is because it’s New York. But it’s also because the city government is also very friendly to the moving picture industry.

When I worked on a website for Kenneth Cole, I learned an interesting factoid: the real name of this fashion powerhouse is Kenneth Cole Productions. It turns out that in the early days they abused a perk that the city gives to movie people: ability to park their huge trailers in places where normally only city services vehicles can linger. Cole applied for a permit to shoot a movie called “The Birth of a Shoe Company”, parked a huge truck in front of a hotel where a major shoe show was taking place, and proceeded to sell enough shoes while cameras were rolling (sometimes even with film) to start a company.

While watching a movie or a show set in New York I get a lot of “oh, hey it’s” and a lot of “hmm, where’s that?” moments. Sometimes a movie or a show becomes more memorable just because its locations are so familiar to me.

Let me give you some examples about how cinematically impregnated my environs are. Take, for instance 30 Rock. I spent 7 years working in two buildings that are behind 30 Rock, and every little thing in, under and over Rockefeller plaza is seared in my brain. Also, I have the same last name of one of the actors (is Jane Krakowski a relative? Probably not).

The 47-50th Street/Rockefeller Center subway station that I got out at almost every day for those 7 years (unless I missed a few stops while reading or sleeping) is the one featured in a key scene in Darren Aranofsky’s “Pi”. The Brighton Beach bus stop in “Requiem for a Dream” – one of my first American jobs was right there, handing out fliers for a gypsy psychic. One of the buildings where I worked, 1211 Avenue of the Americas was very subtly featured as Sideshow Bob’s prisoner number in a Simpson’s episode.

Sterling Cooper corporate headquarters are famously located at a non-existing 405 Madison Avenue. On the other hand 415 Madison Avenue is a very real building where my wife used to work.

When I go to and from work now, I pass a grating which John McClane ripped off in one of the Die Hard movies to jump on the top of a moving train. The building where I work? Well, it doubles as the Massive Dynamic headquarters on “Fringe”. They do a lot of shooting at the floor where I work. You can see our big conference room called “Jail” in a number of commercials. You know, Doctor House, he’s supposed to stay in New Jersey, but one time he slept on “my” couch at the office after shooting a commercial there. The butterflies of doom from Fringe also live in “Jail”.

Ironically, the only famous person who went to my hight school is Larry David, the co-creator of a certain show about nothing set in New York, but shot in LA.

Treyf

I find Jewish humor to be one of the best ways to explain certain situations in programming. Here are two that I find particularly funny and useful.

The first is a true story told me by a friend. I use it when I’m told that good web developers don’t use tables. It goes like this: My friend’s aunt met her religious relatives for the first time after coming to America from the Soviet Union. Horrified at being served pork sausage, they told her: “But auntie, Jews don’t eat pork!”. She replied — “Nonsense, I eat it all the time.”

The second is an old and racist Soviet-era joke. A Chukcha serves in the Soviet Army, and is an exemplary soldier in border patrol. There’s only one problem — he tends to eat patrol dogs, considering them a delicacy (this untrue ethnic detail must have been created to make the joke setup work). An army psychologist offers to correct this. He sits the soldier down, takes out his watch, and hypnotizes him with the words “you are not a Chukcha, you are a Jew. You don’t like to eat dogs, you like to eat gefilte fish.” The patrol dogs continue to vanish even after the hypnosis seems to have worked. Authorities send another soldier to follow the hypnotized Chukcha around. This soldier reports that the Chukcha sits the dogs down, takes out his watch and hypnotizes them with the words “You are not a dog, you are gefilte fish.” I tend to tell it when I’m told that the act of turning a hack into a Drupal module somehow makes it “gefilte fish.”

To the Moon, Alice

I recently visited my alma mater, Brooklyn College. Some things changed for the better, like the gorgeous new library addition, some for the worse, like the Campus Sugar Bowl restaurant replaced by Starbucks.

On the other hand, the science classrooms and offices in the old and new Ingersall building seem to have been frozen in time, down to the wall niches. You see, most floors have these glassed in niches which the various departments fill. Compsci displays books written by professors, Geology shows off a collection of minerals and fossils (a fancy one at that), Biology has a series of stands with pickled and dried specimens that I think dates to the 1940s, like something out of a Hellboy comic.

The Physics department has a very old, dusty and ironic display, seemingly not opened since the 80s: