I figure any phrase that people deem to be a "law" and find important enough to attribute to a specific person (even if incorrectly) probably contains some real wisdom. Here's a collection of Eponymous Laws from Wikipedia, all of which I have found to be true in my own experience.
Amara's Law: We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.
Conway's Law: Any organization that designs a system will inevitably produce a design whose structure is a copy of the organization's communication structure.
Gall's Law:
A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a working simple system.
Parkinson's law: Work expands so as to fill the time available for its completion.
Law of the Instrument or Maslow's Golden Hammer: It is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.
Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law. (I've also heard this restated as "Every task takes longer and costs more than originally estimated.")
Occam's razor – "Entia non sunt multiplicanda praeter necessitatem." Literally, entities are not to be multiplied without necessity. When two explanations are offered for a phenomenon, the simplest full explanation is preferable. (Or in modern terms: Keep It Simple, Stupid!)
Pareto principle – 80% of consequences stem from 20% of the causes.
Schneier's law – Any person can invent a security system so clever that she or he can't think of how to break it.
Sturgeon's law – Ninety percent of everything is crap.
Occam's Moving Parts
Technology, Business, Life: It's all about systems.
2012-01-24
2012-01-19
Efficiency: Enemy of Innovation?
The science of management in the industrial age was all about efficiency. It had to be. The whole concept of capitalism is based on efficiency. An entrepreneur acquires capital at a cost, and that capital must be made to produce profit at a rate higher than the cost of capital. If you borrowed money at 10% to start your business, you had to make it earn 11% at least. That meant controlling costs ruthlessly and milking every bit of productivity from every penny's worth of capital.
But talk to a systems administrator about efficiency. She'll tell you that, in terms of percentage of server utilization, there are two numbers you never want to approach, numbers that will cause midnight pages and pale-faced panic. The first number, of course, is 0%. Everything is down! The second, more surprising but equally frightening number is: 100%! At 100% utilization, everything breaks, because you have no more capacity for work.
Now a capitalist might look at a well-run data center, and his first instinct is, "Look at all this waste! Half these servers are sitting idle most of the day." But the clever sysadmin will tell him that spare capacity is what keeps the data center running. If your capacity is 100 requests per second, a 101st request can bring the whole system to a screeching halt. 100% and 0% are equally disastrous. If you want your Internet business to operate, you must have spare capacity.
Now, I'm not arguing that efficiency is somehow evil. If you are in a capital intensive business today, you still need to use that capital efficiently. Of course the capitalist theory goes that capital + labor = profit, but what often gets lost in the quest for efficiency is the fact that people are not labor. That's a false assumption, and that formula was never correct (which should not surprise anyone given its source). It's not labor that turns idle capital into profit; it's creativity and its more productive sister, innovation.
In order to innovate, in order to create, you need some very special ingredients. First, you need people. Smart people, with a desire to solve problems, the ambition to tackle big ones, and the hubris to believe that they can do something better than everyone who has come before them. These people then need time to analyze the problem and devise or improvise solutions, and they need resources (read: money) to test those solutions.
So no, efficiency is not necessarily the enemy of innovation. Saving time and money on existing processes creates spare capacity that can be allocated to innovation. The extra people, time, and money that are not being used to operate the existing business, instead can be applied to solve the next big problem and give birth to new lines of business. But too many business leaders still see people as labor which, if not making capital productive, they label as "waste". Spare capacity is inefficiency in their eyes. They see the tools of innovation as inefficiency, and so they attempt to eliminate it. With the result that they eventually become irrelevant because the industry has passed them by.
Don't fall into this trap at your company. In established processes like manufacturing, efficiency creates value. In exploratory processes like innovation, efficiency destroys value. Use efficiency to generate spare capacity from your established processes. Then, use that capacity to tackle big problems. To stay relevant and keep growing, accept that creativity is inefficient, and pay the cost to gain the future rewards.
But talk to a systems administrator about efficiency. She'll tell you that, in terms of percentage of server utilization, there are two numbers you never want to approach, numbers that will cause midnight pages and pale-faced panic. The first number, of course, is 0%. Everything is down! The second, more surprising but equally frightening number is: 100%! At 100% utilization, everything breaks, because you have no more capacity for work.
Now a capitalist might look at a well-run data center, and his first instinct is, "Look at all this waste! Half these servers are sitting idle most of the day." But the clever sysadmin will tell him that spare capacity is what keeps the data center running. If your capacity is 100 requests per second, a 101st request can bring the whole system to a screeching halt. 100% and 0% are equally disastrous. If you want your Internet business to operate, you must have spare capacity.
Now, I'm not arguing that efficiency is somehow evil. If you are in a capital intensive business today, you still need to use that capital efficiently. Of course the capitalist theory goes that capital + labor = profit, but what often gets lost in the quest for efficiency is the fact that people are not labor. That's a false assumption, and that formula was never correct (which should not surprise anyone given its source). It's not labor that turns idle capital into profit; it's creativity and its more productive sister, innovation.
In order to innovate, in order to create, you need some very special ingredients. First, you need people. Smart people, with a desire to solve problems, the ambition to tackle big ones, and the hubris to believe that they can do something better than everyone who has come before them. These people then need time to analyze the problem and devise or improvise solutions, and they need resources (read: money) to test those solutions.
So no, efficiency is not necessarily the enemy of innovation. Saving time and money on existing processes creates spare capacity that can be allocated to innovation. The extra people, time, and money that are not being used to operate the existing business, instead can be applied to solve the next big problem and give birth to new lines of business. But too many business leaders still see people as labor which, if not making capital productive, they label as "waste". Spare capacity is inefficiency in their eyes. They see the tools of innovation as inefficiency, and so they attempt to eliminate it. With the result that they eventually become irrelevant because the industry has passed them by.
Don't fall into this trap at your company. In established processes like manufacturing, efficiency creates value. In exploratory processes like innovation, efficiency destroys value. Use efficiency to generate spare capacity from your established processes. Then, use that capacity to tackle big problems. To stay relevant and keep growing, accept that creativity is inefficient, and pay the cost to gain the future rewards.
2011-07-09
Web Developers: Infrastructure is part of your Application!
One of the most difficult realities for web developers to face is that their application code, elegant and beautiful as it may (or may not) be, does not run in the ivory tower of Code Perfection. It runs on a real machine (or several) in a real data center, competing for resources to serve real clients, and tripping over all-too-real limitations of the environment.
Operations people, those shadowy, pager-carrying folks that developers call "sysadmins", know that there is so much more to delivering a web application to its clients than simply deploying code. Web applications are not delivered the way packaged software was in the 90's, on a shrink-wrapped CD-ROM like a book. Web applications are not products at all, they are services, and services don't get to say "bring your own computer." Services must be delivered complete, with an entire stack of running programs and systems underneath them.
A web application, whether Java, Ruby, Python, PHP, or LOLcode, is incomplete until it is paired with a stack of servers and services on which to run it. Which language runtime must be installed? Which version of which web server(s)? How should the database server be tuned? How much RAM allocated to memcached? When should the logs be rotated? Developers often do not even think about these questions. When they do, the answers are usually provided as a narrative requirements list which some dedicated systems engineers must translate into a working system somehow.
Systems automation has now reached the point where this infrastructure can be delivered as code right along with the application code. Every web application should be delivered with Puppet configurations or Chef cookbooks to bring up a precisely tuned deployment stack designed for the application. Cloud-based infrastructure means you can even deliver the (virtual) hardware itself with the application. A good web application should come with a "deploy_to_ec2" script for instant production deployment.
Of course, there are other opinions. You may choose to outsource your operations work to a platform-as-service like Heroku or App Engine. If you want to live in a code-only world where infrastructure never crosses your mind, write your code to target deployment environments like these, and get used to the constraints they impose.
In my opinion, every web development team needs a systems engineer embedded as part of the team, developing and codifying the infrastructure alongside the application code. A web application delivered without infrastructure automation is incomplete.
Operations people, those shadowy, pager-carrying folks that developers call "sysadmins", know that there is so much more to delivering a web application to its clients than simply deploying code. Web applications are not delivered the way packaged software was in the 90's, on a shrink-wrapped CD-ROM like a book. Web applications are not products at all, they are services, and services don't get to say "bring your own computer." Services must be delivered complete, with an entire stack of running programs and systems underneath them.
A web application, whether Java, Ruby, Python, PHP, or LOLcode, is incomplete until it is paired with a stack of servers and services on which to run it. Which language runtime must be installed? Which version of which web server(s)? How should the database server be tuned? How much RAM allocated to memcached? When should the logs be rotated? Developers often do not even think about these questions. When they do, the answers are usually provided as a narrative requirements list which some dedicated systems engineers must translate into a working system somehow.
Systems automation has now reached the point where this infrastructure can be delivered as code right along with the application code. Every web application should be delivered with Puppet configurations or Chef cookbooks to bring up a precisely tuned deployment stack designed for the application. Cloud-based infrastructure means you can even deliver the (virtual) hardware itself with the application. A good web application should come with a "deploy_to_ec2" script for instant production deployment.
Of course, there are other opinions. You may choose to outsource your operations work to a platform-as-service like Heroku or App Engine. If you want to live in a code-only world where infrastructure never crosses your mind, write your code to target deployment environments like these, and get used to the constraints they impose.
In my opinion, every web development team needs a systems engineer embedded as part of the team, developing and codifying the infrastructure alongside the application code. A web application delivered without infrastructure automation is incomplete.
2011-06-19
Web Analytics for Operations
Web analytics packages, from free to exorbitant, have grown in complexity over the life of the web. That's great news for marketers using the web as a tool to deliver a message to an audience. These tools allow them to measure audience reach, time spent viewing a page, return visits, session length, and other useful customer engagement factors that helps shape the business strategy.
Unfortunately, while the marketers have won some great tools, where does that leave the techies who need to operate the infrastructure? We don't need to know how long a visitors spent on the site, nor to measure the difference between a "page view" and an "interaction", we need to know how many requests per second the application will generate. Where marketing-oriented analytics goes to great pains to filter out automated crawlers, we desperately need to know when a rampant robot is eating up server resources.
There isn't much in the way of off-the-shelf software to fit our needs. Mostly, we grow our own solutions, cobbled together with a tool here and a tool there.
Lately I've had a need to do some log analysis over a large farm of Apache web servers. I looked at a few open source packages that I knew about: AWStats and Webalizer being the perhaps the best known. But I wasn't happy with either of these solutions. I wanted a tool that would allow me to aggregate not just hits, but time spent generating each page (in milliseconds), and I wanted to break down traffic by five minute increments for a detailed shape in my graphs. So finally, and somewhat reluctantly, I settled on analog.
Analog is not pretty nor user-friendly by any means. The configuration file is touchy and somewhat arcane, and its convention for command line parameters is non-standard. However, analog generates 44 different reports, including time breakdowns from annual down to my desired five minute interval, reports for successes, failures, redirects, and other interesting outcomes, and a processing time report with fine resolution. It can read compressed log files, and it has no problem processing files out of chronological order.
Most importantly, analog is blazingly fast. It chewed through my 20 million lines of compressed Apache logs in six minutes. The speed at which it consumes log files seems to be limited more by I/O rate than CPU, though as a single-process, single threaded application, analog will only tax one of your CPU cores. If you find CPU a limiting factor on a multi-core system, you might try decompressing the files using gzip and piping the output to analog. This allows the decompression to happen in a separate process, and therefore on a separate CPU core, but I don't know if that would speed things up much.
I'm still not entirely pleased with this solution. I would prefer a solution that was a little more intuitive, and a little easier to customize. Analog has plenty of knobs to turn, but there is no built-in extension mechanism, so it makes me work pretty hard to pull out custom metrics.
I would love to hear what other folks are using to analyze your Apache logs. How do you get operational intelligence? Are you using remote logging? Shoot me an email or leave a comment.
Unfortunately, while the marketers have won some great tools, where does that leave the techies who need to operate the infrastructure? We don't need to know how long a visitors spent on the site, nor to measure the difference between a "page view" and an "interaction", we need to know how many requests per second the application will generate. Where marketing-oriented analytics goes to great pains to filter out automated crawlers, we desperately need to know when a rampant robot is eating up server resources.
There isn't much in the way of off-the-shelf software to fit our needs. Mostly, we grow our own solutions, cobbled together with a tool here and a tool there.
Lately I've had a need to do some log analysis over a large farm of Apache web servers. I looked at a few open source packages that I knew about: AWStats and Webalizer being the perhaps the best known. But I wasn't happy with either of these solutions. I wanted a tool that would allow me to aggregate not just hits, but time spent generating each page (in milliseconds), and I wanted to break down traffic by five minute increments for a detailed shape in my graphs. So finally, and somewhat reluctantly, I settled on analog.
Analog is not pretty nor user-friendly by any means. The configuration file is touchy and somewhat arcane, and its convention for command line parameters is non-standard. However, analog generates 44 different reports, including time breakdowns from annual down to my desired five minute interval, reports for successes, failures, redirects, and other interesting outcomes, and a processing time report with fine resolution. It can read compressed log files, and it has no problem processing files out of chronological order.
Most importantly, analog is blazingly fast. It chewed through my 20 million lines of compressed Apache logs in six minutes. The speed at which it consumes log files seems to be limited more by I/O rate than CPU, though as a single-process, single threaded application, analog will only tax one of your CPU cores. If you find CPU a limiting factor on a multi-core system, you might try decompressing the files using gzip and piping the output to analog. This allows the decompression to happen in a separate process, and therefore on a separate CPU core, but I don't know if that would speed things up much.
I'm still not entirely pleased with this solution. I would prefer a solution that was a little more intuitive, and a little easier to customize. Analog has plenty of knobs to turn, but there is no built-in extension mechanism, so it makes me work pretty hard to pull out custom metrics.
I would love to hear what other folks are using to analyze your Apache logs. How do you get operational intelligence? Are you using remote logging? Shoot me an email or leave a comment.
2011-06-12
Occam's Moving Parts
As an architect of complex applications, I spend my day aggressively applying Occam's Razor, attempting to simplify large systems by removing as much as possible. But the nature of the work is such that the system can never be truly simple. No matter how much I try to simplify, I am left with that feeling that there are too many moving parts.
As a geek, I apply a systems approach to almost everything in my life. I have a system for preparing meals, a system for loading the dish washer, a system for folding my underwear. I can't perform an activity more than once without thinking about optimizing and systemizing it somehow. I am always looking for patterns, and I am always looking for that piece that just doesn't fit.
This blog is intended to be a collection of my observations and ponderings on the systems of the world, particularly but not exclusively those in the technology and business realms. What are the moving parts and how do they fit together? How can we apply Occam's Razor to them? Which parts can be removed, and which parts are essential?
Like most of my writing, I expect to bore almost everyone, but hopefully fascinate and engage a few people.
As a geek, I apply a systems approach to almost everything in my life. I have a system for preparing meals, a system for loading the dish washer, a system for folding my underwear. I can't perform an activity more than once without thinking about optimizing and systemizing it somehow. I am always looking for patterns, and I am always looking for that piece that just doesn't fit.
This blog is intended to be a collection of my observations and ponderings on the systems of the world, particularly but not exclusively those in the technology and business realms. What are the moving parts and how do they fit together? How can we apply Occam's Razor to them? Which parts can be removed, and which parts are essential?
Like most of my writing, I expect to bore almost everyone, but hopefully fascinate and engage a few people.
Subscribe to:
Posts (Atom)