The Pokayoke Guide to Developing Software


A good project starts with a need. It’s nice if it’s a big need – that way you have a lot of potential customers – but much more important than that is that it’s an acute need. Users should be hungering to fill this need – it should be so that when they find out about your product, they’re compelled to use it. If you fill a desperate need of one person, you have at least one dedicated customer; if you fill a kind of theoretical need for 6 billion, you could easily end up with none. And since people are often alike, filling a need for one person usually fills a need for many others – or can easily be adapted to do so.

It’s important that you feel this need yourself. Ideally, it’s a need that you have, borne of your own experience. For example, you might be desperately searching for someone to date you. Second-best is if you can go out and try living the lifestyle that inspires that need. For example, if you’re happily married, you might try asking your spouse for a pass so that you can go out and desperately try to find a date. It’s not really the same, but at least it’s something. At the very least, you should sit and watch people who have this need and be able to empathize with them. Go be your single friend’s wing-person and watch them try to find someone.

Of course, it is possible for your need to be too idiosyncratic. Sometimes people will be so in love with an idea that they’ll pretend to have a need for it. You want to make sure it’s a genuine need you’re filling and one good way to do that is to make sure you can find at least one stranger who feels the need as acutely as you do.

Example time: I worked on a site that provided people with a list of interesting and funny things to look at. For most office workers, this is a pretty acute need – offices are boring and you really can only sit at your computer and look at things, so you’re desperate for something interesting to look at to break the tedium. By contrast, my friend worked on a site that let you look up various government things that were happening around you (new liquor permits getting approved, people getting arrested, cars getting towed, etc.). You can come up with lots of stories about why this is interesting or why people might want to know this sort of thing, but there’s no real acute need that this site fills. Despite the fact that my friend did a much better job than I did, the site I was working on became vastly more popular than his.


But a need is not enough – you also need an idea to meet the need. Look at your idea objectively for a second. Does it really seem like it will really meet the need? Most bad ideas are bad because they don’t really do that. You want to work forwards from the need to the idea, not backwards from your idea toward some sort of justification. The government data site I mentioned suffered from this problem – government data is really cool and providing people with an easy way to search through it seems like a really cool idea. And once you’re in love with that idea, it’s easy to come up with needs that it might fill. But you’re just coming up with justifications. It’s not a direct way of addressing any one need. And it’s always better to nail one need than to kind of fill two.

This isn’t to say that one idea can’t solve multiple needs. Great ideas do. But they genuinely solve them. They’re direct and sensible solutions to the problem, not just ways to shoehorn different needs into justifying an idea you’re already fond of.

Take the iPhone for example. You might say “What need does the iPhone solve? Steve Jobs just came up with a really good generic idea and then it happened to be useful to fill all sorts of needs.” But that’s not true at all. When the iPhone was launched, Jobs insisted it filled three needs: it was a widescreen video iPod, a vibrant Internet communicator, and a phone that’s fun to use. Let’s take the one of these that seems least like the iPhone. What would you need to just make a great widescreen video iPod? Well, you’d need a big, wide screen that takes up the whole device and a long-lasting battery. You’d also need some kind of input mechanism, but how do you do that when the screen takes up the whole device? Well, you have to make the screen the input mechanism. But now you have a brick about the size of your phone sitting in your pocket. You really ought to combine them. So why not use the touchscreen to provide the interface to a phone that’s fun to use? And now that you have a big touchscreen and a wireless connection, it seems silly not to be able to use it to access the Internet… and you’re back at the iPhone. Even Steve Jobs wasn’t good enough to sell a good idea that doesn’t fill a real need.

Once you have a basic idea, you don’t need to go into a ton of detail about it. But since you’re the kind of creative person that likes coming up with ideas, you will anyway. You’ll constantly come up with all sorts of cool features or add-ons or uses and whatnot. These are not important, which means that they’ll distract you unless you do something with them. So put them all in a Lenin Document. A Lenin Document is just a description of what the maximalist version of your idea will look like, starting from the core features (it will be able to make phone calls) and working out toward the more obscure (it’ll have an app that will let you control your toaster from bed!).

You’ll probably never look at this document again, but all the good ideas you and your colleagues come up with will stop harrassing you so much once you have a safe place to write them down in.


Oh wait, what colleagues? You’ll also need to put together a team. When hiring someone, you want to ask three key questions:

It’s tempting to skimp on these, e.g. by hiring someone who meets two out of three. But it’s a big mistake. Someone who’s smart but doesn’t get stuff done should be your friend, not your employee. Even if you don’t hire them, you can still talk your problems over with them while they procrastinate on their existing job. Someone who gets stuff done but isn’t smart is inefficient: non-smart people are always doing things the hard way and smart people can’t bear to watch them do it and are always taking time off of their real jobs to go over and help. Someone you can’t work with, you really can’t work with. It’s always tempting to say “well, it’s just work, we don’t have to be friends”, but work is hard and if you don’t feel like you can honestly communicate with someone, they end up doing the wrong thing and you don’t correct them and then they just end up sitting in a corner somewhere not doing anything useful.

The traditional programmer hiring process consists of: a) reading a resume, b) asking some hard questions on the phone, and c) giving them a programming problem in person. I think this is a terrible system for hiring people. You learn very little from a resume and people get real nervous when you ask them tough questions in an interview. Programming isn’t typically a job done under pressure, so seeing how people perform when nervous is pretty useless. And the interview questions usually asked seem chosen just to be cruel. How many of the people asking these questions could actually answer them the first time they heard them?

Instead, just try to answer the three questions. To find out if they can get stuff done, ask what they’ve done. If someone can actually get stuff done, they should have done so by now. If someone’s really good at getting stuff done, they wouldn’t have been able to avoid it. It’s hard to be a good programmer without some previous experience and these days anyone can get some experience by starting or contributing to a free software project. So just request a code sample and a demo and see whether it looks good. You learn an enormous amount really quickly, because you’re not watching them answer a contrived interview question, you’re seeing their actual production code. Is it concise? clear? elegant? usable? Is it something you’d want in your product?

To find out whether someone’s smart, just have a casual conversation with them. Do everything you can to take the pressure off: meet at a cafe, make it clear it’s not an interview, do your best to be casual and friendly. Under no circumstances should you ask them any standard “interview questions” – just chat with them like you would with someone you met at a party. (If you ask people at parties to name their greatest strengths and weaknesses or to estimate the number of piano tuners in Chicago, you’ve got bigger problems.) It’s pretty easy to tell whether someone’s smart in casual conversation. We constantly make judgments about whether the people we meet are smart, just like we constantly make judgments about whether the people we see are attractive.

But if you’re still not sure, look at three things. First, do they know stuff? Ask them what they’ve been thinking about and probe them about it. Do they seem to understand it in detail? Can they explain it clearly? (Clear explanations, ala Feynman, are a sign of genuine understanding.) Do they know stuff about the subject that you don’t? Second, are they curious? Do they reciprocate by asking questions about you? Are they genuinely interested or just being polite? Do they ask follow-up questions about what you’re saying? Do their questions make you think? Third, do they learn? At some point in the conversation, you’ll probably be explaining something to them. Do they actually understand it or do they just nod and smile? There are people who know stuff about some small area but aren’t curious about others. And there are people who are curious but don’t learn, they ask lots of questions but don’t really listen. You want someone who does all three.

Finally, figure out if you can work with them by just hanging out with them for a bit. Many brilliant people can seem delightful in a one-hour conversation, but their eccentricities become grating after a couple hours. So after you’re done chatting, invite them along for a meal with the rest of the team or a game at the office. Again, keep things as casual as possible. The point is just to see whether they get on your nerves.

If all that looks good and you’re ready to hire someone, do one last sanity check to make sure you haven’t been fooled somehow: ask them to do part of the job. Usually this means picking some small and separable component you expect to need and asking them to write it. (If you really insist on seeing someone working under pressure, give them a deadline.) If necessary, you can offer to pay them for the work, but most programmers don’t mind being given a small task like this as long as they can open source whatever they did when they’re done. This test doesn’t work on its own, but if someone’s passed the first three parts, it should be enough to prove they didn’t trick you, they can actually do the work they say they can.

Now it’s tempting to say “OK, well why don’t we try hiring you for a month and see how it goes.” This doesn’t work. First, it makes the person you hire feel like they’re on eggshells the whole time, constantly having to prove themselves, which is cruel and counterproductive (the stress and fear makes them less productive). Second, if you can’t bear to say no after a small project, you also won’t be able to after a month and then you’ve just ended up hiring someone who isn’t good enough. Better to just say no and err on the side of getting better people.


Now that you have your team, it’s time to actually do some work. It’s tempting to just dive in and start building your big dream (complete with the part that lets you make toast from bed). But this is a huge waste. You don’t want to do the most you can, you want to do the least you can. Here’s how.

To work, every idea depends on certain hypotheses about the world; if the hypotheses aren’t true, our idea won’t succeed. Let’s say you work at an airline and the need you’ve identified is that people hate waiting in line to board and your idea for solving it is that they can buy a $5 “Early Board” ticket when they check in to get called to board the plane first. Now this idea depends on several hypotheses:

You’ll want to write out these hypotheses and pick the most important. Let’s say that “Our customers want to board the plane early” is the most important. Now remember, if this hypothesis is false, all the work we’ve done will be wasted. So let’s do as little work as we can until we’ve proven that it’s true.

So what’s the minimum necessary to test it? The original term for this is Minimum Viable Product or MVP, but this term has become a buzzword hijacked by people who don’t really understand it. Most people would say the minimum viable product for this idea is a real bare-bones system that just lets you pay an extra $5 at checkout and maybe writes an extra letter on your boarding pass and then instructing all the gate agents to call people with that letter up first. Pretty easy, right?

Maybe, but it could be way easier. The truly minimal way to test this hypothesis is just to add a button to one of the checkout screens that says “Click here to board first.” When someone clicks it, an error message pops up saying “Sorry, our ‘Early Board’ program isn’t available.” And you measure how many people press the button. If a lot of people press it, then clearly people do want to board early. If nobody presses it, then there’s no demand for the product.

But how many presses are enough? It’s very easy to come up with justifications for any number after the fact. “Oh, a thousand people pressed the button,” you’ll say. “That’s huge! That’s ten times as many use our deluxe bag check service!”

“No it’s not,” replies your arch-enemy. “That’s a huge flop. That’s half as many people used the elite pre-screen service.”

You can avoid these arguments by just picking a number in advance that you and your arch-enemy agree on. You both sign off on it, saying that if it’s above the number you’ll agree the hypothesis will have been proven and if it’s below the number it will have been disproven.

But what sort of number should you pick? Actually counting the literal number of button-presses isn’t a very good idea. It’s known as a vanity metric. Let’s say only one out of a hundred people actually want to board the plane early, but your test happens to run during Christmas break, when three times as many people are flying as normally. Well, your button easily vaults over the two-thousand-person goal you set for it, but that’s not because the button is so popular – it’s because so many extra people were flying that week.

Instead, you want to measure an innovation metric, a number that’s independent of everything except the thing you’re testing. In this case, we’d want to measure the percentage of people who clicked the button. Let’s say your goal is that 3% of everyone who saw the button clicked it. That’s a number that won’t shoot up just because a lot of extra people are traveling that week.

Of course, it might go down because Christmas travelers are less savvy than your usual travelers. So you might want to adjust your metric further and say your goal is for 3% of all frequent travelers to click the button. That’s a metric that will stay stable even if a lot of occasional travelers happen to be flying that week.

You can even go further and develop cohorts. A cohort is a group of people chosen in advance. For example, you might pick out a group of specific frequent travelers in advance and only show them the button. That way, there’s no way an influx of new customers can possibly affect your test – they’ll never see it, since you’ve already picked out the specific existing customers who will.

You may also want to develop a control. Perhaps adding another button makes people less likely to buy a (much more expensive) seat upgrade. So take your pool of people picked in advance and randomly divide them in half. Half will get the button and the other half won’t. Then you can compare metrics between the two halves to see if adding the button changed anything. Perhaps 4% of the experiment group bought an upgrade but 8% of the control group did – that would be a difference you could factor into your future planning.


Once you’ve identified a hypothesis, a minimal way to test it, and a clear set of metrics for evaluating it, it’s time to actually build it. You should start by picking a product owner. This is the “Steve Jobs” of your product – they’re empowered to sign off on every detail to make sure the whole thing coheres.

You should write a card (this can be a physical 3x5 card or a task in some kind of task management system like Asana) describing your proposed experiment and the metrics you’ll use for evaluating it:

Select a cohort of frequent travelers and divide them into an experiment and a control group.

As a member of the experiment group, when I check in for my flight I should see a button offering me a chance to board the plane first. If I click it, I should get an error saying this service isn’t currently available.

This is to test the hypothesis that our customers want to board early. We’ll consider the hypothesis proven if more than 2% of the experimental group presses the button. We’ll also monitor their purchase of other upgrades and their check-in completion rate to make sure introducing the button doesn’t have any severe adverse effects.

Note that the first paragraph decides who gets experimented on. The second is a story about a change to a user’s experience. And the third paragraph explains why we’re testing and what metrics we’ll look at.

This will go into a stack of cards (or an online todo list) sorted by priority, with the most important hypotheses to test at the top. Your designer, when they’re done with their current task, will pull a card from the top of the pile. They will then work with the product owner to design what this experience should look like (where do you put the button? what exactly does it say?). Once the product owner has signed off on it, they’ll hand the card to a programmer and work with them to implement the design.

Practical problems with implementation or experience with actually using it once implemented may cause them to revise the design a couple times, and maybe they bring the product owner in for more feedback on their revisions.


It’s good practice to write automated tests for your software as you’re developing it, so that people can easily know if they’ve broken part of it later. When you think you’re finished, you can run the automated tests to make sure they all pass. You want to make sure the automated tests run against both the control and all the experiment arms, of course.

Programming is a mentally strenuous job, so it’s often more efficient to have programmers work in pairs, with one typing and another observing and commenting. (Sometimes it’s fun to have one person in the pair writing the tests, then sliding the keyboard over and having the other person write the code that makes them pass and the next round of tests.)

If you don’t have a pair working with you the whole time, you should at least make sure you pull someone over to evaluate the changes you made. You should always read the diff before committing code to the project (for example, by running git diff HEAD).


If you’re building a network service (e.g. a web application), you should design it as a Twelve-Factor Application. A Twelve-Factor Application follows twelve principles:

  1. The entire application’s code is stored in a single revision control repository. If you have multiple repositories for different parts of the software, you should consider them to be separate applications that treat each other as services. If you have multiple applications in a single repository, you should factor out whatever they both use as a library they both depend on and then split them into two codebases.

    For your revision control system, you will probably want to use git, because it’s the most featureful and most popular.

  2. All your dependencies should be explicitly declared. In Ruby you can do this with a Gemfile, in Python with requirements.txt, etc. Locally, you should use a tool like bundler or virtualenv to isolate your environment to make sure you aren’t using any undeclared dependencies.

  3. All configuration values should be stored as environment variables. This includes anything you’d be afraid of making public, like passwords or secret keys, as well as anything that might be different from deploy to deploy, including the locations of databases or the administrator email address.

  4. All backing services (like databases or in-memory caches) are treated as services. No distinction is made between local and third-party services; they’re all accessed over the network.

  5. Code is deployed in three separate stages: build (in which the software is compiled and built), release (in which it’s combined with the configuration environment and put onto the appropriate servers), and run (in which it’s executed). These stages should be completely isolated – the server can’t change its configuration at runtime, since the release stage has already been passed. And the release process can’t edit the software, since the build stage has already passed.

  6. The application should execute as a series of stateless processes that share nothing – any process should be able to be killed at any time. This means any state needs to be stored in one of the backing services.

  7. The application should be completely self-contained and contact the outside world through an IP port (designated by the $PORT environment variable). It shouldn’t be expecting to live inside some sort of larger process.

  8. The application should be made up of various process types and be able to scale by starting more instances of these process types. For example, if there’s a lot of web traffic, you should be able to handle it by starting more instances of the web process type.

  9. Processes should be disposable – you should be able to start them and stop them at a moment’s notice, without any harm.

  10. The gap between development and production should be kept small – the same backing services, dependencies, and team should be used in both places. Development is just another deploy of the application with a slightly different config.

  11. Logs are just a stream of events written unbuffered to stdout. It’s not the application’s job to make sure they get to the right place; that’s the job of the infrastructure.

  12. Administration tasks should be run as one-off processes.

You should also use a 12-factor hosting system, like Heroku, since it will force you to obey these constraints.

You should also introduce a Chaos Monkey to further ensure the robustness of your system. A Chaos Monkey is an automated process that deactivates different elements of a system to ensure they are robust in response to outages. For example, processes are meant to be disposable, so a Chaos Monkey would automatically kill randomly-selected production processes. This both provides an incentive for developers to avoid depending on processes being persistent and, if they make a mistake and do it anyway, catches the mistake early rather than later when it compounds with others and causes a catastrophic failure.


Developers commit their code to the revision control repository once the changes are made and all the tests pass.

If their code requires changes to one of the backing services (e.g. a change to the database schema), this should be done through migrations. A migration describes how to make and rollback such a change. When a new version of the software is deployed, any un-run migrations are run, synchronizing its version of the backing service to the one the code depends on. Upon rolling back, the migrations are also rolled back. This makes sure the backing services and the code are always in sync.

Almost all new code is committed to the main line of development (aka trunk or HEAD). This avoids the painful task of merging different branches of development later. Since any big change should be implemented as an experiment, if a change is unfinished or unready, it can be easily turned off by keeping most users out of the experiment. When the code is ready, more people can be added to the experimental group and the toggle can be eventually removed.

Once a commit is made to the repository, the repository should automatically build, release, and run the code in a fresh testing environment and run the automated tests against it. It should then try running the code and the migrations against a full copy of the production backing services (read: database) and try applying and rolling back the migrations, making sure the tests pass either way.

If they all pass, it should be pushed forward to production. To make sure that broken code that somehow got passed the tests doesn’t make it into production, you should have an immune system to monitor the deploy. The immune system will watch your key innovation metrics (looking at new revenue, new users, etc.) to make sure they haven’t been adversely affected by the deploy. If they are, it will automatically roll back and alert the team.

To make sure that code that hasn’t been reviewed by the product owner doesn’t make it into production, you give them control over launching the experiments. New features will initially make it into production with no one in the experiment. The product owner can then either add themselves to the experiment or turn the experiment on for everyone on a preview server to test it. QAs can also test it there as well. When everyone is happy, more people can be added to the experiment. If the metrics look good, even more can be added, until eventually 100% of users are in the experiment and the control arm can be removed from the codebase.


Some people will encourage you to have a big Hollywood-style launch, hyping the release date for months in advance before throwing it open to an appreciative world. This may work well for Hollywood – if your movie is a big hit at the box-office on opening weekend, then the movie theaters are more likely to keep showing it in the weeks to come and you get credit for being “one of the weekend’s biggest films”. But for software developers, it’s nonsensical. Your software isn’t being released in theaters, it’s available over the Web. You don’t have to worry about the theater no longer showing after week one; you can keep pushing it for years, growing your userbase.

The problem with a big launch is that, unless you are perfect, launch day always reveals bugs and conceptual errors you hadn’t noticed before. Now millions of people are visiting your product and experiencing those mistakes. (Perhaps one of your mistakes is that you didn’t properly load test the service and now they’re experiencing a downed website.)

Instead, you should treat your launch as another experiment, slowly ramping up the number of people allowed inside as long as the metrics are peforming well. Follow Gmail’s example: give invitations to a handful of people, test your hypotheses against their usage and feedback, and when they like it let them invite a few more. Slowly ramp up until everyone that wants an invitation has one and the whole world is inside your big experiment.

Good luck!


This document was originally written by Aaron Swartz and is an assemblage of many different ideas, his and others. The need and idea discussions were probably influenced by discussions with Paul Graham. Hiring was adapted from Aaron’s How I Hire Programmers which written in response to Joel Spolsky’s writing (his book on hiring is called Smart and Gets Things Done). Hypothesis is adapted from The Lean Startup and the Toyota Production System. Team and Development are based upon ideas from Extreme Programming. Architecture is obviously based around the 12-factor application. The Chaos Monkey is from Netflix. Migrations is a term from Rails. Continuous deployment with an immune system comes from IMVU via Timothy Fitz. The launch section is adapted from How to Launch Software.

If you have changes or suggestions, file an issue or a pull request.

当前网页内容, 由 大妈 ZoomQuiet 使用工具: ScrapBook :: Firefox Extension 人工从互联网中收集并分享;
若有不妥, 欢迎评注提醒:


点击注册~> 获得 100$ 体验券: DigitalOcean Referral Badge

订阅 substack 体验古早写作:

关注公众号, 持续获得相关各种嗯哼:


关于 ~ DebugUself with DAMA ;-)
公安备案号: 44049002000656 ...::