Wednesday, June 24, 2009

Pruning

I spent the past few hours pruning the trees along my property line--mostly across the line on the neighbor's side, but they've grown in and through my fences, and developed into quite a thicket along the property line. The bushes acted as a lower story for the trees which are planted along the line, and looked quite nice; I didn't realize what a mess they'd become. Once I started, I realized they were choking out the bases of the big trees, and had become infested with a mess of thorny vines which had bound the whole thing together into a green mass of dead limbs and vine. Now that they're trimmed back, the trees look nicer, and the edges of my lawn are free to grow; in 2 weeks, I now realize, it will look a lot nicer than it has in the past couple of years.

In my project work, I regularly come across thickets of choked code which is hard to read and even harder to update. I fixed a bug yesterday which should've take an hour to fix, but took almost 12 hours, because I had to write a 2 new unit test suites, making sure to cover all the old code. That took about 10 hours; then the last 2 were spent brutally pruning and finally, fixing the bug. What's left is small and tight, and will be a lot easier to maintain, since it has a nice suite of unit tests.

It's easy to let the thicket grow. It's also easy to write tests while the classes are still small, and keep them up to date as the code evolves. Then, when a class does turn into a thicket, it can be pruned quickly and with authority, because the tests are already there.

Sunday, June 21, 2009

FitNesse for a Particular Purpose

I've been aware of Ward Cunningham's "Fit" framework (http://fit.c2.com/) for some time. I've never actually used it, because it seemed "hard" and my business analysts had their own way of doing things.

My current project introduced me to "FitNesse" (http://fitnesse.org/), a wiki which integrates the Fit framework with wiki pages, and forced me to learn to use it for real enterprise-scale projects, both interactively and as an automated test suite to be run as part of our continuous build process.

Boy, have I been missing out; this thing rocks!

At the core, FitNesse is a wiki. It's important to remember that, because as it's presented in the documentation, it's primarly a test tool. But here's the thing: if you just type your requirements into the wiki, it more-or-less demands that you write tests, too. That is, that you express your requirements as something that the developers must test. That has two huge benefits: 1) it makes sure developers speak the same language as the requirements, and 2) it forces requirements to be testable. Because it's a wiki, as developers and analysts come to a shared understanding of the requirements, either can easily tweak the pages to say what needs saying, and the tests to express those requirements succinctly.

As requirements and associated tests are developed in FitNesse pages, you can check them in to source control, run them regularly, and there's no need to back-trace test to code to requirements--if the FitNesse tests run, you've met all the requirements they embody. It's about the shortest distance between requirements and a quality deliverable I have ever seen, and one of the lowest overhead. Rather than checking our FitNesse pages in "at the end of an edit", we have a scheduled job which checks in any changes at the end of every day.

The examples on the Fit and FitNesse web sites are simple; I encourage you to go there now and have a look. These are great for learning, but they don't express the real power of the tool.

Here's an example from my current project in all its enterprise-messy glory. This is what real enterprise application testing is about. The image below is about 2/3 of the testing page for one scenario (click the image for a full-size rendering. It's large (about 100k, and about 3x my screen)). Most of the page sets up the test conditions. Along the way, there are tests to verify the (very complex) input data are correct; "Read Tax Rate Area", for example, generates numbers which aren't in any specific location--they're summed over dozens of inputs already in the database. Any heading with a question mark is an invitation to invoke the application code to see what the results are.

Binding this page to the application code requires a "fixture", which each developer writes. Fixtures are typically quite small. The set of fixtures which executes the tests on the page shown total 596 lines of code; it took me about a day to write, including figuring out exactly how to perform the required setup. I had to write a fixture base class which took another day, but now that I have that written, writing new complex tests will be as simple as writing this one was. This scenario, and 11 others just like it, run using the same fixture and exercises the same code as runs in production. Those 11 scenarios represent most of the edge cases in one kind of billing. There are many more; a total of around 400 scenarios and thousands of individual tests.

To run the test, you simply punch the test button in the upper right-hand corner of the page. I'll walk through some of the results. First, the top of the page notes that 2 tests succeeded, 2 failed, 2 were ignored completely (usually as a result of failure), and 2 exceptions were thrown. Each row typically represents a test, and that's not anything like the full population of the page, but exceptions tend to end things abruptly. Secondly, note the "Read Tax Rate Area" block, where two tests ran, and both failed. Thus, the assumptions which are built into results further down the page are proven wrong--the testers need to know this so they can revise their expected answers. One important thing to note is that the framework handled the display; all the fixture code does is respond to "getLipTaxRate()" and "getLiTaxRate()"--names derived from the associated headings--with a text string. The framework colored the cells and placed the "expected" vs. "actual" answers on the page.


Scrolling down, there's an exception. Yes--I made a mistake in the fixture which, under the right conditions, throws an NPE. I'll fix that in the next release. On down is a successful test.

That's what it looks like. The fixture code is easy to write, the requirements are expressed in a way the business people understand, and there's a man-day or less (usually far less) between describing the behavior in a FitNesse test and writing the fixture which connects the test to the functional code.

In our environment, FitNesse tests are also run automatically as part of our automated build cycle. The tests span hundreds of wiki pages. The results are consolidated into a nice punchy list with counts of successes and failures, much like the top of the first "post-run" page above. For failures, more detail is provided. The XML parsing is done by fixures already available in CruiseControl. I found excellent help for configuring our ant build (and thus our CruiseControl build) at http://www.jroller.com/njain/entry/integrating_fitnesse_with_cruisecontrol. Thanks, Naresh!

What more could you want in a requirements/code/test/document system? Go forth and experiment. You'll be glad you did.

Wednesday, June 17, 2009

Measuring the Unmeasurable

In a previous post, I wondered how you measure yourself as a developer. The available metrics are pretty poor, overall. Consider all the ways a developer can be "good"--fast, good at translating requirements into functionality, good at the key algorithm, good at clear, concise code, good at predicting when she'll be done with a given bit of code... on and on. How do you measure that? The short answer seems to be: you don't. I haven't found a single metric which addresses developer quality which is in any way objective. Then I cast back to my earlier life as an engineer--and surprise! Exactly the same situation exists there.

Sunday, June 14, 2009

What Needs Doing

I have participated in more than one fiasco. A close friend (and excellent developer) friend of mine is doing so now. These are generally "successful projects" in most senses of the word, except for one key fact: they don't deliver anything new, at a cost of several tens or even hundreds of thousands of dollars.

My personal favorite, and an instructive example, is a nine-month, 6-man replacement project for the public client of an n-tier, mission critical, customer-facing application for a very large public cash-squeezed company. I was architect for this application for several years, and I made a number of specific recommendations for it as I was leaving, including one that the client be refactored so that business logic present in the client (for good architectural reasons) was separated from the display logic. We delivered the first piece of that refactor--a display controller separate from the display code itself--and recommended that it be used for all screens, not just the ones for which we piloted it. For no reason I could understand, the architect who replaced me decided to completely replace the client--including hundreds of working screens, huge chunks of interface logic, state management code, encryption and masking code for credit-card processing, terminal interface code, etc. Knowing his management team, I'm sure the work was justified in bafflegab. The project is just wrapping up, and a cost and schedule of just 5 and 3 times (respectively) the original refactoring estimate. What problem were they trying to solve that could justify this kind of expense?

The original refactoring was proposed to address the fact that the client had become very rigid and spaghetti-bound. It was hard to understand and harder to modify, so that even small changes (and there were many of those in the queue) required weeks to make and test. All the senior developers understood that the source of the problem wasn't the screen display code, or the hardware or server interfaces--it was the way business logic was in-lined into the screen transition code. Once that was moved out into business logic classes and properly tied into a separate screen controller, most of the problems in the client would be resolved. Such a project was difficult but was nicely bounded; we estimated a 2-man team consisting of our 2 best and most experienced developers working for 3 months could complete it. Instead it took a team of 5 working for 9 months to rewrite the entire client. Now they have a nice new front-end written in Adobe Flex--and since all the Flex coding was subcontracted, nobody on the development team can maintain the client UI. I will make a prediction: the new client will be more expensive to maintain than the old one.

I hope I'm wrong. Fortunately, I don't work there anymore.

I see this kind of thing all the time. On my current project, a relatively junior engineer was given free reign for 3 months to refactor a chunk of functionality which runs once a year to reduce the running time from 18 hours to 3 hours. The source of the requirement was the engineer in question--not the users, not the management team. Now the code is barely readable--but it sure is fast. Of course, to make up the 3 months of developer time spent, the annual process has to run about 30 times--a process which will take between 15 and 30 years. Any changes to the process will require man-weeks of developer time to make instead of man-days. In short, there's almost no way to justify that kind of expenditure.

So: how are these decisions made? Why do managers allow them? What could possibly justify this kind of wasted effort? And finally: why do otherwise good designers and developers allow them to happen?