Undisclosed Recipients: March 2005

Alternative Dispose()

Visual Studio writes Dispose() methods for me that look like this:

protected override void Dispose( bool disposing )
{
    if( disposing )
    {
        if(components != null)
        {
            components.Dispose();
        }
    }
    base.Dispose( disposing );
}

I don't mind - much - it's not like I have to look at it much. But it's a whole lot of indentation to do not very much. I figured the one really important thing it's doing there is calling base.Dispose(), and the conditionals just add a lot of structure to protect the inner clause without interfering with that final line. So I refactored it into this:

protected override void Dispose(bool disposing)
    {
    try
        {
        if (!disposing) return;
        if (components == null) return;
        components.Dispose();
        }
    finally {base.Dispose(disposing);}
    }

I'm pretty sure the behavior is identical, and it's just a whole lot easier on my eyes. The return/finally interactions are a little scary (finally clause code happens after return, which isn't true of normally structured code) - but I'll (grudgingly) trade that scariness for the reduction in apparent complexity.

Triangulation

Here's something I wrote up a long time ago about one of my favorite programming techniques. I've since learned to call it "triangulation".

I noticed myself doing something last night that I have done hundreds or thousands of times before, following a habit that I've unconsciously adopted. It worked - again - and I decided to formalize it, to make the practice easier to apply deliberately in the future, to better be able to determine appropriate situations for its application, and to share with others.

What I was doing was as follows. I had four little triangles to draw around the outside of circles forming an irregular shape. I wrote a routine to calculate the points of one of the triangles, with values for all of the coordinates that would only work for the first case. I then duplicated this routine, and reworked it for a second triangle. I then generalized the code, writing a routine that could handle any arbitrary triangle-outside-a-circle (within the context of my application), replaced the original two routines with calls to this one, and wrote the routines for the remaining triangles.

Here's the recipe:

Write a specific case;
Derive a second specific case from the first;
Write a general solution;
Apply the general solution to the first two cases;
Use the general solution as needed.

and the rationale:

Write a specific case; This lets you solve the problem once without getting caught up in the abstractions. It gives you something simple to test, and a good motivation: you need this first case to work. You also need the other cases to work, but the motivation for the general case is more abstract - "softer", if you will. You can mess with values as much as you need, and it won't be messing up any other cases.

Derive a second specific case from the first; This exposes your first implementation to some stress-testing. It lets you see which parts are stable under different circumstances, and which parts need to be flexible. It gives you a second, more reflective look at the problem - because it is much easier than solving the problem in the first case, you have more of a chance to think about the problem without getting lost in the details.

Write a general solution; The bits you didn't need to change in (2) should form the basis for a common solution, with the bits you did need to change representing parameters to the routine. Apply the general solution to the first two cases; If nothing else, this gets rid of embarrassing code bloat - but of course there is much else: you get a chance to test against already-working cases, and identify and resolve discrepancies.

Use the general solution as needed. Now that you've got it, put it to work. There may be a few remaining things to fix in the general solutions - a sample size of two is, after all, unlikely to be representative of most problem populations - but these should be small, and the main work of the routine is already behind you.

Note that at every step, except (4), you have something new to test. That's good - you don't save up all the bugs for the final step. It is also a fairly even distribution of thoughtload, so you're never bored or overwhelmed.

Crocodile Tears

.

Florida state Sen. Nancy Argenziano spoke at the Capitol in Tallahassee yesterday about her decision to vote against intervening in the case of Terri Schiavo, whose feeding tube was removed but reinserted twice in the past after legal and legislative efforts to keep her alive.

When I saw this in this morning's U-T, I thought: Why can't these supposedly compassionate Republicans spare a little compassion for the truly living? How about shedding a tear for the homeless, for the hungry, for the downtrodden in our country? How about sobbing for the thousands of Iraqis terrorized by our unjust war and occupation?

How about some real compassion, and how about letting that compassion guide your actions?

Crying on command to energize the base is contemptable.

A Nice TDD Experience

Sometimes, Test-Driven Development (TDD) works great; other times I find myself slipping away from it. This was one of the great times.

I wanted a function to return the longest common substring between two strings. Simple, easy to test. I started with this:

[Test] public void LCS_1() 
    {Assert.AreEqual("", Matcher.LongestCommonSequence("", ""));}

Which was, of course, very easy to make pass:

public static string LongestCommonSequence(string s1, string s2)
    {return "";}

The next step was to make it return the first string if the strings were the same (or, actually, even if they weren't - the test didn't require a comparison):

[Test] public void LCS_2() 
    {Assert.AreEqual("a", Matcher.LongestCommonSequence("a", "a"));}

Which was also very easy to make pass:

public static string LongestCommonSequence(string s1, string s2)
    {return s1;}

Notice a certain amount of duplication in the tests? I did:

private void LCS(string want, string s1, string s2)
    {Assert.AreEqual(want, Matcher.LongestCommonSequence(s1, s2));}
[Test] public void LCS_1()    {LCS("", "", "");}
[Test] public void LCS_2()    {LCS("a", "a", "a");}

Unfortunately, I didn't keep all the intermediate steps. But of course I kept the intermediate tests! Tests are forever:

[Test] public void LCS_3()    {LCS("", "t", "a");}
[Test] public void LCS_4()    {LCS("a", "ta", "a");}
[Test] public void LCS_5()    {LCS("ta", "ta", "ta");}
[Test] public void LCS_6()    {LCS("tgggtaa", "ccccctgggtaacccccc", "gggggggggtgggtaagg");}

This resulted in a working, slow version of the function. I was iterating over all possible lengths (my like-real data set had 700+ character strings), over all possible positions. With a working function and a suite of tests to keep it working, it was a simple matter to break out of the outer loop as soon as it failed (if a length of 35 didn't work, there was no point in looking for lengths of 36 or greater), to break out of the inner loop as soon as it succeeded (there may be multiple 35-character substrings in common, but we only need one), and to start each inner loop at the starting point of the previous result (if there were no 34-character matches before the 92nd character, there will be no 35-character matches there, either). Here's the final function, which passes all tests and is fast enough:

public static string    LongestCommonSequence(string s1, string s2)
    {
    string result = "";
    int start = 0;
    
    for (int length = 1; length <= s1.Length; ++length)
        {
        for (int i = start; i < s1.Length - length + 1; ++i)
            {
            string snippet = Substring(s1, i, length);
            if (!Regex.Match(snippet.ToLower(), "^[acgt]*$").Success) break;
            if (s2.IndexOf(snippet) >= 0)
                {
                start = i;
                result = snippet;
                break;
                }
            }
        if (result.Length < length) break;
        }
    return result;
    }

As I look at this, I see that I should probably extract the regex bit into a nice function along the lines of IsJustBases(), and I could squeeze a little more efficiency out of this by ToLower()ing the input parameters instead of the snippet, and I could probably force a bug by mixing cases across s1 and s2. All the changes will be easy and safe, thanks to the existing tests. I can't justify the ToLower() change based on performance, which is adequate, but I certainly can on correctness, if I can show the bug.

Thanks to Ehsan Akhgari, whose Code Formatter has made this post more readable.

Popcorn - Perfect for Hot Air

Several years ago I compiled a list of why popcorn is so very well suited for hot air popping. I don't still have the list, but perhaps I can remember most of the items.

First off, the much lower density of the popped kernels means that they can be blown away by an airflow that doesn't affect the unpopped ones. So the ones that need the heat, stay in the heat.

Next, the low density of popped kernels means they have a lot of resistance to movement, so even though the pop gives them a large initial velocity, they don't fly all over everywhere.

And there is that pop, which imparts a lot of energy, mixing things up (in case there are hot and cold spots, a little mixing up can ensure that every kernel gets its fair turn in the heat).

And the popped kernels that haven't yet made it all the way out provide a nice buffer to absorb some of that energy, so that if an unpopped kernel gets much of the energy, its movement is dampened and its flight kept short.

And it's dense, so it falls back down to the hot surface.

That's all I can remember right now; I think there may have been more.

Propaganda Taxonomy

Let's launch the political side of this blog with a brief taxonomy of the Bush administration's propaganda techniques. I've just been struck recently by the revelations about manufactured news, and it seemed like it would be good to review the whole mess. So, here goes.

Media

Shills (Armstrong Williams, Maggie Gallagher, Mike McManus)
Salting (Jeff Gannon)
Fake News (Karen Ryan)
Embeds
Media Intimidation

Lying

Secrets

Cheney's energy task force

Suppressing Science

Terminology

Privatization/ Private Accounts / Personal Accounts

Stage Props

First Amendment Zones

Timing

Friday afternoon press releases
Delaying 9/11 disclosure until Rice's confirmation as Secretary of State
Rice running out the clock in her 9/11 testimony

I'm sure I've left a lot out here, but it's a start. At times I've thought that dishonesty or hypocrisy were the defining characteristics of this administration, but I'm coming to see that there is something deeper underneath it. Of course there is a great deal of dishonesty (and hypocrisy) documented in the above links, but I think that "message control" (newspeak for "propaganda") is more fundamental. If they felt it was helpful to them for the public to know the truth about something, I think they would probably be just as energetic about spreading that truth as they are about spreading their lies.

Transpose-relative values

Transpose-relative formulas

Transpose-relative Cell References for Pairwise Comparisons

I've proposed to Microsoft a new modifier for cell references. In addition to relative references and absolute references, this modifier (for which I'm suggesting the percent symbol, but any symbol will do) would transpose the reference: if applied to the column, it would advance the column based on the row of the source cell; if applied to the row, it would advance based on the source's column.

With this formula in B1:
=$A1*$A%1

filled into C1 and B2, you get:
=$A1*$A%2 (C1)
=$A2*$A%1 (B2)

My desire for this stems from wanting to do a lot of pairwise comparisons, and wanting to do it without fancy formulas using TRANSPOSE() or (as I've always done) copy-paste-special-transposing the source data column into a row above the table.

I have implemented this in my own spreadsheet, and I like how it works.

Spreadsheet Cut & Copy

I'm currently writing a spreadsheet, and a couple of weeks ago I got around to implementing Copy & Paste, then Cut.

I was surprised by how different Cut was from Copy; I had thought it would be a simple combination of Copy and Delete. I was wrong; I now understand why Excel's clipboard is so nonstandard, relative to non-spreadsheet applications where Cut is just Copy & Delete.

First off, because it changes the document, Cut should itself be undoable, although Copy isn't. I guess this is also true in non-spreadsheet applications.

Links into the clip region (including both relative and absolute links) from outside are unaffected in a Copy's Paste, but are updated in a Cut's, because it is really a Move operation.

This means, too, that a Cut-Paste is a one-shot deal; when it's done the clipboard is empty, although a Copy-Paste can be repeated ad infinitum.

When pasting a single cell, Copy-Paste allows pasting into multiple cells, but Cut-Paste does not; this is an extension to the one-shot rule.

Relative links from the clip region to outside are altered to fit the new location in the case of Copy-Paste, but left alone for Cut-Paste.

All internal links are updated in a Cut-Paste, but only relative internal links are updated in a Copy-Paste.

Welcome

Here's my third attempt at a blog; this time I'm not confining it to just programming or just politics. Whatever strikes my fancy. Enjoy.

Undisclosed Recipients

About Me

Sunday, March 27, 2005

Alternative Dispose()

Sunday, March 20, 2005

Triangulation

Saturday, March 19, 2005

Crocodile Tears

Friday, March 18, 2005

A Nice TDD Experience

Thursday, March 17, 2005

Popcorn - Perfect for Hot Air

Wednesday, March 16, 2005

Propaganda Taxonomy

Tuesday, March 15, 2005

Transpose-relative Cell References for Pairwise Comparisons

Spreadsheet Cut & Copy

Welcome