2013-11-02

Some programming annoyances

I write and maintain mostly C# these days. I've been accumulating a list of annoyances. Whenever I come across a piece of code that makes me groan, fills me with fear or just bothers me, I add it to the list. It helps keep me sane. Some of them are obvious and blatant; nobody should be doing those things! Some of them are subtle. Some of them are debatable. All are related to actual code I've dealt with. Some of them are things I do myself and then feel guilty about. Maybe you'll find them interesting. Maybe you'll disagree.

C#

Most of these are C# or at least .NET-specific. Some are general to object-oriented languages.

Don't lock on public objects

This exposes your implementation and invites deadlock.

Don't lock on String.Empty

Seriously. WTF. I mean, locking on strings is weird and problematic enough, but locking on String.Empty??

Don't catch Exception

There should almost always be a more specific exception to catch. If you're catching Exception you have no idea what kind of bad state your program is in and there's very little you can sensibly do. I'll concede that there may be situations where you need to bundle away the exception to rethrow later on, if you're writing some kind of remoting/scheduling framework, but really ask yourself if you need to be doing that. As soon as you catch that exception, you're making it much harder to debug, by separating the symptom from the cause in time and space.

Don't do work in a constructor

It is a horrible code-smell if you construct an object and then throw away the reference immediately, indicating that it is invoked solely for the side-effects of its constructor. Constructors should not be responsible for stitching the newly created object into an object-graph. You have something that resembles a Klein bottle. I find it hard to express exactly what's so bad about this, but it fills me with trepidation when I see it.

If you need to guarantee that some non-trivial work is done before the object is in a valid state, you should not postpone that work until after the constructor. That's worse! Instead require that it has been completed before the constructor is invoked - the prepared components should be arguments to the constructor. Perhaps you will find that you then need another object to assemble the object-graph. That's okay! Assembling a system is a different and quite unrelated task to the job the system performs! Cars, computers, bridges, houses - none of these engineered objects need to assemble themselves. The same is true of objects in programming.

Avoid "new" outside of factories

Okay, this is possibly a controversial one.

The "new" keyword should be relatively rare in your code. It's okay to new up a List or Dictionary, but you should not be constructing objects all over the codebase. This tightly couples code and makes it difficult to test separate components. It's especially bad to have a huge hierarchy of constructors called by each other. Prefer to confine construction to factories. Note that you don't need one factory per class - often it makes sense for one factory to construct a suite of related object classes.

Avoid downcasting

Sometimes it's the pragmatic choice. Most of the time it creates hidden dependencies and surprising behaviour.

Don't use "indicator" interfaces

Interfaces without members are a bad smell. Don't use interfaces which are just indicators that the object can be used in some other way, such as by casting.

One pattern that does this a lot that I don't have a great answer for, is the "interface that returns handles". Suppose you define an interface for a file-system, and an interface for the files/folders it contains. The file-system hands out instances of the file interface to represent the files it contains. The question is, how do you describe file-system operations that affect two files? Say "move this file into that folder"? How do you design this so that you can have different implementations of the file-system interface with different hidden implementation details, but still have a natural and useful API for users? I've seen a lot of systems that just take IFile as an argument, and fail if the downcast to MyFileType fails or if file belongs to a different file-system. That seems downright nasty, but I most of the alternatives I can think of are still at least a bit unpleasant.

Prefer signed integers

Unsigned integers are unusual in .NET APIs. They're not CLS compliant - other .NET languages might not have them. They also behave badly, silently wrapping when decremented below 0. It's better to take signed integers as parameters and throw an exception when a negative value is provided than to take unsigned integers and fail in weird ways when somebody accidentally decrements beyond 0. Unsigned integers should only be used when it's important to have the entire range of positive values. Also beware not to do this:

for (uint i=9; i>=0; --i) { ... }

This advice is quite specific to C#. It sometimes applies in C++, but unsigned integers are much more pervasive there, particularly size_t, so it's not such practical advice. It doesn't apply at all to languages that don't have silent wrapping.

Dispose in reverse order of creation

Objects with *similar* lifetimes should have *nested* lifetimes. Anything else is horrible to keep track of and maintain. That said, most of the worst cases of this I've seen have involved other bad practices, like massive constructors that call a million other constructors. Nothing is less fun than trying to fix an "object diposed too early" bug and finding that it's one of thirty other objects disposed by its parent in seemingly random order.

Don't have properties that modify state

It's surprising, and it will do weird stuff in the debugger. Property access should not observably modify state. (Caching an unchanging result is probably okay.)

Don't hand out "this" in a constructor

Here's another one I think might be controversial. I'd be interested in feedback on this one, and I'll try to answer questions anyone has.

See also "Don't do work in a constructor". If you pass out references to "this" (including bound methods, or closures that capture "this") you are revealing your partially constructed object and allowing it to be meddled with before it has established its invariants. This will end in tears if you have subclasses. Even if you don't have subclasses, it's hard to maintain. It becomes unclear where in the constructor the invariants have been established, and it breaks easily when someone tries to re-order the code in the constructor.

Don't throw from a Dispose method

It's too late. You don't know if your object is being disposed due to stack unwinding passing through a "using" block, in which case you will trash the useful information in its stack trace and replace it with less useful information from your own. If you're doing something in a dispose method that might fail, prefer to give users a way to do it manually before the object is Disposed, and throw at that point. I'm not 100% certain on this one. There are cases where it's a programmer error to forget to do something before disposing an object and it would be nice to flag it up.

Prefer not to expose IDisposables to clients who are not responsible for disposing them

It's just too confusing otherwise. A corollary is that you should be very careful about declaring interfaces that inherit from IDisposable. Should all clients of your interface have the power to dispose the object?

Avoid null values when possible, and document them clearly where they're necessary

Nulls are insidious, and love to get into places they're not supposed to be. Avoid them in APIs when possible.

Avoid null with the meaning "key not found"

This just invites your clients to make a mistake and forget to check if the value is null. Prefer to throw an exception, or use a TryGetValue approach, or let the client specify what to return when the key is not found. If you must, document it.

Don't repeat yourself

Do I need to say this one? Don't repeat yourself. It just gives you more places for bugs to live and more places you need to remember to fix bugs.

Prefer composition to inheritance

Inheritance exposes part of your implementation. It also tightly couples you to the base class implementation. Prefer composition as a first resort, and inheritance only when it is well-suited to the problem. If you're not going to make use of polymorphism, think very carefully about why you're using inheritance.

Don't hold references "for a friend"

Especially don't hold references solely for the purpose of passing them on to a constructor later on. See also "Avoid 'new' outside of factories".

I'm guilty of this myself. I think this tends to go hand-in-hand with Law of Demeter violations. It's probably a sign that your objects know too much about each other. However, it can take a lot of careful thought to iron out these problems. I have a feeling that a good dependency injection framework is helpful here, but sadly I've not been able to use one much lately, so I've got no first-hand evidence.

Ensure no methods on an object will be called after you enter its Dispose method

Just as a constructor shouldn't hand out references to "this", a Dispose method shouldn't be expected to defend against use of outstanding references to "this". You may choose to detect and flag up this situation, but it should be considered an error, not the intended behaviour. The same problems with sub-classes arise. Clients of the class should guarantee that all other uses of it happen-before invocation of Dispose. If the class itself manages a thread or by other mechanisms causes concurrent invocations throughout its lifetime, *those concurrent invocations should not be to the object itself*.

This goes double for C++ destructors. You are doing something extremely dodgy if any other thread is calling methods on your object after control has entered the destructor.

NUnit

My list was almost entirely general C# stuff, but there were a few items specific to unit-testing.

Avoid Debug.Assert

Use the NUnit asserts. They give better error messages.

Don't Assert.IsTrue(false)

This is perverse. Prefer more specific assertions which can give you more information about what went wrong. E.g. Assert(foobar.Counter, Is.EqualTo(5)) will tell you the actual value of foobar.Counter when it fails. Even if there is no more specific assertion, prefer Assert.Fail, which is clearer as a "we should not have gotten here" indicator than seeing "Expected true: was false".

Don't repeat yourself

Yes, this counts for tests too. Yes, I understand the irony of repeating myself in telling you not to repeat yourself.

2013-05-01

Use of 'this' in constructors

this in constructors

Some quick C++ advice today, but I think it generalizes to C# and Java too. If you do anything with the this pointer in your constructor – apart from dereference it – your design is probably missing a class.

There are a few dangers with using the this pointer in the constructor. For a start, weird stuff will happen if it ends up in the hands of code that attempts to call a virtual method: in C++ the instance changes type as each constructor in the inheritance hierarchy gets a shot at it, meaning that the same call will invoke different code depending on where we are in the construction process. In C# and Java you get a different sort of misery – virtual methods in a subclass can be invoked before their constructor has ever had a chance to initialize the subclass fields.

But I'd say the more important general problem is that it becomes much harder to reason about the class. It's no longer clear which methods can rely on the class's invariants being intact when they start, and it's no longer clear at which point in the constructor invariants should be established.

Concurrent access in destructors

The other side of this coin is destruction. If methods on your class can be invoked concurrently with its destructor something is fishy in your class! The most common scenario I've seen this is when the class manages a thread which it attempts to join during the destructor, and where the thread is calling methods on the class itself. This isn't obviously insane – with appropriate synchronization it can work and be safe – but I'd argue that it's still a really bad idea. For one thing, it all goes a bit wrong if you create a subclass – the concurrent calls will continue while the subclass destructor runs. (I'm not 100% sure, but I think it's probably undefined behaviour to invoke a virtual method on an instance concurrently with one of its subclass destructors finishing.) Even if you avoid subclassing, it gives you a messier destructor that's harder to reason about.

You're missing a class

The good news is that there is a general solution to these problems. If you have a class A that in its constructor wants to send its this pointer to a class B (figure 1), you can almost always solve the problem by splitting A into A1 and A2, with A2 a member of A1 (figure 2). The A1 constructor can now give B a pointer to A2. It might be a little tricky to figure out what goes in A1 and A2, but I think you'll find in the end the design becomes cleaner and your constructors and destructors will become less tangled and confusing.

Figure 1 – Before, the cycle of pain

Figure 2 – After, a pleasant pyramid

2012-09-10

Key usage coding challenge

Last week, Cedric Beust posted a new coding challenge:

We want to make multiple calls to a site that only allows to use a key 500 times every 10mn. You are given a collection of keys (strings), a max count (e.g. 500) and a time period (e.g. 10mn) and your task is to implement getNextKey()...

(Go read the article, as there's a more detailed problem statement there.)

So we have a set of K keys, and each key is valid for X uses in a rolling window of T milliseconds. My first observation (made in the comments) is that we need never re-order our keys. If we use every key X times each in any order, then the next key that will be available for use is the first key that we used, exactly T milliseconds after we first used it. Then the second key, T milliseconds after we first used it. Whatever order we pick, we can just repeat it indefinitely. We might cycle all through the keys X times, or repeat each key X times before moving on to the next. For the rest of this discussion, we'll assume we have K keys that can each be used only once in the given period, and for which we won't bother keeping track of the different key names. We can easily adapt such a solution to handle the original problem, but it will be simpler to discuss. So here's the simplified problem we'll consider:

We want to make multiple calls to a site that only allows to use a key once every 10 minutes, and we have 500 keys. At any time we might be asked for a key to be used, and we must answer whether there is one free, and if there is, to consider it used up for the next 10 minutes. Your task is to implement tryUseKey(), returning true if a key was available and false if not.

My second observation is that to provide perfect answers, we cannot avoid storing the time-stamp of every one of the last K keys used. To observe this, imagine that over the last 10 minutes we have called tryUseKey() exactly K times. To recover the timestamps of all K invocations of tryUseKey(), we can just poll getNextKey() as fast as possible over the next 10 minutes. Every time it successfully returns a key, we know the approximate timestamp of one of the original invocations: about 10 minutes ago. By polling arbitrarily fast we can can obtain an arbitrarily accurate answer. If we can reconstruct this information, then it must be encoded somehow in our object's state, so we can place a lower bound on the working memory required by the data structure that is linear in K.

It's not especially hard to write a solution that solves the problem precisely by storing K timestamps, but it would be nice if we could trade off some precision in order to use substantially less memory. For that reason, we'll look for solutions that allow false negatives in specific circumstances. A false negative is to indicate that no keys are available when in fact some keys should be available. We never allow false positives – we will never say that a key is available when it should not be.

  • We might allow a false negative when all the keys have been used in the last 11 minutes (rather than the last 10 minutes). We'll call this relaxed requirement A.
  • We might allow a false negative when less that 50 (of 500) keys are available. We'll call this relaxed requirement B.
  • We might allow a false negative only when less than 50 keys are available and all keys have been used in the last 11 minutes. We'll call this relaxed requirement AB.

I'm going to use graphs of keys used and available over time to illustrate a number of solutions. They're all going to look similar to this:

The orange line rises whenever keys are used. The blue line rises whenever used keys have "cooled down" over 10 minutes and become available for re-use. The blue area between the two lines shows how many keys are available for use – when the red and blue lines meet all keys have been used and none are available until the blue line rises again. You can see that the shape of the blue line in the top-right of the graph replicates the shape of the orange line in the bottom-left, as the keys that were used on the left become ready again on the right.

Buckets

All the solutions I'm going to discuss involve buckets. The general idea is that as we use keys we group them up in an "active" bucket, and at some point we set the active bucket aside (and don't add any more keys to it), and use another bucket as the active one. After a bucket has been set aside for 10 minutes to "cool down", we recover all the keys from that bucket. By grouping keys into buckets, we only need to store a timestamp for the bucket. By choosing when to set the buckets aside and how many buckets to use we can try to meet our relaxed requirements.

To meet requirement A we can use 10 buckets and swap out the active bucket every minute. Here's what it might look like:

The rectangles added to the diagram show keys held in buckets. In the light blue rectangles keys are added to the active bucket, while the dark blue rectangles show keys held in a cooling down bucket. The new green line shows the number of keys we consider free. It tracks closely to the blue line, but in places it is lower, the yellow area between them reflects where our algorithm will falsely report a smaller number of keys available than is accurate. The number of keys reported available may be a lot lower than the true value, but never for very long.

To meet requirement B we can use 10 buckets and swap out the active bucket every time it has 50 keys in it. Here's what that might look like:

As before, the yellow area shows where the algorithm will falsely report a smaller number of keys available than is accurate. This time the number of keys incorrectly reported unavailable is never very high, but can remain low for a long time.

Finally, to meet requirement AB, we can set aside a bucket after it has been active for 1 minute or after it has 50 keys in it, whichever comes first. It will look something like this:

The complication here is in figuring out how many buckets we might need. I believe it's 20. This can happen if, for example, we use 451 keys very rapidly, then use just one or two keys a minute for the rest of the time. So we need more buckets, but in return we obtain very strict bounds on how wrong our answers can be.

Generalizing

So far, we've been working with a specific example where the cooldown period is 10 minutes, there are 500 keys and we have allowed a 1 minute time threshold and a 50 key threshold for false negatives. We can change these parameters. The number of buckets we'll need for solution A is based on the cooldown period divided by the time threshold, the number of buckets needed for solution B is based on the number of keys divided by the key threshold, and the number of buckets needed for solution AB is the some of both these values.

The code

Here's an example implementation in Python:

import itertools
import time
from collections import deque

class Bucket(object):
    def __init__(self, close_time):
        self.keycount = 0
        self.close_time = close_time
    def add(self, now):
        self.keycount += 1
    def has_cooled_down(self, now, cooldown):
        return self.close_time + cooldown <= now
    def will_accept(self, limit, now):
        is_full = self.keycount == limit
        is_closed = now >= self.close_time
        return not is_full and not is_closed

def divide_rounded_up(dividend, divisor):
    return (dividend + divisor - 1) // divisor

class Throttler(object):
    def __init__(
            self,
            limit, cooldown, maximum_cooldown, key_threshold):
        '''
        limit: number of key usages allowed initially.
        cooldown: time in milliseconds before a used key can be
            re-used.
        maximum_cooldown: try_use_key is only allowed to give a
            false negative when all keys have been used within this
            amount of time in the past.
        key_threshold: try_use_key is only allowed to give a false
            negative when the true number of keys is less than this
            number.
        '''
        self.buckets = deque()
        self.cooldown = cooldown
        self.open_time = maximum_cooldown - cooldown
        buckets_a = divide_rounded_up(cooldown, self.open_time)
        buckets_b = divide_rounded_up(limit, key_threshold)
        self.num_buckets = buckets_a + buckets_b
        self.bucket_size = key_threshold
        self.free = limit
    def release_expired_buckets(self, now):
        while self.buckets:
            if self.buckets[0].has_cooled_down(now, self.cooldown):
                self.free += self.buckets.popleft().keycount
            else:
                return
    def try_use_key(self, now):
        '''
        Check if any keys are available. If so, use one and return
        True. Otherwise, return False.
        May return a false negative, but only if all keys were used
        within the last (maximum_cooldown) milliseconds, and if
        fewer than key_threshold keys are actually available.
        '''
        self.release_expired_buckets(now)
        if self.free == 0:
            return False
        should_create_bucket = (
            len(self.buckets) == 0 or
            not self.buckets[-1].will_accept(
                    self.bucket_size, now))
        if should_create_bucket:
            assert len(self.buckets) < self.num_buckets
            self.buckets.append(Bucket(now + self.open_time))
        self.buckets[-1].add(now)
        self.free -= 1
        return True

def now_ms():
    '''
    Return the current time in milliseconds.
    '''
    return int(time.time() * 1000)

class SlidingWindowMap(object):
    def __init__(self,
            keys, max_count, cooldown, maximum_cooldown,
            key_threshold, now_func=now_ms):
        '''
        keys: a sequence of key names.
        max_count: the maximum number of times any key can be used.
        cooldown: the time in milliseconds before a used key can be
            re-used.
        maximum_cooldown: try_use_key is only allowed to give a
            false negative when all keys have been used within this
            amount of time in the past.
        key_threshold: try_use_key is only allowed to give a false
            negative when the true number of keys is less than this
            number.
        now_func: a function that returns the current time in
            milliseconds.
        '''
        keylist = list(keys)
        num_keys = len(keylist) * max_count
        self.keys = itertools.cycle(list(keys))
        self.now_func = now_func
        self.throttle = Throttler(
                num_keys, cooldown,
                maximum_cooldown, key_threshold)
    def get_next_key(self):
        if self.throttle.try_use_key(self.now_func()):
            return self.keys.next()
        else:
            return None

Further questions

I've not given a proof of the formula for the maximum number of buckets required for solution AB. I actually thought it should be one less, but in practice this didn't seem to be true. I haven't quite figured out why.

For a fixed number of buckets, it's not clear when it would be better to be more strict about the time or the key thresholds. It might be interesting to see what percentage of full key utilization can be achieved with different levels of "burstiness" in key requests and different choices of threshold.

2011-09-29

Objects in Javascript and Python

I've been doing some Javascript recently (well, Coffeescript actually, but this applies equally to either, so I'll stick to Javascript for the examples) and found its approach to objects rather interesting. I think it might be instructive to compare Javascript's objects to Python's, the dynamic programming language I'm most familiar with.

First of all, take this Javascript code, which creates a factory-function called "Counter" which returns objects with "increment" and "decrement" methods:

function Counter() {
    this.value = 0
}
Counter.prototype.increment = function() {
    this.value += 1;
}
Counter.prototype.decrement = function() {
    this.value -= 1;
}
c1 = new Counter()
c2 = new Counter()
c2.increment()

When we execute it, we end with a set of objects much like this:

In our outermost scope (the identity of which I'll gloss over for now) we have three names (or keys) defined. "Counter" is a function object, while "c1" and "c2" are both generic objects. "Counter", like all functions, has an object attached to it under the key "prototype", and it's in this prototype object that we've inserted the functions increment and decrement.

Each time we invoke "new Counter()" a new object is created with its special "__proto__" key mapped to Counter.prototype, and then the Counter function is executed with "this" bound to the new object. The Counter function adds to that object, mapping "value" to the number 0. (I must say, I still don't entirely understand why "new" works like this - I can't see any value in being able to invoke the same function both with and without "new".)

Finally we invoke "c2.increment()". First Javascript looks on the c2 object for the key "increment", but there is no such key. Next, it follows the "__proto__" key and tries again with the prototype object. There it finds the increment function. The function is executed with "this" bound to the "c2" object. When it does "this.value += 1" it first looks up "value" in the same way, finding it directly on the object itself, then after adding 1 to the value it stores it back on the object. (Note that storing something doesn't involve "__proto__" traversal. It always goes straight into the object that you put it into. It's only look-ups that involve "__proto__" traversal.)

Now, let's see the same thing in Python:

class Counter(object):
    def __init__(self):
        self.value = 0
    def increment(self):
        self.value += 1
    def decrement(self):
        self.value -= 1
c1 = Counter()
c2 = Counter()
c2.increment()

At a first glance, things look pretty similar. The "__class__" attributes on the objects are much like the special "__proto__" attributes in Javascript. The "__init__" method is quite similar to the "Counter" function in Javascript. The differences are subtle.

When Python executes the class statement, the three functions "__init__", "increment" and "decrement" are defined and attached to the newly created class object.

When "Counter()" is invoked a bunch of stuff goes on behind the scenes, but importantly a new object instance is created with "__class__" pointing to the class and then the "__init__" function is invoked with the object instance as its first argument.

When "c2.increment()" is executed, first "increment" is looked up on c2. This fails, so next Python follows "__class__" and checks for "increment" on the class. However, something special happens when it finds it there. When an attribute lookup for an object finds a function on its class object, it instead returns a "bound method", which is effectively a function that takes one fewer argument then passes through to the "increment" function itself, inserting a reference to the object as the first argument. So even though the statement "c2.increment()" provides no arguments, when "increment" is actually executed, it receives c2 as its first argument.

There are, I think, two interesting differences here. One is what happens when we want to represent inheritance, which my example doesn't really cover. In Javascript, if attribute lookup fails on the prototype then the prototype's prototype is searched, and so on. In Python, if the attribute lookup fails on the class, then instead of looking at its "__class__" attribute, the base classes are searched. (And since Python allows multiple inheritance, the corner cases are a bit tangled.)

But what I think is most interesting are the different approaches to how methods know their receiving object and the unintuitive consequences of these mechanisms. In Javascript, when a function is retrieved from an object and invoked on the same expression, it triggers the binding of "this" to the object in the function. But if you take a reference to the function and invoke it separately then "this" remains "undefined". For example:

c1.increment();

is not equivalent to this:

var f = c1.increment;
f();

The former will increment c1. The latter will do nothing because increment executes with this undefined.

In Python, on the other hand, the equivalent will work as you might expect, because "c1.increment" in fact refers to a bound method. However, from this we can see that in Python "Counter.increment" is not the same entity as "c1.increment". "Counter.increment" is a function that takes one argument. "c1.increment" is a bound method that takes no arguments. In contrast, in Javascript, "Counter.prototype.increment" refers to the exact same object as "c1.increment". So Python does the magic during object attribute lookup, while Javascript does the magic when invoking an attribute as a function.

I think this is all correct, but do let me know if I've made a mistake. I'm finding Javascript a lot more pleasant than I expected, except for its cheerful silent propagation of "undefined" at every opportunity, and Coffeescript definitely rounds off some of the rough corners. In fact, I'd say I even slightly prefer Coffeescript's class notation to Python's.

I'm curious to know if there are any particular benefits of prototype-based objects. I can't see any clear advantages over class-based objects, and I particularly don't understand the benefit of being able to use any function to create an object. Is it just nice because it results in a smaller language? (Does it result in a smaller language?)

2011-07-18

As if by magic, some maths appeared

I've not really done any programming lately, so this week there's no Minecraft. Instead there's some maths. Hurrah! Hey, where are you—

I was reading Richard Wiseman's Friday puzzle. It's very short and not at all hard, so you might want to go and solve it and come back. Go on, I'll wait.

Okay, you're back? I'll try not to state the answer outright, but it will probably become pretty obvious very quickly. Right, so this got me thinking about sums of series. The case in the question is sufficiently small that you don't even need to work out the formula for the sum of sequence in question, but I was thinking about it nonetheless.

Now, I knew the formula for the sum of an arithmetic sequence, and I knew that you could easily enough prove it by induction, but this is a very unsatisfying form of proof. It doesn't feel like something you could have figured out on your own. Imagine just picking random equations one at a time and testing them until you find one that you can prove correct with induction! I thought about it a little and came up with what I thought was a pretty compelling geometric proof, at least for the simple case where you start at 1 and increment by 1. (I have no idea if a proper mathematician would call this a proof. I am certainly not a proper mathematician.)

Is it obvious? On the left in the green we have a square with area 1, a rectangle with area 2, a rectangle with area 3, etc. The area of the whole shape must be the sum of those areas. By dividing up the space differently, we can easily see how to produce the closed form.

Great, I thought. Now lets do the same for the sum of squares. It must be just as obvious in 3D. Right?

Hmmm. Well, it gives the right answer, and it's satisfying to figure it out by hand, but it lacks the obviousness of the diagram for the arithmetic sequence sum. Perhaps there's a simpler way to do it?

As an aside, I think I'm getting the hang of Inkscape, although there are still some areas I find I much prefer Visio. Inkscape has some pretty painful and inflexible handling of arrowheads, for example. They're always black, regardless of the line colour, and then there's a special menu action just to make them change to match the line colour. The selection of arrowheads available is limited and weird, and most of them aren't resizable. I'm not sure if that's because SVG forces arrowheads to behave in strange and unintuitive ways, or that's just the way that Inkscape chooses to work.

Maybe next week there will be more programming. We shall see.

2011-07-12

Minecraft mapping - Rotation and flipping

Last time, I discussed ambient occlusion for adding depth to the map. Today I'm going to explain how the fragment shader rotates and flips oriented blocks like minecart tracks.

As you may have noticed, there's only one block ID for regular minecart tracks. The 4-bit data value encodes the orientation of the track, from 10 possibilities - 2 orientations of straight track, 4 orientations of corner track, and 4 orientations of sloped track.

A minecraft texture pack includes only two textures for normal minecart tracks: one curved and one straight. We need to use these to make up all the other orientations. We could either do that in advance - building up a bigger texture atlas with all the orientations we need, or we can do it in the shader. By doing it in the shader we avoid the need for a much expanded texture atlas and some fiddly work to assemble textures, at the cost of extra work in the shader.

To let the shader know what orientation to use, we encode it in the blue channel of our map data texture. We use 2 bits to encode a rotation (0°, 90°, 180° 270°) and 1 bit to encode a possible flip. (I don't think we actually use the flip at the moment, but it might be useful for doors.)

The shader code isn't actually terribly exciting. I should however point out that this version treats the map texture as having normalized float values, rather than integers. For most of the earlier posts I had been updating the code to use integer textures to simplify things a bit, but I haven't gotten around to doing that with this code. That's why we multiply the sample values by 255.0 – as normalized floats they've already been rescaled from 0 to 255 to the range 0.0 to 1.0.

    float flipx = (tiles_sample.b * 255.0) >= 8.0 ? -1.0 : 1.0;
    float rotation = mod(
        tiles_sample.b * 255.0, 8.0) *
        0.5 * 3.141592653589793;
    mat2 transform = mat2(
        flipx * cos(rotation),  flipx * sin(rotation),
        -sin(rotation),         cos(rotation));

We form a rotation matrix with the rotation value of 0, 0.5π, π or 1.5π. The sines and cosines just work out as -1, 0 or +1. There's probably a cheaper way to calculate them.

This will rotate around the origin. Before we transform our texture coordinate we rescale it to the range -1..1, then adjust it back afterwards.

Some mention should be given to how we obtain the rotation values in the first place. There's no trick – there's just a great big table mapping block id and data values to texture coordinates and orientations. The same table is used to pick coloured wool values.

And here's the end result. I think I've talked about most of the things I had originally set out to cover. Next week I might touch on light levels, but there's not a lot to say there. If I find the time I'd like to do some work on cave rendering, but I'm not sure I'll both find the time to work on it and write a blog post, so next week's post may be quite short.

2011-07-05

Minecraft mapping - Ambient occlusion 2

Last time I described in principle how the ambient occlusion shader works, but didn't really go into the details. Today I'll go into a bit more detail.

Something that I think may not have been clear last time is that we're calculating ambient occlusion independently from lighting based on the "blocklight" and "skylight" values. For ambient occlusion we're considering only the arrangement of solid blocks in the immediate neighbourhood, not any genuine sources of light. As a separate step we later multiply together the ambient occlusion value and the light-level based on blocklight and skylight. Again, this is a simplification, and it does result in extra shadows in dark areas, but it generally looks fine.

As we saw previously, to determine the the level of shadow added by occlusion, we determine how much "sky" is visible from the rendered surface, where the sky is actually a horizontal square not very far above the surface. This screenshot shows the view from the ground looking up. The white blocks represent the sky square, and the stone slabs are neighbouring taller cells.

The yellow outline shows the unoccluded area of sky. The following diagram breaks that down into the areas shadowed by each of the eight neihbouring columns:

Area A is the occlusion from a neighbouring corner block, while areas B and C are occlusions from neighbouring side blocks. You'll observe that 1. the corner blocks always occlude rectangular regions of sky; 2. the side blocks occlude trapezoidal areas and 3. the corner block occlusions can overlap with the neighbouring side blocks.

Now, I have to admit that I made a bit of a mistake when I first worked this out, and didn't realise that the side blocks were occluding trapezoids, so I worked it as if they were rectangles instead, like this diagram:

The problem with making this as a simplification is that it will result in significant discontinuities around corners. Encountering these, but not understanding why, I experimented with tweaking values and if you look in the shader code you'll see a comment, "TODO: Figure out why the 2* multiplier is needed in the next line!" I haven't actually figured out why that works yet, but I plan to work through the maths again with trapezoids, if it's not too complicated, and see how it compares. However, for now I'm going to discuss the slightly broken version with the rectangles.

Here's another pair of diagrams of those occlusion rectangles:

On the left we have a similar case with some extra occluding columns. Things to note here are that the yellow corner column provides no extra occlusion because it is shorter than both of the neighbouring side columns. The dotted line shows the division of the sky square into quadrants, each of which we'll handle separately.

On the right diagram, you can see the same arrangement, but different rectangles are highlighted. The orange "+" rectangle is the area of unoccluded sky in the quadrant ignoring the corner column. This can be easily calculated based on the proximity of the neighbouring cells on the sides, and the heights of the columns in those cells. The teal "−" rectangle is the extra occlusion provided by the corner column, which we subtract from the unoccluded area to find the total unoccluded area in the quadrant. We calculate the teal area similarly, using the proximity of the neighbouring side cells, and the height of the corner column, but we make sure to clamp its dimensions to be positive – if either dimension would be negative it means the corner block is not going to provide any occlusion and can be ignored.

I've left this a bit late, so I don't have formatted code snippets or a nice runnable example, but you can go look at the complete fragment shader in github. The obscurely named "lightcalc" calculates the dimension of one side of the orange rectangle based on the height and proximity of the neighbouring cell, with 1 being the maximum unoccluded size. "cornerlight" calculates the unoccluded area in the quadrant, again with 1 being entirely unoccluded.

This shader is pretty slow. It results in a noticeable frame-rate drop for me. I haven't yet investigated whether the cost is in the quantity of calculations or from sampling all eight neighbouring cells for their height, but I suspect it's the latter. If that's the case, we could preprocess the image and store information about each cell's neighbours in a compressed form, reducing the number of samples needed. Maybe an expert could suggest the most fruitful avenues to optimization.

Here's what it looks like without any textures:

Any thoughts? It occurs to me that I could very well have overlooked a much easier or a less horrifically simplified way to do this. I'll need to look into how feasible it is to properly calculate the trapezoidal areas, and also figure out why the bodge of doubling the teal area results in continous results (at least, continuous between cells of equal height; we don't want continuity across height changes – the whole point of this is to highlight height differences). Feel free to ask questions if it's not clear. I may yet do one more post on this topic if people would like further clarification.