Monday, September 29 2008, 17:31
Heterogeneous collections in C#
During my previous software development project, I found that the handling of heterogeneous collections was one of the most interesting development problems. Because there are many possible solutions whose validity depends mostly on your more general problem, this issue is very representative of software design in general: making the best choices and finding the best compromise for a given set of constraints.
In this article, I will focus on a very specific problem: designing a C# interface for an easy read/write access to a list of settings. This list is not fixed and depends on the implementation class, but as part of a more general API, the interface must also be quite secure, especially in terms of type casting.
Of course, an obvious way to store heterogeneous variables in C# is simply to build a class (or an interface). You have a fixed list of strongly-typed variables (or properties, which are easy-to-use variable accessors), which means that the class consumer takes no risk of invalid cast at run-time, since the type mismatch is detected directly at compile-time.
However, this one-size-fits-all solution does not respect our flexibility constraint: if the properties are hardcoded in the interface, adding a new property or changing an existing one in the future will break the contract with the previous implementation classes. Therefore, this is not an option in our specific case, although it is obviously the easiest and most common way to create a heterogeneous collection!
Anonymous access method
An alternative solution in order to “hide” the variables would be to define a common GetValue method, which takes a variable name as a parameter and returns a weakly-typed object corresponding to the variable.
object GetValue (string variableName);
A big advantage of this solution is that you can add or remove your variables directly inside your implementation class, since the accessor (interface-wise) is the same for all the variables. Therefore, this solution respects our first constraint (flexibility). However, it does not completely respect our second constraint (security), since the method consumer will have to cast explicitly the anonymous object to the right type, which is not without risk (although this can, or even better, this must be handled by an “invalid cast” exception).
Generic access method
A better solution for our problem can be derived from the previous access method, thanks to a simple mechanism of C# called “generics”. Indeed, we can force the user to specify a type when calling the method, instead of delaying the cast. This is performed by using a generic type in the method signature, which is noted T here.
T GetValue<T>(string variableName);
The main advantage when anticipating the cast is that the GetValue method knows what is the expected return type. This is a fundamental difference with the explicit cast, because the only outcome of an invalid cast in .NET is an exception. Conversely, if the GetValue method anticipates an invalid cast, it can react before the exception, making the mechanism more robust.
Note that the use of generics brings also some flexibility, as the method can return several values depending on the asked type. For example, a multi-valued field could be returned as a list or as a single element (which may correspond to the best match or the first element of the list).
As a side commentary, it is also interesting to note that this last solution was made possible only from .NET 2.0, since generics did not exist in the first version of the framework; before that, it was mandatory to use standard collections, such as ArrayList, taking only anonymous objects.
The best of both worlds
Whatever is the chosen solution, the variables must be stored somewhere before being accessed. An interesting way is to use properties anyway in the implementation class, although these are accessed gracefully only through the GetValue method. This ensures a good separation between the actual fields and the generic access method.
Conversely, you can imagine that the variables are generated dynamically, and therefore cannot be stored as fixed properties, even in the class. For instance, this is the case of the DataRow framework object (System.Data namespace), whose Field<T> extension method is used to access the fields, although these fields are not fixed when the instance is initialized first. As such, an object similar to the DataRow would be an excellent candidate for the handling of generic heterogeneous collections.