Advanced C#
For a Sharper Language
For a Sharper Language
Throughout the earlier chapters, we’ve focused on the theoretical aspects of Object-Orientation, and discussed how those are embodied in the C# language. Before we close this section though, it would be a good idea to recognize that C# is not just an object-oriented language, but actually draws upon many ideas and syntax approaches that are not very object-oriented at all!
In this chapter, we’ll examine many aspects of C# that fall outside of the object-oriented mold. Understanding how and why these constructs have entered C# will help make you a better .NET programmer, and hopefully alleviate any confusion you might have.
Some key terms to learn in this chapter are:
static
keywordIt is important to understand that C# is a production language - i.e. one intended to be used to create real-world software. To support this goal, the developers of the C# language have made many efforts to make C# code easier to write, read, and reason about. Each new version of C# has added additional syntax and features to make the language more powerful and easier to use. In some cases, these are entirely new things the language couldn’t do previously, and in others they are syntactic sugar - a kind of abbreviation of an existing syntax. Consider the following if
statement:
if(someTestFlag)
{
DoThing();
}
else
{
DoOtherThing();
}
As the branches only execute a single expression each, this can be abbreviated as:
if(someTestFlag) DoThing();
else DoOtherThing();
Similarly, Visual Studio has evolved side-by-side with the language. For example, you have probably come to like Intellisense - Visual Studio’s ability to offer descriptions of classes and methods as you type them, as well as code completion, where it offers to complete the statement you have been typing with a likely target. As we mentioned in our section on learning programming, these powerful features can be great for a professional, but can interfere with a novice programmer’s learning.
Let’s take a look at some of the features of C# that we haven’t examined in detail yet.
To start, let’s revisit one more keyword that causes a lot of confusion for new programmers, static
. We mentioned it briefly when talking about encapsulation and modules, and said we could mimic a module in C# with a static class. We offered this example:
/// <summary>
/// A library of vector math functions
/// </summary>
public static class VectorMath
{
/// <summary>
/// Computes the dot product of two vectors
/// </summary>
public static double DotProduct(Vector3 a, Vector3 b) {
return a.x * b.x + a.y * b.y + a.z * b.z;
}
/// <summary>
/// Computes the magnitude of a vector
/// </summary>
public static double Magnitude(Vector3 a) {
return Math.Sqrt(Math.Pow(a.x, 2) + Math.Pow(a.y, 2) + Math.Pow(a.z, 2));
}
}
You’ve probably worked with the C# Math
class before, which is declared the same way - as a static class containing static methods. For example, to compute 8 cubed, you might have used:
Math.Pow(8, 3);
Notice how we didn’t construct an object from the Math
class? In C# we cannot construct static classes - they simply exist as a container for static fields and methods. If you’re thinking that doesn’t sound very object-oriented, you’re absolutely right. The static
keyword allows for some very non-object-oriented behavior more in line with imperative languages like C. Bringing the idea of static
classes into C# let programmers with an imperative background use similar techniques to what they were used to, which is why static
classes have been a part of C# from the beginning.
You can also create static
methods within a non-static class. For example, we could refactor our Vector3
class to add a static DotProduct()
within it:
public struct Vector3 {
public double X {get; set;}
public double Y {get; set;}
public double Z {get; set;}
/// <summary>
/// Creates a new Vector3 object
/// </summary>
public Vector3(double x, double y, double z) {
this.X = x;
this.Y = y;
this.Z = z;
}
/// <summary>
/// Computes the dot product of this vector and another one
/// </summary>
/// <param name="other">The other vector</param>
public double DotProduct(Vector3 other) {
return this.X * other.X + this.Y * other.Y + this.Z * other.Z;
}
/// <summary>
/// Computes the dot product of two vectors
/// </summary>
/// <param name="a">The first vector<param>
/// <param name="b">The second vector</param>
public static DotProduct(Vector3 a, Vector3 b)
{
return a.DotProduct(b);
}
}
This method would be invoked like any other static
method, i.e.:
Vector3 a = new Vector3(1,3,4);
Vector3 b = new Vector3(4,3,1);
Vector3.DotProduct(a, b);
You can see we’re doing the same thing as the instance method DotProduct(Vector3 other)
, but in a library-like way.
We can also declare fields as static
, which has a meaning slightly different than static methods. Specifically, the field is shared amongst all instances of the class. Consider the following class:
public class Tribble
{
private static int count = 1;
public Tribble()
{
count *= 2;
}
public int TotalTribbles
{
get
{
return count;
}
}
}
If we create a single Tribble, and then ask how many total Tribbles there are:
var t = new Tribble();
t.TotalTribbles; // expect this to be 2
We would expect the value to be 2, as count
was initialized to 1
and then multiplied by 2
in the Tribble constructor. But if we construct two Tribbles:
var t = new Tribble();
var u = new Tribble();
t.TotalTribbles; // will be 4
u.TotalTribbles; // will be 4
This is because all instances of Tribble share the count
field. So it is initialized to 1
, multiplied by 2
when tribble a
was constructed, and multiplied by 2
again when tribble b
was constructed. Hence $1 * 2 * 2 = 4$. Every additional Tribble we construct will double the total population (which is the trouble with Tribbles).
Which brings us to a point of confusion for most students, why call this static? After all, doesn’t the word static indicate unchanging?
The answer lies in how memory is allocated in a program. Sometimes we know in advance how much memory we need to hold a variable, i.e. a double
in C# requires 64 bits of memory. We call these types value types in C#, as the value is stored directly in memory where our variable is allocated. Other types, i.e. a List<T>
, we may not know exactly how much memory will be required. We call these reference types. Instead of the variable holding a binary value, it holds a binary address to another location in memory where the list data is stored (hence, it is a reference).
When your program runs, it gets assigned a big chunk of memory from the operating system. Your program is loaded into the first part of this memory, and the remaining memory is used to hold variable values as the program runs. If you imagine that memory as a long shelf, we put the program instructions and any literal values to the far left of this shelf. Then, as the program runs and we need to create space for variables, we either put them on the left side or right side of the remaining shelf space. Value types, which we know will only exist for the duration of their scope (i.e. the method they are defined in) go to the left, and once we’ve ended that scope we remove them. Similarly, the references we create (holding the address of memory of reference types) go on the left. The data of the reference types however, go on the right, because we don’t know when we’ll be done with them.
We call the kind of memory allocation that happens on the left static, as we know it should exist as long as the variable is in scope. Hence, the static
keyword. In lower-level languages like C, we have to manually allocate space for our reference types (hence, not static). C# is a memory managed language in that we don’t need to manually allocate and deallocate space for reference types, but we do allocate space every time we use the new
keyword, and the garbage collector frees any space it decides we’re done with (because we no longer have references pointing at it). So pointers do exist in C#, they are just “under the hood”.
By the way, the left side of the shelf we call the Stack, and the right the Heap. This is the source of the name for a Stack Overflow Exception - it means your program used up all the available space in the Stack, but still needs more. This is why it typically happens with infinite loops or recursion - they keep allocating variables on the stack until they run out of space.
Memory allocation and pointers is covered in detail in CIS 308 - C Language Lab, and you’ll learn more about how programs run and the heap and stack in CIS 450 - Computer Architecture and Operations.
C# allows you to override most of the language’s operators to provide class-specific functionality. The user-defined casts we discussed earlier are one example of this.
Perhaps the most obvious of these are the arithmetic operators, i.e. +
, -
, \
, *
. Consider our Vector3
class we defined earlier. If we wanted to overload the +
operator to allow for vector addition, we could add it to the class definition:
/// <summary>
/// A class representing a 3-element vector
/// </summary>
public class Vector3
{
/// <summary>The x-coordinate</summary>
public double X { get; set;}
/// <summary>The y-coordinate</summary>
public double Y { get; set;}
/// <summary>The z-coordinate</summary>
public double Z { get; set;}
/// <summary>
/// Constructs a new vector
/// </summary>
public Vector3(double x, double y, double z)
{
X = x;
Y = y;
Z = z;
}
/// Adds two vectors using vector addition
public static Vector3 operator +(Vector3 v1, Vector3 v2)
{
return new Vector3(v1.X + v2.X, v1.Y + v2.Y, v1.Z + v2.Z);
}
}
Note that we have to make the method static
, and include the operator
keyword, along with the symbol of the operation. This vector addition we are performing here is also a binary operation (meaning it takes two parameters). We can also define unary operations, like negation:
/// Negates a vector
public static Vector3 operator -(Vector3 v)
{
return new Vector3(-v.X, -v.Y, -v.Z);
}
The full list of overloadable operators is found in the C# documentation
Generics expand the type system of C# by allowing classes and structs to be defined with a generic type parameter, which will be instantiated when it is used in code. This avoids the necessity of writing similar specialized classes that each work with a different data type. You’ve used examples of this extensively in your CIS 300 - Data Structures course.
For example, the generic List<T>
can be used to create a list of any type. If we want a list of integers, we declare it using List<int>
, and if we want a list of booleans we declare it using List<bool>
. Both use the same generic list class.
You can declare your own generics as well. Say you need a binary tree, but want to be able to support different types. We can declare a generic BinaryTreeNode<T>
class:
/// <summary>
/// A class representing a node in a binary tree
/// <summary>
/// <typeparam name="T">The type to hold in the tree</typeparam>
public class BinaryTreeNode<T>
{
/// <summary>
/// The value held in this node of the tree
/// </summary>
public T Value { get; set; }
/// <summary>
/// The left branch of this node
/// </summary>
public BinaryTreeNode<T> Left { get; set; }
/// <summary>
/// The right branch of this node
/// </summary>
public BinaryTreeNode<T> Right { get; set; }
}
Note the use of <typeparam>
in the XML comments. You should always document your generic type parameters when using them.
Returning to the distinction between value and reference types, a value type stores its value directly in the variable, while a reference type stores an address to another location in memory that has been allocated to hold the value. This is why reference types can be null
- this indicates they aren’t pointing at anything. In contrast, value types cannot be null - they always contain a value. However, there are times it would be convenient to have a value type be allowed to be null.
For these circumstances, we can use the Nullablenull
. It does this by wrapping the value in a simple structure that stores the value in its Value
property, and also has a boolean property for HasValue
. More importantly, it supports explicit casting into the template type, so we can still use it in expressions, i.e.:
Nullable<int> a = 5;
int b = 6;
int c = (int)a + b;
// This evaluates to 11.
However, if the value is null
, we’ll get an InvalidOperationException
with the message “Nullable object must have a value”.
There is also syntactic sugar for declaring nullable types. We can follow the type with a question mark (?
), i.e.:
int? a = 5;
Which works the same as Nullable<int> a = 5;
, but is less typing.
Another new addition to C# is anonymous types. These are read-only objects whose type is created by the compiler rather than being defined in code. They are created using syntax very similar to object initializer syntax.
For example, the line:
var name = new { First="Jack", Last="Sprat" };
Creates an anonymous object with properties First
and Last
and assigns it to the variable name. Note we have to use var
, because the object does not have a defined type. Anonymous types are primarily used with LINQ, which we’ll cover in the future.
The next topic we’ll cover is lambda syntax. You may remember from CIS 115 the Turing Machine, which was Alan Turing’s theoretical computer he used to prove a lot of theoretical computer science ideas. Another mathematician of the day, Alan Church, created his own equivalent of the Turing machine expressed as a formal logic system, Lambda calculus. Broadly speaking, the two approaches do the same thing, but are expressed very differently - the Turing machine is an (imaginary) hardware-based system, while Lambda Calculus is a formal symbolic system grounded in mathematical logic. Computer scientists develop familiarity with both conceptions, and some of the most important work in our field is the result of putting them together.
But they do represent two different perspectives, which influenced different programming language paradigms. The Turing machine you worked with in CIS 115 is very similar to assembly language, and the imperative programming paradigm draws strongly upon this approach. In contrast, the logical and functional programming paradigms were more influenced by Lambda calculus. This difference in perspective also appears in how functions are commonly written in these different paradigms. An imperative language tends to define functions something like:
Add(param1, param2)
{
return param1 + param2;
}
While a functional language might express the same idea as:
(param1, param2) => param1 + param2
This “arrow” or “lambda” syntax has since been adopted as an alternative way of writing functions in many modern languages, including C#. In C#, it is primarily used as syntactic sugar, to replace what would otherwise be a lot of typing to express a simple idea.
Consider the case where we want to search a List<string> AnimalList
for a string containing the substring "kitten"
. The List.Find()
takes a predicate - a static method that can be invoked to find an item in the list. We have to define a static method, i.e.:
private static bool FindKittenSubstring(string fullString)
{
return fullString.Contains("kitten");
}
From this method, we create a predicate:
Predicate<string> findKittenPredicate = FindKittenSubstring;
Then we can pass that predicate into our Find
:
bool containsKitten = AnimalList.Find(findKittenPredicate);
This is quite a lot of work to express a simple idea. C# introduced lambda syntax as a way to streamline it. The same operation using lambda syntax is:
bool containsKitten = AnimalList.Find((fullString) => fullString.Contains("kitten"));
Much cleaner to write. The C# compiler is converting this lambda expression into a predicate as it compiles, but we no longer have to write it! You’ll see this syntax in your xUnit tests as well as when we cover LINQ. It has also been adapted to simplify writing getters and setters. Consider this case:
public class Person
{
public string LastName { get; set; }
public string FirstName { get; set; }
public string FullName
{
get
{
return FirstName + " " + LastName;
}
}
}
We could instead express this as:
public class Person
{
public string LastName { get; set; }
public string FirstName { get; set; }
public string FullName => FirstName + " " + LastName;
}
In fact, all methods that return the result of a single expression can be written this way:
public class VectorMath
{
public double Add(Vector a, Vector b) => new Vector(a.X + b.X, a.Y + b.Y, a.Z + b.Z);
}
Pattern matching is another idea common to functional languages that has gradually crept into C#. Pattern matching refers to extracting information from structured data by matching the shape of that data.
We’ve already seen the pattern-matching is operator in our discussion of casting. This allows us to extract the cast version of a variable and assign it to a new one:
if(oldVariable is SpecificType newVariable)
{
// within this block newVariable is (SpecificType)oldVariable
}
The switch
statement is also an example of pattern matching. The traditional version only matched constant values, i.e.:
switch(choice)
{
case "1":
// Do something
break;
case "2":
// Do something else
break;
case "3":
// Do a third thing
break;
default:
// Do a default action
break;
}
However, in C# version 7.0, this has been expanded to also match patterns. For example, given a Square
, Circle
, and Rectangle
class that all extend a Shape
class, we can write a method to find the area using a switch:
public static double ComputeCircumference(Shape shape)
{
switch(shape)
{
case Square s:
return 4 * s.Side;
case Circle c:
return c.Radius * 2 * Math.PI;
case Rectangle r:
return 2 * r.Length + 2 * r.Height;
default:
throw new ArgumentException(
message: "shape is not a recognized shape",
paramName: nameof(shape)
);
}
}
Note that here we match the type of the shape
and cast it to that type making it available in the provided variable, i.e. case Square s:
matches if shape
can be cast to a Square
, and s
is the result of that cast operation.
This is further expanded upon with the use of when
clauses, i.e. we could add a special case for a circle or square with a circumference of 0:
public static double ComputeCircumference(Shape shape)
{
switch(shape)
{
case Square s when s.Side == 0:
case Circle c when c.Radius == 0:
return 0;
case Square s:
return 4 * s.Side;
case Circle c:
return c.Radius * 2 * Math.PI;
case Rectangle r:
return 2 * r.Length + 2 * r.Height;
default:
throw new ArgumentException(
message: "shape is not a recognized shape",
paramName: nameof(shape)
);
}
}
The when
applies conditions to the match that only allow a match when the corresponding condition is true.
C# 8.0, which is currently in preview, has expanded greatly upon pattern matching, adding exciting new features, such as the switch expression, tuples, and deconstruction operator.
In this chapter we looked at some of the features of C# that aren’t directly related to object-orientation, including many drawn from imperative or functional paradigms. Some have been with the language since the beginning, such as the static
keyword, while others have recently been added, like pattern matching.
Each addition has greatly expanded the power and usability of C# - consider generics, whose introduction brought entirely new (and much more performant) library collections like List<T>
, Dictionary<T>
, and HashSet<T>
. Others have lead to simpler and cleaner code, like the use of Lambda expressions. Perhaps most important is the realization that programming languages often continue to evolve beyond their original conceptions.