Tuesday, September 25, 2012

C# features that would be nice to have.

I would like to present some C# features which, in my opinion, could appear in the next versions of the language. This is just my imagination and certainly, I might have not considered some side effects of this stuff :)


  • Readonly local variables.

  • public void Schedule()
    {
      readonly var year = calendar.CurrentYear();
    }
    

    This is something I've seen in Java. In C#, only the class level fields can be readonly. I wouldn't probably use it always for locals. I rather thought of a scenario when some lambda captures the variable and I would like to disallow any writing to that variable.

  • Namespace-scoped generic aliases.

  • Sometimes it's really useful to alias the fluffy generic type name. C# supports creating a local-file alias. For example

    using Pair = KeyValuePair<int, IEnumerable<Tuple<decimal, string>>>;
    

    It would be great if once defined, such alias could be used anywhere in the defining namespace.

  • In SQL-like operator

  • if(variable in (a, b))
    

    Should be neater than

    if(variable == a || variable == b) ...
    

    or

    if(new [] {a, b}.Contains(variable)) ...
    

  • Implicit conversion of equivalent delegates.

  • Delegates with identical signatures could be convertible to each other. Now, for example following code won't compile.

    delegate void ActionHandler();
    MyAction ah = () => { };
    Action a = ah;
    

    The error is:

    Error: cannot convert source type 'ActionHandler' to target type 'System.Action'
    

    This is actually a case that does not occur often though I see no contraindication why this would be impossible. This case is most commonly seen in two built-in delegates: Predicate<T> and Func<T,bool> they represent the same signature though cannot be implicitly converted.

  • Enhanced generic constraints.

  • Constraint for types that allow arithmetic/comparison operations. This could be used with built-in numeric types and for classes that overload +,-,*,/, < etc. operators. For example:

    public class Calculator<T> where T : numeric
    {
      public T Sum(IEnumerable<T> elements)
      {
        T sum = 0;
        foreach(var element i elements)
        {
          sum += element;
        }
      } 
    }
    

    Also, new generic type constraints for enum and a new constraint that could impose constructors with parameters would be nice.

  • Implicit typing for delegates.

  • Compiler won't accept the following declaration:

    var play = (f, d) => Console.Beep(f, d);
    

    Error: Cannot assign lambda expression to an implicitly-typed local variable.
    

    Probably as this is ambiguous between

    Action<int, int> play = (f, d) => Console.Beep(f, d);
    Expression<Action<int, int>> play = (f, d) => Console.Beep(f, d);
    

    It would be nice if the inference mechanism treated every declaration introduced with var keyword always as a delegate type. If an Expression was required then it would have to be declared explicitly.

  • Anonymous types implementing an interface.

  • This is a feature I really hope would be incorporated someday.
    public interface IRepository
    {
      void Add(Item item);
      Item Peek();
    }
    

    It could be possible to implement that interface with the anonymous class like this.

    public IRepository Any()
    {
      return new : IRepository
      {
      Item i = null;
        public void Add(Item item)
        {
          i = item;
        }
        public Item Peek()
        {
          return i;
        }
      }
    }
    

    or:

    public IRepository Any()
    {
      Item i = null;
      return new : IRepository
      {
        Add = (Item item) => i = item;
        Peek = () => i;
      }
    }
    

  • Events that do not throw exception when no handlers are attached.

  • This is how some people try to prevent from NullReferenceException when the invocation list is empty.

    public event EventHandler MyEvent = delegate { };
    

    I would prefer this as a built-in mechanism, of-course for delegates that return void.

Friday, September 21, 2012

Hierarchical data and Entity Framework 4

When writing any kind of application it is not uncommon to deal with hierarchical data. If these data need to be persisted, then relational databases seem perfectly valid for this purpose. Consider developing an online store application where each category of products has its parent category and subcategories. The requirement is that user can display given category and all subcategories down below.

Category objects can be stored in a single table that is referencing itself. The ParentCategoryId column should contain the id of parent category. Root category will have null value assigned for parent category, every other category should point to another category. The integrity of data can be modeled at database level by using the foreign key constraints.
A Category table with one-to-many relationship (one category can have single parent category and multiple subcategories).
In this blog post I would like to present two options for querying data with parent-children relationship. Both approaches will use EntityFramework as an ORM. In the first technique I will traverse the categories hierarchy using only navigation properties provided by EntityFramework. The second approach will rely on a database stored procedure.

Preparing data.


To populate the table with test data I created a script that inserts 1000 categories, each on separate sublevel.

DECLARE @Level INT = 1
INSERT INTO Category VALUES ('RootCategory', 'Root category', NULL)

WHILE @Level < 1000
BEGIN
 INSERT INTO Category
 SELECT 'Subcategory No. ' + CAST(@Level AS VARCHAR(4)) AS Name,
     '' AS Description,
     SCOPE_IDENTITY() AS ParentCategoryId
 SET @Level += 1
END


RootCategory
 -> Subcategory No. 1
  -> Subcategory No. 2
   -> Subcategory No. 3
     ...
         -> Subcategory No. 999

The data is now prepared for fetching.

Navigation properties.


In the C# application I included ADO.NET Entity Data Model and renamed the entity container to StoreContext. I generated the model from database and included only Category table leaving the option Include foreign key columns in the model checked. The entity type for Category will appear inside of the model (as shown in the picture). I renamed the navigation properties to Subcategories and ParentCategory for better readability.
From that moment I was able to create the database context object and retrieve the root category.

var context = new StoreContext();
var root = context.Category
                  .First(c => c.Name.Equals("RootCategory"));

foreach (var subcategory in root.AllSubcategories())
        Console.WriteLine(subcategory.Name);


If you inspect the Subcategories property of root you will see only one direct subcategory. I am not aware of any built-in mechanisms in Entity Framework that could load any hierarchical data to an arbitratry depth. To load the entire hierarchy I will use the method below to recursively retrieve all subcategories.

public partial class Category
{
  public IEnumerable<Category> AllSubcategories()
  {
    yield return this;
    foreach (var directSubcategory in Subcategories)
      foreach (var subcategory in directSubcategory.AllSubcategories())
      {
        yield return subcategory;
      }
  }
}


The entity classes are generated as partial hence may be freely extended.

Stored procedure.


In the stored procedure approach the entire logic of preparation of the data set is performed at the database level. SQL Server (since the 2005 version) supports recursive common table expressions (CTE). Recursive CTE is a temporary result set that can reference itself and allows for writing concise queries against hierarchical data. The following stored procedure is designed to query all subcategories of a category passed as a parameter. CTE is introduced with the WITH statement.

CREATE PROCEDURE GetCategoriesTree(@RootName NVARCHAR(50))
AS BEGIN

WITH 
  CategoryTree (CategoryId, Name, Description, ParentCategoryId)
  AS
  (
    SELECT CategoryId, Name, Description, ParentCategoryId
    FROM Category
    WHERE Name = @RootName
    UNION ALL
    SELECT cat.CategoryId, cat.Name, cat.Description, cat.ParentCategoryId
    FROM Category cat
      INNER JOIN CategoryTree parent
        ON cat.ParentCategoryId = parent.CategoryId
  )
  
SELECT CategoryId, Name, Description, ParentCategoryId
FROM CategoryTree
OPTION (MAXRECURSION 32767);

END


Entity framework can encapsulate a stored procedure in a very elegant way. Function import allows to select existing stored procedure and map the returned rows to a collection of domain types. To perform seamless conversion, column names should be identical to property names.
After adding function import to the model, the function is accessible directly on the database context object.

var spSubcategories = context.GetCategoriesTree("RootCategory");

foreach (var spSubcategory in spSubcategories)
  Console.WriteLine(spSubcategory.Name);


The effect is identical as in the previous code.

Conclusions.


Both methods do not differ in terms of final results. However, there is a very important issue related to performance. In order to compare the execution times I used Stopwatch.

var context = new StoreContext();

var stopwatch = new Stopwatch();
stopwatch.Start();
var subcategories = context.Category
                    .First(c => c.Name.Equals("RootCategory"))
                    .AllSubcategories()
                    .ToList();
stopwatch.Stop();
Console.WriteLine("Loading {0} cat. with navigation properties took {1} ms",
                  subcategories.Count,
                  stopwatch.ElapsedMilliseconds);

stopwatch.Reset();

stopwatch.Start();
subcategories = context
                    .GetCategoriesTree("RootCategory")
                    .ToList();
stopwatch.Stop();
Console.WriteLine("Loading {0} cat. with stored procedure took {1} ms",
                  subcategories.Count,
                  stopwatch.ElapsedMilliseconds);


//output:
Loading 1000 cat. with navigation properties took 15259 ms
Loading 1000 cat. with stored procedure took 169 ms


The results highlight the difference that is almost two orders of magnitude in favor of the second solution. The test data used was a tree-like structure with 1000 levels. Certainly, if the data was not so nested then the first method would have perform much better (less number of queries to the database). In cases similar to presented, you can get huge performance gains by using the CTE approach.

The complete example can be downloaded from here.

Friday, September 14, 2012

Building anonymous implementation of an interface with Roslyn

Moq framework has this nice feature to create mocks of type using the functional syntax. Mock.Of<T> can construct a mock implementation conforming to a given predicate. For example:

var shipmentMock = Mock.Of<IShipment>(s => s.GetWeight(WeightUnit.Kg) == 10m);
Assert.AreEqual(10, shipmentMock.GetWeight(WeightUnit.Kg));

One thing that's great about it is that it works with abstract types such as interfaces. On the contrary, such syntax may encourage people to perform too many setups on mock object, which is generally bad idea. However, this is not the point right now.

Recently, Microsoft released the Roslyn project which is a compiler-as-a-service technology with a powerful scripting engine for C# that can be hosted within an application. In this blogpost I will demonstrate how to programmatically assemble a C# class and execute the new operator to create an instance of that class. The goal is to make that class implement an interface to imitate the behavior of the code above. This will be, however, very limited mechanism that will work only with non-generic interfaces with get properties. The input will be an expression of Predicate for that interface, correctly formatted to have only binary expressions of method Equal separated with AndAlso nodes. Predicate itself is nothing more than a function that takes object and returns true or false. The left node of each binary sub-expression should be the one specifying the property and the right node should represent the expected value.

To create application with such capability I will perform the following steps.
  1. Examine the properties of the iterface using reflection.
  2. Extract the behaviour specified in the expression using the ExpressionVisitor class.
  3. Match the properties (step 1) with the values provided in expressions (step 2).
  4. Construct an interface-implementing C# class with required properties using the methods from Roslyn.Compilers.CSharp.Syntax namespace.
  5. Compile the code and create an instance of the class.

For the purpose of this example I will use the IRoadVehicle interface.

public interface IRoadVehicle
{
  string Name { get; }
  int WheelCount { get; }
  decimal FuelUsage { get; }
  IRoadVehicle Backup { get; }
  Color Color { get; }
  DateTime ManufactureDate { get; }
}

public enum Color { Blue, Red }


Given an expression the resulting method will be able to transform it into object.


var car =
  ImplementationFactory<IRoadVehicle>
  .CreateMatching(v => v.Name == "Ziggo"
                    && v.WheelCount == 4 
                    && v.FuelUsage == 10m 
                    && v.Color == Color.Red
                    && v.ManufactureDate == new DateTime(2010, 10, 01));


Implementation:


public class ImplementationFactory<T>
{
  const string Implementator = "AnyClassImplementingT";
  private const string Instantiate = "new " + Implementator + "();";


  public static T CreateMatching(Expression<Predicate<T>> predicate)
  {
    IEqualitySearch search = 
        new BinaryEqualitySearch(predicate, new ExpressionValidation());

    var mapping = new MappingBetweenInterfacePropertiesAndExpressions
                    (new Properties<T>(), new Expressions(search));
    var codeAssembling = 
        new CodeAssembling(mapping, new ClassAssembling<T>(Implementator));

    var evaluation = new Evaluation<T>(codeAssembling);
    return evaluation.Run(Instantiate);
  }
}


First thing to do is to examine the interface to get information about all defined properties. The points of interest are property name and property type.


public interface IProperties
{
  IEnumerable<Tuple<string, string>> Extract();
}

public class Properties<T> : IProperties
{
  public IEnumerable<Tuple<string, string>> Extract()
  {
    return
      typeof(T)
      .GetProperties()
      .Select(p => Tuple.Create
          (p.Name, p.PropertyType.FullName));
  }
}


Next thing to do is to decompose the expression in order to pull out the nodes that specify the values for properties.


public interface IEqualitySearch
{
  IEnumerable<BinaryExpression> GetEqualityNodes();
}

public class BinaryEqualitySearch : ExpressionVisitor, IEqualitySearch
{
  private readonly Expression _expression;
  private readonly IExpressionValidation _validation;
  private readonly List<BinaryExpression> _nodes = new List<BinaryExpression>();

  public BinaryEqualitySearch(Expression expression, IExpressionValidation validation)
  {
    _expression = expression;
    _validation = validation;
  }

  public IEnumerable<BinaryExpression> GetEqualityNodes()
  {
    Visit(_expression);
    return _nodes;
  }

  protected override Expression VisitBinary(BinaryExpression node)
  {
    if (_validation.IsSupportedCondition(node))
    {
      if (_validation.IsSupportedExpression(node))
        _nodes.Add(node);
    }
    else throw new NotSupportedException();
    return base.VisitBinary(node);
  }
}


An injectable helper class is used to filter out some of the not supported expressions.


public interface IExpressionValidation
{
  bool IsSupportedCondition(BinaryExpression node);
  bool IsSupportedExpression(BinaryExpression node);
}

public class ExpressionValidation : IExpressionValidation
{
  public bool IsSupportedCondition(BinaryExpression node)
  {
    return node.NodeType == ExpressionType.AndAlso
            || node.NodeType == ExpressionType.Equal;
  }

  public bool IsSupportedExpression(BinaryExpression node)
  {
    return (node.Left is MemberExpression || node.Left is UnaryExpression) &&
            (node.Right is ConstantExpression || node.Right is NewExpression);
  }
}


Now, the lowest level binary expressions are parsed to find out their names and values.


public interface IExpressions
{
  IEnumerable<Tuple<string, string>> Extract();
}

public class Expressions : IExpressions
{
  private readonly IEqualitySearch _expressionsSearch;

  public Expressions(IEqualitySearch expressionsSearchSearch)
  {
    _expressionsSearch = expressionsSearchSearch;
  }

  public IEnumerable<Tuple<string, string>> Extract()
  {
    var expressions = _expressionsSearch.GetEqualityNodes();

    return from e in expressions
            let name = GetName(e.Left)
            where name != null
            select Tuple.Create(name, e.Right.ToString());
  }

  private string GetName(Expression exp)
  {
    var memberExpression = exp as MemberExpression;
    if (memberExpression != null)
      return (memberExpression).Member.Name;
    var unaryExpression = exp as UnaryExpression;
    return unaryExpression != null
    ? ((MemberExpression)((unaryExpression).Operand)).Member.Name : null;
  }
}


The collections of data extracted from interface properties and from expression are now joined by their names in order to match properties with the values specified in expression. Note that the code performs left join so that every property (from the left set) should be preserved. The properties that were not successfuly matched will have null value assigned. They will have default values for their type provided later.


public interface IMappingBetweenInterfacePropertiesAndExpressions
{
  IEnumerable<Getter> Perform();
}

public class MappingBetweenInterfacePropertiesAndExpressions 
              : IMappingBetweenInterfacePropertiesAndExpressions
{
  private readonly IProperties _properties;
  private readonly IExpressions _expressions;

  public MappingBetweenInterfacePropertiesAndExpressions
    (IProperties properties, IExpressions expressions)
  {
    _properties = properties;
    _expressions = expressions;
  }

  public IEnumerable<Getter> Perform()
  {
    var properties = _properties.Extract();
    var expressions = _expressions.Extract();

    return from p in properties
            join e in expressions.GroupBy(i => i.Item1).Select(g => g.First())
              on p.Item1 equals e.Item1
              into matched
            from left in matched.DefaultIfEmpty()
            select new Getter 
        { Name = p.Item1, Type = p.Item2, 
        Value = (matched.Any() ? left.Item2 : null) };
  }
}


The code above returns a collection of Getter objects. Getter is a simple plain class to encapsulate values describing a get property. It does not have any behavior yet.


public class Getter
{
  public string Name { get; set; }
  public string Type { get; set; }
  public string Value { get; set; }
}


Finally, we have now all essential information to start constructing the code.


public interface ICodeAssembling
{
  string Perform();
}

public class CodeAssembling : ICodeAssembling
{
  private readonly IMappingBetweenInterfacePropertiesAndExpressions _mapping;
  private readonly IClassAssembling _classAssembling;

  public CodeAssembling(IMappingBetweenInterfacePropertiesAndExpressions mapping, 
                        IClassAssembling classAssembling)
  {
    _mapping = mapping;
    _classAssembling = classAssembling;
  }

  public string Perform()
  {
    var getters = _mapping.Perform();

    foreach (var getter in getters)
      _classAssembling.AddGetter(getter);

    return _classAssembling.Perform();
  }
}


This class uses IClassAssembling interface that is meant to encapsulate the Roslyn API for emitting code. The AddGetter method checks whether the property has a return value specified, if not, it supplies a default value for the type. The code looks a little verbose since the syntax is heavily nested.


public interface IClassAssembling
{
  void AddGetter(Getter getter);
  string Perform();
}

public class ClassAssembling<T> : IClassAssembling
{
  private readonly List<PropertyDeclarationSyntax> _properties
    = new List<PropertyDeclarationSyntax>();
  private readonly string _className;

  public ClassAssembling(string className)
  {
    _className = className;
  }

  public void AddGetter(Getter getter)
  {
    var value = getter.Value == null
      ? Syntax.DefaultExpression(Syntax.ParseTypeName(getter.Type))
      : Syntax.ParseExpression(getter.Value);

    var getterCode =
      Syntax.PropertyDeclaration(Syntax.ParseTypeName(getter.Type), getter.Name)
        .WithModifiers(Syntax.TokenList(Syntax.Token(SyntaxKind.PublicKeyword)))
        .WithAccessorList
        (Syntax.AccessorList
            (Syntax.AccessorDeclaration
              (SyntaxKind.GetAccessorDeclaration,
                Syntax.Block(Syntax.List((StatementSyntax)
                  Syntax.ReturnStatement
                      (Syntax.CastExpression(Syntax.ParseTypeName(getter.Type)
                        , value))))
              )
            )
        );
    _properties.Add(getterCode);
  }

  public string Perform()
  {
    var code = Syntax.CompilationUnit()
      .WithMembers(
        Syntax.List<MemberDeclarationSyntax>(
          Syntax.ClassDeclaration(Syntax.Identifier(_className))
            .WithKeyword(Syntax.Token(SyntaxKind.ClassKeyword))
            .WithModifiers(Syntax.TokenList
              (Syntax.Token(SyntaxKind.PublicKeyword)))
            .AddBaseListTypes(Syntax.ParseTypeName(typeof(T).FullName))
            .WithOpenBraceToken(
              Syntax.Token(
                SyntaxKind.OpenBraceToken))
            .WithCloseBraceToken(
              Syntax.Token(
                SyntaxKind.CloseBraceToken))));

    var currentClass = code.ChildNodes()
        .OfType<ClassDeclarationSyntax>().Single();

    var classWithProperties =
        currentClass.AddMembers(_properties.ToArray());

    var afterReplacement =
        code.ReplaceNode(currentClass, classWithProperties);

    return  afterReplacement.Format().GetFormattedRoot().GetFullText();
  }
}
}


If you inspect the string that is returned from the Perform method you can see that it is a valid C# class declaration.


public class AnyClassImplementingT : PredicateImplementation.IRoadVehicle
{
  public System.String Name
  {
    get
    {
      return (System.String)"Ziggo";
    }
  }

  public System.Int32 WheelCount
  {
    get
    {
      return (System.Int32)4;
    }
  }

  public System.Decimal FuelUsage
  {
    get
    {
      return (System.Decimal)10;
    }
  }

  public PredicateImplementation.IRoadVehicle Backup
  {
    get
    {
      return (PredicateImplementation.IRoadVehicle)
      default(PredicateImplementation.IRoadVehicle);
    }
  }

  public PredicateImplementation.Color Color
  {
    get
    {
      return (PredicateImplementation.Color)1;
    }
  }

  public System.DateTime ManufactureDate
  {
    get
    {
      return (System.DateTime)new DateTime(2010, 10, 1);
    }
  }
}


At last, we can use the scripting engine to compile the code and execute a query against it. The query will be just a command with the new operator.



public class Evaluation<T>
{
  private readonly ICodeAssembling _codeAssembling;
  private readonly ScriptEngine _engine;
  private readonly Session _session;

  public Evaluation(ICodeAssembling codeAssembling)
  {
    _codeAssembling = codeAssembling;
    _engine = new ScriptEngine(importedNamespaces: 
        new[] { "System", typeof(T).Namespace });
    _session = Session.Create();
    _session.AddReference(typeof(T).Assembly);
  }

  public T Run(string query)
  {
    var code = _codeAssembling.Perform();
    _engine.Execute(code, _session);
    var result = _engine.CompileSubmission<T>(query, _session);
    return result.Execute();
  }
}


By calling the CreateMatching method you receive a concrete object with the properties described by the predicate:

Console.WriteLine("Name: {0}", car.Name);
Console.WriteLine("WheelCount: {0}", car.WheelCount);
Console.WriteLine("FuelUsage: {0}", car.FuelUsage);
Console.WriteLine("ManufactureDate: {0}", car.ManufactureDate);
Console.WriteLine("Color: {0}", car.Color);

//Output:

Name: Ziggo
WheelCount: 4
FuelUsage: 10
ManufactureDate: 2010-10-01 00:00:00
Color: Red


The mechanism demonstrates some practical usage of Roslyn along with expressions and reflection to build compilable code. It has now a very limited functionality though might be useful as a starting point. Properties of type Boolean are not supported now because the default ToString call on constant expression creates a camel-cased value of True or False that is not recognized by the C# compiler. There is probably some reason for it related to the .NET languages common specification.