database

Mapping DataTables and DataRows to Objects in C# and .NET

My group regularly uses DataSet, DataTable, and DataRow objects in many of our apps.

(What? Don't look at me like that. These apps are old.)

Anyway, we're simultaneously trying to implement good C# and object-oriented programming principles while maintaining these old apps, so we often end up having to map data from a data set to a C# object. We did this enough times that I and a coworker (we'll call her Marlena) decided to sit down and just make up a new mapping system for use with these DataTable and DataRow objects.

As always with my code-based posts, there's a GitHub project with a full working example app, so check that out too!

One Jump Ahead

So here's a basic problem with mapping from DataSet, DataTable, and DataRow objects: we don't know at compile time what columns and tables exist in the set, so mapping solutions like AutoMapper won't work for this scenario. Our mapping system will have to assume what columns exist. But, in order to make it more reusable, we will make the mapping system return default values for any values which it does not locate.

There's also another, more complex problem: the databases we are acquiring our data from use many different column names to represent the same data. 20 years of different maintainers and little in the way of cohesive naming standards will do that do a database. So, if we needed a person's first name, the different databases might use:

  • first_name
  • firstName
  • fname
  • name_first

This, as might be imagined, makes mapping anything rather difficult. So our system will also need to be able to map from many different column names.

Finally, this system wouldn't be worth much if it couldn't handle collections of objects as well as single objects, so we'll need to allow for that as well.

So, in short, our system needs to:

  1. Map from DataTable and DataRow to objects.
  2. Map from multiple different column names.
  3. Handle mapping to a collection of objects as well as a single object.

We'll need several pieces to accomplish this. But before we can even start building the mapping system, we must first acquire some sample data.

Mine, Mine, Mine

We're going to create some DataSet objects that we can test our system against. In the real world, you would use an actual database, but here (for simplicity's sake) we're just going to manually create some DataSet objects. Here's a sample class which will create two DataSet objects, Priests and Ranchers, each of which use different column names for the same data:

public static class DataSetGenerator  
{
    public static DataSet Priests()
    {
        DataTable priestsDataTable = new DataTable();
        priestsDataTable.Columns.Add(new DataColumn()
        {
            ColumnName = "first_name",
            DataType = typeof(string)
        });
        priestsDataTable.Columns.Add(new DataColumn()
        {
            ColumnName = "last_name",
            DataType = typeof(string)
        });
        priestsDataTable.Columns.Add(new DataColumn()
        {
            ColumnName = "dob",
            DataType = typeof(DateTime)
        });
        priestsDataTable.Columns.Add(new DataColumn()
        {
            ColumnName = "job_title",
            DataType = typeof(string)
        });
        priestsDataTable.Columns.Add(new DataColumn()
        {
            ColumnName = "taken_name",
            DataType = typeof(string)
        });
        priestsDataTable.Columns.Add(new DataColumn()
        {
            ColumnName = "is_american",
            DataType = typeof(string)
        });

        priestsDataTable.Rows.Add(new object[] { "Lenny", "Belardo", new DateTime(1971, 3, 24), "Pontiff", "Pius XIII", "yes" });
        priestsDataTable.Rows.Add(new object[] { "Angelo", "Voiello", new DateTime(1952, 11, 18), "Cardinal Secretary of State", "", "no" });
        priestsDataTable.Rows.Add(new object[] { "Michael", "Spencer", new DateTime(1942, 5, 12), "Archbishop of New York", "", "yes" });
        priestsDataTable.Rows.Add(new object[] { "Sofia", "(Unknown)", new DateTime(1974, 7, 2), "Director of Marketing", "", "no" });
        priestsDataTable.Rows.Add(new object[] { "Bernardo", "Gutierrez", new DateTime(1966, 9, 16), "Master of Ceremonies", "", "no" });

        DataSet priestsDataSet = new DataSet();
        priestsDataSet.Tables.Add(priestsDataTable);

        return priestsDataSet;
    }

    public static DataSet Ranchers()
    {
        DataTable ranchersTable = new DataTable();
        ranchersTable.Columns.Add(new DataColumn()
        {
            ColumnName = "firstName",
            DataType = typeof(string)
        });
        ranchersTable.Columns.Add(new DataColumn()
        {
            ColumnName = "lastName",
            DataType = typeof(string)
        });
        ranchersTable.Columns.Add(new DataColumn()
        {
            ColumnName = "dateOfBirth",
            DataType = typeof(DateTime)
        });
        ranchersTable.Columns.Add(new DataColumn()
        {
            ColumnName = "jobTitle",
            DataType = typeof(string)
        });
        ranchersTable.Columns.Add(new DataColumn()
        {
            ColumnName = "nickName",
            DataType = typeof(string)
        });
        ranchersTable.Columns.Add(new DataColumn()
        {
            ColumnName = "isAmerican",
            DataType = typeof(string)
        });

        ranchersTable.Rows.Add(new object[] { "Colt", "Bennett", new DateTime(1987, 1, 15), "Ranchhand", "", "y" });
        ranchersTable.Rows.Add(new object[] { "Jameson", "Bennett", new DateTime(1984, 10, 10), "Ranchhand", "Rooster", "y" });
        ranchersTable.Rows.Add(new object[] { "Beau", "Bennett", new DateTime(1944, 8, 9), "Rancher", "", "y" });
        ranchersTable.Rows.Add(new object[] { "Margaret", "Bennett", new DateTime(1974, 7, 2), "Bar Owner", "Maggie", "y" });
        ranchersTable.Rows.Add(new object[] { "Abigail", "Phillips", new DateTime(1987, 4, 24), "Teacher", "Abby", "y" });

        DataSet ranchersDataSet = new DataSet();
        ranchersDataSet.Tables.Add(ranchersTable);

        return ranchersDataSet;
    }
}

We'll test our system against this sample data.

Something There

Now we can build our actual mapping solution. First off, we need a way to decide what column names map to object properties. It was Marlena's idea to keep those things together, and so we came up with a class called DataNamesAttribute that looks like this:

[AttributeUsage(AttributeTargets.Property)]
public class DataNamesAttribute : Attribute  
{
    protected List<string> _valueNames { get; set; }

    public List<string> ValueNames
    {
        get
        {
            return _valueNames;
        }
        set
        {
            _valueNames = value;
        }
    }

    public DataNamesAttribute()
    {
        _valueNames = new List<string>();
    }

    public DataNamesAttribute(params string[] valueNames)
    {
        _valueNames = valueNames.ToList();
    }
}

This attribute can then be used (in fact, can only be used, due to the AttributeUsage(AttributeTargets.Property) declaration) on properties of other classes. Let's say we're going to map to a Person class. We would use DataNamesAttribute like so:

public class Person  
{
    [DataNames("first_name", "firstName")]
    public string FirstName { get; set; }

    [DataNames("last_name", "lastName")]
    public string LastName { get; set; }

    [DataNames("dob", "dateOfBirth")]
    public DateTime DateOfBirth { get; set; }

    [DataNames("job_title", "jobTitle")]
    public string JobTitle { get; set; }

    [DataNames("taken_name", "nickName")]
    public string TakenName { get; set; }

    [DataNames("is_american", "isAmerican")]
    public bool IsAmerican { get; set; }
}

Now that we know where the data needs to end up, let's start mapping out the mapper (heh).

Reflection

Our mapper class will be a generic class so that we can map from DataTable or DataRow objects to any kind of object. We'll need two methods to get different kinds of data:

public class DataNamesMapper<TEntity> where TEntity : class, new()  
{
    public TEntity Map(DataRow row) { ... }
    public IEnumerable<TEntity> Map(DataTable table) { ... }
}

Let's start with the Map(DataRow row) method. We need to do three things:

  1. Figure out what columns exist in this row.
  2. Determine if the TEntity we are mapping to has any properties with the same name as any of the columns (aka the Data Names) AND
  3. Map the value from the DataRow to the TEntity.

Here's how we do this, using just a bit of reflection:

public TEntity Map(DataRow row)  
{
    //Step 1 - Get the Column Names
    var columnNames = row.Table.Columns
                               .Cast<DataColumn>()
                               .Select(x => x.ColumnName)
                               .ToList();

    //Step 2 - Get the Property Data Names
    var properties = (typeof(TEntity)).GetProperties()
                                      .Where(x => x.GetCustomAttributes(typeof(DataNamesAttribute), true).Any())
                                      .ToList();

    //Step 3 - Map the data
    TEntity entity = new TEntity();
    foreach (var prop in properties)
    {
        PropertyMapHelper.Map(typeof(TEntity), row, prop, entity);
    }

    return entity;
}

Of course, we also need to handle the other method, the one where we can get a collection of TEntity:

public IEnumerable<TEntity> Map(DataTable table)  
{
    //Step 1 - Get the Column Names
    var columnNames = table.Columns.Cast<DataColumn>().Select(x => x.ColumnName).ToList();

    //Step 2 - Get the Property Data Names
    var properties = (typeof(TEntity)).GetProperties()
                                        .Where(x => x.GetCustomAttributes(typeof(DataNamesAttribute), true).Any())
                                        .ToList();

    //Step 3 - Map the data
    List<TEntity> entities = new List<TEntity>();
    foreach (DataRow row in table.Rows)
    {
        TEntity entity = new TEntity();
        foreach (var prop in properties)
        {
            PropertyMapHelper.Map(typeof(TEntity), row, prop, entity);
        }
        entities.Add(entity);
    }

    return entities;
}

You might be wondering just what the heck the PropertyMapHelper class is. If you are, you might also be about to regret it.

Dig a Little Deeper

The PropertyMapHelper, as suggested by the name, maps values to different primitive types (int, string, DateTime, etc.). Here's that Map() method we saw earlier:

public static void Map(Type type, DataRow row, PropertyInfo prop, object entity)  
{
    List<string> columnNames = AttributeHelper.GetDataNames(type, prop.Name);

    foreach (var columnName in columnNames)
    {
        if (!String.IsNullOrWhiteSpace(columnName) && row.Table.Columns.Contains(columnName))
        {
            var propertyValue = row[columnName];
            if (propertyValue != DBNull.Value)
            {
                ParsePrimitive(prop, entity, row[columnName]);
                break;
            }
        }
    }
}

There are two pieces in this method that we haven't defined yet: the AttributeHelper class and the ParsePrimitive() method. AttributeHelper is a rather simple class that merely gets the list of column names from the DataNamesAttribute:

public static List<string> GetDataNames(Type type, string propertyName)  
{
    var property = type.GetProperty(propertyName).GetCustomAttributes(false).Where(x => x.GetType().Name == "DataNamesAttribute").FirstOrDefault();
    if (property != null)
    {
        return ((DataNamesAttribute)property).ValueNames;
    }
    return new List<string>();
}

The other we need to define in ParsePrimitive(), which as its name suggests will parse the values into primitive types. Essentially what this class does is assign a value to a passed-in property reference (represented by the PropertyInfo class). I'm not going to post the full code on this post (you can see it over on GitHub), so here's a snippet of what this method does:

private static void ParsePrimitive(PropertyInfo prop, object entity, object value)  
{
    if (prop.PropertyType == typeof(string))
    {
        prop.SetValue(entity, value.ToString().Trim(), null);
    }
    else if (prop.PropertyType == typeof(int) || prop.PropertyType == typeof(int?))
    {
        if (value == null)
        {
            prop.SetValue(entity, null, null);
        }
        else
        {
            prop.SetValue(entity, int.Parse(value.ToString()), null);
        }
    }
    ...
}

That's the bottom of the rabbit hole, as it were. Now, we can use the DataSet objects we created earlier and our mapping system to see if we can map this data correctly.

Two Worlds

Here's a quick program that can test our new mapping system:

class Program  
{
    static void Main(string[] args)
    {
        var priestsDataSet = DataSetGenerator.Priests();
        DataNamesMapper<Person> mapper = new DataNamesMapper<Person>();
        List<Person> persons = mapper.Map(priestsDataSet.Tables[0]).ToList();

        var ranchersDataSet = DataSetGenerator.Ranchers();
        persons.AddRange(mapper.Map(ranchersDataSet.Tables[0]));

        foreach (var person in persons)
        {
            Console.WriteLine("First Name: " + person.FirstName + ", Last Name: " + person.LastName
                                + ", Date of Birth: " + person.DateOfBirth.ToShortDateString()
                                + ", Job Title: " + person.JobTitle + ", Nickname: " + person.TakenName
                                + ", Is American: " + person.IsAmerican);
        }

        Console.ReadLine();
    }
}

When we run this app (which you can do too), we will get the following output:

Which is exactly what we want!

(I mean, really, did you expect me to blog about something that didn't work?)

Go the Distance

It concerns me that this system is overly complicated, and I'd happily take suggestions on how to make it more straightforward. While I do like how all we need to do is place the DataNamesAttribute on the correct properties and then call an instance of DataNamesMapper<T>, I feel like the whole thing could be easier somehow. Believe it or not, this version is actually simpler than the one we're actually using in our internal apps.

Also, check out the sample project over on GitHub, fork it, test it, whatever. If it helped you out, or if you can improve it, let me know in the comments!

Finally, extra special bonus points will go to anyone who can figure out a) what the hell those odd section titles are about and b) where I got the sample data from.

Happy Coding!

Using T4 Templates to Generate Enums From Database Lookup Tables

The major project we've been working on had a non-unusual requirement: take these SQL database tables and convert them to enumerations we can use over a service. A quick Google search found some code that did exactly what we wanted, so of course we (ok, I) copied-and-pasted that code into our solution and BAM, it worked like a charm.

Problem was, when someone asked me to explain what it did, I couldn't. I can't stand not understanding something, especially not something that enables my projects to work, so I dove into this unfamiliar T4 Templates technology and came out the other side with a little more knowledge than I'd started with. Perhaps it will help you too.

Something Really Clever

As I said earlier, we had a requirement for this big project we're working on to use values from a set of database lookup tables as Enumerations in my server code. Given that I'm already a big fan of using Enums to reduce ambiguity it didn't take much convincing for me to get started. A quick google search revealed this StackOverflow answer in which Robert Koritnik shared his method of using T4 Templates to generate files for Enumerations generated from database lookup tables.

As always, let's see the finished solution first. Here's the T4 template from the StackOverflow answer:

<#@ template debug="true" hostSpecific="true" #>  
<#@ output extension=".generated.cs" #>  
<#@ Assembly Name="System.Data" #>  
<#@ import namespace="System.Data" #>  
<#@ import namespace="System.Data.SqlClient" #>  
<#@ import namespace="System.IO" #>  
<#@ import namespace="System.Text.RegularExpressions" #>  
<#  
    string tableName = Path.GetFileNameWithoutExtension(Host.TemplateFile);
    string path = Path.GetDirectoryName(Host.TemplateFile);
    string columnId = tableName + "ID";
    string columnName = "Name";
    string connectionString = "data source=.;initial catalog=DBName;integrated security=SSPI";
#>
using System;  
using System.CodeDom.Compiler;

namespace Services.<#= GetSubNamespace() #>  
{
    /// <summary>
    /// <#= tableName #> auto generated enumeration
    /// </summary>
    [GeneratedCode("TextTemplatingFileGenerator", "10")]
    public enum <#= tableName #>
    {
<#  
    SqlConnection conn = new SqlConnection(connectionString);
    string command = string.Format("select {0}, {1} from {2} order by {0}", columnId, columnName, tableName);
    SqlCommand comm = new SqlCommand(command, conn);

    conn.Open();

    SqlDataReader reader = comm.ExecuteReader();
    bool loop = reader.Read();

    while(loop)
    {
#>      /// <summary>
        /// <#= reader[columnName] #> configuration setting.
        /// </summary>
        <#= Pascalize(reader[columnName]) #> = <#= reader[columnId] #><# loop = reader.Read(); #><#= loop ? ",\r\n" : string.Empty #>
<#  
    }
#>  }
}
<#+  
    private string Pascalize(object value)
    {
        Regex rx = new Regex(@"(?:[^a-zA-Z0-9]*)(?<first>[a-zA-Z0-9])(?<reminder>[a-zA-Z0-9]*)(?:[^a-zA-Z0-9]*)");
        return rx.Replace(value.ToString(), m => m.Groups["first"].ToString().ToUpper() + m.Groups["reminder"].ToString().ToLower());
    }

    private string GetSubNamespace()
    {
        Regex rx = new Regex(@"(?:.+Services\s)");
        string path = Path.GetDirectoryName(Host.TemplateFile);
        return rx.Replace(path, string.Empty).Replace("\\", ".");
    }
#>

Then, for every Enum we need generated, we create a t4 file with the enum name and place the following line of code in it:

<#@ include file="..\..\T4 Templates\EnumGenerator.ttinclude" #>  

I plugged this in to our application, added a few t4 files for enumerations we needed generated, and boom, it just worked. All my lookup tables were now perfect little enums.

So we chugged along for a little while, perfectly content with our newfound solution, until my teammate Stacy asked about these T4 Templates and what they actually, y'know, did. My response went something like:

Matt: Well, it works like this. The first line..... Uh.... Well, actually, this set of lines here does something really clever....

I had no idea what this code was doing. I knew what it resulted in (a set of Enums generated from database tables) but couldn't sufficiently explain to anyone, even myself, exactly how it did this, which I unquestionably should be able to do. This code snippet was a black box, unknown to me, and experience told me that if I don't know how something works I won't be able to fix it if it breaks.

So I needed to understand just what it was that this code snippet was doing. That's what I'm going to do here: break down the snippet so that I (and hopefully you readers) will better understand what is being accomplished.

So let's get started!

What are T4 Templates?

If I'm going to understand what the code is doing, I first need to understand what technology is being used. MSDN has this to say about T4 Templates:

In Visual Studio, a T4 text template is a mixture of text blocks and control logic that can generate a text file. The control logic is written as fragments of program code in Visual C# or Visual Basic. The generated file can be text of any kind, such as a Web page, or a resource file, or program source code in any language.

OK so, in my mind, T4 templates allow you to write code that will generate other code. Seems simple enough. But how does it do that?

Breaking Down The Template

Let's walk through this template and see if we can understand what it is doing.

<#@ template debug="true" hostSpecific="true" #>  
<#@ output extension=".generated.cs" #>  
<#@ Assembly Name="System.Data" #>  
<#@ import namespace="System.Data" #>  
<#@ import namespace="System.Data.SqlClient" #>  
<#@ import namespace="System.IO" #>  
<#@ import namespace="System.Text.RegularExpressions" #>  

This appears to be some kind of collection of using statements, but the <#@ syntax was throwing me off. MSDN states that this syntax is used to create "directives" which (more or less) tell the T4 template how to behave. In our case, this template will cause the generated file to end in ".generated.cs" and imports several namespaces that we will need this template to use.

<#  
    string tableName = Path.GetFileNameWithoutExtension(Host.TemplateFile);
    string path = Path.GetDirectoryName(Host.TemplateFile);
    string columnId = tableName + "ID";
    string columnName = "Name";
    string connectionString = "data source=.;initial catalog=DBName;integrated security=SSPI";
#>

Now we start to see the meat of the template. The <# syntax is a boundary that specifies that anything within it is code running within the template (not the output code being generated). In our particular case, we are getting the name of the database table from the name of the file, which implies that each .tt file we want for each generated enum needs to be named the same as the database table.

We're also setting up the common values for the ID column and Name column, and getting our connection string ready. Just a bunch of setup here.

using System;  
using System.CodeDom.Compiler;

namespace Services.<#= GetSubNamespace() #>  
{

IMO this is the first truly interesting snippet. Given what I know about how T4 templates work (read: nothing) this snippet appears to be outputting actual code into the target file. Specifically, this is outputting two using statements and a namespace.

Since we're here, let's go ahead and examine the GetSubNamespace method:

private string GetSubNamespace()  
{
    Regex rx = new Regex(@"(?:.+Services\s)");
    string path = Path.GetDirectoryName(Host.TemplateFile);
    return rx.Replace(path, string.Empty).Replace("\\", ".");
}

Now I'm no regular expressions whiz-kid or anything (it looks like a bunch of gobbeldy-gook to me), but this appears to be using the folder structure where this file is located and transforming that structure into a namespace, which works very well for Visual Studio application since that's pretty much what VS does anyway. I'd be hard-pressed to explain exactly how it does this, though.

Back to the main portion of the code, specifically the first part emitted within the namespace declaration:

/// <summary>
    /// <#= tableName #> auto generated enumeration
    /// </summary>
    [GeneratedCode("TextTemplatingFileGenerator", "10")]
    public enum <#= tableName #>
    {
<#  
    SqlConnection conn = new SqlConnection(connectionString);
    string command = string.Format("select {0}, {1} from {2} order by {0}", columnId, columnName, tableName);
    SqlCommand comm = new SqlCommand(command, conn);

    conn.Open();

    SqlDataReader reader = comm.ExecuteReader();
    bool loop = reader.Read();

    while(loop)
    {
#>      

Now we start seeing the <# syntax again. The first time we see it is when it uses the tableName to output some comments about the generated enumeration. After that, we set up a SqlConnection, select from the appropriate table, and use a SqlDataReader to parse the results. What we do with those results is in the next snippet:

    while(loop)
    {
#>      /// <summary>
        /// <#= reader[columnName] #> configuration setting.
        /// </summary>
        <#= Pascalize(reader[columnName]) #> = <#= reader[columnId] #><# loop = reader.Read(); #><#= loop ? ",\r\n" : string.Empty #>
<#  
    }
#>  }
}

For each of the values in the SqlDataReader, we output an enumeration value that uses the name found in the Name column in the table, after that name is run through a method called Pascalize(). Here's the implementation for Pascalize():

private string Pascalize(object value)  
{
    Regex rx = new Regex(@"(?:[^a-zA-Z0-9]*)(?<first>[a-zA-Z0-9])(?<reminder>[a-zA-Z0-9]*)(?:[^a-zA-Z0-9]*)");
    return rx.Replace(value.ToString(), m => m.Groups["first"].ToString().ToUpper() + m.Groups["reminder"].ToString().ToLower());
}

Oh goody, regexes again (/sarcasm). I'm inferring somewhat here, but Pascalize() appears to transform strings into Pascal-cased objects. So, if the value of Name was

movie theatre or cinema  

Pascalize would transform that string into:

MovieTheatreOrCinema  

and this conforms to the naming conventions for enumerations. I do wonder how Pascalize strips out non-alphanumerics (since that is what the "(?:[^a-zA-Z0-9]*)" portion of the regular expression appears to be doing), but that investigation is for a later time.

Summary

When I started writing this post, I knew thismuch about how T4 templates worked. Now I know a little more, and all it took was a bit of intuition, some luck, and some googling.

The point of all this is simple: you learn by discovering, by guessing, by failing, and by succeeding. It is part of my job to understand what the code that my group is using actually does, and when I didn't know enough, I had to stumble around for a bit before understanding came to me. Don't be afraid to admit you don't know something, since you can always learn it!

Regular expressions, though.... not sure I'll ever understand that nonsense.

Happy Coding!