Lots of ORM Updates

We use the OpenNETCF ORM in the Solution Family products.  Unfortunately I haven’t figured out a good way to keep the code base for the ORM stuff we use in Solution Family in sync with the public code base on CodePlex, so occasionally I have to go in and use Araxis Merge to push changes into the public tree, then check them into the public source control server.  What that means to you is that you’re often working with stale code.  Sorry, that’s just how the cookie crumbles, and until I figure out how to clone myself Multiplicity-style, it’s not likely to change.

At any rate, we’re pretty stable on the Solution Family side of things, so I did a large merge back into the public tree this evening.  I still have to do a full release package, but the code is at least up to date as of change set 104901 and all of the projects (at least I hope) properly build.

Most of the changes revolve around work I’ve been doing with the Dream Factory cloud implementation, so there are lots of changes there, but I also have been doing more with DynamicEntities, so some changes were required for that too.  Of course there are assorted bug fixes as well, most of them in the SQLite implementation.  I leave it to you and your own diff skills if you really, really want to know what they are.

Go get it.  Use it.  And for Pete’s sake, quit writing SQL statements!

Storing Cloud Data Over Transient Connections

An interesting challenge in many M2M scenarios is that your network connection is often far from good. If you’re trying to collect Engine data from an over-the-road truck going 80 miles an hour across rural Montana it’s a pretty safe bet that you’re going to have places where you have no access to a network. Even in urban areas you have dead spots, plus there’s always the old “driver put a tuna can over the antenna” scenarios to throw a wrench into things. Just because we lose our connection doesn’t mean we should start throwing data onto the floor though. We need a data storage mechanism that’s robust enough to deal with these kind of problems.

What you need is a local data store for the times when you don’t have connectivity and a remote store when you do. Or maybe a local store that does store-and-forward or replication. Yes, you could roll your own data storage service that can do these things, but why would you when there’s a perfectly good, already written solution out there? Again, you should be abstracting your application’s data services so you can focus on the business problems you’re good at. Solve the problems your company is hired to solve – not the grunt work of putting data into storage.

I added a new feature today to the OpenNETCF ORM called Replication (it’s only in the source downloads right now, not the latest release). A Replicator attaches to any DataStore and ties it to any other DataStore. It doesn’t matter what the actual storage is – there’s the beauty of abstraction, it works with any supported stores – it can take data from one store and push it to anther behind the scenes for you. So you can store to a local SQLite data file and have a Replicator push that data off to an Azure table. And it requires no change in your data Insert logic at all. Zero.

Currently Replicators are simplistic in capability. They can only replicate Inserts, and they only do a “Replicate and Delete” meaning that during replication the data is “moved” from the local store to the remote store, but that’s typically all you need and the typical case is all I’m trying to solve in the first pass.

So what does it look like, you ask? Below is an example of a working test that stores locally to a SQL Compact database, and when the network is up, those rows get moved off to a DreamFactory Cloud table. Notice that the only “new” thing you do here is to define the DataStore where the replicated data goes, you define which Entities will get replicated (it’s opt-in or a per-table basis), and you add the Replicator to the source DataStore’s new Replicators collection (lines 11-29). Yes, that means you could even replicate different tables to different target Stores.

[TestMethod()]
public void BasicLocalReplicationTest()
{
    var source = new SqlCeDataStore("source.sdf");
    if (!source.StoreExists)
    {
        source.CreateStore();
    }
    source.AddType<TestItem>();

    var destination = new DreamFactoryDataStore(
        "https://dsp-mydsp.mycompany.dreamfactory.com/",
        "ORM", 
        "MyUID",
        "MyPWD");

    if (!destination.StoreExists)
    {
        destination.CreateStore();
    }

    // build a replicator to send data to the destiantion store
    var replicator = new Replicator(destination, ReplicationBehavior.ReplicateAndDelete);

    // replication is opt-in, so tell it what type(s) we want to replicate
    replicator.RegisterEntity<TestItem>();

    // add the replicator to the source
    source.Replicators.Add(replicator);

    // watch an event for when data batches go out
    replicator.DataReplicated += delegate
    {
        // get a count
        Debug.WriteLine(string.Format("Sent {0} rows", replicator.GetCount<TestItem>()));
    };

    var rows = 200;

    // put some data in the source
    for (int i = 0; i < rows; i++)
    {
        var item = new TestItem(string.Format("Item {0}", i));
        source.Insert(item);
    }

    int remaining = 0;
    // loop until the source table is empty
    do
    {
        Thread.Sleep(500);
        remaining = source.Count<TestItem>();
    } while(remaining > 0);

    // make sure the destination has all rows
    Assert.AreEqual(rows, destination.Count<TestItem>());
}

Sending M2M data to The Cloud

If you’re doing M2M work, it’s a pretty good bet that at some point you’ll need to send data off of a device for storage somewhere else (it better not be all of the data you have, though!).  Maybe it’s off to a MySQL server inside your network.  Maybe it’s off to The Cloud.  Regardless, you should expect that the storage location requirement could change, and that you might even need to send data to multiple locations.  What you should not do is code in a hard dependency on any particular storage form.  From your app’s perspective storage should be a transparent service.  Your app should say “Hey storage service, here’s some aggregated data.  Save it for me,” and that should be it.  The app shouldn’t even tell it “save it to The Cloud” or “save it to a local server.”  It should be up to the service to determine where the data should go, and that should be easily configurable and changeable.

This is pretty easy to do with an ORM, and naturally I think that the OpenNETCF ORM is really well suited for the job (I may be biased, but I doubt it).  It supports a boatload of storage mechanisms, from local SQLite to enterprise SQL Server to the newest Dream Factory DSP cloud.  And the code to actually store the data doesn’t change at all from the client perspective.

For example, let’s say I have a class called Temperatures that holds temps for several apartments that I’m monitoring.  Using the ORM, this is what the code to store those temperature from Windows CE to a local SQL Compact database would look like:

store.Insert(currentTemps);

This is what the code to store those temperature from Wind River Linux running Mono to an Oracle database would look like:

store.Insert(currentTemps);

And this is what the code to store those temperature from Window Embedded Standard to the Dream Factory DSP cloud would look like:

store.Insert(currentTemps);

Notice any pattern? They key here is to decouple your code. Make storage a ubiquitous service that you configure once, and you can spend your time writing code that’s interesting and actually solves your business problem.

New ORM Implementation: Dream Factory

As the chief architect for the Solution Family, I’m always s looking for new places to store M2M data.  I like to investigate anything that might make sense for either ourselves or our customers.  A few months ago I came across Dream Factory, which is an open-source, standards-based Cloud Services Platform.  They provide more than just storage, but my first foray was into the storage side of things.

First I had to understand exactly how to use their REST APIs to do the basic CRUD operations.  On the surface, their API documentation was cool, but it actually lacked some of the fundamental pieces of information on how to initiate a session and start working.  I suspect that this is because they’re small and new and let’s face it, documentation isn’t that fun to generate.  Fortunately, their support has been fantastic – I actually can’t praise it enough.  Working with their engineers I was able to piece together everything necessary to build up a .NET SDK to hit their DSP service. To be fair, the documentation has also improved a bit since I started my development as well, so several of the questions I had have been clarified for future developers.  Again, this points to their excellent support and reacting to customer feedback.

Once I had a basic SDK, I then wrapped that in an implementation of the ORM that, for now, supports all of your basic CRUD operations.  The code is still wet, doesn’t support all ORM features, and is likely to be a bit brittle still, but I wanted to get it published in case anyone else wanted to start playing with it as well. So far I’m pretty pleased with it and once I have it integrated into Solution Engine, it will get a whole lot of use.

I’ve published the SDK and the ORM implementations up in CodePlex, all under the OpenNETCF.DreamFactory branch of the source tree.  Head over to Dream Factory to learn more and set up your free DSP account, then jump in.

It’s worth noting that at this point it’s a Windows Desktop implementation only.  Mono, Compact Framework and Windows Phone will follow at some point.

Stop Using “The Cloud”

We’ve all been in a meeting, conference or seminar where the talk goes something along these lines:

“We collect the data at the device, then we send it up to The Cloud where we can do analysis and other things on it.”

This is usually accompanies by the vague graphic of a line – sometimes solid, sometimes dotted – pointing to the generic PowerPoint cloud image.

I hate this statement.  I hate this cloud.  This is usually a huge red flag that the speaker or author really has no idea what they’re talking about.  It a sales or marketing cop-out that is the technical equivalent of the popular “then a miracle happens” cartoon.

What does “send it to The Cloud” even mean?  What cloud?  How is the data going to get there?  How will I get it back out? I’ve got other questions about “the cloud” that I’ll address in another post, but these three are the biggies that you should always raise your hand and ask.

1. What is “The Cloud?”

Here’s the little secret – there is no single “cloud”.  There are a boatload of them, and the list seems to be continually growing as it gets cheaper to stand up servers with a lot of storage and as more and more use-cases make use of off-device storage.  “The Cloud” can mean a whole panoply of things, including (but definitely not limited to):

Again there are plenty of others, these are just most of the ones I’ve dealt with in the past year or so.

And bear in mind that not all customers are going to be amenable to all clouds.  Some customers aren’t so comfortable putting their data on servers they don’t control.  Some aren’t comfortable putting their data on servers in particular countries where governmental agencies are known to mine them.  Some customers simply have predispositions to different services for different reasons.

Maybe they like Azure because it provides a simple interface for their .NET enterprise apps.  Maybe they like Amazon’s scale.  Maybe they like Brand X just because they do.  The point is that if you have more than one customer, you’re probably going to need to look at more than one cloud provider.  We’ve got customers that use multiple clouds due to the benefits each provides for certain types of data, data retention policies or total cost of use.

2. How do I get my data into “The Cloud”?

Yeeaaahhh, about that…since there are a boatload of cloud services, there are a boatload of ways that an app gets data into them.  Network databases might use ODBC or a proprietary/specific API set.  Services on the web typically use a web service interface.  Maybe REST, maybe OData, maybe something else.  The point here is none of them are the same.  None of them.

So that means that if you have to support multiple clouds, and you will have to support multiple API sets.  Of course you can abstract them all back to a common interface – I’d certainly recommend it, it’s what we do using the OpenNETCF ORM – but there’s still work to be done to actually implement each.

3. How do I get my data back out of “The Cloud”?

Just like putting the data in, getting it out requires an API, and again, they are all different.  Another thing to consider in the “getting it out” part of the equation is how you actually use it.

Some clouds allow you to run services and/or applications on the servers as well and data access is direct.  Sometimes you have to pull it out back to another server.  Once again, this means more work from a development perspective.  And again, you’ve got to multiply that by the clouds you need to support.

And how about data retention? Some clouds are not “forever” storage.  The data gets purged on a time basis.  If you need to keep the data archived for later data mining then add that work onto your plate too.

So the next time you see that slide with the cloud on it and the speaker says “just send the data to the cloud” raise your hand.  Take them to task over it.  We build software, and while that seems like magic to some, we don’t do miracles.

OpenNETCF ORM Implementation Update

I’m been maintaining and expanding the OpenNETCF ORM code base for quite some time now and it’s becoming pretty robust.  We dogfood it heavily and have a variety of application installs using it for all sorts of things, from local apps to M2M solutions.  One key tenet I’ve been following is that I opt for portability over expansive features support. Some features that you might think an ORM would have are very difficult to do in a generic way that would support both RDBMS systems and object or cloud databases.  It becomes easier for an application to do those relationships, or for you to (gasp) denormalize your data. For example, composite primary keys are a common request, but it’s a pretty complex thing to implement for an RDBMS, and for an object database, it’s a friggin’ nightmare.  It’s a lot easier for me to go do a whole new store implementation and just tell users that they should use surrogate keys.  We’re not the only ORM that feels that way, and honestly, I think composite keys are generally a bad idea anyway.

Features have largely been need driven, and by “need” I mean what I need at any given time.  I’ve also taken some time to experiment with different backing stores, and it’s led me to have a variety of implementations in different states of “doneness.”  For example, I have SQL Compact, Oracle and SQLite at a point I’d call complete, but I have a variety of others that aren’t quite so done.  Some are in the public source tree on CodePlex, some haven’t found there way there yet, but probably will when they get further along.

Here’s a complete list of implementations I have worked on, and a rough guess on state of completion.  If you’d like to se me work on any one in particular, let me know:

A majority of these (I think all but Oracle) work cross-platform, meaning I’ve tested them on big Windows, Windows CE and Mono under Linux.

A key point here, beyond the cross-platform capability (which was no small effort), is the fact that identical data access code in your application can perform the good old standard CRUD operations on *any* of those data stores.  The only code changes needed are setup bits (providing credentials, file paths, etc) that are specific to each store type.  Show me any other existing ORM that has even close to this kind of coverage.

OpenNETCF ORM: Dynamic Entities

There’s no doubt in my mind that code libraries and frameworks are fantastic for saving time and work, after all I want to spend time solving my business problem, not writing infrastructure.  Nowhere is this more true than it is with Object-Relational Mapping, or ORM, libraries.

Broadly speaking, if you’re still writing application code that requires that you also write SQL, you’re wasting your time.  Wasting time thinking about SQL syntax.  Wasting time writing it.  Wasting time testing, debugging and maintaining it.

I believe this so much, but was so dissatisfied with any existing ORM offering, that I wrote my own ORM. It wasn’t a trivial task, but I have something that does exactly what I need, on the platforms I need, and does it at a speed that I consider more than acceptable.  Occasionally I hit a data storage requirement that ORMs, even my own, aren’t so good at.

ORM usage is usually viewed as one of two approaches: code first or data first.  With the code-first approach, a developer defines the storage entity classes and the ORM generates a backing database from them.  With data first, the developer feeds a database into the ORM or an ORM tool, and it then generates the entity classes for you.

But this doesn’t cover all scenarios – which is something no ORM that I’m aware of seems to acknowledge.  Consider the following use-case (and this is a real-world use case that I had to design for, not some mental exercise).

I have an application that allows users to define storage for data at run time in a completely ad-hoc manner.  They get to choose what data items they want to save, but even those data items are dynamically available so they are available only at run time.

So we need to store effectively a flat table of data with an unknown set of columns.  The column names and data types are unknown until after the application is running on the user’s machine.

So the entities are neither data first nor code first.  I’ve not thought of a catchy term for these types of scenarios, so for now I’ll just call it “user first” since the user has the idea of what they want to store and we have to accommodate that.  This is why I created support in the OpenNETCF ORM for the DynamicEntity.

Let’s assume that the user decides they wanted to store a FirstName and LastName for a person.  For convenience, we also want to to store a generated ID for the Person entities that get stored.

At run time, we generate some FieldAttributes that define the Person:

var fieldList = new List();
fieldList.Add(new FieldAttribute()
{
    FieldName = "ID",
    IsPrimaryKey = true,
    DataType = System.Data.DbType.Int32
});

fieldList.Add(new FieldAttribute()
{
    FieldName = "FirstName",
    DataType = System.Data.DbType.String
});

fieldList.Add(new FieldAttribute()
{
    FieldName = "LastName",
    DataType = System.Data.DbType.String,
    AllowsNulls = false
});

And then we create and register a DynamicEntityDefinition with the DataStore:

var definition = new DynamicEntityDefinition(
                              "Person", 
                              fieldList, 
                              KeyScheme.Identity);
 
store.RegisterDynamicEntity(definition);

Now, any time we want to store an entity instance, we simply create a DynamicEntity and pass that to the Insert method, just like any other Entity instance, and the ORM handles storage for us.

var entity = new DynamicEntity("Person");
entity.Fields["FirstName"] = "John";
entity.Fields["LastName"] = "Doe";
store.Insert(entity);
 
entity = new DynamicEntity("Person");
entity.Fields["FirstName"] = "Jim";
entity.Fields["LastName"] = "Smith";
store.Insert(entity);

The rest of the CRUD operations are similar, we simply have to name the definition type where appropriate.  For example, retrieving looks like this:

var people = store.Select("Person")

Updating like this:

var person = people.First();
person.Fields["FirstName"] = "Joe";
person.Fields["LastName"] = "Satriani";
store.Update(person);

And Deleting like this

store.Delete("Person", people.First().Fields["ID"]);

We’re no longer bound by the either-or box of traditional ORM thinking, and it leads to offering users some really interesting and powerful capabilities that before were relegated to only those who wanted to abandon an ORM and hand-roll the logic.