Storing Cloud Data Over Transient Connections

An interesting challenge in many M2M scenarios is that your network connection is often far from good. If you’re trying to collect Engine data from an over-the-road truck going 80 miles an hour across rural Montana it’s a pretty safe bet that you’re going to have places where you have no access to a network. Even in urban areas you have dead spots, plus there’s always the old “driver put a tuna can over the antenna” scenarios to throw a wrench into things. Just because we lose our connection doesn’t mean we should start throwing data onto the floor though. We need a data storage mechanism that’s robust enough to deal with these kind of problems.

What you need is a local data store for the times when you don’t have connectivity and a remote store when you do. Or maybe a local store that does store-and-forward or replication. Yes, you could roll your own data storage service that can do these things, but why would you when there’s a perfectly good, already written solution out there? Again, you should be abstracting your application’s data services so you can focus on the business problems you’re good at. Solve the problems your company is hired to solve – not the grunt work of putting data into storage.

I added a new feature today to the OpenNETCF ORM called Replication (it’s only in the source downloads right now, not the latest release). A Replicator attaches to any DataStore and ties it to any other DataStore. It doesn’t matter what the actual storage is – there’s the beauty of abstraction, it works with any supported stores – it can take data from one store and push it to anther behind the scenes for you. So you can store to a local SQLite data file and have a Replicator push that data off to an Azure table. And it requires no change in your data Insert logic at all. Zero.

Currently Replicators are simplistic in capability. They can only replicate Inserts, and they only do a “Replicate and Delete” meaning that during replication the data is “moved” from the local store to the remote store, but that’s typically all you need and the typical case is all I’m trying to solve in the first pass.

So what does it look like, you ask? Below is an example of a working test that stores locally to a SQL Compact database, and when the network is up, those rows get moved off to a DreamFactory Cloud table. Notice that the only “new” thing you do here is to define the DataStore where the replicated data goes, you define which Entities will get replicated (it’s opt-in or a per-table basis), and you add the Replicator to the source DataStore’s new Replicators collection (lines 11-29). Yes, that means you could even replicate different tables to different target Stores.

[TestMethod()]
public void BasicLocalReplicationTest()
{
    var source = new SqlCeDataStore("source.sdf");
    if (!source.StoreExists)
    {
        source.CreateStore();
    }
    source.AddType<TestItem>();

    var destination = new DreamFactoryDataStore(
        "https://dsp-mydsp.mycompany.dreamfactory.com/",
        "ORM", 
        "MyUID",
        "MyPWD");

    if (!destination.StoreExists)
    {
        destination.CreateStore();
    }

    // build a replicator to send data to the destiantion store
    var replicator = new Replicator(destination, ReplicationBehavior.ReplicateAndDelete);

    // replication is opt-in, so tell it what type(s) we want to replicate
    replicator.RegisterEntity<TestItem>();

    // add the replicator to the source
    source.Replicators.Add(replicator);

    // watch an event for when data batches go out
    replicator.DataReplicated += delegate
    {
        // get a count
        Debug.WriteLine(string.Format("Sent {0} rows", replicator.GetCount<TestItem>()));
    };

    var rows = 200;

    // put some data in the source
    for (int i = 0; i < rows; i++)
    {
        var item = new TestItem(string.Format("Item {0}", i));
        source.Insert(item);
    }

    int remaining = 0;
    // loop until the source table is empty
    do
    {
        Thread.Sleep(500);
        remaining = source.Count<TestItem>();
    } while(remaining > 0);

    // make sure the destination has all rows
    Assert.AreEqual(rows, destination.Count<TestItem>());
}

Sending M2M data to The Cloud

If you’re doing M2M work, it’s a pretty good bet that at some point you’ll need to send data off of a device for storage somewhere else (it better not be all of the data you have, though!).  Maybe it’s off to a MySQL server inside your network.  Maybe it’s off to The Cloud.  Regardless, you should expect that the storage location requirement could change, and that you might even need to send data to multiple locations.  What you should not do is code in a hard dependency on any particular storage form.  From your app’s perspective storage should be a transparent service.  Your app should say “Hey storage service, here’s some aggregated data.  Save it for me,” and that should be it.  The app shouldn’t even tell it “save it to The Cloud” or “save it to a local server.”  It should be up to the service to determine where the data should go, and that should be easily configurable and changeable.

This is pretty easy to do with an ORM, and naturally I think that the OpenNETCF ORM is really well suited for the job (I may be biased, but I doubt it).  It supports a boatload of storage mechanisms, from local SQLite to enterprise SQL Server to the newest Dream Factory DSP cloud.  And the code to actually store the data doesn’t change at all from the client perspective.

For example, let’s say I have a class called Temperatures that holds temps for several apartments that I’m monitoring.  Using the ORM, this is what the code to store those temperature from Windows CE to a local SQL Compact database would look like:

store.Insert(currentTemps);

This is what the code to store those temperature from Wind River Linux running Mono to an Oracle database would look like:

store.Insert(currentTemps);

And this is what the code to store those temperature from Window Embedded Standard to the Dream Factory DSP cloud would look like:

store.Insert(currentTemps);

Notice any pattern? They key here is to decouple your code. Make storage a ubiquitous service that you configure once, and you can spend your time writing code that’s interesting and actually solves your business problem.

New ORM Implementation: Dream Factory

As the chief architect for the Solution Family, I’m always s looking for new places to store M2M data.  I like to investigate anything that might make sense for either ourselves or our customers.  A few months ago I came across Dream Factory, which is an open-source, standards-based Cloud Services Platform.  They provide more than just storage, but my first foray was into the storage side of things.

First I had to understand exactly how to use their REST APIs to do the basic CRUD operations.  On the surface, their API documentation was cool, but it actually lacked some of the fundamental pieces of information on how to initiate a session and start working.  I suspect that this is because they’re small and new and let’s face it, documentation isn’t that fun to generate.  Fortunately, their support has been fantastic – I actually can’t praise it enough.  Working with their engineers I was able to piece together everything necessary to build up a .NET SDK to hit their DSP service. To be fair, the documentation has also improved a bit since I started my development as well, so several of the questions I had have been clarified for future developers.  Again, this points to their excellent support and reacting to customer feedback.

Once I had a basic SDK, I then wrapped that in an implementation of the ORM that, for now, supports all of your basic CRUD operations.  The code is still wet, doesn’t support all ORM features, and is likely to be a bit brittle still, but I wanted to get it published in case anyone else wanted to start playing with it as well. So far I’m pretty pleased with it and once I have it integrated into Solution Engine, it will get a whole lot of use.

I’ve published the SDK and the ORM implementations up in CodePlex, all under the OpenNETCF.DreamFactory branch of the source tree.  Head over to Dream Factory to learn more and set up your free DSP account, then jump in.

It’s worth noting that at this point it’s a Windows Desktop implementation only.  Mono, Compact Framework and Windows Phone will follow at some point.

Responsible M2M

About a year ago, maybe two years now, we had a large manufacturing customer that we were working with to implement MTConnect on their production floor. Basically they had 20 five-axis machine tools running creating aircraft parts and they wanted to be able to get data off of those machines and “put it in The Cloud.” Well, first off I’ve talked about how much I dislike the term “The Cloud” so we had to clarify that. Turns out they meant “in a SQL Server database on a local server.”

MTConnect is a machine tool (hence the “MT” part) standard that we leveraged and heavily extended for use in our Solution Family products. Painting it with a broad brush, what it means is that all data from each machine tool – axis positions, part program information, door switches, coolant temperature, run hours, basically the kitchen sink – can be made available through a REST service running either on the machine tool or on a device connected to it.

They wanted to take that data and put it into SQL Server so their engineering group could run analytics on the data. Maybe they wanted to look at part times, energy consumption, tool path length, whatever. They actually weren’t fully sure what they wanted to do with the data, they just knew that “in The Cloud” is where everyone said it should be, so the commandment came down that that’s where the data would go.

Ugh. The conversation went something like this.

“So you want all of the data from each machine tool to go to the server?”

“Yes. Absolutely.”

“You know that there are 6 continually moving axes on those machines, right? And a constantly changing part program.”

“Of course. That’s the data we want.”

“You are aware that that’s *a lot* of data, right?”

“Yes. We want it.”

“You’re sure about this?”

“Yes, we’re sure. Send the data to The Cloud.”

So we set up a mesh of Solution Engines to publish *all* of the data from *all* of the machines to their local server. We turned on the shop floor. And roughly 20 seconds later the network crashed. This was a large, well built, very fast, hard-wired network. There was a lot of available bandwidth. But we were generating more than a lot of data, and the thing puked, and puked fast.

So what’s the lesson here? That you can always generate more data out at the edge of your system than the infrastructure is capable of carrying. If you’re implementing the system for yourself, trying to transfer all of the data is a problem, but if you’re implementing it for a customer, trying to transfer all of it is irresponsible. We did it in a closed system that was just for test, knowing what the result would be and that it would be non-critical (they simply turned off data broadcasting and everything went back to normal), but we had to show the customer the problem. They simply wouldn’t be told.

We need to do this thing, this M2M, IoT, Intelligent Device Systems or whatever you want to call it responsibly. Responsible M2M means understanding the system. It means using Edge Analytics, or rules running out at the data collection nodes, to do data collection, aggregation and filtering. You cannot push all of the data into remote storage, no matter how badly you or your customer might think it’s what needs to happen.

But that’s fine. Most of the time you don’t need all of the data anyway, and if, somehow, you do there are still ways you can have your cake and eat it too.

Let’s look at a real-world example. Let’s say we have a fleet of municipal busses. These busses drive around all day long on fixed routes, pickup up and dropping off people. These busses are nodes that can collect a lot of data. They have engine controller data on CAN or J1708. They have on-board peripherals like fare boxes, head signs and passenger counters. The have constantly changing positional data coming from GPS and/or dead-reckoning systems. They’re also moving, so they can’t be wired into a network.

Well we could send all of that data to “The Cloud”, or at least try it, but not only would it likely cause network problems, think of the cost. Yes, if you’re AT&T, Verizon or one of the mobile carriers, you’ve just hit pay dirt, but if you’re the municipality the cost would be astronomical. Hello $20 bus fares.

What’s the solution here? Well, first of all there’s a load of data that we have that’s near useless. The engine temperature, RPMs or oil pressure (or any of the other of the thousands of data points available from the engine controller) might fluctuate, but generally we don’t care about that data. We care about it only when it’s outside of a “normal” range. So we need Edge Analytics to be able to watch the local data, measure it, and react when some conditions are met. This means we can’t just use a “dumb” device that grabs data from the controller and forwards it on. Instead we need an Intelligent Device – maybe an Intelligent Gateway (a device with a modem) – that is capable of running logic.

Now when we’re out of the “normal” range, what do we do? Maybe we want to just store that data locally on the vehicle in a database and we can download it at the end of the shift when the vehicle returns to the barn. Maybe we want to send just a notification back to the maintenance team to let them know there’s a problem. Maybe we want to send a capture of a lot of a specific set of data immediately off to some enterprise storage system for further analysis so the maintenance team can order a repair part or send out a replacement vehicle. It depends on the scenario, and that scenario may need to change dynamically based on conditions or the maintenance team’s desires.

Positional data is also ever-changing, but do we need *all* of it? Maybe we can send it periodically and it can provide enough information to meet to data consumer’s needs. Maybe once a minute to update a web service allowing passengers to see where the bus is and how long it will be until it arrives at a particular spot. Or the device could match positional data against a known path and only send data when it’s off-route.

And remember, you’re in a moving vehicle with a network that may or may not be available at any given time. So the device has to be able to handle transient connectivity.

The device also needs to be able to affect change itself. For a vehicle maybe it puts the system into “limp mode” to allow the vehicle to get back to the barn and not be towed. For a building maybe it needs to be able to turn on a boiler.

The point here is that when you’re developing your Intelligent Systems you have to do it with thought. I’d say that it’s rare that you can get away with a simple data-forwarding device. You need a device that can:

– Run local Edge Analytics
– Store data locally
– Filter and aggregate data
– Run rules based on the data
– Function with a transient or missing network
– Effect change locally

Intelligent Systems are great, but they still need to be cost-effective and stable. They also should be extensible and maintainable. You owe it to yourself and your customer to do M2M responsibly.

Of course if you want help building a robust Intelligent System, we have both products and services to help you get there and would be happy to help. Just contact us.

Stop Using “The Cloud”

We’ve all been in a meeting, conference or seminar where the talk goes something along these lines:

“We collect the data at the device, then we send it up to The Cloud where we can do analysis and other things on it.”

This is usually accompanies by the vague graphic of a line – sometimes solid, sometimes dotted – pointing to the generic PowerPoint cloud image.

I hate this statement.  I hate this cloud.  This is usually a huge red flag that the speaker or author really has no idea what they’re talking about.  It a sales or marketing cop-out that is the technical equivalent of the popular “then a miracle happens” cartoon.

What does “send it to The Cloud” even mean?  What cloud?  How is the data going to get there?  How will I get it back out? I’ve got other questions about “the cloud” that I’ll address in another post, but these three are the biggies that you should always raise your hand and ask.

1. What is “The Cloud?”

Here’s the little secret – there is no single “cloud”.  There are a boatload of them, and the list seems to be continually growing as it gets cheaper to stand up servers with a lot of storage and as more and more use-cases make use of off-device storage.  “The Cloud” can mean a whole panoply of things, including (but definitely not limited to):

Again there are plenty of others, these are just most of the ones I’ve dealt with in the past year or so.

And bear in mind that not all customers are going to be amenable to all clouds.  Some customers aren’t so comfortable putting their data on servers they don’t control.  Some aren’t comfortable putting their data on servers in particular countries where governmental agencies are known to mine them.  Some customers simply have predispositions to different services for different reasons.

Maybe they like Azure because it provides a simple interface for their .NET enterprise apps.  Maybe they like Amazon’s scale.  Maybe they like Brand X just because they do.  The point is that if you have more than one customer, you’re probably going to need to look at more than one cloud provider.  We’ve got customers that use multiple clouds due to the benefits each provides for certain types of data, data retention policies or total cost of use.

2. How do I get my data into “The Cloud”?

Yeeaaahhh, about that…since there are a boatload of cloud services, there are a boatload of ways that an app gets data into them.  Network databases might use ODBC or a proprietary/specific API set.  Services on the web typically use a web service interface.  Maybe REST, maybe OData, maybe something else.  The point here is none of them are the same.  None of them.

So that means that if you have to support multiple clouds, and you will have to support multiple API sets.  Of course you can abstract them all back to a common interface – I’d certainly recommend it, it’s what we do using the OpenNETCF ORM – but there’s still work to be done to actually implement each.

3. How do I get my data back out of “The Cloud”?

Just like putting the data in, getting it out requires an API, and again, they are all different.  Another thing to consider in the “getting it out” part of the equation is how you actually use it.

Some clouds allow you to run services and/or applications on the servers as well and data access is direct.  Sometimes you have to pull it out back to another server.  Once again, this means more work from a development perspective.  And again, you’ve got to multiply that by the clouds you need to support.

And how about data retention? Some clouds are not “forever” storage.  The data gets purged on a time basis.  If you need to keep the data archived for later data mining then add that work onto your plate too.

So the next time you see that slide with the cloud on it and the speaker says “just send the data to the cloud” raise your hand.  Take them to task over it.  We build software, and while that seems like magic to some, we don’t do miracles.

Intel’s New Quark Core

This week Intel announced their new processor core named Quark.  It is smaller than their current embedded-focused Atom core (hence the name – smaller than an atom is a quark) and more importantly it uses about 10% of the power of an Atom.  We can probably assume it also produces a lot less heat, so your embedded devices will no longer double as a Panini press.

Intel has, unfortunately, been pretty vague about the Quark, so I’ll have to remain equally vague about some things.  We don’t know exactly when we’ll be able to actually buy processors using the Quark core (Q4 for general availability of evaluation systems?).  We don’t know what price point they are targeting (I’ve seen guesses at the $5 range).  We can be pretty sure that with the market we’re in that the definitive answers will be “soon” and “low.”

So what do we know, then?  Well, it’s x86 (so 32-bit) and probably single core. What?!  I can hear the screams now.  In fact reading the comments on several other tech sites, there seems to be a lot of furor about how it can’t compete with ARM processors shipping in phones and tablets today and how 32-bit, single-core architecture is so 1990’s and useless in today’s landscape.

I think those people are totally missing the point.  This isn’t a processor for a phone or tablet.  Intel has even said it isn’t.  Quit trying to place it into the devices you think of.  This baby is designed for the Internet of Things (IoT) and M2M, and I think it’s going to be a game changer.

M2M – and I’m just going to call it that for now, instead of IoT or the acronym I saw this week IoE (internet of everything. seriously)  – is growing.  It looks like it’s the next wave of “things to do” and happily, we’ve been doing it for a decade.

Quark is going to enable all of those applications using 16- and 32-bit microcontrollers to run full-blown OSes.  That means they’ll have access to connectivity.  That means they’ll be able to do local analytics and run local rules. It means they’ll be able to push data upstream to clouds.  It means they’ll start participating in overall solutions. That also means they’ll need security, but they’ll have the capacity to implement it.

The core itself is also synthesizable, meaning it’s “open”.  No, it’s not that anyone can go in and change the actual processor core, it’s not that open, but they can change the fabric, meaning they can build their own SoC with the Quark core directly wired to peripheral components like radios and crypto devices to further reduce cost and footprint.

I’m confident that we’ll have Solution Engine running on a Quark system very soon and it will be interesting to see how it performs compared to the Atom and ARM systems we’re already running on.

What I’d really love to see is someone building a Windows CE OS for it to give us low-latency, real-time capabilities coupled with the familiar Win32 API.  Since it’s still x86, that’s not a big stretch.