Today we created a new open source project to host InSTEDD's efforts on data meshes.
The goal for the project is to provide libraries, tools and applications that simplify using standards-based data meshes. Our contributions will be based on the requirements observed in global health, community development and humanitarian aid.
You can find the project here: http://code.google.com/p/mesh4x/
Our first contribution to the project consists of some libraries that implement the FeedSync specification, an open standard that describes version vectors, and processes for conflict detection and conflict preservation. FeedSync also happens to be one of the underpinnings of Microsoft's consumer-targeted Live Mesh, but could be used happily on any platform as it's based on extensions to RSS and ATOM - an obvious idea is to build a Feedsync javascript adapter for Google Gears).
Because of the project's emphasis on standards, we structured the source tree so it would host implementations in more than one platform and language. 'Mesh4x' has 2 starter source code folders - Mesh4j (Java), Mesh4n (.NET - a large C# contribution done by Clarius Labs and the Microsoft XML MVPs who already had an open source version, unit tests and all). We hope to eventually see Mesh4php, Mesh4r (Ruby) and so on.
It's a start. InSTEDD's work in SE Asia in addition to the input of humanitarian aid agencies and other providers of technology for social good will be the drivers behind our contributions. We expect work in these areas:
- Adapters to different stores (e.g. MySQL, or application-specific formats, such as KML), for servers & clients, and the ETL (extract, transform, load) that goes at heterogeneous endpoints. I heard a great idea today for building open-source VMs that run on Amazon's cloud hosting.
- Implementations that work on mobile devices (for example, we are currently refactoring the Java library to run in J2ME)
- Support for different transports (plain XML over files, or HTTP is a start, but there are optimizations that can be done for low bandwidth, no-Internet scenarios, or integrating with a transport mesh like WASTE).
- Integrating implementations with standards-based authentication and data signing approaches.
- ..your contributions!
Applicability for Humanitarian & Health Scenarios
So why are we at InSTEDD interested in this? Data meshes have some interesting properties:
- Symmetrical: They allow data to exist in a concurrent multi-master environment where updates can be applied at any node in the mesh.
- Asynchronous: They allow offline updates to information and synchronization with other nodes without requiring data locks, essential for occasionally connected applications.
- Dynamic: The synchronization can happen even in constantly changing connectivity topologies. I can sync to a server and later the sync can be done between my client and another client, who could then sync with another server if the first one is there, and so on.
These properties make them very suitable for humanitarian, crisis, and health care environments, where information sharing, data system integration, and technologies that assist politically neutral solutions are beneficial. For example:
- Symmetry allows you to have two audiences work on the same data through different applications, with no application being the 'master'. You can also have data sharing of sensitive information between countries or organizations with no country hosting more or less data than the other. See Mary Jane's post about the NGOs in Cambodia, to understand how important this symmetry and neutrality can be. It also allows data to move around a user independently of the device it's been created on.
- Asynchronicity allows work to happen in environments where data connections are unavailable, bandwidth is low, or the only 'transport' is a USB stick.
- Dynamism allows the field teams to share data amongst themselves and servers as early as possible. Unlike email, there is no need to wait for connectivity to a specific server to let the information free.
One idea could be to add data mesh capabilities to Sahana, allowing any instance running on a server or laptop to edit the information and 'sync' both ways with any other server or laptop. We have also heard scenarios where users of FrontlineSMS could synchronize information amongst themselves. If anyone is interested we'd happily work with you to see how to approach this..
Come and participate - lets share our scenarios, ideas, and code here: http://code.google.com/p/mesh4x/

5 comments:
Hi,
"One idea could be to add data mesh capabilities to Sahana.."
You should look at Martus... It is also in the humanitarian background and they have developed their way to synchronize data.
Martus automatically copies and backs up bulletins that have been saved to a designated Martus Server when the computer is connected to the Internet.
Thanks! I've never had to use Martus myself but I am familiar with many who have. What mesh4x gives is the ability to have multi-master edits, for example, you could have many Martus 'repositories' synchronized with each other, or folks working on a KML autmatically synchronizing to a secure martus server. We'll be scouting if folks would want a Martus adapter, which would allow you to sync any other application (that there is an adapter for) into it (so far we have adapters for databases, KML..). We talked with Benetech folks too some weeks ago but we hadn't started mesh4x yet back then
Hi Ed,
Do you also know about this proposal ?
http://www.brightearthproject.org/?p=19
From the Sky -Hope all is well up there. Saw the link just learnt from this particular effort but heard of other related ones - has there been any progress on it? Could it benefit from yahoo's new geo-services too? The role of mesh4x is in making bridges between apps, and this could be one endpoint. are you inolved in bigearth?
Eduardo --
Met you in Durban. I have finally gotten around to looking at FeedSync and related data mesh stuff.
Have you made any progress with MySQL adapter?
I have questions about the use of data mesh for DB synchronization:
Performance -- seems like you would need to have as items (table, key, varname, value) sets, which would make for a lot of traffic.
Need for transactions -- either all updates from a single endpoint for a set of (table, key) pairs should be unconflicted or all should be conflicted.
Need for programmatic conflict resolution -- my experience with health data is that metadata propagates outward from central office while data propagates inward. Is that any help toward an algorithm?
Mesh equivalent of deadly embrace -- suppose that central is trying to change the health district to which a facility is assigned at the same time the facility is trying to report results to the district, are we fully protected by the (table, key, varname, value) item? Or do we end up with all sorts of relationship tables mediating between variables and their value lists?
Please reply to rdf4 [at] cdc [dot] gov or post here and tell me to look.
Thanks, Roger
Post a Comment