Wednesday, April 23, 2008

Mesh4x: New Open Source Project for Data Meshes

Today we created a new open source project to host InSTEDD's efforts on data meshes.

The goal for the project is to provide libraries, tools and applications that simplify using standards-based data meshes. Our contributions will be based on the requirements observed in global health, community development and humanitarian aid.

You can find the project here: http://code.google.com/p/mesh4x/

Our first contribution to the project consists of some libraries that implement the FeedSync specification, an open standard that describes version vectors, and processes for conflict detection and conflict preservation. FeedSync also happens to be one of the underpinnings of Microsoft's consumer-targeted Live Mesh, but could be used happily on any platform as it's based on extensions to RSS and ATOM - an obvious idea is to build a Feedsync javascript adapter for Google Gears).

Because of the project's emphasis on standards, we structured the source tree so it would host implementations in more than one platform and language. 'Mesh4x' has 2 starter source code folders - Mesh4j (Java), Mesh4n (.NET - a large C# contribution done by Clarius Labs and the Microsoft XML MVPs who already had an open source version, unit tests and all). We hope to eventually see Mesh4php, Mesh4r (Ruby) and so on.

It's a start. InSTEDD's work in SE Asia in addition to the input of humanitarian aid agencies and other providers of technology for social good will be the drivers behind our contributions. We expect work in these areas:

  • Adapters to different stores (e.g. MySQL, or application-specific formats, such as KML), for servers & clients, and the ETL (extract, transform, load) that goes at heterogeneous endpoints. I heard a great idea today for building open-source VMs that run on Amazon's cloud hosting.
  • Implementations that work on mobile devices (for example, we are currently refactoring the Java library to run in J2ME)
  • Support for different transports (plain XML over files, or HTTP is a start, but there are optimizations that can be done for low bandwidth, no-Internet scenarios, or integrating with a transport mesh like WASTE).
  • Integrating implementations with standards-based authentication and data signing approaches.
  • ..your contributions!

Applicability for Humanitarian & Health Scenarios

So why are we at InSTEDD interested in this? Data meshes have some interesting properties:

  • Symmetrical: They allow data to exist in a concurrent multi-master environment where updates can be applied at any node in the mesh.
  • Asynchronous: They allow offline updates to information and synchronization with other nodes without requiring data locks, essential for occasionally connected applications.
  • Dynamic: The synchronization can happen even in constantly changing connectivity topologies. I can sync to a server and later the sync can be done between my client and another client, who could then sync with another server if the first one is there, and so on.

These properties make them very suitable for humanitarian, crisis, and health care environments, where information sharing, data system integration, and technologies that assist politically neutral solutions are beneficial. For example:

  • Symmetry allows you to have two audiences work on the same data through different applications, with no application being the 'master'.  You can also have data sharing of sensitive information between countries or organizations with no country hosting more or less data than the other. See Mary Jane's post about the NGOs in Cambodia, to understand how important this symmetry and neutrality can be. It also allows data to move around a user independently of the device it's been created on.
  • Asynchronicity allows work to happen in environments where data connections are unavailable, bandwidth is low, or the only 'transport' is a USB stick.
  • Dynamism allows the field teams to share data amongst themselves and servers as early as possible. Unlike email, there is no need to wait for connectivity to a specific server to let the information free.

One idea could be to add data mesh capabilities to Sahana, allowing any instance running on a server or laptop to edit the information and 'sync' both ways with any other server or laptop. We have also heard scenarios where users of FrontlineSMS could synchronize information amongst themselves. If anyone is interested we'd happily work with you to see how to approach this..

Come and participate - lets share our scenarios, ideas, and code here: http://code.google.com/p/mesh4x/

Monday, April 21, 2008

SMS Applications and Microformats - lots of work to do!

I got a comment at this weekend's Alt.Net conference - which was echoed in mikel's blog - about us not using a location microformat in the Friends Nearby and GeoChat proof of concept applications on the InSTEDD site.

In retrospect, it would have been nice to have the support for the microformat, and we should have. But it would have not -I believe- been used much, if at all.

The geochat thingieOur bad on the omission (easy to add), but I think it would be good to explain why we did what we did, why we'd do it again (with the addition of the microformat support), and why I think a lot of usability testing is still required to make the the conversation about microformats from SMS phones more realistic.

Some background- microformats, like anything that increases interoperability and has the long-term potential of reducing user training, are pure goodness. Microformats specify ways to represent common pieces of information, such as the following for position:

l:lat,lon    

...where l is a small L for location, and lat, lon are latitude and longitude. You can also add a location name like so:

l:cityname=lat,long

However in our proof of concepts we accept the following formats, and had to do some extra tricks to work with the input of the non-tech-savvy users. Here are some examples:

lat*long*message Basic - Lat and long in decimal format with a point, a comma or any (with poor eyesight, when stressed, or in sunlight, it is easy to mix a . with a ,)
32.121
32. 121
32, 121
Weird variations of numeric input for decimal lat/longs. Spaces, commas and points  all appear in unexpected spots
32.55.55
32, 55, 55
Weird variations of numeric input for Degrees Minutes Second lat/longs. Many folks had GPSs and copied what the screen showed, which by default is DMS for most devices.
Context-specific funneling
e.g. 121.234 to -121.234
We call 'funneling' the act of correcting location using some encoded common sense based on context. If you are reporting fires in California, USA from a truck and suddenly your marker moves to the Yellow Sea right off China, it is quite probable you forgot the 'minus' in your longitude report.
For Golden Shadow (which expected activity of people to stay in the area) we just did a blanket rule - we make everyone's location move to the USA.
For Friends Nearby and future global apps, we make a hit-test for political boundaries inferred from shapefiles, and feedback to the user the inferred position 'Hope you had a nice trip to Laos' or 'Looks like you are in a plane or a boat'. This gives the user a chance to correct.
Choice of separators We chose * instead of , ; or # as it was more accessible on all phones we tested, without needing a trip to the symbols menu in most cases.
Geocoding cities and addresses
Palo Alto, CA*Sunny!
Phnom Penh*Kh'mim Pain-haa
In an urban setting, lat/longs are a nuisance unless you are in a flood or vectoring a helicopter. We accept addresses or town names and geo-code them using Google's gocoding APIs.
Reduce the need to report location Assume the sender hasn't moved if no new location is submitted. Reduce the amount of times the user has to go through this complex procedure as much as you can!
Errors and omissions on the above, attempt to resolve automatically AND give a fallback for people to resolve and correct. Loosing data and bothering the user with 'try again' is more unacceptable than trying to infer the intent of the message, so we did our best. On case of failure, add the last known good position, and log the issue so that a human could correct manually or call up and ask 'where are you?'

 

Where the rubber meets the road

We did a usability test with CERT volunteers and folks from the local search and rescue team. It was a diverse audience - we had in the same room a range of cell phones (from baby Nokias to Blackberrys)  and the people had a range of expertise (some use their phone for email and calendar, others never used SMS before). We assumed we just had one chance to train them so we explained it once, and asked them to start sending messages. The log we got was invaluable as raw input - and informed the parsing algorithms to make them more robust for the actual exercise.

A quickie 1st hand test you can try at home: I have a collection of diverse phones to try these things out on - even phones with Khmer Script input support!. So let me see...it takes approximate 70 keystrokes to enter L:123.45,67.89 in a small Nokia, with no errors.  To enter 123.45*67.89 it took me approximately 50 keystrokes. The microformat takes ~40% more key presses (and this particular phone uses the * key for the symbol pad, so that even plays against my point).

I know folks in diverse settings have tried the microformat and it works to communcate position (of course), but I'm not sure it was a representative audience of a broad set of non-tech-savvy users. The breakdown is not in the data itself, but the usability of the format.

Lessons learnt and realizing it's an ongoing effort

The key lesson for me is to make sure we accept the location microformat in addition to more user-friendly formats. If folks know the microformat beforehand (an exception rather than the norm) they can expect 'it just works'.

Alternative paths include applying machine learning feature-extraction efforts to the information, or building a smart client for rich phones that formats message for you (from the GPS?), so the user never sees the location  'wire' format at all. All approaches have pros and cons.

I am also wondering what activity is ongoing in the area of nanoformats, microformats which are slightly friendlier for numeric pad input. It's trivial to invent tiny ways of representing bits of information. The problem is that unless done right they can shift power away from the end user towards the engineers who consume that data. I don't see that as a positive power shift.

To make the effort real these nanoformats would have to get usability testing and feedback from real users in real situations to grow them into something intuitive and easy to enter in different phones.

To do that we (the 'big we') have to continue to experiment with easy to remember schemes which can be trained in one shot, can be context-specific, can be easier to discover, recall and communicate, and works even for a health volunteer who cares a lot about the content and doesn't care about the format at all.

In the end, I believe the best formats, like the best technologies, will be invisible to the end users.

If you are a geo-geek hope to see you at Where 2.0 (InSTEDD is presenting there) in a couple of weeks so we can continue the dialogue!