Friday, May 09, 2008

Sahana installation poised for Myanmar disaster support

Upon the request from Lanka software, we have successfully brought up a virtualized Sahana instance.

You can see it here: https://sahana.instedd.org

(Note you may need to ignore a certificate warning to see this until we deploy a new certificate for this server)

Learn more about Sahana at: http://sahana.lk

Juan was instrumental in getting the Debian virtualized image running well on our Red Hat host OS, thank you! The compressed Sahana VM is about 300mb, which will allow a quick re-deployment of a hardened configuration in Myanmar as necessary. I think this VM would be a good asset to keep around, allowing anyone running Windows or Linux to bring up a running Sahana server with little to no effort.

Volunteer-based Sahana localization effort

Sahana sporting a mix of Burmese and Sinhala

There are over 30 volunteers across 4 continents working on localizing Sahana to Burmese. We ran into multiple issues, most stemming from the lack of Unicode standardization of Burmese. We are using Google Spreadsheets to coordinate the work (the embedded live chat is an amazing feature for live coordination) and folks are using mostly MS Word to do the translations, which we accumulate on a Google Groups page. Many thanks to all involved, I'm afraid to start mentioning folks by name because I'll mess things up or miss key individuals. Jesse Robbins blogged about this in O'Reilly Radar, and Bill Behrman from Stanford has worked his rolodex though which helped us get additional volunteers. Many folks at the NetHope summit had the chance to refer folks as well. Thanks!

Google Groups for localization

Translation is hard - especially for the fonts and encodings to work together. See the awesome burglish site to see what I mean... the translated docs end up having strings like tcef;u¾rsm; which is really encoded Wwin_burmese, which would look like this with Padauk image 

image

Burglish project

One of the main issues with the localization is that it isn't just about translating strings-  there is also a need to accept input in the right format. This isn't trivial with all combinations of fonts and input methods people use, and especially not trivial on a web page that has to work in multiple browsers!

There is also ongoing work on InSTEDD's GeoChat system with usual suspects and new volunteers, preparing for a potential use in Myanmar, which is topic of another blog post entirely.

Thursday, May 08, 2008

Build maps collaboratively with new Mesh4x KML adapter

A handful of months ago I met Kersten Jauer, UN Information Officer for the Central African Republic (CAR). CAR is a large country in Central Africa, surrounded by Sudan, Chad, Cameroon, Congo, and DRC; 67% of its population lives with under $1 a dCAR is Cornered in the middleay and is scoured by constant internal rebellions and gender-based violence.

Kersten spends a lot of the time in the field in CAR, and put together an amazing map of the whole country to support logistics and NGO programs. Roads, provinces, bridges, fuel pumps, it all got captured by hand in Google Earth and saved as KML files. By the time I got it, Kersten's KML had grown to be 11 MB, an amazing amount of information patiently collected and edited, and periodically shared online with all those working to improve the region.Gooogle Earth with Kersten's Epic KML

Google Earth, by defaultContrast the map above showing the CAR KML with the map on the right showing the same region as seen by default in Google Earth. What got my attention was a little note in the KML:

If you would like to comment on this file or have suggestions please email to MapsAndGoogleEarth+car@hcpt.jot.com

To add a placemark just email it with a short description to the same address or kersten.jauer@undp.org

Please also check out the maps section on http://hcpt.jot.com

The map was built collaboratively, but imagine the workload Kersten must have had getting little snips, integrating them on the larger map, and then letting folks know of updates. And how would the map be maintained whenever Kersten was attending to some emergency?

Mesh4x KML Adapter

We started building a simple instance of a KML adapter for Mesh4x this week. This adapter would allow a team of people edit a KML file and then 'synchronize' it with all the others. For example, I could add a pushpin saying a bridge is down, and you could be editing another pushpin or moving it around to represent that a logistics truck has moved. When we synchronize, the truck moves around in my KML and the broken bridge appears in yours.

This could be synchronized peer-to-peer (a KML on your disk to a KML on a USB drive or someone else's box) as well as via a 'cloud' web service. Note this is changing the data inside the KML, it is not just 'file sharing'. The adapter knows about KML and keeps track of versions of fine-grained elements (pushpins, placemarks, polygons) inside the same file. It is an example of how a data mesh could be used to synchronize fine-grained data between applications.

The wonderful KML Sync Demo UI, version 0.000001We chose KML for this adapter as it is a standard ("OGC KML") that is widely used and supported by Google Earth (of course), Microsoft Virtual Earth, as well as nice tools that work offline and can be used in the field such as GeoPDF.

We have a sample UI (shown here) to let you play around with the basics. The effort is still on the libraries and we don't have a neat UI to let you choose endpoints or resolve conflicts, but all will come in due time. Other restrictions include having to put your placemarks in a "Shared Items" folder in your KML, and styles don't get replicated. We foresee no problems working out these constraints over the coming weeks.

To try it out, make sure you have Java installed and:

  1. Get the sample application from http://code.google.com/p/mesh4x/downloads/list
  2. Double click on mesh4j-KML-DemoApp.jar
  3. Point to a KML or open the sample ones
  4. Edit the location of Sample Pushpin 1 in File 1
  5. Add a new pushpin in File 2
  6. Press synchronize, and after both files should have the updated Sample pushpin 1 AND the new pushpin!

Another advantage of a data mesh is that endpoints can be heterogeneous, as long as you do the appropriate mapping. Eventually you will be able to sync a spreadsheet with columns such as Title/Description/Lat/Long into KML pushpins and back quite easily.

We hope to be showing this at Where 2.0. A lot of the team has been focusing on supporting the Myanmar disaster relief, so progress this week has been a bit random, but we still want your feedback!

Learn more about Kersten's Work in CAR at www.hdptcar.net, or get Kersten's latest epic KML.

See the Mesh4x project at http://mesh4x.org

Tuesday, May 06, 2008

Mesh4x adds generic database support

First of all - a very heartfelt support to the Myanmar population in this times of crisis. Many friends are either already there or on their way to help as part of UNDAC teams. It's a tough situation in a tough context, and all my hopes reach out to the communities there so they can recover soon. Unfortunately, it won't go back to "normal" for a long time, if ever. I was in Peru last week and the August '07 earthquake still defines how people live in Pisco. The press and much of the aid has left and the town is still...leveled. Throw in a major disaster in a non-resilient environment, with a bunch of foreign aid with varied commitments to the region, and the long term outcomes are very hard to predict.

This week we made significant updates in mesh4x. One of them is a Hibernate adapter, which allows you to plug into the mesh almost any relational database available in the market

Hibernate Adapter

In our first scenario, let's say you have, or you are quickly hacking together, an application to help enter, analyze and report information. You have a database schema, and you'd like to integrate it with an excel database that field folks are using for data entry. You need to make sure updates and deletes somehow make it out to the spreadsheets, and that folks' updates make it back in. Furthermore, you'd like folks in the field to synchronize spreadsheets with each other directly - thus making it a classic mesh scenario. With the Hibernate adapter, our goal is to allow you to mesh-enable your database by just mapping your entity fields to your database fields.

Hibernate, as most developers know, is an Object-Relational Mapper library for Java. With this adapter you can now integrate into a data mesh any database engine that Hibernate supports, which is an impressive list. By supporting Hibernate as an adapter we allow every user to customize the mapping of the mesh data to their database schema using familiar tools, and get support for a lot of databases. There is still some work to do - for example, as of today the adapter still requires the database schema to revolve around the fact that the rows are being synchronized in a mesh. We expect in the upcoming weeks to remove this restriction and use two separate 'repositories', one for the synchronization information (which you shouldn't care about) and another one for your data.

This will allow you to point to almost any existing database schema and mesh it up without messing it up. (Apologies, couldn't resist).

You can always reach the project through http://mesh4x.org.

Here you can see a list of adapters and suggest your own.

Wednesday, April 23, 2008

Mesh4x: New Open Source Project for Data Meshes

Today we created a new open source project to host InSTEDD's efforts on data meshes.

The goal for the project is to provide libraries, tools and applications that simplify using standards-based data meshes. Our contributions will be based on the requirements observed in global health, community development and humanitarian aid.

You can find the project here: http://code.google.com/p/mesh4x/

Our first contribution to the project consists of some libraries that implement the FeedSync specification, an open standard that describes version vectors, and processes for conflict detection and conflict preservation. FeedSync also happens to be one of the underpinnings of Microsoft's consumer-targeted Live Mesh, but could be used happily on any platform as it's based on extensions to RSS and ATOM - an obvious idea is to build a Feedsync javascript adapter for Google Gears).

Because of the project's emphasis on standards, we structured the source tree so it would host implementations in more than one platform and language. 'Mesh4x' has 2 starter source code folders - Mesh4j (Java), Mesh4n (.NET - a large C# contribution done by Clarius Labs and the Microsoft XML MVPs who already had an open source version, unit tests and all). We hope to eventually see Mesh4php, Mesh4r (Ruby) and so on.

It's a start. InSTEDD's work in SE Asia in addition to the input of humanitarian aid agencies and other providers of technology for social good will be the drivers behind our contributions. We expect work in these areas:

  • Adapters to different stores (e.g. MySQL, or application-specific formats, such as KML), for servers & clients, and the ETL (extract, transform, load) that goes at heterogeneous endpoints. I heard a great idea today for building open-source VMs that run on Amazon's cloud hosting.
  • Implementations that work on mobile devices (for example, we are currently refactoring the Java library to run in J2ME)
  • Support for different transports (plain XML over files, or HTTP is a start, but there are optimizations that can be done for low bandwidth, no-Internet scenarios, or integrating with a transport mesh like WASTE).
  • Integrating implementations with standards-based authentication and data signing approaches.
  • ..your contributions!

Applicability for Humanitarian & Health Scenarios

So why are we at InSTEDD interested in this? Data meshes have some interesting properties:

  • Symmetrical: They allow data to exist in a concurrent multi-master environment where updates can be applied at any node in the mesh.
  • Asynchronous: They allow offline updates to information and synchronization with other nodes without requiring data locks, essential for occasionally connected applications.
  • Dynamic: The synchronization can happen even in constantly changing connectivity topologies. I can sync to a server and later the sync can be done between my client and another client, who could then sync with another server if the first one is there, and so on.

These properties make them very suitable for humanitarian, crisis, and health care environments, where information sharing, data system integration, and technologies that assist politically neutral solutions are beneficial. For example:

  • Symmetry allows you to have two audiences work on the same data through different applications, with no application being the 'master'.  You can also have data sharing of sensitive information between countries or organizations with no country hosting more or less data than the other. See Mary Jane's post about the NGOs in Cambodia, to understand how important this symmetry and neutrality can be. It also allows data to move around a user independently of the device it's been created on.
  • Asynchronicity allows work to happen in environments where data connections are unavailable, bandwidth is low, or the only 'transport' is a USB stick.
  • Dynamism allows the field teams to share data amongst themselves and servers as early as possible. Unlike email, there is no need to wait for connectivity to a specific server to let the information free.

One idea could be to add data mesh capabilities to Sahana, allowing any instance running on a server or laptop to edit the information and 'sync' both ways with any other server or laptop. We have also heard scenarios where users of FrontlineSMS could synchronize information amongst themselves. If anyone is interested we'd happily work with you to see how to approach this..

Come and participate - lets share our scenarios, ideas, and code here: http://code.google.com/p/mesh4x/

Monday, April 21, 2008

SMS Applications and Microformats - lots of work to do!

I got a comment at this weekend's Alt.Net conference - which was echoed in mikel's blog - about us not using a location microformat in the Friends Nearby and GeoChat proof of concept applications on the InSTEDD site.

In retrospect, it would have been nice to have the support for the microformat, and we should have. But it would have not -I believe- been used much, if at all.

The geochat thingieOur bad on the omission (easy to add), but I think it would be good to explain why we did what we did, why we'd do it again (with the addition of the microformat support), and why I think a lot of usability testing is still required to make the the conversation about microformats from SMS phones more realistic.

Some background- microformats, like anything that increases interoperability and has the long-term potential of reducing user training, are pure goodness. Microformats specify ways to represent common pieces of information, such as the following for position:

l:lat,lon    

...where l is a small L for location, and lat, lon are latitude and longitude. You can also add a location name like so:

l:cityname=lat,long

However in our proof of concepts we accept the following formats, and had to do some extra tricks to work with the input of the non-tech-savvy users. Here are some examples:

lat*long*message Basic - Lat and long in decimal format with a point, a comma or any (with poor eyesight, when stressed, or in sunlight, it is easy to mix a . with a ,)
32.121
32. 121
32, 121
Weird variations of numeric input for decimal lat/longs. Spaces, commas and points  all appear in unexpected spots
32.55.55
32, 55, 55
Weird variations of numeric input for Degrees Minutes Second lat/longs. Many folks had GPSs and copied what the screen showed, which by default is DMS for most devices.
Context-specific funneling
e.g. 121.234 to -121.234
We call 'funneling' the act of correcting location using some encoded common sense based on context. If you are reporting fires in California, USA from a truck and suddenly your marker moves to the Yellow Sea right off China, it is quite probable you forgot the 'minus' in your longitude report.
For Golden Shadow (which expected activity of people to stay in the area) we just did a blanket rule - we make everyone's location move to the USA.
For Friends Nearby and future global apps, we make a hit-test for political boundaries inferred from shapefiles, and feedback to the user the inferred position 'Hope you had a nice trip to Laos' or 'Looks like you are in a plane or a boat'. This gives the user a chance to correct.
Choice of separators We chose * instead of , ; or # as it was more accessible on all phones we tested, without needing a trip to the symbols menu in most cases.
Geocoding cities and addresses
Palo Alto, CA*Sunny!
Phnom Penh*Kh'mim Pain-haa
In an urban setting, lat/longs are a nuisance unless you are in a flood or vectoring a helicopter. We accept addresses or town names and geo-code them using Google's gocoding APIs.
Reduce the need to report location Assume the sender hasn't moved if no new location is submitted. Reduce the amount of times the user has to go through this complex procedure as much as you can!
Errors and omissions on the above, attempt to resolve automatically AND give a fallback for people to resolve and correct. Loosing data and bothering the user with 'try again' is more unacceptable than trying to infer the intent of the message, so we did our best. On case of failure, add the last known good position, and log the issue so that a human could correct manually or call up and ask 'where are you?'

 

Where the rubber meets the road

We did a usability test with CERT volunteers and folks from the local search and rescue team. It was a diverse audience - we had in the same room a range of cell phones (from baby Nokias to Blackberrys)  and the people had a range of expertise (some use their phone for email and calendar, others never used SMS before). We assumed we just had one chance to train them so we explained it once, and asked them to start sending messages. The log we got was invaluable as raw input - and informed the parsing algorithms to make them more robust for the actual exercise.

A quickie 1st hand test you can try at home: I have a collection of diverse phones to try these things out on - even phones with Khmer Script input support!. So let me see...it takes approximate 70 keystrokes to enter L:123.45,67.89 in a small Nokia, with no errors.  To enter 123.45*67.89 it took me approximately 50 keystrokes. The microformat takes ~40% more key presses (and this particular phone uses the * key for the symbol pad, so that even plays against my point).

I know folks in diverse settings have tried the microformat and it works to communcate position (of course), but I'm not sure it was a representative audience of a broad set of non-tech-savvy users. The breakdown is not in the data itself, but the usability of the format.

Lessons learnt and realizing it's an ongoing effort

The key lesson for me is to make sure we accept the location microformat in addition to more user-friendly formats. If folks know the microformat beforehand (an exception rather than the norm) they can expect 'it just works'.

Alternative paths include applying machine learning feature-extraction efforts to the information, or building a smart client for rich phones that formats message for you (from the GPS?), so the user never sees the location  'wire' format at all. All approaches have pros and cons.

I am also wondering what activity is ongoing in the area of nanoformats, microformats which are slightly friendlier for numeric pad input. It's trivial to invent tiny ways of representing bits of information. The problem is that unless done right they can shift power away from the end user towards the engineers who consume that data. I don't see that as a positive power shift.

To make the effort real these nanoformats would have to get usability testing and feedback from real users in real situations to grow them into something intuitive and easy to enter in different phones.

To do that we (the 'big we') have to continue to experiment with easy to remember schemes which can be trained in one shot, can be context-specific, can be easier to discover, recall and communicate, and works even for a health volunteer who cares a lot about the content and doesn't care about the format at all.

In the end, I believe the best formats, like the best technologies, will be invisible to the end users.

If you are a geo-geek hope to see you at Where 2.0 (InSTEDD is presenting there) in a couple of weeks so we can continue the dialogue!

Sunday, March 30, 2008

Off to Thailand & Cambodia

I'm taking off to Southeast Asia in a couple of hours. Our goals for this trip is to set up the structure for our long-term presence in the region. I'm going with Dennis (our Program Director) to Bangkok to meet with many organizations that we are or would like to be working with in the area such as Mekong Basin Disease Surveillance and Mahidol. From there we go to Phnom Penh and inner Cambodia - where Mary Jane, Luke and Robert have already spent all week.

Here's my rough itinerary

  • March 31..April 4: Bangkok
  • April 4..April 15: Phnom Penh

I'd like to chat with folks who work in the technology space, especially in Cambodia. We also have a meeting with some of the folks contributing to Bar Camp Phnom Penh... So if you are in Phnom Penh and you are doing programming, web design, databases, mobile applications or program localization or have an interest in contributing tech skills to our MCP program drop me a comment here, or send an email at edjez-at-instedd-dot-org, and we'll take it from there.

See you soon!

Tuesday, March 25, 2008

Keeping our infrastructure 'in the cloud' and our costs close to the ground

At InSTEDD -like at any other non-profit..or a well-run business- there is a constant evaluation of how we are using our donors' money, looking for ways we can reduce overhead and anything that doesn't translate directly into mission-related impact.

Given how much of our focus is on technology, it is natural that this concern affects how we design the infrastructure that supports our work. In this post I share the toolset we use to support the lifecycle of our technology which is effective as well as lean. Perhaps others can take advantage of the evaluation work we did or can suggest useful alternatives..

Our key requirements are:

  • The tools work with our Scrum+XP (eXtreme Programming) processes
  • The tools work for an internationally distributed team even when in the field 
  • They are efficient cost-wise as well as adequate for the task and reliable

These requirements led us to evaluate many approaches. Ultimately, we opted for an infrastructure that requires no intranet, and no on-premise servers. That means no extra staff of acolytes & operators simply to keep the things going, and associated savings on power/heat/rackspace management...expenses decidedly not core to our mission.

I'm exaggerating. We do have an intranet. It has a printer.

By using software-as-a-service or software+services we have the advantages of lesser operations and increased reliability. We must also take a hard look at three contentious areas:

  • Security: Will the hosted service provide us with the level of confidentiality, transport security, and the management of user privileges that we need?
  • Data Portability: Will the hosted service allow us to import - and even important - EXPORT data to another service? We didn't want to fall into a lock-in scenario with 'trapped data'. Both on and off-premise backups are a must.
  • Accessibility: Will the service be accessible in the field? How Can it cope with low bandwidth connections? Is it possible to work offline?

In the end, we have arrived at the following list of tools that as a set fare well with our way of working and our needs. You can see the rough cost structure for the services that aren't free (When I say 'Free' I mean gratis/no cost/Free as in Beer.)

Engineering Tools:

Google Code -http://code.google.com/- We use Google Code for the source code control (SCC) of components and tools we release as FOSS . It provides issue tracking, a wiki, and downloads in addition to the source code control features. Free.

image CVSDude - http://cvsdude.com/ - We host in CVSdude the source code for projects in their early stages when they aren't open source yet. Monthly fee.

image Tortoise SVN - http://tortoisesvn.tigris.org/ - Tortoise is the most popular SCC client in the team. CVS & Subversion allow working offline - and managing multiple copies of the source trees on the client, and Tortoise allows you to manage these with ease. Another great feature is that we can keep in the same source tree a mix of projects hosted on Google Code and CVSDude, allowing developers to just do single update and commit operations. This helps us work on our embarrassing early code while keeping the open source projects up to date with no extra hassle. Monthly fee.

image Fogbugz - http://www.fogbugz.com/ - We use Fogbugz for our work-item, task, and bug management. Batch updates are easy with its AJAX-based list management. It allows you to create private and shared views, exports data in multiple formats, provides email notification and reports (eg burndown charts) that are useful in agile processes. And - great for testers - it even has a client that allows you to take screenshots, annotate them, and attach them to new bugs. Finally, it has a killer feature that allows me to create new tasks by just typing and pressing "enter" ("killer" because its abuse guarantees my death at the hands of the engineering team). Monthly fee.

Virtualization - Although it is not "hosted infrastructure", virtualization saves on hardware costs. We run most of our work in virtualized environments which include including dev boxes, boxes running demos, boxes for testing or building. Some devs run these from XP, Linux, or Mac OS. We tend to use VMWare, which does a good job of allowing VPCs unfettered access to USB ports and such when working with special devices. Free or not depending on the virtualization product you use and the OSs you are hosting.

The collective annual fee of all the services listed above roughly equal the cost of one moderate-size server with no OS, no software and no support staff.

Communications & Sharing

Skype - http://www.skype.com/ - For voice, video and chat. At crunch times, our team keeps a Skype channel open -- sometimes for hours at a time. It provides a sense of literally being in the same room. We are also looking at ooVoo, vSee and N-way for video. Basic Skype is free. Additional features, such as forwarding calls to a cell phone, are very cheap.

Conference calling- http://www.intercall.com/- We chose a conference call provider that has access numbers in over 50% of the world's countries. Although costs are based on the number and location of call participants, overseas tolls are avoided, which is a significant savings on international calls.

Twitter  (http://www.twitter.com) "Micro-messages" are great for ad-hoc communications, especially by SMS users spread across several countries. Within the team we send twitter direct messages by prefixing messages with d as in "d some-name I just uploaded the new version check it out". I use Twhirl as a Twitter desktop client. Free.

Sharedview - (Link) We use this new Microsoft tool for quick-and-easy screen sharing. Drawback: it only runs on Windows. But most of us have some flavor of Windows running - even if it is in a virtual machine on Linux or MacOS. Free.

Google Docs - (http://docs.google.com/) We use Google docs for taking notes and brainstorming during conference calls. It allows multiple users to collaboratively edit a document in real time. Once completed, though, we copy the document into MS Word or OneNote and save in Groove, which makes it possible to access the information offline. Free.

Groove - (Link) Microsoft Groove is useful for team coordination, managing "knowledge bases" of technologies and, most important of all, for tracking user requirements in the field. Since Groove is inherently an offline tool, it shines when Internet connectivity is an issue, but local connectivity is possible. It is not a traditional "cloud" product, but is based on a secure mesh architecture that allows pure peer-to-peer interaction. Unfortunately (hint!), it only works on the Windows OS, it speaks non-standard protocols over the wire, and has no "Web Access"/"Live" component to it. It is a part of Microsoft Office Ultimate. License Fees.

Still needing improvements...

These are some of the shortcomings with these hosted services which we hope will be addressed in the upcoming years:

  • Offline access: Many web-based tools would be more useful and valuable if they also offered a thoughtfully-designed, well-architected, reliable client for offline usage (and it takes more than just sprinkling Google gears around your javascript to achieve this, but that's a topic for another post).
  • Unified authentication - The growth in number of sites using single sign on technologies such as OpenID is encouraging but more would be better. In addition, services such as access control and other crosscutting features could be added into the mix (a trend I encouraged at an AOP panel long ago).
  • Support for Integration - I'd like to see more sites view themselves as 'building blocks' -- as part of a larger solution instead of trying to be the 'one stop shop'. Data, process, and UI integration APIs are always welcome.

So far this mix of products has been working well. The increase in bandwidth, along with tools and standards has allowed us to have core engineering-mission-critical tools online, and "software as/plus services" a cost-effective strategy we use everyday.