Monday, December 22, 2008

Phones don’t change the world, people do

CD4  counts, maybe someday (pic credit Dave Bullock, Wired)Inspired by the Wired article “Scientists Hack Cellphone to Analyze Blood, Detect Disease, Help Developing Nations” by Dave Bullock there has been a lot of activity under the change.org post “The Cellphone that could change the world” by Nathaniel Whittemore. 

Nate’s post takes on a ‘remember the future’ approach where he fast-forwards to 2011, and paints a scenario where mobile technologies are widely deployed and used. I really like that approach to visualizing possibility, and wished it was used more as a social activity. Strong Angel and Superstruct do this too, in a way. The realm of the imaginable could be further expanded by more science fiction about community and civilization resilience (This year I enjoyed reading Kim Stanely Robinson’s fiction books about the onset of sudden climate change and the response of a “fictionalized NSF” and a US govt that isn’t afraid to change). But I digress. I liked Nate’s post and the ideas there. The comments were riveting.

Katrin urged me to engage in the discussion at change.org. Reading through the original post and then through the comments (with a lot of  ‘strong players’ from the mobile applications community), a couple of thoughts emerged about the state of mobile technology applications for health and other social purposes. Here are some.

In the future…where are the business models?

today (pic credits Eduardo Jezierski, InSTEDD)

If you are curious, here is the reality today: In June the week before the elections I visited Zimbabwe. Here you can see a real, resilient, working Guava machine for CD4 counts on the outskirts of Harare. It uses microfluidic technology (for smaller blood samples and reactant costs) and if I recall correctly the operating principle is the same as the phone above which is a tested technology. The thing is solid, and the staff deemed it highly reliable. Calibration was not an issue. They were able to multiply the amount of CD4 lab counts manifold to 300+ per day. I was there discussing the possibility to link to the lab record system, but it wasn’t the highest priority.

A lot of the discussion did center around how disruptive it would be to have an open platform (open hardware, open software, open assays, open IP on the test methods, open reactant formulas and manufacturing) for these tests.

Just as a $99 iPhone is a red herring for the phone network costs you are going to pay every year, a cheaper test sensor that becomes widely deployed and relies on proprietary reactants has a hidden, more insidious cost.

I did not check what are the assays or lab system used by the LUCAS phone in the Wired article, and whether they are open. I just was surprised this dimension wasn’t part of the interview. I encourage Ozcan from UCLA to open-source the hardware specification to allow others to build on it!

Question: When you plug something in do you say “I’m using electricity” or “I’m using the wall socket”? Sometimes I feel the discussion about innovation in mobile tech sounds like a discussion of innovation in energy…where the discussion centers on the design of plugs & sockets. A phone is just a conduit to a network, and a powerful, sensor-rich, user-friendly device can be underused as a collaboration tool that help people work better together if network reliability and costs are not managed in unison.

In my 2011, I hope that there are hybrid social-enterprise efforts that can make inroads to working with wireless providers and carriers. They need to evolve their offerings and provide the types of cost structures needed for health and social good to scale and not depend on infusion of donations to keep running OR pushing costs where they can’t be paid while willing customers cant spend their money. Even just helping providers make money differently would help a lot. Examples: toll-free-SMS?  Free-to-send? Free-to-receive? Mobile banking? Shared-costs billing? Provider-supplied location tracking of registered gov’t health staff? Anonimized tracking of random individuals for disease migration modeling? it goes on.. Providers could make more money (gasp!) and they don’t.

Beyond 2011 I hope more effort gets put into creating connectivity approaches that would be disruptive to current wireless systems. And I mean the “system” of government spectrum licensing + carriers + wireless providers + device manufacturers. But who would fund this research? Sigh…we need smaller, personal, cheaper GSM ‘towers’ that can be linked up more than phones. What would happen if every smartphone could host a 802.13 ‘peer’ network?

Centralized or Distributed mobile apps? There are no ‘best’ practices…

There are only proven practices, in context.

When evaluating whether an approach fits a new situation, you have to consider the context in which other solutions succeeded or failed. I face this all the time in the discussion of ‘centralized’ versus ‘individual’ mobile solutions. Sometimes I get asked which approach is better and the answer is a) it depends b) you want both, not either/or.

Server side approaches work well with large scale requirements The centralized approach uses national or international-scale gateways, like Ushahidi with Clickatell, RapidSMS, InSTEDD GeoChat with Clickatell and BT, and so on. These are appropriate for national-scale programs, where reliability, performance security and availability of certain types are provided.

FrontlineSMS is an example of a personal solution FrontlineSMS is the archetypical individual or grassroots approach, where a phone attached to a computer acts as a gateway where you control costs, numbers, location, etc. – providing different types of reliability, performance, security and availability for different contexts. This type of ‘individual’ solution can even run in a smartphone, and FrontlineSMS and other projects are already proposing such a migration. For GeoChat, we put it on the backlog until we saw more demand for this approach from our Asia programs.

Approaches like RapidSMS which rely on an Asterisk server can also work on a laptop, or on a server, and can help span a ‘middle ground’ between other solutions.

Scalability is important, but, I see discussions of scalability center around numbers of messages and numbers of registered users which is for most cases profoundly irrelevant. Again, scalability is context-specific; and measured by how well you grow with your users’ needs.

phone in rural cambodia with structured data. Photo Eduardo Jezierski (InSTEDD) I know a chap –I consider him a hero- who spends most of his month travelling rural Cambodia supporting a national program to send data via SMS using plugged-phone installations. Imagine it: phones with locked enclosures get forced and misused, SIM cards swapped, chargers that burn out, USB drivers that fail, phones that lock up…Support costs of a site are his scalability denominator. For GeoChat, for example, our main scalability metric is latency of roundtrip messages under sustained use (like twitter, responses have to come out fast) across all channels (SMS, email, twitter) under large number of group users and groups.

But why one approach or the other?

Some applications support both centralized and decentralized models (like GeoChat) but as we start working together in this budding mobile community it makes sense to pool efforts and re-use each others’ technologies. I don’t see why InSTEDD for example should build yet-another-phone-detection-and-driver layer if other “social good” applications have it. For example, FrontlineSMS can forward messages on to Ushahidi (acting as a local gateway). We will take a similar approach with InSTEDD and should be emulated by the rest of the community. By working on common protocols all our apps could forward messages to each other as required (see this example as a working draft from the Open Mobile Consortium Katrin mentioned) (And Ken, if you are reading this, contributing to FrontlineSMS source was on my last years’ resolutions, and now that we got access to the source code we can really start work on integrating/implementing it with GeoChat, Mesh4x, etc… I’m optimistic about ‘09)!

rough sketch of whewre it is all going (in terms of message exchange topology, at a very high level)

The goal is to be able to pick the right tool for the context, and all the applications mentioned above are already working on protocols that would let you have a hybrid deployment that would allow you to scale up or out as needed. As contexts change, having freedom to evolve your app and not be locked into one or another is key.

Once you are moving messages around, how do you make sure different applications interpret the information in similar ways?

Shared formats for data exchange

To achieve interoperability, and reuse the human capital of having trained users, mobile apps should also share conventions on what gets put IN the messages. There is a huge gap in defining what gets put on SMS messages for diverse uses:

  • Free text, with specified language
  • Free text with explicit tags
  • Locations (lat/long, place names, village PCodes etc)
  • Delimited data (e.g. Ed, Jezierski, Cambodia)
  • Self Describing Data (e.g. firstn:Ed|lastn=Jez|city=Seattle
  • Multi-Message batching, sequenced or order-agnostic
  • Message batch retries
  • Compression
  • and the list goes on…

The community of builders of mobile apps for social purposes has to start catching up in this space. I suggest re-using the leadership of twitter and other services in evolving some conventions (eg @user, #tag) in common ways where applicable. I would also like e.g. Nokia’s data gathering solutions and other industry players e.g. Google to participate in the open forum, too.

For example, In the Cambodian Avian Influenza hotline pilot we implement batching and self-describing data over SMS. We should get together with RapidSMS, and define a common approach. This would let the Cambodian government switch out InSTEDDs backend and put RapidSMS transparently, if they chose to do so.

One example of this is GeoChat + JavaROSA. We want to support JavaROSA front ends to send structured data to GeoChat, and if we documented the format well, other clients (like Nokia’s?) or servers could be used interchangeably.

JavaROSA is an excellent open source project, great technology and well run. We have contributed the ability to do 2-way sync between phones and between a phone and a server, already.

Even with these agreements interoperability can also lead to a shallow openness, where applications work with others… as long as they can continue to hoard the data and lock-in users. You can see this happening over the last year in the space of social networking technologies, where many announcements of open approaches veil an underlying strategy of trying to become the ‘hub’ or the ‘one stop shop’.

Do the benefited populations really gain much if folks can collect more data, but we they can’t move it around?

Sharing Data

We all know the limits to sharing data are political or incentive-based, more than technical. But technology makes a fine excuse for not sharing information.

In the field one faces many silos – NGOs with different mandates, Government agencies with different domains (animal health, human health), research programs funded by different ivy league universities, not to mention ethnic, language and country borders.

This is an area where InSTEDD has been doing a lot of work as part of the Mesh4x project, which basically allows data to be shared two-ways between disparate systems.

Here are some latest updates

Leslie (Les) Lenert, Director, National Center for Public Health Informatics, US CDC, puts forward technologies they believe are disruptive. Better devices, data collection, and data sharing:

CDC Slideby Les Lenert, photo credit Taha Kass Hout (InSTEDD)

The goal: An Open & Sustainable Platform for the end users

Ken uses the \o/ logo for FrontlineSMS, a gesture of empowerment. I smile every time I see it.

We can’t forget that all these technology efforts are trying to empower individuals and organizations, and simplify the work of caring for one’s own community or for others.

All the teams mentioned here are working together already in different capacities towards this end goal. Resources, timelines, tools are always an issue, but over time things will be more integrated.

All the technologies mentioned here are converging towards a shared architecture –a platform for data exchange and collaboration built around mobile users in the harshest environments. A platform that can start small and grow transparently, or start large and continue running even if the centralized networks are unavailable. Because of this shared architecture, the end portfolio will be stronger, dollars spent on technology will go further, and users will have a simpler entry point to learn what are the right tools for their context.

So when a new phone comes out with a CD4 blood cell sensor, its users will know that it can send its data and “it just works”...and then go change the world one CD4 test at a time!

Thursday, December 11, 2008

For Geeks: Progress on Mesh4x: Cloud Services, Architecture, Adapters, and Adopters

As the year wraps to an end we have a mixed blessing: On one side we have a small but growing portfolio of technology stemming from our organization's immediate goals to improve disease detection and public health in South East Asia, being built at a steady pace by our small but ultra-capable team. On the other hand, the scenarios we are addressing are proving to be relevant in all walks of life of the health and humanitarian space, generating an increasing demand and with it, a simultaneous increase in breadth and depth on the demand side. Exciting times indeed!

Of our main technology efforts (Riff, GeoChat, Mesh4x, TrackerNews.net) Mesh4x (http://www.mesh4x.org) is the one that started getting the earliest deployments to the real world.

From mesh4x.org:

“The goal of mesh4x is to provide a portfolio of libraries, tools and applications that simplify using standards-based data meshes from multiple platforms and languages…”

The libraries can be used right away by developers who integrate them in their own applications, so there was no need for them to wait for a more packaged set of user interfaces and end to end experiences.

Why it matters and why InSTEDD is working on this

Data meshes have appealing characteristics for our users, so our contributions to the Mesh4x project are driven by observed data-sharing needs in the health and humanitarian space.

  • Symmetrical: They allow data to exist in a concurrent multi-master environment where updates can be applied at any node in the mesh.
  • Asynchronous: They allow offline updates to information and synchronization with other nodes without requiring data locks, essential for occasionally connected applications.
  • Dynamic: The synchronization can happen even in constantly changing connectivity topologies. I can sync with a server and later the sync can be done between my client and another client, who could then sync with another server if the first one isn't there, and so on.

This matters to us as these characteristics help information flow and data sharing even in the tough contexts we face:

  • Symmetrical: No organization or application has, de-facto, greater control over information than any other. Symmetry allows power to be shared equally amongst partners, in a true multi-master way, resulting in less hoarding of live data.
  • Asynchronous: Connectivity is an occasional luxury, and the most up to date information is found where it is less likely to have a connection. Storing changes locally and sharing them opportunistically keeps information moving.
  • Dynamic: Connections are opportunistic – you may not have Internet access at all, but you have access to local wifi networks, physical contact with other devices, etc. Data will eventually get to the desired endpoints as it leaps opportunistically between participants.

Some concrete applications of mesh4x in the space:

Mesh4x goes mobile with JavaROSA, allows you to sync data on your handset with no Internet

Mesh4x SMS Adapter: Sync data without an Internet connection

I have another blog post I should release soon that highlights the proven value of meshes and Groove in the humanitarian space, and my personal introduction to the uses of this architectural pattern.

But this post is about the progress & directions for the project.

Cloud-Based Service

In the last post we mentioned building a cloud based services as a contribution to the space. The demand was for an always-online, cheap to host, simple server that could act as a storage of data and as a relay point for devices connected to the Internet.

The implementation was embarrassingly simple on Amazon's Elastic Compute Cloud (EC2, a dynamic and virtualized hosting environment) and S3. As a matter of fact, a single Java servlet running on Tomcat + Linux and driving the Java Mesh4 sync libraries ("Mesh4j") provides the heart of the logic. Less code is the best code!

image We are doing a pilot with the Center for Disease control, synchronizing their Microsoft Access-based EpiInfo application, and they asked if the health surveys they were taking could be automatically geo-mapped as the users synchronized to share their information. This led to incorporate an ontology ("schema") mapping aspect to tell the server "expose a KML feed taking THIS as the title, description, address, and timestamp for the items"

Taha describes the work with CDC on his Biosurveillance 2.0 blog and why using mesh4x will help them extend the effectiveness of EpiInfo for outbreak investigation.

We will be opening this service up progressively as we test it out with initial users and tweak it based on their feedback; I hope in a couple of months to have a tested version we can point you to publicly! In the meantime, contact us if you are interested via email or if you are a developer via the Mesh4x.org code project.

Part of the forcing function for writing this post this week is that we've been chatting with CDC, JavaROSA, and others about these store/endpoint/mapping capabilities and I'd rather we start the collaboration early before we accidentally diverge codebases or approaches.

Under the Hood

This is the architecture that the server has been going towards these last couple of weeks:

AAaagh lots of coloured boxes! a drilldown to what the server architecture is trending to

Update APIs:

These allow other applications to change the data in the service. A mesh endpoint allows FeedSync-style updates, but we'll add AtomPub for simpler edits via http POST and other RESTful verbs that are easy to manage from Javascript or are useful if you don't need the full power of the mesh. A JavaROSA endpoint will allow the right metadata to be exposed to JavaROSA or AndroidROSA handsets, and accept updates.

The GeoChat and a FrontlineSMS bridge would allow message forwarding and sending semistructured data directly in via SMS.

Storage:

This is the storage layer for all the data and the configuration, security information, etc needed to keep the service running. In our web-based instance, all this data is stored in S3, but if you wanted to host this in your own office or in a clinic, it would all be sitting inside a MySQL instance. As a matter of fact, all the mesh4x services' information is managed by mesh4x itself, so the actual configuration data is stored via an adapter.

Ontology Extraction:

Our service differs from a database in which you don't need to tell it the schema of your information up front. As a matter of fact, we would like to know as little as possible about the format of your data. We prefer to let applications change and evolve the data they use without having to ask developers to change database structures or write specific code for each case. But knowing just a little about the structure of your data helps with things such as defining mappings and filters, so we try to infer as much as we can. The Ontology Extraction component allows you to submit RDF-formed information (or XForms-based or other any other formats that has a transformer) and we keep track of (for example) what fields make up your entities. If you supply such ontologies yourself (in RDFS, or an XForm Definition)we keep it around, too (e.g. 'Patient Date of Birth is a Date/Time field' ).

this thingie is supposed to represent an RDF triplet Internally, we are using RDF as the default standard to represent data and ontologies. RDF has many properties that make it the simplest appropriate choice, but that would be the topic of a whole different post in of itself.

Ontology Mapping:

Ontology Mapping allows us to map fields and entities of different ontologies to help us make sense of your data. For example, to do nice map of your data we need a title and a descriptive summary, a position, and a timestamp associated with the entity. Which field should provide the timestamp? Which address or coordinate fields should be used to put an item on the map? How should the description be composed from from the data? Mappers allow us to do this, and in a future through the user interface you will be able to define these yourself.

Filtering:

Filtering is essential in a mesh where little devices and big devices coexist. You could have refugee records for a whole country in one mesh4x mesh, but on a mobile phone you'd probably only want to keep a subset of that. As soon as we expose filters it will be easy for a phone to say 'I work with patients in village X' and just sync that subset of data.

Format Transformers:

Format Transformers are components built to translate data into specific formats. GeoRSS and KML are standard formats for representing information with geographic aspects to them. You can see the KML in Google Earth, for example, and items would appear on the map as people sync their data to the server.

Transformers for XForms Models and XForms form allow us to translate the information of your entities and their ontologies into XForm formats. We see the utility and the pragmatism of XForms models as a way of exchanging records and to define the UI model of the forms users see in XForms, so these transformers allow us to go from our internal RDF-centric representations to these broadly adopted formats.

Sync Adapters:

Finally, you have all this data here, but you probably want to work with it elsewhere! Folks have suggested/requested the following as potential endpoints for the data:

  • Google Spreadsheets: we have a Microsoft Excel adapter, so why not a Google spreadsheet one? Imagine creating a form, having it fill out a spreadsheet with gadgets for analytics, and then. Google spreadsheets are also great when lots of people online have to work live on the same data.
  • Zoho is coming up with lots of useful applications. Imagine synchronizing your Zoho app with a table in your MySQL or MS-Access database.
  • MySQL: a lot of websites out there -for good or for bad- run with their MySQL instance exposed on an open network port. Someone we were working with in Mukdahan, Thailand (a 12-hour truck ride from Bangkok), asked the simple question: if I give you my connection string, can you just put the data there for me? Seemed simple and straightforward, so we will line it up in front of other needs!

Together with running sync adapters we will have to have some user interface to schedule these updates, define mappings between schemas/ontologies, and resolve conflicts. A nice UI for this may end up taking a big pat of the project effort, so if you can reference us to open source projects that do any of this or want to contribute, don't be shy!

These mappings are part of the mesh too, so in a future (assuming anyone requests InSTEDD or contributes the source) you could be offline and mark an excel spreadsheet as 'shared' and when you sync, not only the data would travel back and forth, but the server itself could create a Google spreadsheet endpoint (or something similar) with the same information for others in your team to use!

Putting it all together

In my next post I am explaining how all the pieces of the Mesh4x project come together to help data integration of disparate systems and helping connect these applications into a synthetic whole, instead of having dozens of islands of information.

More information

http://www.cdc.gov/epiinfo/ EpiInfo is CDC's outbreak investigation surveying tool. You can participate in their Open Source project on CodePlex: http://www.codeplex.com/EpiInfo. We are working with them to enable synchronization over the cloud of their MySQL/Access based tool.

....And recently had a release, announced hours ago. Congratulations to the CDC team!