The U.S. intelligence community does much of its work out of the public eye for obvious reasons, but sharing certain challenges with everyone can benefit both a federal agency and members of the public eager to help.
One of these problems involves creating a unified picture from data in various forms, schemas, interfaces and locations. National Geospatial-Intelligence Agency (NGA) Director Robert Cardillo called it creating “coherence from chaos.”
NGA just completed its Disparate Data Challenge, the agency’s second competition on Challenge.gov, to find easier ways to access and make sense of pictures, video, social media, documents and other forms of data.
Diffeo, a small company in Cambridge, Mass., ultimately won the grand prize of $25,000. This was the company’s first prize competition.
PrizeWire spoke to Diffeo CEO John Frank about the company’s work and the rewards of participating in NGA’s challenge.
PrizeWire: First thing’s first — what is Diffeo and how is it unique?
John Frank: Diffeo’s knowledge discovery software changes how people access and interact with information in the deep Web and private archives. It is an autonomous research assistant that presents you with new content that is highly relevant to your work and that you might miss or not have time to dig out using traditional manual search tools.
Diffeo is a new kind of knowledge discovery tool for understanding networks of entities, such as people, ships, malware, etc. By observing what you gather into your notes, Diffeo anticipates queries that you might eventually conceive and runs them proactively to find answers now.
Diffeo runs text analytics on in-progress documents, and the results are transformational. It allows our system to join the user in looking across the full corpus of data, including the user’s current notes. Instead of manually entering queries for words you already know, you can simply write notes in familiar tools like Word and Outlook, and Diffeo recommends content to expand your understanding.
This doesn’t require users to learn how to create queries on their own. Users do their usual work, and Diffeo uncovers information they might not even know they needed. Diffeo can do this because it has developed machine learning algorithms that continuously disambiguate entities and creates a graph of their relationships.
PW: How long have has the Diffeo team been at this?
JF: The content recommender paradigm is a new approach to exploring and querying the knowledge graph. Our team has been researching content recommender engines since 2011. Our history in search began with my previous company, MetaCarta, which was acquired by Nokia in 2010. While I was Chief Architect for Search at Nokia, we helped the National Institute of Standards and Technology (NIST) run an algorithm evaluation of content recommenders for Wikipedia. Diffeo grew out of this research in NIST’s Text Retrieval Conference (TREC) and on-going work in the Defense Advanced Research Projects Agency’s Memex program.
PW: Has your company ever participated in an open prize competition before?
JF: We have experience organizing and participating in open evaluations of information retrieval and human language algorithms.This was our first open competition for a prize.
PW: What was the experience like and how did it differ from the usual way you do work with the government?
JF: We love it! It accelerates the exchange of ideas. It helps the participants understand problems faced by users faster and earlier, and it allows the customer’s to see and react to new ideas and new paradigms as they emerge.
Unlike traditional software procurements, the challenge laid out a problem area without specifying requirements or defining solution areas — it simply invited innovative solutions. Unlike a Broad Area Announcement, the hackathon approach allowed the evaluators to see real software in action on relevant data much faster.
PW: How did you find out about the challenge?
JF: A friend pointed us to the challenge.gov website and the Disparate Data Challenge sounded like a good fit for Diffeo. It’s a network effect. In fact, I just recommended to a colleague that he look at the recently launched EdSim Challenge.
PW: What about the subject matter attracted your team to the challenge?
JF: Two factors attracted us to the Disparate Data Challenge.
First, our team has a deep background in applying collaborative machine intelligence algorithms to integrating complex data sources.
And second, the user-facing nature of the challenge is exciting –– it resonates with our focus on user experience and interactive recommendations.
PW: What was your solution? What does it do and how does it help NGA?
JF: Diffeo has two products.
Cloud Search is a single search box for all of your cloud drives, collaboration tools, email, desktop, and other content stores. The experience is fast and enriched by “Smart Tags” that help you see the key concepts in documents before you open them.
Advanced Discovery Toolbox is a recommender engine that applies text analytics to your in-progress documents, such as an email composed in Outlook or notes that you take during a meeting. The system automatically formulates queries to pull data from your disparate data stores, and ranks documents by similarity and difference. Our system finds information nuggets — sometimes crucial — that are missing from your working notes and ranks each nugget by the likelihood that you will choose to incorporate it into your document. It provides an easy user interface for exploring this information out into the deep Web and your organization’s private archive.
PW: In addition to helping with NGA’s problem, how else is your software useful?
JF: Our software is useful to anyone who thrives on digital content, and we are particularly focused on research analysts. This includes business consultants, investment analysts, and intelligence analysts such as those at NGA.
Here’s a neat example: Supply chain risk management (SCRM) is a complex challenge. Who made the keyboard that you are using right now? Who handled the microchips in this airplane? Is this broker going to sell us recycled parts with new part numbers silkscreened over the old?
Diffeo’s software helps SCRM researchers uncover the connections between vendors and networks of shippers and dealers. This involves a combination of research on the open Web as well as internal data. Our system builds knowledge graphs that span across the two content domains, so you can see the full story.
PW: Is it common to fuse public data with internal data holdings?
JF: Yes, this also helps with cyber threat analysis, geopolitical analysis, counter-party analysis and more.
PW: How will your company use the prize money?
JF: We’re rapidly expanding our team and technology to support more users and customers.
We recently launched diffeo.com as a tool for open source researchers. Contact us for a demo!
PW: What is your advice to other citizens or small businesses considering getting involved in a challenge?
JF: Do it! These challenges are high-paced and have tremendous return on investment for everyone involved.
The challenge paradigm helps you focus on end users by showing you details of their real problem set.
For more information on NGA and to keep an eye out for future challenges, visit the agency’s page on Challenge.gov.