Today on Data Mining Research, Stuart Shulman is answering our questions regarding his tool DiscoverText and his company Texifter. Stuart, thanks for sharing your work and taking some time to answer Data Mining Research questions.
Data Mining Research (DMR): Could you please introduce yourself to the readers of Data Mining Research?
Stuart: I am a political science professor, software inventor, and garlic growing enthusiast who coaches U9 boys travel soccer…go Tigers! I am also the founder and CEO of Texifter, LLC, Director of the Qualitative Data Analysis Program (QDAP) at UMass Amherst, and the Editor-in-Chief of the Journal of Information Technology & Politics.
DMR: How did you come up with your company Texifter?
Stuart: I began work in this area in the fall of 1999, when a mid-level agency manager at the USDA’s National Organic Program shared 20,000 electronic public comments that were submitted in response to new proposed standard for organic food. The agency also wrote a letter to the NSF pledging support and collaboration as I undertook a pilot study of the viability of commercial-off-the-shelf (COTS) qualitative software for sorting large numbers of public comments. It was clear that agencies needed more powerful human language tools to meet the demands of electronic democracy, especially when the pulse of the nation was inflamed.
I was the founder and Director of the “eRulemaking Research Group,” which was formed at the January 2003 National Science Foundation-sponsored workshop titled “E-Rulemaking: New Directions for Technology and Regulation,” held at the John F. Kennedy School of Government, at Harvard University. Following the workshop, I lead a team that involved computer scientists Eduard Hovy (University of Southern California-Information Sciences Institute) and Jamie Callan (Carnegie Mellon University), as well as sociologist Stephen Zavestoski (University of San Francisco). With funding from the National Science Foundation (NSF), our group organized workshops, made presentations to federal agencies, NGOs, and private sector representatives, launched an eRulemaking text data testbed, and collaborated with five federal agencies (DOT, EPA, USDA, BLM, and USFS) in the submission of a successful 4-year proposal, funded by the NSF’s Digital Government program.
At a certain point, technology needs to spin out of university labs and into the private sector. This is that point. I am currently transitioning out of a fulltime academic role and into the private sector.
DMR: What is DiscoverText and who is it for?
Stuart: For Texifter customers, the need to mine social media data is seamlessly fulfilled through the deployment Application Programming Interfaces (APIs) in DiscoverText. These applications ease the collection, archiving and sorting of social media text, for example via the Twitter and the Facebook Graph APIs. Texifter offers a universal, multilingual capable, Web-based, user-centered text repository with extremely low barriers to entry in terms of cost, time & training. Texifter applications make it possible to crowd source data analysis in novel ways, leveraging peer relationships and Web-verifiable credentials. Ingesting millions of items from social media, email and electronic document repositories is easier, and advanced social search leveraging metadata, networks, credentials and filters will change the way users interact with diverse types of text data.
DMR: What is the most important lesson you have learned from text processing / mining?
Stuart: Computers cannot do a lot of important things with human language, but they are great for organizing, storing and reusing the work of humans to try and make computers do those things better and faster over time.
People like large text datasets; they are fun to play with and yield wonderful inferences when handled with care.