Back to Talks

Pitfalls of Texting Mining

Dalila Benachenhou Femvestor, Inc.

Audience level: Intermediate
Topic area: Case Study


Text in NLP, Information extraction, and supervised or unsupervised learning, brings challenges to the researchers that are none existant with structured data. Here, we present 4 scenarios and unique approaches developed to deal with them.


Text preparation and modeling depends on the practitioners objective. To compare Moby Dick and White Fang, one can follow conventional approaches of pre-processing and comparing words distribution. To create executives profiles from public domain documents and news, one need NLP, and to extract tuples that are then weaved into a social network. To understand tweets about Brexit, one has to think how to represent the words, but also which distance to use for tweets clustering. To find affinities between ingredients within a dish of over thousands of recipes, one has to think about representation but also about using technics from graph theory.