Information Extraction

Information extraction is about retrieving structured, well-defined information from relatively unstructured or less available data sources.  Such sources include websites and blogs, databases hidden in the Deep Web, and print material such as catalogs and directories.  Rapid growth in all these areas provides increasing opportunities—and challenges—for making information both useful and available on demand.

  • Have you ever been tempted to just start copying/pasting items from a web site to your database so you could do some analysis, make a report, create a catalog?  But soon quit due to slow going and unreliable results?
  • Or did you manage to copy/paste many lines or tables, but then confronted problems of inconsistent spacing, fields missing or run together, and again the large and unreliable task of trying to manually sort it all out?
  • Or was the information you wanted fairly well-defined on the screen, but every single item had to be requested one at a time, by typing into a form?  [Sadly, many government sites provide their "freely available" information this way.] 

These tasks can be fully or partially automated, given the right tools and knowhow.  At Aware Research, we extract the information you need, allowing you to focus on what you do best.

Women-Owned Businesses on the Move

A business woman, and leader in her community, worked to promote the growth of woman-owned businesses in her area. To amplify her efforts and help make her case, she required substantial data.

Aware Research went to work, extracting data on local businesses, with focus on NAICS code, estimated employees, estimated sales, and identification of each company's executive or owner.

Aggregate analysis by type of business, related to number of employees and annual sales, identified some key areas of interest differentiating (on average) the businesses run by women from others.

Graphical data visualization for a few presentation slides completed the task.

The client was pleased with her results.  We were pleased to help her and her cause.

Prospecting for Web Development Clients

A web development company sought additional clients. Their immediate difficulty was not too few prospects, but too many -- it seems everybody has a web site these days.  But how best to select prospective clients in the best position to benefit from their services?

With Aware Research they defined criteria including front page length, quantity of links, key phrases, possible spelling errors, readability indices, presence of flash or video content, JavaScript, meta-tags, and whether the site passes the W3C tests.

A visual screenshot of the full-length front page of each prospect's web site was included in the sortable rankings report, adding quick convenience and providing a reference for later comparison.

Now that they could effectively review their prospect base, they realized that their list of local web site URLs (based mainly on business survey data) was incomplete.  With Aware Research they created a project for automated discovery of web pages of locally owned or managed businesses.  We won't say here how we did it, but we discovered about twice as many as were available from the highly respected business directory.

The client saved hundreds of hours and thousands of dollars focusing first on their most highly ranked prospects, of which they now have three times as many.

Playing to Stay Ahead

A small business found a lucrative niche making custom carrying cases for the latest toy fad sweeping the world. The toy, with various accessories, was being sold at more and more novelty and game stores, and it was impossible to keep up a list, let alone assess which outlets were growing versus declining in sales and popularity.

With Aware Research, they specified a weekly analysis of mentions of the toy in the context of outlets selling similar items.  Additional outlets were easily highlighted for follow-up. Interested also in tracking mentions and popular sentiment of their product, they subscribed to game-player's discussion forums from which relevant text was extracted and summarized.

Working smarter, they've managed to stay ahead in a fast-paced market

Who's Who?

A manufacturer of a new robotics technique needed a list of key influencers in the field. While standard mailing lists are available by industry classification, they provide names only of top management rather than top scientists or engineers.

With Aware Research, this client was able to specify search and extraction for a comprehensive list of names consisting of inventors of patents, authors of papers, heads of laboratories and members of robotics associations.

The quality and precision of these leads paid off well in providing a realistic sense of who's who in the field and who best to approach.

Market Research and Exploration

A Marketing Manager is asked by her company president to prepare and present a report next Monday (!) exploring the competitive landscape for a potential new product line. 

A quick Google search shows hundreds of related pages.  What companies, individuals, products and technologies are mentioned in those pages?  How are they linked, and which predominate? 

She calls Aware Research and has the analysis, including company profiles, in her inbox when she gets to work the next morning

Event Promotion on the Web

A group organizing a new conference on emerging technologies needed to boost awareness and excitement.

With Aware Research, they identified web-sites, blogs and discussion groups relevant to their theme. Analysis of link clusters between sites identified key influencers, as well as additional relevant links. Focusing promotional activities on these key influencers worked well, and daily monitoring showed the message spreading. 

The conference was a big success, presenting a new challenge of finding an even bigger venue for the next one.

Syndicate content