Data Analysis

Data analysis has the goal of highlighting useful information, suggesting conclusions, and supporting decision-making. Aware Research applies an advanced toolkit of statistical, linguistic and structural techniques, depending on your needs.

Data cleansing

This refers to identifying incomplete, incorrect, irrelevant, inconsistent parts of the data and correcting by replacement, modification or deletion.  This generally does not include the "noise reduction" step of eliminating advertisements, adminstrative text, or repeating blocks by page segmentation, nor the data validation step during extraction.

Duplicate detection and correction is included in the above, but crosses over into the linguistic or statistical realm when near-duplicates must be identified by phonetic or fuzzy matching.

For many projects, this is all the analysis that is required before delivery.

However, for those who need a little more:

Statistical methods including

  • Classification by Bayesian methods or by support vector machine.
  • Clustering by k-means, nearest-neighbor, etc., with a variety of distance metrics.
  • Dimensionality reduction by singular value decomposition or principle component analysis

Linguistic methods including

  • Tokenization, for second-order queries using words, analysis of word and ngram frequencies, co-occurrence
  • Segmentation by paragraph and sentence, necessary for specifying words co-occurring within a sentence or within a certain window on the page
  • Stemming/lemmatization, reducing words to simpler forms for broader matching
  • Shallow parsing, e.g, extracting noun phrases for summarizing or matching content
  • Named Entity Recognition, for names of people, places, organizations, events, etc.
  • Miscellaneous methods including readability metrics, spelling checking, text normalization, synonym generation...

These analytical tools, along with our software and hardware infrastructure, provide a great deal of capability helping us to help you.

Women-Owned Businesses on the Move

A business woman, and leader in her community, worked to promote the growth of woman-owned businesses in her area. To amplify her efforts and help make her case, she required substantial data.

Aware Research went to work, extracting data on local businesses, with focus on NAICS code, estimated employees, estimated sales, and identification of each company's executive or owner.

Aggregate analysis by type of business, related to number of employees and annual sales, identified some key areas of interest differentiating (on average) the businesses run by women from others.

Graphical data visualization for a few presentation slides completed the task.

The client was pleased with her results.  We were pleased to help her and her cause.

Prospecting for Web Development Clients

A web development company sought additional clients. Their immediate difficulty was not too few prospects, but too many -- it seems everybody has a web site these days.  But how best to select prospective clients in the best position to benefit from their services?

With Aware Research they defined criteria including front page length, quantity of links, key phrases, possible spelling errors, readability indices, presence of flash or video content, JavaScript, meta-tags, and whether the site passes the W3C tests.

A visual screenshot of the full-length front page of each prospect's web site was included in the sortable rankings report, adding quick convenience and providing a reference for later comparison.

Now that they could effectively review their prospect base, they realized that their list of local web site URLs (based mainly on business survey data) was incomplete.  With Aware Research they created a project for automated discovery of web pages of locally owned or managed businesses.  We won't say here how we did it, but we discovered about twice as many as were available from the highly respected business directory.

The client saved hundreds of hours and thousands of dollars focusing first on their most highly ranked prospects, of which they now have three times as many.

Playing to Stay Ahead

A small business found a lucrative niche making custom carrying cases for the latest toy fad sweeping the world. The toy, with various accessories, was being sold at more and more novelty and game stores, and it was impossible to keep up a list, let alone assess which outlets were growing versus declining in sales and popularity.

With Aware Research, they specified a weekly analysis of mentions of the toy in the context of outlets selling similar items.  Additional outlets were easily highlighted for follow-up. Interested also in tracking mentions and popular sentiment of their product, they subscribed to game-player's discussion forums from which relevant text was extracted and summarized.

Working smarter, they've managed to stay ahead in a fast-paced market

Market Research and Exploration

A Marketing Manager is asked by her company president to prepare and present a report next Monday (!) exploring the competitive landscape for a potential new product line. 

A quick Google search shows hundreds of related pages.  What companies, individuals, products and technologies are mentioned in those pages?  How are they linked, and which predominate? 

She calls Aware Research and has the analysis, including company profiles, in her inbox when she gets to work the next morning

Syndicate content