e - media business group
about us databases methodoloigy sic categories faq contact us
How We Compile The Best Databases
Over the past 20 years we have seen many different databases and the claims made by most data sellers and compilers. None of these claims are backed by any good quantification mechanism and thus we saw the need for a good quality scoring mechanism. We at E Media have built expertise in mining data from the unstructured sources. As we work with our customers to identify the sources and write complex code to compile data from constantly changing web pages, we realized the basic problem of measuring quality. To this end we have engineered our own system called the MEDIA STRUCTURED SCORING SYSTEM (MSSS) which allows us to score each record using a formula that considers weight of every field in the record. The average of all the scores in the dataset is the MSSS score of that dataset. Our goal is to never let our data score drop between two deliveries.
For all the fields in the structured dataset, we assign a weight which conveys the importance of the field for the value of each record. The weight is in the range of 5 to 100. 100 weight indicates the highest importance of that field for that record to be valuable. 5 indicates good-to-have. For example, we are using following weights for our fields. We may tweak these based on the value the customer of our data places on these fields.
Score of a record = 100* (Sum of weights of all the fields that are not empty)/1060
Score of a dataset = Average of scores of all the records
* 1060 is the max score a record can get by having all the fields available.
ID - 100 PHONE - 50
SIC CATEGORY - 100 FAX - 50
COMPANY NAME - 100 URL - 25
ADDRESS - 100 EMAIL - 50
CITY - 100 CONTACT PERSON - 10
COUNTY - 50 CONTACT TITLE - 10
STATE - 100 ANNUAL SALES VOLUME - 5
ZIP - 100 NUMBER OF EMPLOYEES - 5
LONGITUDE - 50 LATITUDE - 50
Basic quality of data:
MSSS method expects every dataset to follow basic rules of data sanity. It doesn’t consider quantity of records as a measure of quality.
a) ID is always unique
b) All blanks are converted into nulls
c) All strings are trimmed
d) Data sanity checks are added for every field.
a. ZIP code can’t be more than X characters based on the country
b. STATE and CITY must exist in that country
c. EMAIL, FAX, and PHONE must have a valid format.
d. LATITUDE and LONGITUDE must be within boundaries of that country and must be valid numbers
e. ANNUAL SALES VOLUME must be within an acceptable range for that currency and country.
f. NUMBER OF EMPLOYEES and YEARS IN BUSINESS must be within an acceptable range.
e. COMPANY NAME, ADDRESS, CITY and STATE (Phonetic name using SOUNDEX algorithm)
must be unique for that dataset.
Our Products Are Ideal For:
- Finding Qualified Sales Leads
- Market Research Studies
- Intelligence Gathering
- Direct Mail Campaigns
- Telemarketing Programs
- Sourcing New Suppliers
This Web Page Created with PageBreeze Free HTML Editor