Things have changed since that performance in the previous post from over two years ago. It's time to get back to some documenting!
I spent this beautiful saturday afternoon reading about a direct marketing company called Rapleaf. From their own site:
Learn about your audience. Analytics can tell you a rich story about your loyal users, and help you understand who is using your product.
They have a privacy section (registration required) where you can look at your own information as seen by Rapleaf. I signed up and recorded some numbers about their score on how well they know me despite never contacting me personally and asking for this information. Results!
They got two thirds of my personal information correct. The items they got incorrect were biased in an overly optimistic direction. For example, if Rapleaf thought I was very interested in underwater basket weaving but in reality, I am not at all interested in that, I would give it a value of 1 in the distance from reality attribute. If Rapleaf thought my estimated income was lower than it is in reality, then that category got a negative value.
The aggregate distance is my attempt to represent how much bias is contained in their data. The value of 22.78 makes sense as it is roughly 1/3. So that balances out. Obviously if they were 100% correct, all the values in the distance from reality would be zero and there would be zero percent bias.
I am very interested in analytics companies like Rapleaf. Since email addresses are now used as a unique identifier with higher frequency, they now have much more context embedded in those tiny characters than in the past. I'm curious how difficult it is to have an active email address that is used for communication with other living persons that will have a 100% bias in Rapleaf.