Ruby curl with follow redirects

October 23, 2011

Just a technical FYI; this took a little digging to find in the documentation.

If you’re using Curl::Easy in ruby to download html (or results from our API), then FYI the default is to NOT follow redirects. If you want to follow redirects and download the page contents of the target, you’ll need to set the option option


to true.

Here’s a code snippet :

  def download_url url
    res = ""

    tries = 0
      tries += 1
      easy =
      easy.timeout = 30
      easy.follow_location = true
      easy.url = url
      res = easy.body_str 
    rescue Exception => e  
      retry unless tries > 2
      puts "#{url} failed, returning empty string, #{e.message}"

    ic ='UTF-8//IGNORE', 'UTF-8')
    res = ic.iconv(res + ' ')[0..-2]

    return res.downcase

An Aggregation API for all

October 21, 2011

The simplest starting point for any user is the basic Aggregation API, where you can pull together all of the best customer data based on minimal inputs. This is for the more advanced users that want to build their own analytics.

Are you looking for Yahoo or Google data? Geolocation data? Or demographics by email? This Aggregation API may be a great way to start.

This was always available, but we’ve now given it the pride of place that it deserves and a permanent, static endpoint (/engine/raw).

Offer targeting with Demyst.Data

October 17, 2011

Something we’re proud of here at Demyst.Data is the ability to create APIs based on minimal input variables. The most common use case we come across is offer targeting. In short this means that you can guide your visiting customers towards the most relevant products, by predicting something (such as conversion likelihood) based solely on their IP

To learn more, see our ‘how-to’ :

Adding twitter and other social data

October 13, 2011

Based on client requests, we’ve been hard at work tapping in to additional useful variables. A quick update on highlighted additions :

  1. Connectedness measure for an email
  2. Twitter mentions, we previously had presence, but now we’re tapping in to the twitter API to handle keywords and hashtags. E.g. If you query on twitter=mhookey (or @mhookey or #mhookey) then you find the number of times people mention hate/fail or love me
  3. Twitter velocity – I.e. What rate are tweets arriving for a given keyword

We’re always open to special requests, so please let us know if you have an upcoming project that requires integrating with better web data, segmentation, or predictive analytics but haven’t quite figured out how to apply the toolkit.

What-if (we threw an analytics party and everyone came)?

October 11, 2011

We’re pleased to announce the release of what-if (scenario testing) functionality for each API, all included within the base package.

This allows you to perform scenario testing on your underlying API. For example if you build a conversion API, where product offered is a variable, it can be nice to test the impact of changes to product offers. This is now possible :

Be warned though, if you want to draw strong conclusions from this analysis you’re predicting a counterfactual scenario. To do that with the most confidence, statistical purists would strongly suggest you need a randomized experiment (in this case random in the product offer variable). Even if you don’t have this, our modeling approach bring in as much third party data as possible to remove the biases inherent in an historical analysis, as such it can still suggest where the low hanging fruit might be.

Try it out, under ‘what-if’ on the left hand side.

Clearer data attributes

September 25, 2011

In our continued effort to demystify data, we’ve recently published our available attributes, which clarifies which inputs are required for each attribute. We’re continually updating this list, so please let us know if you have any suggested additions.

Occasionally it can be nice to avoid using (or even seeing) particular attributes. We’ve recently added support for this too … within the Account page. Just enter a comma separated list of values, and when third party data is being appended, any fields with names including this text will be skipped.

If you have any questions or suggestions, please let us know.

Optimize your web forms; the conversion rate vs accuracy trade off

September 8, 2011

We all want to ask as few questions as possible on our web forms. However each question adds incremental value. How do we think through the trade-off of additional questions (leading to accuracy) vs simplicity?

First, let’s illustrate the tradeoff here.

We’re trying to find the optimal number of questions, where conversion is maximized, subject to some minimum level of information content.

The first, perhaps obvious, observation here, is that third party data is always a good idea. You get extra information content, for example to customize offers and look and feel, without impacting on the consumer experience.

Next, we need some way to test the information content of various subsets of the questions. Demyst.Data offers a way to do this – but the concept is pretty simple.

1. Upload your exhaustive questions, and a target variable

2. Fit some scorecard or segmentation that you’re happy with

Here’s ours. This can be thought of as the ‘taj mahal’ workflow (i.e. all questions are included).

3. Delete columns, rinse and repeat

The next step is to delete each column, and refit the entire scorecard, and plot side by side. Again, here’s one we prepared earlier.

The orange line, the baseline, is flat (clearly if you don’t ask any questions then predictive lift isn’t possible). The red line is what it looks like if no “Demyst” data is appended. All this means is we’ve temporary turned off the third party data and refit. The “without demyst” line is almost as steep as the full ‘taj mahal’ line. In a real dataset, this might mean you wouldn’t bother buying third party data (not something we’d advocate – actually what’s happening here is the emails are always joe, or john, so it’s not surprising that it’s not adding much value).

4. Keep going

There’s a near limitless number of permutations of this exercise.

No we can see that credit and email as standalone don’t add much value. Age is really the winner here, suggesting a radically simpler quoting process.

We don’t have the full picture yet, since we don’t know if that reduction in lift is compensated by a corresponding lift in conversion thanks to a simpler workflow. That’s a topic for another post.