Deprecated blog

June 9, 2012

Hi all – we’ve switched to our new blog URL at :

This site will be closed in a few days

http://demystdata.wordpress.com


Singapore bound…

April 20, 2012

 

Innotribe Startup Challenge

Matteo Rizzi, Innovation Manager, Innotribe, says “I’m delighted to announce DemystData as a semi-finalist and look forward to discovering more about the business. This year’s semi-finalists have assessed the developments and trends in the region and have identified opportunities in the market. The entrants have each demonstrated a forward-thinking and innovative approach to the financial sector and have developed start-up businesses which could have profound impacts on the future of the industry. I’m extremely excited to give DemystData the opportunity to pitch its ideas to some of the top decision makers in the industry”.

Enough said! Looking forward to seeing everyone in Singapore on April 24th,

About Innotribe

Launched in 2009, Innotribe is SWIFT’s initiative to enable collaborative innovation in financial services.  Innotribe presents an energising mix of education, new perspectives, collaboration, facilitation and incubation to professionals and entrepreneurs who are willing to drive change within their industry. It fosters creative thinking in financial services, through debating the options (at Innotribe events) and supporting the creation of innovative new solutions (through the Incubator, Startup Challenge and Proof of Concepts (POCs). It is through this approach, the Innotribe team at SWIFT is able to generate a platform that enables innovation across SWIFT and the financial community. For more information, please visit http://www.innotribe.com/.


Your data…Your asset…

March 7, 2012

Image

A month ago, the New York Times published an opinion piece entitled ‘Facebook is Using You.’ Effectively, they argued that the use of aggregated online data is an invasion of privacy and that a person’s online profile and/or behavior potentially paints an inaccurate picture of who they actually are.   At one extreme, yes, I agree- there is a much room in today’s society for marketers, health care providers, financial service firms, insurers, etc to misuse a person’s data based on their search habits or the types of websites an individual visits.  On the flip side, I would suspect that 9 times out of 10, there is some correlation between a users ‘web data’ and who they actually are.  In fact, I’d be willing to wager that for a large portion of the world, someone’s online profile is actually a more holistic representation of their character than may be found in more antiquated reputation databases.  I also think it’s important to decipher between data that is self reported, e.g. that which a user enters or provides on sites such as Facebook or LinkedIn when creating a profile or on Foursquare when checking in at a location, and that which is ‘mined’ online through the use of cookies and or other tracking mechanisms.   For context, at Demyst.Data, we focus on the former, and only that which is publicly available, and the application of such data solely for the benefit for the consumer.

It is our opinion that the ability to effectively access, analyze and deploy a person’s data creates an invariably better customer experience for the ‘goods’ of the world.  Online data provides many who otherwise would be considered ‘off the grid,’ think youths, immigrants, the un-banked and under-banked, with a mechanism to establish an asset and a dossier for which reputation laden industries can make informed decisions about such people.   Without this profile, they are essentially invisible with no access to relevant offers, no access to fair credit, and probably most importantly, no mechanism to transform and transition to being ‘on the grid’.

Curious about the information that is publicly available on you?  Look yourself up for a sampling.  If you don’t like what you see or feel as if your online footprint is not actually representative of the information you have provided to some of our partner sites, you can always opt out of our database by clicking here.


Why data transparency is good for ‘rejected’ customers.

January 8, 2012

 

 Image

 

 

 

 

 

 

 

 

 

If you’ve ever read our blog or navigated our site, you’ve likely seen the phrase, ‘removing information asymmetries’.  If you’ve sat through a meeting with us, you’ve been lectured on how data transparency can benefit the consumer.  Let me try to connect the dots.

 Asymmetric information refers to a situation in which one party in a transaction has more or superior information compared to another. Economist George Akerloff publicized the problems of asymmetric information in his 1970’s paper discussing the ‘market for lemons’ in the used car industry. He explained that because a buyer cannot generally ascertain the value of a vehicle accurately, he/she would be willing to pay only an average price for it. Knowing in advance that the ‘good sellers’ are going to reject this average price, the buyer removes the aforementioned ‘lemon seller’s advantage’ by adjusting downward the price they are willing to pay.  In the end, the average price isn’t even offered, only the ‘lemon’ price is.  Effectively, the ‘bad’ drive the ‘good’ out of the market.

 Image

A similar situation occurs in the credit markets.  Let us examine a case in which a lender is faced with uncertainty about the creditworthiness of a group of borrowers. Having to account for the bad risks, lenders are pushed to charge artificially high interest rates to cross subsidize their risk. Recognizing this and not willing to borrow at usurious rates, the good subset of creditworthy borrowers remove themselves from the credit markets.  Similar to above, the ‘bad’ have driven out the ‘good.’

This inefficient risk cross subsidization affects a large portion of the $many trillion financial services markets, and removing it will yield huge value in the coming years. The availability of information is paramount to realizing this value.  Fortunately, data today is being created at an unprecedented rate. 

At Demyst.Data, we are constructing the infrastructure and mechanisms to aggregate & analyze this data.  Our clients are working to engage the consumer to share their information and educating them on the benefits of transparency.  Together, we are removing the asymmetries necessary to draw the ‘goods’ back to the market and to help lenders make educated lending decisions.  We believe we’re engaged in a win/win game; hence, our passion, excitement, and enthusiasm about the potential value of improved information. 


Performance tuning mongodb on ruby

November 26, 2011

As we’ve added hundreds of interesting online attributes, we’ve been hitting some performance bottlenecks when processing larger, batch datasets. This hasn’t been an issue for customers thankfully, and it doesn’t affect our realtime APIs, but it’s still frustrating. I had a spare day, so it felt like time for a performance boost.

Here’s the executive summary :

  1. If you use threads, upgrade to ruby 1.9.X
  2. Index index index
  3. Don’t forget that rails sits on a database

To start I spun up a deliberately small cut down test server, set up a reasonably complex API, and used the great tools at http://blitz.io to rush the API with hundreds of concurrent requests

That spike at the start was a concern, even though it is on a small server.

All the CPU usage was in rack/passenger, so I dusted off the profiler. Thread contention was getting in the way. We need threads because we integrate with so many third party APIs. We were still on REE for it’s memory management, however that uses green threads, so it was time to bite the bullet and (1) update to ruby 1.9.X.

That helped a fair amount, but we were still getting the timeouts.

So we re-ran the profile and noticed a strange amount of time in activerecord associations and one particular activerecord and a different mongodb query. This led to a few things …

2. We didn’t dig in to why but mymodel.relatedmodel.create :param => X was causing some painful slowness in the association code. It wasn’t that important to keep the syntactic sugar; switching to Relatedmodel.create :mymodel_id => mymodel.id, :param => X saved a bunch.

3. We added a couple of activerecord indexes, which helped a bit. MongoDB indexes were working a charm, but there was one particular group of 3 independent indexes that were always used in conjunction, and the mongo profiler was revealing nscanned of >10000 for some queries. Creating a combined index helped a lot. Another couple of examples that remind us that, while ORMs are nice, you can never forget there’s a database sitting under all of this.

The result?

And no timeouts until about 150 concurrent hits.

The performance was already plenty in our production system (we automatically scale horizontally as needed), but this helped improved things about 2-3x.

That’s enough for today. We’ll share some more details on performance benchmarks in the coming weeks.

Any other thoughts from the community? Please email me (mhookey at demystdata dot com).


Integrate with Facebook connect with rails and coffeescript

October 26, 2011

Facebook’s documentation on authentication via Facebook and the graph API is very comprehensive … but sometimes a worked example still helps. Here is how you can add a “Connect with Facebook” button with minimal effort, using rails, coffeescript, and ruby

Get your APP ID

You need to register your app with Facebook if you haven’t already

Add standard facebook loading code

From here. Look under authentication, and copy/paste to application.js and/or /layouts/application.html.erb. Add this line to the script to make sure the async loading works

window.setup();

Add the div to your page

	<div class="field">;
		<fb:login-button size="large">
		  Connect to Facebook
		</fb:login-button>
	</div>

Do something with your newly logged in customers

For example if you want to access the logged in customer’s profile after they have logged in, to customize the page, you might do something like this in coffeescript:

$ ->
    window.setup()

window.setup = ->
    window.FB.Event.subscribe('auth.login', -> do_something()) if window.FB?

do_something = ->
    console.log "doing something ..."

    window.FB.getLoginStatus (authtoken) ->
        if authtoken.authResponse
            window.FB.api '/me', (fbdata) ->
                console.log "FB name : #{fbdata['name']}"}
                # Add interesting personalization logic here

… and you’re ready to go


We have a white labelled offering where we can host this for you, and return the data through painless APIs, in case you’re looking to get up and running even faster. email us and let us know what you’re working on.


Foursquare API integration

October 24, 2011

Foursquare’s API changes quickly, so this post may be out of date before you get started.

However they offer up to the minute, user generated, location based venue data that makes it well worth the effort, especially if you’re cross referencing it with other location based data sources.

As with many social APIs, there are requests which need oauth (i.e. the end user opts-in) and those which don’t. This post gives an example of integrating with the public (non-oauth) data using ruby.

A quick example :

  def foursquare latlon
    apikey = get_my_foursquare_api_key # sign up as a developer, hardcode your key here
    apisecret = get_my_foursquare_api_secret  # ... and your secret key here
    @url = "https://api.foursquare.com/v2/venues/search?ll=#{latlon}&client_id=#{apikey}&client_secret=#{apisecret}"
    hsh = download(@url) # use Curl or some other method to get the json results
    results = {}
    results = hsh.response.groups.first.items if hsh.response && hsh.response.groups && hsh.response.groups.first && hsh.response.groups.first.items
    results
  end

Hopefully this is quite self explanatory. The result is an array of all nearby venues.

For example if you wanted to find popular venues near DC :

  x = foursquare("38.898717,-77.035974")
  pp x
  puts x.first.name # West Wing

Or, if you’d like to save some time and avoid this work altogether, we integrate and aggregate a range of interesting data, and deliver it through 1 simple API, so you don’t have to