Google What?

June 7, 2011

A few weeks ago, Google made its foray into financial services with Google Advisor.  The site, in essence a price comparison engine, bills itself as one-stop shop for financial services designed to help users easily find relevant products from multiple providers, compare them side by side, and apply online.  Available only in the US, Advisor allows users to create customized searches for products including mortgages, credit cards, CD’s, checking, and savings accounts.  The site then, typically within 2 seconds, produces a list of offers that match the user’s criteria along with lender contact info and rates.  Finally, Google is only paid when users contact lenders for mortgages.  In all other products, Google’s listings are sorted exclusively by APY.

Google Advisor
This new domain, which within its first few weeks has solidified top placement in search and garnered 75k YouTube views of it’s “how too” video has left many people scratching their heads asking, “Why?”  A couple of thoughts on that:

1-     According to their blog, Google had already constructed and begun testing a mortgage comparison tool in 2009 and therefore, adding other financial offers to this product was relatively easy.

2-     5 days after the launch of Advisor, Google announced the acquisition of Sparkbuy, a consumer comparison site, and that Sparkbuy’s 3 person team was joining Google as employees working on the Advisor product, e.g. the perfect operating team.

But rather than ponder “why”, because frankly, I think the answer is, “because they can,” I found myself navigating the site trying to determine “who”, in today’s economy, finds a site like this useful?  Practically speaking, the only measure of creditworthiness on the site is self entered, the rates are variable and for comparison only, and the end result of your “customized search” is still an application away from an offer of credit.  So, for those of us with a 780 FICO score and the entire spectrum of credit products to choose from, I guess yes, Advisor could be considered a good source of information.  But what about for the other 80% of the US population?  Think for a second about an average US consumer who is searching for a loan.  Their intent is unambiguous- cash or access to credit as quickly as possible at the lowest available interest rate.  Through this lens, Advisor’s process of choose the loan that’s “right for you,” browse offers, contact the advertiser, apply, and then get accepted or rejected seems to me to miss the mark.

Now of course, a workflow that facilitates actual lending is much more complex than a comparison tool, which in fairness, Advisor only claims to be.  But if anyone, Google has the brand and resources to meet user objectives.  Surely, they know enough about each of us to head down the necessary path of individualized customized product offerings, yet, have chosen to shy away.  I’ll go out on a limb and predict that Advisor’s intent is not to send free, non-biased referrals to lenders forever, and that in all likelihood, what we are seeing today is only a first iteration.  Thoughts?


House price data

December 2, 2010

Today’s economist chart :

This shows the widely quoted case shiller index.

As always applying the data lens to this : there is some significant risk of bias here. My understanding of the index is they look at value weighted percent changes in registered houses from period to period within each of a basket of representative cities.

That misses a couple of things :

1. New homes

2. Added value of renovations (although I recall reading they seek to remove those from the sample)

3. Inter and intra city mix shifts

With so much riding on home prices, there are more options out there … such as … however just like with credit scores (with FICO) there is a widely understood although perhaps inaccurate norm that is quoted in the popular media that seems to serve as the benchmark for most decision makers.

With vertical disintermediation (such as secondary markets for mortgage backed securities) it seems there is even more of a tendency to rely on summary metrics – perhaps this is just practicality. Economists talk of vertical integration being caused by, among other things, very high cost of contracting. In secondary credit markets this seems to exist – e.g. there are contract terms around mortgage prepayment risk and securities tranched by average FICO. I’m not sure if they have terms related to widely used indices like case shiller – but it wouldn’t surprise me. Accounting for the inaccuracies in these measures counts as a high cost of contracting. In fact since data is a competitive advantage for the security originator, it might even be a case of the paradox of information (you can’t contract for it without giving away the ‘secret sauce’).

With these considerations being only exacerbated by increasing data sources, are we going to see the most profitable US banks – heaven forbid – staying veritically integrated, and even securitizing less than they otherwise would?

China’s unreal exchange rate

November 18, 2010

Finally, some sensible journalism :

It’s a bit of a chore to listen to politicians debate exchange rates … The US for a long time has been asserting that China is ‘unfairly’ offering an advantage locally through undervalued rates … and more recently China is retorting that America’s new monetary stimulus is reckless (heaven forbid it might devalue their USD denominated debt).

But isn’t this all missing the point? Why aren’t politicians discussing the real exchange rate (which takes prices in to account).

If China doesn’t revalue up more rapidly, then prices will rise, either way the imbalance is removed. Sure, it will take longer, prices are sticky. Some of my professors at Columbia quantified this : … I think they found it took around 6 months to adjust but don’t quote me on that.

Indeed the real exchange rate with China has risen quite rapidly. The gov’t controls the nominal rate, but can’t control the prices – the real rate will equalize over time.

There’s a lot more to the economics … but my angst is the politicians, why (arguably) waste so much bandwidth on this non-issue?

Facebook mail – good for us, bad for them?

November 16, 2010

There are already too many posts on this, and it’s completely unrelated to Silne’s focus, but I can’t help but think about the economics of facebook’s new messaging (“don’t call it email!”) offering.

First let me say that it seems cool for some use cases. I.e. maintaining a log of all communication with 1 person, across chat, email and more. This is great for personal usage. Usually I do want to pick up where I left off. Skype offers a massively cut down version of this that I use all the time.

However this has strategic implications for them. Before thinking through those you have to have a view on the source of their competitive advantage and defensible position, is it (a) proprietary culture (product development capabilities) allowing them to release better products and features than anyone else, or (b) the network effects associated with owning your social graph (the usefulness is a function of the other users, as such new networks can’t launch)? I believe it was a, now the high valuations are based on b.

With this new feature you’ll be able to communicate more freely with friends outside facebook. Further, with IMAP and jabber integration you’ll be able to communicate outside (i.e. without seeing the ads). You can do that already with wall posts (e.g. tweetdeck). If you think of facebook as the location to communicate with friends (rather than post static information) then they’ve released their grip on that.

Now lets fast forwards, and imagine 20 other cool products (including google) allowing a website and other apps to socialize. They can now interface with most of facebook. Users with facebook accounts (to store their social graph only) can use whatever tool they want (no value to FB). Users without facebook accounts can still interact with facebook users (although can’t see all of the relevant content).

Users will be more likely (than prior to releasing this product) to choose the interface/product that suits them (rather than the one their friends are on). Not great for them.

But then again, they want to ‘connect the world’, not make money, right? – too much information?

November 12, 2010

A week or so back mint quietly launched … to provide depersonalized data on shopping trends – E.g. average purchase amounts at tiffany – scraped from mint user accounts.

Interesting data to browse … some commercial questions :

1. I assume they’ve been (at least trying to) resell this data to retailers for a while … why give it away? There are some folks (including the banks) in the business of reselling credit card transaction histories for this purpose that must be finding it tough to compete against this free offer. All I can assume is that they feel that as a benchmarking tool the data will incent new subscribers to sign up to mint (and thus produce some lead gen revenue). However it may also create fear about data privacy.

2. Perhaps they may also attempt to develop an ecosystem by exposing APIs on this data. See an interesting post here : Mint Data Offers a Glimpse Into the Future — and It Is Very Good. (although they could have done this without the public service). How comfortable are users with sharing this type of data with other users (indirectly via benchmarking)?

3. Is it sufficiently depersonalized to protect user data privacy? How many purchases are there each month at Kroger in Carolina Peuro Rico by mint users? Not many I’d guess. This is a more general question – many legal frameworks consider depersonalization to be a binary state – i.e. data with Name/Address/etc is treated one way, data with these columns stripped another. However through fuzzy matching with more dimensions often “depersonalized” data can be used (intentionally or otherwise) to identify the individual. That’s fine in some contexts, but it’s ineffective that the norm is to delineate on the basis of the existence of these columns, rather than setting constraints around the potential to identify individuals.

Like This!

Rating agencies deserve more attention?

September 1, 2010

It seems Moody’s, for it’s part, is out of reach of the SEC

Without wanting to rehash years of analysis of the financial crisis, there’s something a bit odd about this discussion and their role in the financial markets. They’re alleged to have made a mistake in the credit models that investors rely on … but why are investors relying on them? Can investors really shirk this responsibility?

It’s particularly related to their private information, and how debt investors using them as an information shortcut.

Consider equity investments … there are strict accounting and disclosure rules to (attempt to) ensure investors are on equal footings, and management are very careful not to create opportunities for insider trading. However the story is different when firms issue debt as there are no such constraints. Management wants a bond issue. They woo the rating agencies with pitch books filled with inside information in order to achieve the highest possible rating. The rating agencies digest this and in turn produce a rating related to the estimated risk of default… and this is all perfectly legit.

Why, when we consider this from first principles, should they play this role? Are the investors not able to digest this information? Is the risk of debt really that much harder to judge than equity? Through this process the system may place too much faith in trusting the modeling skills of 3 private firms … so it’s not surprising that CDO’s we’re mis-classified.

Accounting standards are there for a reason – to create information symmetry among market participants. Debt in general seems to be fraught with information asymmetry. Examples of firms with legit private information (versus public filings) include :

  • Rating agencies (who transmit information via ratings)
  • Credit default swap issuers (who transmit information via prices)
  • Junk bond investors (who don’t transmit the information)

This may not be a bad thing … from a market efficiency standpoint perhaps you could argue that any firm is free to participate in these markets … but in the light of the regulatory pressure to push responsibility back to bank shareholders (and away from government), why not consider the same in debt markets?

Bolivia & Microfinance credit data

August 24, 2010

How will Bolivia’s credit bureaus evolve?

I have been doing a little reading on Bolivian credit data recently. I don’t know much, but here’s how it seems from the outside.

Bolivia is one of the poorest nations in Latin America. Their credit bureau is still relatively nascent as I understand it, with only ~40% of the population included in the government mandated databases, and I suspect not a huge amount of trade line depth (although this is speculation, perhaps they only have negative reporting). It is a public bureau, with participation mandated by the government. Further, interest rates are significantly higher than inflation rates … at around 40% … which either speaks to a high default rate or significant information asymmetries. Are these items related? Could better data sharing directly lead to growth? I suspect so, but there are hurdles to overcome.

It’s well reported that micro-finance leading to economic growth, however, micro-finance, like any lending activity, depends on sourcing information to evaluate creditworthiness. In much of microfinance, this information is derived from what economists call an “information shortcut” … i.e. a decentralized lending officer makes a judgement based on intangibles. As personal lending markets evolve, they tend to transition towards data driven rules / scores to codify the existing processes, which lowers the search costs (by reducing the lending officer’s role and spreading the learning curve across all borrowers), improves marketing effectiveness (to those who can borrow), and reduces the risk of fraud. Perhaps most importantly, credit data allows responsible borrowers to build a reputation. As such improved credit data sharing would enable micro-finance and more traditional banking, which would, in turn, contribute to Bolivia’s economic growth.

How will this occur? It turns out Bolivia has one of Latin America’s most vibrant and competitive microfinance sectors. Perhaps sharing data among these would be feasible. This paper demonstrates that predictive models on microfinance data can predict default risk based on currently collected attributes (however there are some issues with overfit & bias in this particular model – e.g. lending officer is likely an endogenous variable in predicting risk). The overall fragmentation of this market would suggest significant returns from data sharing. However there are some regulatory hurdles to overcome, and perhaps the loan amount and volume haven’t yet reached the point at which the significant investment required to gather and process these data is warranted.

I suspect the data owners will lead the charge, if permitted by the government. I.e. the aforementioned alternative data owners such as telcos are well positioned either to sell their data to lenders, or lend directly.

If there’s anyone out there who knows more about the Bolivian credit bureaus, or microfinance participation in bureaus elsewhere, please comment.

Mark Hookey

usa : +1 646 291 6884
aus : +61 415 605 468
skype : mhookey
blog :

The information in this transmittal (including attachments, if any) is confidential and is intended only for the recipient(s) listed above. Any review, use, disclosure, distribution or copying of this transmittal is prohibited except by or on behalf of the intended recipient. If you have received this transmittal in error, please notify me immediately by reply email and destroy all copies of the transmittal. Thank you.

US mobile players finally enter the credit card business

August 3, 2010

As in much of the developing (or on this dimension more developed?) world, US telcos are exploring their options in the credit card space. Technology players are also developing related competencies.

Consider the implications not only for credit card originators and credit providers, but also for credit bureaus. No only do telcos already have a fully fledged credit department, they also have great insight in to consumer behaviour that can supplement the traditional credit score.

Many players (including Silne) are sourcing what is known as “alternative credit data” for this purpose. Telcos have some of this in-house. It takes significant modeling expertise to tease out the insight from this, but they will get there in time.

“Free” credit reports?

July 29, 2010

I’m not sure many folks recognize just how lucrative the numerous “monitor your credit report” services are for the credit bureaus and others. Here’s the typical sequence :

1. Advertising campaign oriented around fear of identify theft or catchy jingle

2. You sign up to view your credit file with the 3 major bureaus (experian, equifax, and transunion), often thinking it’s free (there’s debate about whether they have misleading advertising practises)

3. They put you on to a fee of around $15/month associated with a monitoring service. You could cancel at any time but many don’t.

(This is not to be confused with – a government mandated free service to get your annual credit report)

Are customers being misled … or do they really value this service at $15/month? If it were the latter, then why not be more transparent about the cost up-front? Take a look at the landing page of each of the bureaus (,, and and see how prominent the word “FREE” is. Maybe some people value the service at $15, and some may even get that sort of value if they’re the target of (frequent) identity theft, but I’d guess many just aren’t paying that much attention.

Maybe this is ok if the bureaus have high costs. Eh. Hang on. Isn’t this data they already own? How much does it cost to run a query and send an email again? The marginal cost has to be pretty close to zero; this is pretty much all marginal profit. There are some fixed advertising costs to sign people up for this.

It’s hard to get granular data on the market size, but it might be >$400m/year at a rough swag (Equifax is $149m/year in personal solutions, Experian has ~10m customers in the US & UK, Transunion is privately held). Some of this includes identity theft protection to be sure, so add some large error bars on that.

They are not scams as some sites claims – the sites do outline the fees for a valuable service – however I hope everyone is going in with their eyes open. Actively monitoring and managing your credit report is important – I hope you’re making a conscious choice about the value of that.

Any thoughts?

US consumer protection wiki extract

July 21, 2010

A summary of some but not all relevant consumer protection acts in the US (directly from wikipedia) :

  • “The Home Mortgage Disclosure Act (HMDA) of 1975, implemented by Regulation C, requires financial institutions to maintain and annually disclose data about home purchases, home purchase pre-approvals, home improvement, and refinance applications involving one- to four-unit and multifamily dwellings. It also requires branches and loan centers to display a HMDA poster.
  • The Equal Credit Opportunity Act (ECOA) of 1974, implemented by Regulation B, requires creditors which regularly extend credit to customers, which includes banks, retailers, finance companies, and bankcard companies, to evaluate candidates on creditworthiness alone, rather than other factors such as race, color, religion, national origin, or sex. Discrimination on marital status, welfare recipience, and age is generally prohibited, with exceptions, as is discrimination based on a consumer’s good faith exercise of their credit protection rights.
  • The Truth in Lending Act (TILA) of 1968, implemented by Regulation Z, promotes the informed use of consumer credit, by standardizing the disclosure of interest rates and other costs associated with borrowing. TILA also gives consumers the right to cancel certain credit transactions that involve a lien on the consumer’s principal dwelling, regulates certain credit card practices, and provides a means for resolution of credit billing disputes.
  • The Fair Credit Reporting Act (FCRA) of 1970 regulates the collection, sharing, and use of customer credit information. The act allows consumers to obtain a copy of their credit report records from credit bureaus that hold information on them, provides for consumers to dispute negative information held, and sets time limits after which negative information is suppressed. It requires that consumers be informed when negative information is added to their credit records, and when adverse action is taken based on a credit report.”

… Regulation Z is getting more air time recently following the financial crisis, with lenders seeking data to help evaluate a customers ability to borrow.

Unbiased estimator variable selection

July 16, 2010

We’re continuing to test the tendency of modeling processes to overfit, per an earlier post.

The issue with this approach is that, in most practical settings, variables are either in or out, based on some variable selection process.


When a variable is “in”, typically the parameter is considered a best unbiased estimator … in normal speak this means that if the average in the sample data is .3, the the parameter will be such that the model predicts .3.


This is why, with sparsely populated variables, there is such a risk of overfitting when including too many variables – the model will fit to the sample noise.

Japan credit study

July 10, 2010

Great paper from PERC from 2007


“At a 70 percent acceptance rate, a Japanese
lender using full-file credit reports would have a default rate that
is conservatively estimated to be between 9 percent to 26
percent lower than a lender using any of the incomplete or
negative-only credit reports currently used in Japan.”

Removing information asymmetries through more complete credit bureau files leads to more granular segmentation and cheaper access to capital for the responsible subset within a segment.

I’m not sure how much we can take away from the regression which correlates credit bureau development to GDP.



Stepwise overfitting example

July 7, 2010

We’ve been generating some data in order to test modeling techniques on sparse high dimensionality data. In a simple example, in which categorical variables are created with levels sampled from a random uniform distribution, the effect of overfitting is significant. In this example a stepwise AIC method was used to select variables, then ROC curves produced to demonstrate fit quality in and out of sample. Here is the result :

This is a well known effect – stepwise models tend to lead to overfitting. Harrell is famous for his diatribe against this, a useful summary of which is here :

However, in practise these warnings are rarely heeded (whether though automated stepwise or “human stepwise” – i.e. not leveraging domain expertise when choosing between models, rather just blindly using quality of fit measures).

This is serving as a useful benchmark against other related modeling techniques.

Growing data volumes

June 25, 2010

It’s self evident that data volumes are growing exponentially … see this chart from the economist a while back :

What’s less frequently discussed is the enabling software technology that is making helping to incorporate the technology in to business decisions. This includes Excel 2007+, which allows greater than 2^64 rows, faster machines including 64bit (without the 4gb windows limitation), and more comfort with tools like SAS/SQL/Emblem that tackle large models. As a result the “average” user within a corporate can now crunch all of these larger datasets.

It’s human nature to seek explanations of things, however, just because the technology will handle the data and the modeling tool says “fit converged”, one must retain some statistical caution and common sense. I’ve anecdotally seen many occasions in which users massively overfit just because the tools allows them to – alternatively put occam’s razor used to be enforced not by good sense but by technical limitation. I wonder what will limit us in the future?

Overfit credit models

June 23, 2010

Within the lending markets, consumer data volumes are growing rapidly. These data include internal customer information, patchy credit bureau information, including positive and negative records, and sparse data from alternative sources. there is an increasing need for granular multivariate predictive processes to apply…

… however many models are either too simple with too few degrees of freedom versus the available data or overfit due to the availability (and misuse) of powerful modeling tools like GLMs in SAS.

There is the need for improved modeling techniques, however we should all be careful not to throw degrees of freedom at the problem.

Common R modeling commands

June 22, 2010

A short post on basic R syntax :


fit <- glm(function, data, family = binomial)

y <- predict(fit, type=”response”)

Penalized logistic regression

fit <- lrm(function, data, penalty=1)

Classification tree

fit <- ctree(function, data)

Bias reduction logistic

fit <- logistf(function, data)