One of my favorite blogs, Variance Explained, does a lot of interesting work using secondary data, such as this examination of Trump's tweets. A few days ago, he posted yet another secondary data analysis, that examines whether software developers in different cities use different technologies and programming languages. Sure, he could have sent out surveys to programmers in different locations. But instead, he used Stack Overflow traffic data to answer his question. (I should note that he works at Stack Overflow, and so has access to data that we do not, but he does share the code he used to generate the data).
He examines data from the four metropolitan areas accounting for the most Stack Overflow traffic: San Francisco, Bangalore, London, and New York (where he is based). First, he compares the two US cities.
One clear difference: New York has a larger share of Microsoft developers. Many tags important in the Microsoft technology stack, such as C#, .NET, SQL Server, and VB.NET, had about twice as much traffic in New York as in San Francisco. This may be because many banks and financial firms, which are much more common in NY than in SF, use these technologies.When he expands his analysis to include all four cities, he finds that London has the highest proportion of developers using the Microsoft stack, New York has a higher proportion using data science tools (like pandas for Python and R - which I also use), and Bangalore leads in Android development. Even after bringing in the other two cities, San Francisco still leads in the same technologies listed above, except Android.
There are also patterns in the technologies that are more common in the San Francisco area, especially languages developed by Apple (Cocoa, Objective-C, OSX) and Google (Go, Android). We can also see several influential open source projects, especially ones associated with Apache (Hive, Hadoop, Spark).