From macro-economists versus micro-economists to macro statisticians versus micro statistician: The gulf between big data and small data scientists

VW -think-small-adsIn the 1950’s and 60’s it was “Think small” ad for Volksgen’s Beatle car that marked a radical shift in an era that was dominated by large cars in the US. For anyone who studied advertising or works in the industry would recall this iconic add – considered as a classic in the field in several ways — but for the most part it created a new way of consumer thinking for the advantages of small cars versus big cars.

Today for someone in the field of data science, “big data” is the buzzword. More and more jobs are emerging in this area. Every day on my LinkedIn, I see several opportunities in this area. So I see the world of data science also facing a similar divide – big data versus small data. While we don’t hear much about small data as much as we hear about the term “big data” as it is more fancy, sexy, and requires highly sophisticated programming skills to model the problem, there are distinct problems and areas of application where each has its own place and utility.

Big data and some of its applications: Big data is associated with machine learning and applying algorithms to extract data from the web to search for pattern and trends based on millions of records. For instance, in text based analysis this would mean web crawling through millions of newspaper articles and editorials through which a computer can identify specific articles that a researcher or analyst is looking for. For a spatial statistician, this could mean employing computer algorithm to extract location information from millions of records about specific events of interest. The world of big data analytics is dominated by computer scientists, statisticians, political scientists interested in studying issues pertaining to conflicts, or public opinion. It is also dominated by companies and industries that are looking to capture consumer behavior and trends. So companies like Amazon and Google can model consumer pattern and forecast demand or decision-making.

Small data and its application: We don’t hear much of the term “small data”, but as a public health and policy professional, I see a lot of problems that need to be addressed in the field of public health, epidemiology, census that require one to deal with counts and small numbers that can be modeled correctly and be used to make valid inferences. This requires domain knowledge of distinct set of statistical models and tools. Organizations where knowledge of small area analysis and estimation would be helpful would be CDC, Census Bureau, and community level program planning and evaluation.

a) Survey methodology: For large scale health and population surveys implemented in developing countries, the sample is representative of the population at a larger regional scale, but often not for small geographic scale. In such a situation small area estimation techniques or interpolation is of interest to make inference about a geographic unit where sampling was not done on specific health outcome.

b) Sentinal Surveillance: This involves surveillance at a specific site or a location for detecting disease outbreaks or new cases of specific diseases. According to WHO sentinel surveillance is appropriate to gather high quality data when passive surveillance system ( generally based on data reported by health workers and health facilities) is not adequate to identify causal factors for certain diseases. However, because data is monitored at specific sites, hospitals, or locations, it may not be appropriate for detecting cases outside of the selected sites.

c) Community based program planning: In case of community and program planning, an application area would be improving a health intervention at a specific site and location. For instance, USAID allocates funds for HIV testing and treatment at specific sites and in several countries. Hence it might be interested in knowing which clinics are doing better in comparison to other clinics. According to the PEPFAR Annual report to the Congress, there exists a wide variation in disease burden and HIV risk at the sub-national level and sub-populations level. Hence, knowledge about distribution of cases around specific sites, uptake in the service utilization can help improve programs. Similarly, AidData, a collaboration between three universities, to track where aid money is going and in which programs by country and by year and based on the type of the project, works in the area of geospatial impact evaluation. Hence, it borrows traditional statistical methods such as difference-in difference and propensity score matching and other methods, but also takes into account site location of the project. It identifies sites where World Bank did not implement a project, thus acts as a control site. By accounting for location of the project implementation site, it considers heterogeneity in program outcomes while conducting impact assessments.

Economics as a discipline has always been demarcated between marco and micro economics. Is it time we divide statistics also as a discipline between macro and micro?

82 thoughts on “From macro-economists versus micro-economists to macro statisticians versus micro statistician: The gulf between big data and small data scientists

  1. Pingback: Backlinks

  2. Pingback: Trusted online casino Malaysia

  3. Pingback: led screen

  4. Pingback: seo supplier

  5. Pingback: porous aerogel particles

  6. Pingback: ways make money iphone

  7. Pingback: buy subutex online

  8. Pingback: click here for more

  9. Pingback: this link

  10. Pingback: gvk bio company news

  11. Pingback: taxi service in udaipur

  12. Pingback:

  13. Pingback: game cheats

  14. Pingback: Escort kiz

  15. Pingback:

  16. Pingback: Iron Nitride powder

  17. Pingback: Houston Car Accident Attorney

  18. Pingback: Encinitas Home Remodeling

  19. Pingback: sunetics hair growth laser

  20. Pingback: Homes for Sale in Bossier City, LA

  21. Pingback: forex signals

  22. Pingback: ผลิตครีม

  23. Pingback: check this

  24. Pingback: the hidden wiki

  25. Pingback: apex legends

  26. Pingback: 加拿大遊學團

  27. Pingback: blackedraw videos

  28. Pingback: Best Hotel in Gangtey

  29. Pingback: صور العرب

  30. Pingback: https://гидра.сайт/

  31. Pingback: finance company

  32. Pingback: breaking news

  33. Pingback: lifestyle homme

  34. Pingback: perth event entertainer

  35. Pingback: marriage spells

  36. Pingback: CHATROULETTE

  37. Pingback: curren quartz watch

  38. Pingback: bovada soccer

  39. Pingback: Real Estate in Rochester Hills, MI

  40. Pingback: roof cleaning berkhamstead

  41. Pingback: buy dilaudid hydromorphone online

  42. Pingback: unicc 2019

  43. Pingback: UK Chat Rooms

  44. Pingback: portable wardrobe

  45. Pingback: informática assistência

  46. Pingback: DECA powder

  47. Pingback: Skrota bil skrotintyg

  48. Pingback: Buying an Aurora, CO Home

  49. Pingback: technology

  50. Pingback: huarache shoes

  51. Pingback: jordan 4

  52. Pingback: nike roshe

  53. Pingback: nike shox

  54. Pingback: jordan 4

  55. Pingback: jordan 11

  56. Pingback: jordan retro

  57. Pingback: nike react

  58. Pingback: timberland outlet

  59. Pingback: nike air max 270

  60. Pingback: nmd

  61. Pingback: fila shoes

  62. Pingback: balenciaga shoes

  63. Pingback: kobe shoes

  64. Pingback: goyard handbags

  65. Pingback: yeezy boost

  66. Pingback: yeezy boost 350

  67. Pingback: russell westbrook shoes

  68. Pingback: asics shoes

  69. Pingback: off white clothing

  70. Pingback: fitflop

  71. Pingback: yeezy boost 350

  72. Pingback: off white

  73. Pingback: air max 90

  74. Pingback: yeezy 500 blush

  75. Pingback: jordans

  76. Pingback: salvatore ferragamo belt

  77. Pingback: movies

  78. Pingback: movies online

  79. Pingback: karan johar

  80. Pingback: Top Movies

  81. Pingback: Movies1

  82. Pingback: 11 10 2019

Comments are closed.