Not That Kind of Food Network

First, if you’re here, thanks for humoring my blabbing.  But this one’s all about the visual, and looking at it as a static image in the blog doesn’t do it justice (depending on your eyesight and screen, it might not even be legible).  So please, do yourself a favor and jump down the rabbit hole by going to the full, interactive version here or the mega version here, where you can view it full screen and use the tools in the top right to zoom and pan to your heart’s content.

If you’re still reading, I’m not really sure why, because if I were you, I’d be lost forever down in the depths of the mega-graph.

Anyway, this time I took a stab at flavor combinations. It’s a huge topic, and one that many more qualified people than I have explored. I tried somewhat unusual approach to presenting the information though – as a network graph. It is, in my humble opinion, a neat way of capturing the flavor world all in one (admittedly complex) visual. Also, sorry smartphone users, this is going to suck on your tiny screens.

food network final
The full graph (Save your eyes and click the link to see it at full size)

 

Here’s what all that means: Ingredients that show up in more than about 1% of recipes are shown as points. The lines are ingredients that appear frequently with one another (at least 15% of recipes that contain one contain the other). I did remove some ubiquitous ingredients – things like salt, pepper, water, flour, eggs, and various cooking oils. I also had to remove garlic, which appears with just about every savory ingredient imaginable, and vanilla, which appears with just about every sweet ingredient imaginable.

Initially, I was like, “let’s keep this thing to a reasonable size.”  So I made the smallest version, that pretty much fits on one large monitor.  I cleaned it up, combining things like chocolate chips and chocolate morsels, and made the nice visual.  But then I was like, well, this is nice, but what would happen if I made it bigger.  So I did (by including more ingredients).   And it’s a little overwhelming, but pretty cool (if messy, because I didn’t clean it up).  And what’s better than bigger?  Freakin’ gigantic.  So I made an even bigger one (here, if you dare).

Mega Network Section
A segment of the second graph

Looking at the individual links is fun, but so is looking at the regions. The graph is generally arranged so that ingredients that are more directly connected are physically closer, which leads to fun things like Mexican regions. Continuing down that rabbit hole it’s neat to see which ingredients form bridges from one cuisine to another (other than garlic). Anyway, if you’re like me, you can spend a lot of time staring at something like these.

As always, thanks to Yummly for the data.

Letting the Robot Overlords Categorize Our Food

I’ve been tinkering around with clustering algorithms on recipes for a few months now to get an unbiased look at which foods are most similar.  I figured having a computer categorize foods based just on ingredients might generate some cool ideas about which foods are actually similar that our puny human minds, limited to constructs like “Mexican Food” and “Dinner”  might not .

I’ve had the clustering done for a while, but got a bit stuck trying to figure out how to present the results.  The issue is I’m grouping ten thousand or so recipes hierarchically, so I can’t really list out which cluster every recipe belongs to in a blog post.  Visualizing the whole hierarchy is pretty, but not particularly insightful either (but since it’s pretty, here it is).

dendro

So yeah, I was puzzling over how to summarize all that information when FiveThirtyEight.com came out with this.  And I was like: here’s my shot to one up Nate Silver!  Seems like he kind of phoned it in by restricting himself just to margaritas and a sample size of 78.  Also, K-means seems like an unnatural approach to the problem.  I considered it, but saw two major issues:  First you have to start by choosing a number of clusters (at least in pure K-means – FiveThirtyEight may have used a more advanced version) and it didn’t seem like there was any great way of guessing how many clusters there are.  Second, K-means doesn’t show subgroups, which are pretty important to think about for food (even for just margaritas, where within, say, sweet margaritas, you might have subgroups for fruity and plain).

Anyways, to cut to the chase, I made this: https://plot.ly/~Robmattles/29.embed.  It’s basically the top part of the pretty dendrogram above, except that if you hover over the nodes, you get information about what’s in there.  I might make a little web-app type thing to explore it further (so that you could go further down the tree and get more information on the nodes) but this is where I stand for now.

If you hover your mouse (can’t imagine it would work well on a phone) over a node, you’ll get some of the recipes in that node, the percent of recipes in that node, and the three biggest ingredient difference from its sibling node.  So if 75% of the recipes in a node contain sugar and only 25% of the recipes in the sibling have sugar, you’ll see sugar +50% when you hover over that node. Anyway, it’s kind of fun to poke around and see what makes sense to our puny human minds and what doesn’t.

 

P.S.  Sorry I couldn’t actually embed the chart in this post.  WordPress wants like $300 to be able to use plug-ins for your site, and while I’m sure this site is days away from going viral and making me rich, I decided against it.

The Price of Restaurant Food is too Damn High

At Munch, we really like maps.  Which is to say, I really like maps.  But I figure it’s just a matter of time until this site takes off and I have a full-time staff, so I better get used to it. I mean, nothing says commercial success like occasionally-updated-food-blog-for-extremely-niche-readership.

So here’s a post that’s pretty much just like the last one! It’s a map of DC area zip codes shaded by the price of average restaurant menu item.  It’s a neat little map, if we do say so ourselves.

medianpricemap

The hard part of making this was connecting to Locu’s menu data (the service that provides menu information to Yelp, among other customers).  It’s a pretty cool dataset to connect to.  For this little map, I loaded up data on more than 200,000 menu items, and now that I’m set up, there’s no reason I can’t load data on millions of dishes from restaurants all over the country.  I have a couple of thoughts for cool ways to analyze that (and, of course, am open to other ideas). So keep an eye out for what’s hopefully some pretty neat stuff based on that data.

The Price of Milk is Too Damn High

I went out to try to brush the dust off my R skills and with it, the old blog.  I figured I’d try something straightforward to start with – the price of milk.  Well, turns out, even milk isn’t that simple.

The USDA publishes a nice dataset called the food environment atlas that contains a lot of fun stuff. Things like grocery store locations, farmers market locations, some basic socioeconomic data, and, of course, three separate measures of berry farming in a given county.  Just, I suppose, in case.

So I figured I’d look into what appears to influence the price of milk.  I thought I’d start with the freebie – median household income in an area.  I figured, you know, that things would be more expensive where people had more money.  Yeah, yeah, I know there are examples of where being poor prevents you from achieving savings available to rich people, but come on, higher incomes, higher rents, more willingness to pay higher prices, fancier stores, etc.  So like I said.  I figured it’d be a freebie, and every sentence I wrote in this post would start with “controlling for median household income”.

Yeah, so then this happened.

Price Vs Income.png

A freaking weird shaped graph.  I mean OK, so above incomes of 50k or so, the thing makes sense.  But what the heck is going on below that?  Why does milk cost about 50 cents a gallon more in counties with median incomes of $25,000 than counties with median incomes of $50,000.  My first inclination was to just arbitrarily remove low income counties from the data-set and figure nobody would notice, and move on to write the post that I wanted to write.  But then I realized nobody’s paying me to do this and this is getting published regardless of whether I get nice, clean results so there’s no need to fudge my data to get the results I want.  But a note to any potential future employers reading this:  Don’t worry, I can fudge with the best of them.

Anyway, so what the heck is going on here?  My first thought was that maybe I was misinterpreting the graph.  Nope.  Milk really is the cheapest in counties with median incomes around that 50k mark, and counties with median incomes of about 30k have about the the same prices as counties with median incomes of about 80k.

Well, the nice thing about having data at the county level is that you can make pretty maps.  So I made some (I mean it also is helpful to interpret the data and all that boring stuff).

PriceMap.pngSo there I was, just trying to make a pretty map and there’s actually some information in there.  Kind of a pain, but now I have to talk about it.  Anyway, milk is cheaper in the Midwest and out in the mountains/deserts.  Pretty expensive on the coasts, especially the east coast, and most of the south.

I need to make a quick digression here to fulfill a standard requirement of all U.S. data journalism with a geographic component: a ranking or map that shows the deep south is struggling.  So here’s a map that shows how many gallons of milk you could buy with the median household income in each county.

RatioPriceMap.png

The Midwest and Rockies could just swim in milk.  They’re not doing quite as well in terms of income as the coasts, but they low price of milk more than makes up for it.  The coasts do alright, at least the metropolitan areas, presumably because they have high incomes, but the south and other rural bits and pieces are pretty milk-poor.

Back to the main point.  From the first graph, it seems like milk prices are fairly consistent within a given region of the country.  And for what it’s worth, low price regionsdo tend to be in the middle of the income range – i.e. not as wealthy as the coasts, but not as poor as the south.  So at least that makes sense.  But why should milk be cheaper in those regions?

Some of you might have figured out where I’m going with this.  Milk is cheap close to where it’s produced.  Makes sense, since it’s perishable and therefore fairly pricey to transport.  Here’s a map of milk production in the U.S. that has some pretty darn strong parallels to the milk price map (and not just because I have a bit of a thing about maps, I promise).dairyProductionMap.png

And if you’re more of a scatterplot person, here’s a scatterplot showing that milk price and milk production are negatively correlated.PricevProduction.png

Finally, here’s a scatterplot showing that milk production is highest in middle income states.  MilkVIncome.png

So where does that leave us?  Well, there seems to be a cluster of three related variables – low milk prices, high milk production regions, and middle income counties or states.  Of those, the causal link from milk production to milk prices seems to be the most reasonable.  However, even when you control for milk production, middle income states still have the lowest prices (the effect is reduced by about 1/2).

We’re left with some questions and some answers.  First, the cleanest conclusion is that milk prices are lowest in milk producing regions.  Second, milk prices tend to be lowest in middle income areas, partly, but not entirely, because middle income areas tend to produce milk.  Third, it stinks to be poor where prices are high, and prices aren’t necessarily high only in areas with high incomes or real estate values.

But I’m still left with some questions.  What’s causing the part of the income effect on price that isn’t attributable to milk prices?  Is it coincidence?  Something to do with the costs of operating retail establishments in low income areas?  I don’t know.  Why do middle income regions tend to produce milk?  Is agriculture tied to middle income?  Is it a coincidence?  I don’t know.  And here again, I have the last laugh.  Because I’m just going to click publish even though I don’t have satisfying answers to all the questions and nobody can stop me.  I’m not even going to make something up so that we can all sleep better at night.  So there.

Montgomery County Health Inspections

So I recently found out that my native Montgomery County has kindly posted the results of all of its health inspections online. I felt like a kid in a candy store. Except instead of candy, there was the opportunity for rodent poop jokes.

So I mean the first thing that comes into my mind is – let’s make a map! So I did. It’s right down here. Red dots are restaurants with a critical violation, blue are restaurants with no critical violations (purple is where multiple restaurants are basically in the same place, at least one with a critical violation and one without).

Rplot01

So back to the main thrust of this post – vermin. I made a map of places that had insect or rodent violations. Sadly for lovers of creepy crawlies, but happily for everyone else, there are fewer red dots in this map than on the one for all critical violations.

Rplot02

Along these same lines, I wanted to figure out if some parts of the county were better than others at avoiding critical violations. So I broke it down by city, excluding cities with fewer than 5 inspections. Cabin John, Aspen Hill, and Spencerville all had no critical violations in any food establishments in their cities. Bravo. But Langley Park, Boyds, and Darnestown all had critical violations in 50% or more of their inspections, with Langley Park leading the pack at 55%. So, you know, there’s that.

Top 5 Montgomery County Locales for Health Code Violations

  1. Langley Park – 55%
  2.  Boyds – 50%
  3. Darnestown -50%
  4. Takoma Park – 49%
  5. Montgomery Village – 48%

Leaving geography behind, I looked at it by category. Elementary schools, Caterers, and Snack Bars all clock in at under 20% critical violations rates. Sadly though, hospitals lead the pack with 37% of hospital inspections uncovering critical violations. Yet another way to get sick when you’re already in the hospital.

Finally, I looked by specific names, again chains or restaurants with fewer than 10 inspections. Baskin Robbins and Elevation Burger nail it, coming in with no critical violations. Not so much though for Paradise Biryani Pointe, where only one out of eleven inspections uncovered no critical violations (to the best of my personal ability to interpret the data – feel free to check yourself, the link is at the bottom of this post). Ironically, their website says “Please beware of imitators, look for “Paradise Biryani Pointe” and its trade mark logo on the menu.”  This might not be such good advice.  But here’s the trademark to look for in any case:

https://i0.wp.com/www.cpparadisebiryani.com/mainlogo.png

Top 5 MoCo Places to Eat for Health Code Violations

  1. Paradise Biryani Pointe
  2. Urban Bar-B-Que
  3. FLIK International Corp @ Medimmune
  4. Eastern Carry Out
  5. Alicia’s Mexican Grill and Restaurant

I also wanted to know what the most common violations were. Turns out 63% of places didn’t have proper nutritional labeling. The next two most common violation types were refrigeration temperatures (17%) and vermin (yay) at 15%. The good news is the more than 99% of places properly keep sick employees away from the food, dispose of sewage properly, and cook their foods thoroughly. Not a bad start actually.

Rplot03

So one thing you might have asked is what exactly is a critical violation. Well, MoCo calls it “a food safety requirement that requires immediate correction. Failure for immediate correction results in cessation of some or all food operations or closure of the facility until violation is able to be corrected.” But beyond that, it’s a little opaque. I mean, I assume that the health department knows what they’re talking about, but still – some places were out of compliance on proper sewage disposal but listed with no critical violations, while some places got critical violations apparently just for not having no smoking signs. So I made my own category and called it serious violations. I counted all the violations except nutrition labeling and no-smoking signs as serious, since everything else actually has to do with food safety.

Rplot04

The rates of serious violations are highly correlated with critical violations, except that they tend to be a little higher, so I won’t go into it in detail except to note that Paradise Biryani Pointe’s only inspection that didn’t have a critical violation had a serious violation (a rodents and insects violation!). This inspection (the most recent one on file) occured three days after they had been shut down due to previous violations, and they were allowed to re-open in spite of continuing vermin problems. I await the results of their next inspection with bated breath.

Finally, here’s where you can get the data yourself:

https://catalog.data.gov/dataset/food-inspection-475fc

And here’s my R html file with my code and full results:

https://htmlpreview.github.io/?https://github.com/Robmattles/MunchTheNumbers/blob/master/MoCo%20Health%20Inspections.html

And finally, note that these are all just my amateur interpretations of publicly available data, and could be mistaken.  So please, Paradise Biryani Pointe , don’t sue me.

Pardon All The Turkeys

Go eat something else
Go eat something else

Well, we’ve entered that magical two week period  when we all stare with exasperation at our refrigerators full of leftover turkey wondering what to do with like 30 pounds of meat.  We dress it up with new flavors in things like turkey chili and turkey pad thai.  But really, deep down, we’re all just waiting for the day that we can decide it’s gone bad, and quietly slip it into our trash/pet.

Well, I’m here to confirm what all of you suspect: Turkey is the worst meat.  Here’s the graph.Meat Graph

Recipes with turkey are rated significantly lower than any of the other five meats I pulled.  Those of you reading closely might notice that the difference is “only” three tenths of a star on a 1-5 scale. But two notes here:  First, the sample size is large (~10,000 recipes per meat) so the standard error is tiny – less than 1/100th of a point.  Second, in the Yummly data I’m using, differences of this magnitude are very rare.  Of all the barbecue sauce ingredients I tested, only hot pepper sauce had an effect of greater than three tenths of a point.  And even the highest rated state, North Carolina, was only a quarter of a point above the mean rating for the states.

But since tradition demands we eat turkey every year, there are a precious few ingredients that do appear to increase rating when paired with turkey.  Dried Thyme is the all-star here, associated for a two tenth rise in rating.  Black pepper, butter, extra virgin olive oil, and flour are the only other highly significant predictors of turkey recipe rating (I can’t really figure out what’s going on with the flour – maybe we’re battering then frying?).   Unsurprisingly, the fats are the strongest (other than thyme) clocking in with effects of .22 and .24.  So it seems like the solution is to slather that dried out meat with oil, and season with thyme and pepper.

None of this is to say that it’s impossible to cook turkey well.  But man, it really seems like we’re making it harder than it has to be with these alternatives out there.  I’ve always wanted to try capon and it seems like too many pheasants are getting pardoned these days.

An interesting side note: culinary darling bacon doesn’t do particularly well in the graph.  I was expecting it to do significantly better.

Some States are More Equal Than Others

So as a follow-up to my last post, I decided to see which state’s dishes were the tastiest.  After all, I really don’t think that my last post was judgy enough.  Anyways, I used a pretty straightforward methodology – I just took the mean rating of dishes with the state’s names in them and mapped it.  States with fewer than ten dishes are gray.

The redder the tastier.
The redder the tastier.

Sorry Pennsylvania, you came out the loser in this one – lowest rated dishes of any state with more than ten recipes.  But you probably should have seen it coming, when your state’s signature dish looks like this.  I mean, come on, people eat with their eyes first.  And hey, if you want to console yourself, just remember that my calculations leave out Philly Cheesesteaks.

Pennsylvania Dutch Potato Salad looks like this.  Not going to win any food presentation contests.
Pennsylvania Dutch Potato Salad looks like this. Not going to win any beauty contests.

But well done North Carolina!  Turns out people like pulled pork!  And pulled pork isn’t exactly the most photogenic food either, so maybe looks aren’t everything.

Thanksgiving-themed post coming soon!  Enjoy the holiday!

State-by-State Guide to Signature Foods

A Michigan Hot Dog
A Michigan Hot Dog

So you might have seen things like this article before, where people assign a different dessert or band or slang term for marijuana to each state. Well I’m doing one of those. Except different.

First, I actually have a methodology other than: “I Googled ‘Alabama Desserts’ and copied my favorite result into the article. But it was actually pretty hard work because I had to do that like 50 times. Plus, because I went to college, I’m really good at Googling stuff and I remembered all 50 states, so this is definitely worth your time to read.”

Instead, I found the most common word or two word phrase occurring in the same recipe title as the name of each state. Not exactly rocket science, but still better than nothing, and I’m not even getting paid to do this.

Second, none of this “each state is its own special snowflake and gets its very own thing” nonsense. Specifically, Colorado, New Mexico Indiana, Kentucky, Tennessee, West Virginia, Arkansas, Kansas, and South Carolina are not unique. Sorry guys, but we’re not in elementary school any more.

Too many states have their own namesake pies.
Too many states have their own namesake pies.

Finally, I don’t have a graphic designer. But what I lack in design skills, I make up for in misdirected bitterness. So game on, Slate.

State Food
Alabama White Barbecue
Alaska Baked Alaska
Arizona Chipotle Sauce
Arkansas Chicken
California Salad
Colorado Green Chili
Connecticut Lobster Rolls
Delaware Steak Diane
Florida Key Lime
Georgia Peach
Hawaii Blue Hawaii
Idaho Potato
Illinois Pork Tenderloin
Indiana Pie
Iowa Corn
Kansas Barbecue Sauce
Kentucky Pie
Louisiana Red Beans
Maine Pancakes
Maryland Crab Cakes
Massachusetts Corn
Michigan Hot Dog
Minnesota Wild Rice
Mississippi Mud Cake
Missouri Cookies
Montana Chicken
Nebraska Cabbage
Nevada Elk
New.Hampshire Salad Dressing
New.Jersey Tuna Melt
New.Mexico Green Chili
New.York Cheesecake
North.Carolina Pulled Pork
North.Dakota Winter
Ohio Lemon Pie
Oklahoma Smoked
Oregon Salmon Patties
Pennsylvania Dutch Potatoes
Rhode.Island Clam Chowder
South.Carolina Barbecue Sauce
South.Dakota [Sunflower] Seed Cookies
Tennessee Cake
Texas Chili
Utah Sauce
Vermont Maple
Virginia Ham
Washington Apple Cake
West.Virginia Cake
Wisconsin Cheese
Wyoming Coleslaw

Notes:

  • Well done Hawaii for being the only state with booze as its most frequently associated “food”.
  • Colorado/New Mexico’s Green Chili is news to me, and looks pretty good.
  • Delaware, you’re a little outdated – according to the New York Times, Steak Diane has been out of fashion since 1980.  Not that being outdated will be anything new for a large chunk of your population.
  • Oklahoma’s “smoked” refers mainly to brisket.
  • “North Dakota” is only in one recipe title. Fitting that the recipe’s first word is “Winter”
Steak Diane, Circa 1972
Steak Diane, Circa 1972

Data Courtesy of Yummly

Thai Food

Pad Thai
Delicious, ubiquitous Pad Thai.

So I went out for Thai food last Friday for the first time in a while and was reminded how much I like Thai food.  Well, to be honest, not the authentic Thai food that your friend you traveled to Thailand tells you about with just a little bit of conceit.  I mean, maybe you’re just jealous that you didn’t get to go, and he’s usually an ok guy, so maybe you’re imagining it?  At least he offers to take you out to an authentic Thai restaurant, but you can’t be bothered to go all the way out the the frickin boondocks to feel distinctly un-cosmopolitan while he orders in Thai and you stare blankly at the menu.  Not that kind of Thai food.  Good ‘ol red-white-and-blue American Thai food. Anyways, at the time, I couldn’t really figure out what flavors make Thai food so special, at least to me.  So I decided to see what the magic of the internet, R, and overly broad conclusions could tell us about what makes fake-Thai food fake-Thai food.  Just kidding about the overly broad conclusions.  Or maybe I’m not.  You be the judge. So here are the ingredients in at least ten percent of Thai recipes, and the proportion of American and Thai recipes they appear in:

Thai Ingredients

And here are the ingredients that appear in at least ten percent of American recipes, along with the proportions for American and Thai Recipes.

American Ingredients

Fish sauce is the real stand-out here – I figured it would be big, but not this big. It appears in almost no American recipes, but appears in almost half of Thai recipes (about 60% if you group it with “Asian Fish Sauce” or “Thai Fish Sauce”). Apparently, fish sauce (often mixed with lime juice and chili) is served with most authentic Thai meals as well. Anyway, between its pervasiveness in Thai recipes and its absence from American recipes, fish sauce almost certainly has a lot to do with what Americans like me think is distinctive about Thai food. Except for lemongrass and coconut milk, the other common ingredients in Thai recipes are ingredients to which Americans have at least a little exposure to outside of Thai cooking (albeit maybe not in combination with each other).

Gussied-up fish sauce.
Gussied-up fish sauce.

The second thought that crossed my mind after curiosity about what made Thai food distinctive was a vague sense of dread and guilt about what the answer might be. If it tastes good, it’s probably not good for you. So the starting point…is not good. Fish sauce is, nutritionally, salt water. Weighing in at 58% of your recommended daily intake of sodium in just a tablespoon, it bests (or worsts) soy sauce (36%), ketchup (6%) and its fermented fishy cousin, Worcestershire sauce (also just 6% per tablespoon). But since nobody likes bad news, I kept digging until I found something good. So maybe Thai food just switches out salt with fish sauce I thought. I mean, salt appears in a bunch more American recipes than Thai recipes, so that seems feasible, right? Wrong. Thai recipes have about twice as much sodium as American recipes, a hefty 1,045 milligrams per serving on average (1,500 is the recommended daily intake). But that’s ok, maybe the difference is statistically insignificant? Wrong. The p-value for test for difference in means is about 10^-25. That means that you’re more likely to get heads on eighty straight coin-flips than see this result by chance (i.e. the population means are actually equal, I just happened to pick a really unlikely sample).

Noodles &  Co Pad Thai:  not exactly low sodium.
Noodles & Co Pad Thai: 140% of your recommended daily sodium intake

Ok so it seems like Thai food might have a bit more sodium. But maybe it’s better on fat and calories? Well, on fat at least, there’s reason to hope – no significant difference in fat between American recipes and Thai recipes (95% confidence interval is from -2.5g to +2.7g). But things start to go downhill when you look at saturated fats – Thai recipes have about 3g more of saturated fats per serving (roughly plus or minus a gram). Well ok. But maybe Thai recipes tend to have fewer calories overall? After all, low fat foods are all well and good, but you can still make yourself unhealthy by eating too many calories elsewhere. Nope. Thai recipes have about 80 more calories per serving than American recipes (bigger confidence interval this time – from about 30 calories to about 120 calories). Oh well. At least it tastes good.

Data Courtesy of Yummly

Settling the Barbecue Wars

Gun Control. Abortion. Climate Change. Affirmative Action. Of all the issues dividing our country, one stands out in a crowded field: Barbecue Sauce. Since time immemorial, Americans have fought over barbecue sauce, tearing apart states and even loving families.

But seriously, this is the kind of stupid stuff that people actually fight about. I don’t get it. But I figure I can do my good deed for the day and use some simple stats (keeping it straightforward for the first post) and hopefully give everyone one less thing to fight about.

whats_in_bbq_sauce(1)

So it turns out people put all kinds of weird ingredients in their barbecue sauce. In addition to the popular ingredients in the graph, I found recipes for barbecue sauce that include mashed bananas, instant espresso powder, and fermented bean curd (two different recipes contained this East Asian delicacy). I think we can all agree that kind of thing is just crazy. But the big sticking point is, of course, tomatoes.

Ketchup alone is second most popular ingredient in barbecue sauce, after everybody’s favorite liquefied fermented anchovy product. But when combined with other tomato products, those little plump suckers find their way into 970 of the approximately 1700 recipes I found, more than even Worcester Sauce.   But, of course, tomatoes are polarizing – East Carolina barbecue fans swear up and down that tomatoes ruin the sauce.

Turns out, it’s really not that big a deal, at least in terms of overall rating. The tiny difference in rating between recipes with tomato products and those without is just three hundredths of a star (on a 5 star scale) and is not statistically significant (p=.509). But I do find that ketchup makes barbecue sauce a little better – by about one tenth of a star. So if you’re going to put tomatoes in your sauce, don’t use V-8 or Prego or those sickly yellow-orange things from the grocery store, just go for the Heinz. But if not, it probably won’t cost you too much in the way of objective deliciousness.

The impact of ketchup on tastiness is dwarfed by the impact of kickin’ it up a notch. Taking your sauce from capsaicin-free to a 9/10 on spiciness will net you, on average, almost 2 tenths of a star. But maybe don’t burn people’s faces off just yet – the effect reverses once you get past a 9 out of 10 on Yummly’s spiciness scale.

Just don’t reach for the Tabasco – although spiciness gets you points, as do a bunch of other hot sauces and peppers, recipes with Tabasco are actually rated about 2 tenths of a star lower than recipes without. Use some standard hot pepper sauce or cayenne pepper, which get recipes rated 3 tenths of a star higher and 2 tenths higher respectively. Or better yet, use both.

Cayenne Peppers
These cute little guys are one of the best ways to boost your BBQ Sauce, statistically speaking

Finally, for those of you looking for more info on what to put in your BBQ sauce, here’s a table of the ingredients I found to have statistically significant effects on rating (to alpha=.95), along with the P Value and size/direction of the effect.

Difference in Mean Rating P Value
Hot Pepper Sauce 0.302019627 0.0013033
Kosher Salt 0.243337413 0.0109152
Cayenne Pepper 0.206490853 0.0018296
Dark Brown Sugar 0.195969852 0.0191256
Ground Black Pepper 0.155532093 0.0202992
Ketchup 0.136790609 0.0023627
Onion -0.122694404 0.0212948
Lemon Juice -0.126924378 0.0366261
Sugar -0.173811057 0.0124096
Liquid Smoke -0.187513127 0.0048328
Vinegar -0.227680008 0.0015838
Tabasco -0.233140301 0.0110939

Data Courtesy of Yummly. Plot Created with Plotly.