Simple way to scrape web with Ruby


MyFitnessPal didn’t give me API access, so I wrote some Ruby to get it anyway.

Finding good entry-point

Logging in into MyFitnessPal, go to Reports Tab.

You can now see Graphs of your data, how do they display the graphs? Let’s use View Page Source:

We can see that the data is available under: http://www.myfitnesspal.com/reports/results/progress/1/30.json

Further analysis found this to be the case:

  • /progress/1 is weight data
  • /30 means number of days to return
  • Nutrition/Calories Nutrition/Protein Nutrition/Carbs Nutrition/Fat are some of the other available reports

Accessing data from code

The data is protected by cookie access. So we’ll need to use some Ruby to first obtain said cookie.

Ruby is a great and easy to grasp language, in our case the usage will only require few explanations:

  • Ruby files end with .rb
  • You can run ruby file from command line by calling ruby file.rb
  • gem’s are similar to CocoaPods, that’s actually what inspired Pods in first place.

Want to learn more Ruby?

Login website: https://www.myfitnesspal.com/account/login

Again looking at the Page Source we know that:

  • There is only 1 form on the page
  • The input fields id are username and password

Filling this form with Ruby and Mechanize is very straightforward.

require 'mechanize'
# 1
mechanize = Mechanize.new
login_page = mechanize.get 'https://www.myfitnesspal.com/account/login'
# 2
form = login_page.forms.first
# noinspection RubyResolve
form.field_with(id: 'username').value = "username"
form.field_with(id: 'password').value = "password"
# 3
form.submit
  1. We create a Mechanize agent and load the login page.
  2. There is only one form on the login page, we grab it and fill the user data.
  3. After submitting our form, mechanize will store the login cookie in the agent, which means all further requests on this agent will be properly authorised.

Retrieving reports

With valid cookie, grab 30 days worth of our weight and calories data:

weight_report = mechanize.get("http://www.myfitnesspal.com/reports/results/progress/1/#{days_to_query}.json")
calories_report = mechanize.get("http://www.myfitnesspal.com/reports/results/nutrition/Calories/#{days_to_query}.json")

weights = JSON.parse(weight_report.body)["data"]
calories = JSON.parse(calories_report.body)["data"]

Grab website using mechanize, then parse the json and just retrieve data.

The data array is already sorted by dates.

We could end here, but since I like fitness, let’s use this data to do some diet calculations.

Calculating our caloric needs

Now that we have both weight and intake calories, we can calculate our maintenance calories.

The more data you have is usually the better, especially for calculating your weight.

Calculate average calories intake:

avg_calories = calories.map { |hash| hash['total'] }.instance_eval { reduce(:+) / size.to_f }

I got 1045 calories, seems a bit low, doesn’t it?

This would work if we were perfect at tracking, we are not, sometimes we won’t track our calories for whole day, this will lower the average significantly.

But we can simply reject entries that are missing data, before we calculate the average. Still, aim to track as often as possible.

avg_calories = calories.map { |hash| hash['total'] }.reject { |x| x == 0 }.instance_eval { reduce(:+) / size.to_f }

Now I got 1796.5 calories, this seems to look correct since I’m in cutting phase.

Weight change

It’s crucial that we smooth our weight data, weight can fluctuate quite a lot and we want to see the trend we are having, not just the values.

Graph of my weight changes, the diamonds are the raw measurements, if I didn’t apply smoothing function it would be really hard to see the trends of my weight changes.

With weights we want to calculate smoothed average for each entry, there is a ruby gem moving_average that we can use:

The best way would be to have a long period of data, so that the beginning weight we are analysing is already smoothed and not influenced by fluctuations.

weight_values = weights.map { |hash| hash['total'] }
smoothed_weights = []
weight_values.count.times do |idx|
  smoothed_weights.push(
      if idx > 1
        weight_values[0, idx + 1].smma.round(1)
      else
        weight_values[idx]
      end
  )
end

That way each smoothed weight entry will depend on the previous values.

How did our weight change?

weight_change = smoothed_weights.first - smoothed_weights.last

Each kg of weight is approximately 7700 kcal.

How much do we need to eat to maintain our weight?

Now that we know our average intake and weight change, we can calculate our TDEE, which means calories we should be eating to keep our weight stable.

puts "Observed TDEE is #{avg_calories + weight_in_calories / weights.count }"

For me it’s 2257 kcal

Evaluate that once every 10-14 days, as it changes.

Diet composition

How should you divide your calories into macros, like Protein/Fat/Carbs?

Doesn’t matter if you bulk or cut, this is the order of importance of macros I recommend:

  1. Protein - crucial for your body, 4 kcal / 1g
  2. Fat - get your essential fat intake for healthy hormones, 9 kcal / 1g
  3. Carbs - remaining calories, 4 kcal / 1g

A good starting point for composition would be:

  • Protein: 2g / kg
  • Fat: 0.8g / kg
  • Carbs: Remaining intake

e.g. For me it looks as follows:

kcal 225
protein = 76 kg * 2 = 152 protein
fat = 76 kg * 0.8 = 60.8 fat
carbs = (2256 - (152*4 + 60.8 * 9))/4 => 275.2 carbs

Goals

Bulking - Gaining Muscle

If you are looking to gain muscle, you should be looking at maximum 0.2 kg per week.

0.2 kg * 7700 kcal => 1 540 kcal per week / 7 => 220 kcal

Eat around 200 kcal over your TDEE and you should be gaining mostly lean muscle, eat a 1000 over and you’ll end up fat.

Cutting - Loosing weight

Loosing weight is actually simpler than gaining lean muscle, you need to eat less than your TDEE, how much less will depend on your goal and bodyfat level.

The leaner you are the slower you should be loosing weight, to spare muscles and to not feel like you are dying, general recommendation would be something like this:

  • 12+% bodyfat: eat 750 kcal below TDEE - refeed every 14 days
  • 8-12% bodyfat: eat 500-700 kcal below TDEE – reefed every 7-10 days
  • <8% bodyfat: eat 300-500 kcal below TDEE – reefed every 3-7 days

Refeed day

A refeed day is a day that you want to eat around your TDEE, let your body rest and relax, increase your carbs and try to keep fat the same.

It’s beneficial for your body and mind, if you were on spot with your diet, you deserve to have a little treat.

Conclusion

So now that you know how to calculate your real TDEE, you should calculate it and then apply either surplus or deficit as I just described, evaluate for next 2 weeks and adjust if needed.

Whole script is available here

Related:


If you found this article interesting, consider retweeting it. I'm also interested in your thoughts and suggestions for improving it.