If you are into computer engineering, this video is for you. Watch the full video (52:41). It is definitely worth your time!

Snippet

Every work day Facebook is safely updated with hundreds of changes including bug fixes, new features, and product improvements. Given hundreds of engineers, thousands of changes every week and hundreds of millions of users we have worldwide, this task seems like it should be impossible. In this tech talk, Chuck Rossi will dig into the tools and processes built by our Release Engineering team that make it possible to push daily updates to the site.

Summary

  • They cut from trunk to RC on Sunday 6pm
  • Tuesday 4pm is where they push it live
  • They uses SVN for their central repo and git for development
  • They do releases every weekday

    • Monday – Small daily pushes
    • Tuesday – Release push
    • Wednesday – Larger daily push
    • Thursday – Regular daily push
    • Friday – Careful daily push (mentioned that developers always slack on a Friday aka Friday you are most likely drunk)
  • Facebook has slow traffic on Friday, but lots of traffic on Sat/Sun
  • Facebook has 500 core engineers divided into 15 teams
  • All users who works in Facebook does testing. When employees within the company type “http://facebook.com” they will be redirected to “http://latest.facebook.com“. Latest is defined as the merge between current Facebook production and what is going up on Tuesday
  • Facebook does bug filing via email as well. If you send an email to a pre-defined address, it will be tracked by the bug tracker
  • There is a Facebook Bugs Groups which all employee is a member (Similar to ours)
  • Facebook.com is not a sandbox, strictly no testing on production
  • Only 3 core release engineers
  • 60 engineers on standby when a release takes place
  • 3 phrases of pushing

    • A1: http://inyour.facebook.com aka staging, last chance to check what you are pushing. Looks good, moving on to A2
    • A2: Small % of machines, a couple of 1000 machines
    • A3: All
  • Every developer, either develop against mock datasource or production database. Yes, you heard me right- Production DB. (And they have an army of DB admins to take care of that)
  • Staging a.k.a. inyour.facebook.com and latest.facebook.com point to production database
  • Facebook has oncall engineers for different teams with regards of different aspect of the website like Like Button, Feeds, Search, etc
  • Automated infrastructure

    • IRC bots

      • They communicate via IRC about 300 to 500 people and 3 release engineers
      • They don’t talk to people, rather they ask people to talk to bots

        • Are we pushing today?
        • Can I get revision been merged?
        • When is my revision going out?
      • All this questions above can be answered by bots

        • /msg request_bot rt <revision #>
    • Test automation

      • Test engineering team
      • Test console

        • Unit Test
        • Water Test
    • Shadow branch

      • Shadowing the production branch + changes that is requested
    • Error tracking

      • Per PHP error
      • Stack trace
      • SVN Blame
      • 1 button file as bug to the person and to the person who reviewed the code
      • How often the error occurs (like Google Analytics but for errors)
    • Gatekeeper

      • Console to change who can see the feature
      • Conditional statements within the code itself
      • Able to filter via Employees, US East Coast, US West Coast, Datacenter, Age, IP White/Black List. Able to bump to show only to 1% public then slowly increasing
    • Push Karma

      • General request page
      • How many files, LOC is being changed, risk ratings
      • Abstraction of discussions, call for changes are all being pulled
      • Magic Push Karma

        • Everyone starts with 4 stars
        • From there you go down
        • Private shaming
    • Perflab

      • Every SVN checkin is being plotted to a graph

        • Difference between production trunk and production+release trunk
        • Eg: A fix is make in trunk but did not get to release
    • HipHop

      • Everyone hates PHP, it is crappy and slow
      • They compiled PHP
      • Transformation Process

        • Parser -> Static Analyzer -> Pre-Optimizer -> Type Interface Engine -> Post Optimizer -> Code Generator -> G++
        • 1 GB binary is Facebook.com. Static, web resources, everything
        • Takes 8 to 10 minutes to build
        • faster by compiling PHP
    • BitTorrent

      • 1GB of binary file push to thousands and thousands of servers using torrent
      • Push 1GB of binary to all servers within 15 minutes
  • Tools alone won’t save you, you need the right company, right people and right culture
  • From Q&A

    • Robust backup and recovery system just in case someone did something stupid
    • Must be backward compatible, they might be a chance a few hundred machines does not have the new code in
    • While pushing to A3, automated test are still being run on A1 and A2 to detect potential problem
    • You write your test for your code
  • phabricator – Phabricator is the Open Source release of Facebook’s internal tools for code review, repository browsing and change management. It contains two major applications: Differential, a code review tool, and Diffusion, a repository browser
  • HipHop – HipHop for PHP transforms PHP source code into highly optimized C++. It was developed by Facebook and was released as open source in early 2010

Video: Push: Tech Talk – May 26, 2011 [HQ]
Facebook: Facebook Engineering