How Twitter/X work, technically?
Twitter/X is the world’s most popular social media platform.
So, twitter/X’s DAU is 300 million users.
But why so many people are on this platform ?
- Unlike other social media platforms that act as a source of entertainment, X (Twitter) is used as a source of news as well. Moreover, people also use Twitter to research brands, and entertainment is the third reason why people use Twitter.
- According to a study conducted by DataReportal, 60.6% of respondents stated that they use the platform to search for news, 34.9% agreed that they use it to research or follow a brand, 34.8% look for funny and entertaining content, 27% share photos and videos, and only 19.2% of people use it to message their friends and family.
- 1M+ jobs have also been posted on X, which shows that a lot of people are looking for jobs and hires on the platform.
- The top four conversation topics on the platform are sports, music, food, and gaming, meaning that Twitter is also a go-to choice for people looking for entertainment content.
So, if so many things are happing on twitter it would be interesting to understand how it works behind the hood.
How it works, technically — —
We will perform a tech teardown twitter in following steps :-
Step 1 → Define the product
Step 2 → Decode the teh layer
Step 3 → Detail out nuance and trade-offs
Step 4→ Third party integrations ( optional, do this only if you have active integrations)
Step 1 → Define the product
Decode the elements of the product. Focus on the functions you want the product to perform and the value it delivers to the users.
- What are the different elements that your product needs? Like homepage, login page, blogs etc. Include what is needed today, as well as what the product will need as you scale.
- How is value delivered to the user? For example using video/images/ catalogue over a webapp/website/mobile device
- What data is captured and which module it is captured in
- Also think of the capabilities your product has / needs to have. For example, in case of Netflix, for downloads what additional modules would you need to build
Product Features:-
- User — Login Page
- Tweet
- Post a tweet
- Like and retweet a tweet
- Feed
- View own feed
- View other people’s feed
- Following other users
- Push Notifications
Things that are not considered
- DM
- Search
- Personalization feed
- Report and moderate
- Tweet with a media file
Step 2 → Decode the tech layer
- For each product component in step 1, layout the tech module associated and technology used within your company
- Draw out how these tech components are speaking to each other. How is the data flowing, where is the logic getting implemented?
- If possible, speak to your engineering team and try to understand why they use a specific language or a framework or a tool chosen for a specific component of product
- How did your engineers optimise for scale and security?
What we need to store in various DBs for Twitter:
- User DB needs to store basic information about the user such as name, email, password, etc.
- Follower DB needs to store the user and follower ids.
- Tweets DB stores tweets, users who created the tweet, and users who liked and retweeted a tweet.
- Feeds DB needs to store all the tweets that will appear in particular user feeds for all users.
We have to store ids such as user_id, tweet_id, etc. in each table.
We do it because this creates a normalized data structure hence queries are faster. If we store everything in one table, the queries will be much slower as for every query, the entire data will have to be analyzed. The bigger the dataset to be analyzed, the slower the query response.
We can choose a graph-based database to store the follower’s information and Feed.
From a graph database, it’s easier to retrieve all a user’s followers and feed tweets.
We can use both MySQL and NoSQL databases for Tweets and User DB.
The Feeds is a big table with heavy writes and updates because it is continuously updated as new tweets come in. It will also have heavy reads since users access and scroll their feed multiple times a day. We will have to put extra effort and choosing NoSQL databases like Cassandra, and they can also handle heavy reads and writes. We also want to keep the feeds available at all times.
It need not be highly consistent. We also don’t need joins for the feeds table. So we can choose Cassandra for the Feeds DB.
Some of the important client-facing APIs would be:-
- readFeeds API to fetch feed for a particular user. It can send the user ID and the return value would be the feed for that particular user.
- createTweet to submit tweets from the users
- followUser to create a user-follower mapping whenever someone follows the user
There will be more around the APIs needed for comments, likes, retweets, etc.
Step 3 → Detail out nuances and trade-offs
Figure tradeoffs for product decisions
- Are there any user experience tradeoffs your product has for better security or scalability?
- Are there any components you went with third party vs in house?
- How did your engineers decide what to cache and what not to?
- If you are using GraphQL, why? If you are using Rest, why?
- What are some limitations of your current tech stack? What are some unique benefits that it offers to your business?
I can think of few non-functional requirements for product
- Scale up to 300+ million DAU -
- Low latency for loading tweets in the feed
- Any tweets created by users should not get lost
- The system should be available all the time/ low downtime
To achieve the above NFRs we can do below things:-
- Scale via distributed system :- To accommodate a larger user base, a distributed system with multiple partitions is needed. Twitter can partition data by user or tweet, with the average partition size being 20–40 GB. This approach can address the issue of overuse of popular user partitions, which can become hotspots and potentially crash servers. To address this, data can be partitioned by TweetID, which solves the hotspot problem but may cause complex queries. A solution could be to create different shards for verified accounts, keeping a lower number on each shard. Special provisions can be added to prevent downtime.
- Load balancers for distributing requests: To distribute traffic, we need to add a load balancer that handles all incoming read/write requests. The load balancer can sit at multiple places. We can place the load balancers between the client and the service, and the service and the database. The load balancers can also sit between various services and the messaging service.
- Caching to improve feed latency: We can introduce caching in the Feed service to improve load time or latency. We can use any caching solutions like Redis, Memcache, etc., to store the feed for each user. As caching memory is expensive, we don’t want to cache all users’ timelines. The eviction policy for the cache can be the LRU (Least recently used). This would ensure that we evict user feeds for inactive users. We can also do client-side caching to improve latency. As the user browses through the feed and loads more tweets, we can preemptively fetch the next set of feed data and add it to the browser cache, reducing the latency and improving the user experience. Caching can be done in other places like search to improve latency as well.
- Redundancy and replication to improve robustness and availability:- Replication and redundancy are crucial in Twitter’s system to prevent data loss and ensure system availability. Replicas can be created for all services, taking over a failed server with reduced capacity to prevent failure. This process can be done at load balancers, app servers/services, and databases. Master-slave configuration can be used for creating redundancy and replication in DB clusters. Twitter’s large scale necessitates a cluster of databases for all four use cases : feeds, user, tweets, and follower, each with a master-slave configuration for replication and redundancy. Caching can be done for other data, such as profile images and basic user information, to reduce the need for fetching them every time a user opens their profile page.
- Disaster recovery to increase robustness and availability: Natural or man-made disasters can happen in certain geographies where the data center resides. A data center is a physical space (like a secured warehouse) that hosts servers and other networking components. Disasters at a data center can lead to system failure. To avoid such a situation, we can replicate the data residing in one data center to another one in a different geography. That way, even if one data center goes down due to some disasters in that geography, the operation can be moved or taken over by another data center in another geography.