system-design-resources: 最好的系统设计英文资源


这些是 Internet 上系统设计的最佳资源。

视频处理
大规模转码视频:https ://www.egnyte.com/blog/2018/12/transcoding-how-we-serve-videos-at-scale/

Facebook 视频广播:https ://engineering.fb.com/ios/under-the-hood-broadcasting-live-video-to-millions/

Netflix 大规模视频编码:https ://netflixtechblog.com/high-quality-video-encoding-at-scale-d159db052746

Netflix 基于镜头的编码:https ://netflixtechblog.com/optimized-shot-based-encodes-now-streaming-4b9464204830

集群和工作流管理
Facebook 集群管理:https ://engineering.fb.com/data-center-engineering/twine/

谷歌自动驾驶仪 - 自动缩放:https ://dl.acm.org/doi/pdf/10.1145/3342195.3387524

Netflix 工作流程编排:https ://netflix.github.io/conductor/

开源工作流管理:https ://github.com/spotify/luigi

元硬件管理:https ://engineering.fb.com/2020/12/09/data-center-engineering/how-facebook-keeps-its-large-scale-infrastructure-hardware-up-and-running/

服务内消息传递
什么是消息队列:https ://www.cloudamqp.com/blog/what-is-message-queuing.html

AirBnb 幂等性:https ://medium.com/airbnb-engineering/avoiding-double-payments-in-a-distributed-payments-system-2981f6b070bb

Nginx 服务网格:https ://www.nginx.com/learn/service-mesh/

消息队列反模式
数据库作为队列反模式:http: //blog.codepath.com/2012/11/15/asynchronous-processing-in-web-applications-part-1-a-database-is-not-a-queue/

使用数据库作为消息队列:https ://softwareengineering.stackexchange.com/questions/231410/why-database-as-queue-so-bad

DB作为队列的反模式:http: //mikehadlow.blogspot.com/2012/04/database-as-queue-anti-pattern.html

DB作为队列的缺点:https ://www.cloudamqp.com/blog/why-is-a-database-not-the-right-tool-for-a-queue-based-system.html

服务网格
Kubernetes 服务网格:https ://akomljen.com/kubernetes-service-mesh/

Kubernetes Sidecar:https ://www.weave.works/blog/introduction-to-service-meshes-on-kubernetes-and-progressive-delivery

服务网格:https ://www.weave.works/blog/introduction-to-service-meshes-on-kubernetes-and-progressive-delivery

NginX 服务网格:https ://www.nginx.com/learn/service-mesh/

实用系统设计
Facebook Messenger 优化:https ://spectrum.ieee.org/how-facebooks-software-engineers-prepare-messenger-for-new-years-eve

YouTube 架构:http ://highscalability.com/youtube-architecture

YouTube 可扩展性 2012:https ://www.youtube.com/watch?v=w5WVu624fY8

分布式设计模式:http ://horicky.blogspot.com/2010/10/scalable-system-design-patterns.html

单体到微服务: https ://martinfowler.com/articles/break-monolith-into-microservices.html

分布式文件系统
开源分布式文件系统:https ://docs.ceph.com/en/latest/architecture/

Amazon S3 性能黑客:https ://aws.amazon.com/blogs/aws/amazon-s3-performance-tips-tricks-seattle-hiring-event/

Amazon S3 对象到期:https ://aws.amazon.com/blogs/aws/amazon-s3-object-expiration/

时间序列数据库
Pintrest 时间序列数据库:https ://medium.com/pinterest-engineering/goku-building-a-scalable-and-high-performant-time-series-database-system-a8ff5758a181

优步时间序列数据库:https ://eng.uber.com/aresdb/

TimeSeries 关系数据库:https ://blog.timescale.com/blog/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c/

Facebook 大猩猩时间序列数据库:http ://www.vldb.org/pvldb/vol8/p1816-teller.pdf

速率限制
断路器算法:https ://martinfowler.com/bliki/CircuitBreaker.html

优步限速器:https ://github.com/uber-go/ratelimit/blob/master/ratelimit.go

网络协议
什么是 HTTP:https ://engineering.cred.club/head-of-line-hol-blocking-in-http-1-and-http-2-50b24e9e3372

QUIC 协议:https ://www.akamai.com/blog/performance/http3-and-quic-past-present-and-future

订阅管理系统
订阅管理器:https ://netflixtechblog.com/building-a-rule-based-platform-to-manage-netflix-membership-skus-at-scale-e3c0f82aa7bc

谷歌文档
操作转换: http: //www.codecommit.com/blog/java/understanding-and-applying-operational-transformation

谷歌文档:https ://www.youtube.com/watch?v=uOFzWZrsPV0&list=PLXDe3d8o9VFtydBV5biyz9iS3WqKsBMD5&index=3

API 设计
API 设计:https ://medium.com/airbnb-engineering/building-services-at-airbnb-part-1-c4c1d8fa811b

Swagger API:https ://swagger.io/docs/specification/about/

NoSQL 数据库内部结构
Cassandra 架构:https ://docs.datastax.com/en/archived/cassandra/3.0/cassandra/architecture/archIntro.html

谷歌 BigTable 架构:https ://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf

Amazon Dynamo 数据库内部:https ://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

Amazon Dynamo DB 中的设计模式:https ://www.youtube.com/watch?v=HaEPXoXVf2k

Amazon Dynamo DB 的内部结构:https ://www.youtube.com/watch?v=yvBR71D0nAQ

NoSQL 数据库算法
Hyperloglog 算法:https ://odino.org/my-favorite-data-structure-hyperloglog/

日志结构化合并树:https ://www.cs.umb.edu/~poneil/lsmtree.pdf

排序字符串表和压缩策略:https ://github.com/scylladb/scylla/wiki/SSTable-compaction-and-compaction-strategies

分级压缩 Cassandra:https ://www.datastax.com/blog/leveled-compaction-apache-cassandra

Scylla DB 压缩:https ://github.com/scylladb/scylla/wiki/SSTable-compaction-and-compaction-strategies

Cassandra 中的索引:https ://www.bmc.com/blogs/cassandra-clustering-columns-partition-composite-key/

数据库复制
数据库复制:https ://dev.mysql.com/doc/refman/8.0/en/replication.html

Netflix 数据复制 - 更改数据捕获:https ://netflixtechblog.com/dblog-a-generic-change-data-capture-framework-69351fb9099b

LinkedIn 日志记录用例: https ://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

容器和 Docker
Facebook Twine 容器化:https ://engineering.fb.com/developer-tools/zookeeper-twine/

CloudFlare 容器化:https ://blog.cloudflare.com/cloud-computing-without-containers/

Docker 架构:https ://docs.docker.com/get-started/overview/docker-architecture

容量估算
谷歌容量估算:https ://www.youtube.com/watch?v=modXC5IWTJI

YouTube 2012 的可扩展性:https ://www.youtube.com/watch?v=G-lGCC4KKok

AWS 的信封背面计算:https ://www.youtube.com/watch?v=-3qetLv2Yp0

容量估计:http ://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf

发布者 订阅者
Oracle 发布者订阅者:https ://docs.oracle.com/cd/B10501_01/appdev.920/a96590/adg15pub.htm

亚马逊发布订阅消息:https ://aws.amazon.com/pub-sub-messaging/

异步处理: http: //blog.codepath.com/2013/01/06/asynchronous-processing-in-web-applications-part-2-developers-need-to-understand-message-queues/

异步请求响应:https ://www.enterpriseintegrationpatterns.com/patterns/conversation/RequestResponse.html

事件驱动架构
Martin Fowler - 事件驱动架构:https ://www.youtube.com/watch?v=STKCRSUsyP0

事件驱动架构:https ://martinfowler.com/articles/201701-event-driven.html

六边形架构
六边形架构:https ://netflixtechblog.com/ready-for-changes-with-hexagonal-architecture-b315ec967749

微服务
单体架构:https ://buttercms.com/books/microservices-for-startups/should-you-always-start-with-a-monolith/

单体与微服务: https ://articles.microservices.com/monolithic-vs-microservices-architecture-5c4848858f59

微服务:http : //highscalability.com/blog/2018/4/5/do-you-have-too-many-microservices-five-design-attributes-th.html

Uber Nanoservices 反模式:https ://www.youtube.com/watch?v=kb-m2fasdDY

Uber 面向领域的微服务:https ://eng.uber.com/microservice-architecture/

负载均衡
具有粘性会话的负载均衡器:https ://stackoverflow.com/questions/10494431/sticky-and-non-sticky-sessions

Citrix 什么是负载平衡:https ://www.citrix.com/en-in/solutions/app-delivery-and-security/load-balancing/what-is-load-balancing.html

Nginx 负载平衡:https ://www.nginx.com/resources/glossary/load-balancing/

一致哈希:https ://michaelnielsen.org/blog/consistent-hashing/

警报和异常检测
异常值检测:https ://towardsdatascience.com/outlier-detection-with-isolation-forest-3d190448d45e

异常检测:https ://towardsdatascience.com/machine-learning-for-anomaly-detection-and-condition-monitoring-d4614e7de770

Uber 实时监控和根本原因分析 Argos:https ://eng.uber.com/argos-real-time-alerts/

微软异常检测:https ://www.youtube.com/watch?v=12Xq9OLdQwQ&t=0s

Facebook 数据工程:https ://engineering.fb.com/2016/05/09/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/

LinkedIn 实时警报:https ://engineering.linkedin.com/blog/2019/06/smart-alerts-in-thirdeye--linkedins-real-time-monitoring-platfor

LinkedIn 隔离林:https ://engineering.linkedin.com/blog/2019/isolation-forest

分布式日志记录
Uber 分布式请求跟踪:https ://eng.uber.com/distributed-tracing/

Pintrest 日志记录:https ://medium.com/@Pinterest_Engineering/open-sourcing-singer-pinterests-performant-and-reliable-logging-agent-610fecf35566

谷歌监控基础设施:https ://www.facebook.com/atscaleevents/videos/959344524420015/

指标和文本搜索引擎
Facebook实时文本搜索引擎:https ://www.facebook.com/watch/?v=432864835468

基于弹性搜索时间的查询:https ://www.elastic.co/guide/en/elasticsearch/guide/current/time-based.html

弹性搜索聚合:https ://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations.html

单点故障
避免单点故障:https ://medium.com/the-cloud-architect/patterns-for-resilient-architecture-part-3-16e8601c488e

Netflix 多区域可用性:https ://netflixtechblog.com/active-active-for-multi-regional-resiliency-c47719f6685b

Oracle 单点故障:https ://docs.oracle.com/cd/E19693-01/819-0992/fjdch/index.html

DNS 单点故障 2004:http ://www.tenereillo.com/GSLBPageOfShame.htm

分片:https ://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6

基于位置的服务
谷歌 S2 库:https ://blog.christianperone.com/2015/08/googles-s2-geometry-on-the-sphere-cells-and-hilbert-curve/

实时处理
LinkedIn Brooklin-实时数据流:https ://engineering.linkedin.com/blog/2019/brooklin-open-source

Netflix 实时流处理:https ://netflixtechblog.com/keystone-real-time-stream-processing-platform-a3ee651812a

缓存
谷歌番石榴缓存:https ://github.com/google/guava/wiki/CachesExplained

缓存(参见自述文件):https ://github.com/ben-manes/caffeine/

缓存:http ://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html

微软缓存指南:https://docs.microsoft.com/en-us/previous-versions/msp-np/dn589802(v%3dpandp.10)

缓存模式:https ://hazelcast.com/blog/a-hitchhikers-guide-to-caching-patterns/

分布式共识
Paxos:http: //ifeanyi.co/posts/understanding-consensus/

raft:https ://raft.github.io/