system-design-resources:最好的系统设计资源


这个github陈列了一些最好的系统设计资源英文,点击标题:

视频处理
大规模转码视频:https ://www.egnyte.com/blog/2018/12/transcoding-how-we-serve-videos-at-scale/
Facebook 视频广播:https ://engineering.fb.com/ios/under-the-hood-broadcasting-live-video-to-millions/
Netflix 大规模视频编码:https ://netflixtechblog.com/high-quality-video-encoding-at-scale-d159db052746
Netflix 基于镜头的编码:https ://netflixtechblog.com/optimized-shot-based-encodes-now-streaming-4b9464204830

集群和工作流管理
Facebook 集群管理:https ://engineering.fb.com/data-center-engineering/twine/
谷歌自动驾驶仪 - 自动缩放:https ://dl.acm.org/doi/pdf/10.1145/3342195.3387524
Netflix 工作流程编排:https ://netflix.github.io/conductor/
开源工作流管理:https ://github.com/spotify/luigi
元硬件管理:https ://engineering.fb.com/2020/12/09/data-center-engineering/how-facebook-keeps-its-large-scale-infrastructure-hardware-up-and-running/
元容量分配:https ://engineering.fb.com/2022/09/06/data-center-engineering/viewing-the-world-as-a-computer-global-capacity-management/

服务内消息传递
什么是消息队列:https ://www.cloudamqp.com/blog/what-is-message-queuing.html
AirBnb 幂等性:https ://medium.com/airbnb-engineering/avoiding-double-payments-in-a-distributed-payments-system-2981f6b070bb
Nginx 服务网格:https ://www.nginx.com/learn/service-mesh/

消息队列反模式
数据库作为队列反模式:http: //blog.codepath.com/2012/11/15/asynchronous-processing-in-web-applications-part-1-a-database-is-not-a-queue/
使用数据库作为消息队列:https ://softwareengineering.stackexchange.com/questions/231410/why-database-as-queue-so-bad
DB作为队列的反模式:http: //mikehadlow.blogspot.com/2012/04/database-as-queue-anti-pattern.html
DB作为队列的缺点:https ://www.cloudamqp.com/blog/why-is-a-database-not-the-right-tool-for-a-queue-based-system.html

服务网格
Kubernetes 服务网格:https ://akomljen.com/kubernetes-service-mesh/
Kubernetes Sidecar:https ://www.weave.works/blog/introduction-to-service-meshes-on-kubernetes-and-progressive-delivery
服务网格:https ://www.weave.works/blog/introduction-to-service-meshes-on-kubernetes-and-progressive-delivery
NginX 服务网格:https ://www.nginx.com/learn/service-mesh/

实用系统设计
Facebook Messenger 优化:https ://spectrum.ieee.org/how-facebooks-software-engineers-prepare-messenger-for-new-years-eve
YouTube 架构:http ://highscalability.com/youtube-architecture
YouTube 可扩展性 2012:https ://www.youtube.com/watch?v=w5WVu624fY8
分布式设计模式:http ://horicky.blogspot.com/2010/10/scalable-system-design-patterns.html
单体到微服务: https ://martinfowler.com/articles/break-monolith-into-microservices.html
Zerodha 技术栈:https ://zerodha.tech/blog/hello-world/

分布式文件系统
开源分布式文件系统:https ://docs.ceph.com/en/latest/architecture/
Amazon S3 性能黑客:https ://aws.amazon.com/blogs/aws/amazon-s3-performance-tips-tricks-seattle-hiring-event/
Amazon S3 对象到期:https ://aws.amazon.com/blogs/aws/amazon-s3-object-expiration/

时间序列数据库
Pintrest 时间序列数据库:https ://medium.com/pinterest-engineering/goku-building-a-scalable-and-high-performant-time-series-database-system-a8ff5758a181
优步时间序列数据库:https ://eng.uber.com/aresdb/
TimeSeries 关系数据库:https ://blog.timescale.com/blog/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c/
Facebook 大猩猩时间序列数据库:http ://www.vldb.org/pvldb/vol8/p1816-teller.pdf

速率限制
断路器算法:https ://martinfowler.com/bliki/CircuitBreaker.html
优步限速器:https ://github.com/uber-go/ratelimit/blob/master/ratelimit.go

内存数据库 - Redis
Redis 官方文档:https ://redis.com/
通过 Redis 大学学习 Redis:https ://university.redis.com/
Redis 开源仓库:https ://github.com/redis/redis
Redis 架构:https ://medium.com/opstree-technology/redis-cluster-architecture-replication-sharding-and-failover-86871e783ac0

网络协议
什么是 HTTP:https ://engineering.cred.club/head-of-line-hol-blocking-in-http-1-and-http-2-50b24e9e3372
QUIC 协议:https ://www.akamai.com/blog/performance/http3-and-quic-past-present-and-future
TCP 协议算法:https ://ee.lbl.gov/papers/congavoid.pdf (前 10 页很重要)
WebRTC:https ://webrtc.github.io/webrtc-org/blog/2012/07/23/a-great-introduction-to-webrtc.html
WebSockets:https ://datatracker.ietf.org/doc/html/rfc6455section-1.2
使用 QUIC 的动态源路由:https ://fb.watch/fSEbI4KHlA/

国际象棋引擎设计
国际象棋引擎:https ://www.youtube.com/watch?v=U4ogK0MIzqk

订阅管理系统
订阅管理器:https ://netflixtechblog.com/building-a-rule-based-platform-to-manage-netflix-membership-skus-at-scale-e3c0f82aa7bc

API 设计
API 设计:https ://medium.com/airbnb-engineering/building-services-at-airbnb-part-1-c4c1d8fa811b
Swagger API:https ://swagger.io/docs/specification/about/

NoSQL 数据库内部结构
Cassandra 架构:https ://docs.datastax.com/en/archived/cassandra/3.0/cassandra/architecture/archIntro.html
谷歌 BigTable 架构:https ://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf
Amazon Dynamo 数据库内部:https ://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Amazon Dynamo DB 中的设计模式:https ://www.youtube.com/watch?v=HaEPXoXVf2k
Amazon Dynamo DB 的内部结构:https ://www.youtube.com/watch?v=yvBR71D0nAQ

NoSQL 数据库算法
Hyperloglog 算法:https ://odino.org/my-favorite-data-structure-hyperloglog/
日志结构化合并树:https ://www.cs.umb.edu/~poneil/lsmtree.pdf
排序字符串表和压缩策略:https ://github.com/scylladb/scylla/wiki/SSTable-compaction-and-compaction-strategies
分级压缩 Cassandra:https ://www.datastax.com/blog/leveled-compaction-apache-cassandra
Scylla DB 压缩:https ://github.com/scylladb/scylla/wiki/SSTable-compaction-and-compaction-strategies
Cassandra 中的索引:https ://www.bmc.com/blogs/cassandra-clustering-columns-partition-composite-key/

数据库复制
数据库复制:https ://dev.mysql.com/doc/refman/8.0/en/replication.html
Netflix 数据复制 - 更改数据捕获:https ://netflixtechblog.com/dblog-a-generic-change-data-capture-framework-69351fb9099b
LinkedIn 日志记录用例: https ://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

容器和 Docker
Facebook Twine 容器化:https ://engineering.fb.com/developer-tools/zookeeper-twine/
CloudFlare 容器化:https ://blog.cloudflare.com/cloud-computing-without-containers/
Docker 架构:https ://docs.docker.com/get-started/overview/docker-architecture

容量估算
谷歌容量估算:https ://www.youtube.com/watch?v=modXC5IWTJI
YouTube 2012 的可扩展性:https ://www.youtube.com/watch?v=G-lGCC4KKok
AWS 的信封背面计算:https ://www.youtube.com/watch?v=-3qetLv2Yp0
容量估计:http ://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf

发布者 订阅者
Oracle 发布者订阅者:https ://docs.oracle.com/cd/B10501_01/appdev.920/a96590/adg15pub.htm
亚马逊发布订阅消息:https ://aws.amazon.com/pub-sub-messaging/
异步处理: http: //blog.codepath.com/2013/01/06/asynchronous-processing-in-web-applications-part-2-developers-need-to-understand-message-queues/
异步请求响应:https ://www.enterpriseintegrationpatterns.com/patterns/conversation/RequestResponse.html

事件驱动架构
Martin Fowler - 事件驱动架构:https ://www.youtube.com/watch?v=STKCRSUsyP0
事件驱动架构:https ://martinfowler.com/articles/201701-event-driven.html

六边形架构
六边形架构:https ://netflixtechblog.com/ready-for-changes-with-hexagonal-architecture-b315ec967749

微服务
单体架构:https ://buttercms.com/books/microservices-for-startups/should-you-always-start-with-a-monolith/
单体与微服务: https ://articles.microservices.com/monolithic-vs-microservices-architecture-5c4848858f59
微服务:http : //highscalability.com/blog/2018/4/5/do-you-have-too-many-microservices-five-design-attributes-th.html
Uber Nanoservices 反模式:https ://www.youtube.com/watch?v=kb-m2fasdDY
Uber 面向领域的微服务:https ://eng.uber.com/microservice-architecture/

负载均衡
具有粘性会话的负载均衡器:https ://stackoverflow.com/questions/10494431/sticky-and-non-sticky-sessions
Citrix 什么是负载平衡:https ://www.citrix.com/en-in/solutions/app-delivery-and-security/load-balancing/what-is-load-balancing.html
Nginx 负载平衡:https ://www.nginx.com/resources/glossary/load-balancing/
一致哈希:https ://michaelnielsen.org/blog/consistent-hashing/

警报和异常检测
异常值检测:https ://towardsdatascience.com/outlier-detection-with-isolation-forest-3d190448d45e
异常检测:https ://towardsdatascience.com/machine-learning-for-anomaly-detection-and-condition-monitoring-d4614e7de770
Uber 实时监控和根本原因分析 Argos:https ://eng.uber.com/argos-real-time-alerts/
微软异常检测:https ://www.youtube.com/watch?v=12Xq9OLdQwQ&t=0s
Facebook 数据工程:https ://engineering.fb.com/2016/05/09/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/
LinkedIn 实时警报:https ://engineering.linkedin.com/blog/2019/06/smart-alerts-in-thirdeye--linkedins-real-time-monitoring-platfor
LinkedIn 隔离林:https ://engineering.linkedin.com/blog/2019/isolation-forest

分布式日志记录
Uber 分布式请求跟踪:https ://eng.uber.com/distributed-tracing/
Pintrest 日志记录:https ://medium.com/@Pinterest_Engineering/open-sourcing-singer-pinterests-performant-and-reliable-logging-agent-610fecf35566
谷歌监控基础设施:https ://www.facebook.com/atscaleevents/videos/959344524420015/

指标和文本搜索引擎
Facebook实时文本搜索引擎:https ://www.facebook.com/watch/?v=432864835468
基于弹性搜索时间的查询:https ://www.elastic.co/guide/en/elasticsearch/guide/current/time-based.html
弹性搜索聚合:https ://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations.html

单点故障
避免单点故障:https ://medium.com/the-cloud-architect/patterns-for-resilient-architecture-part-3-16e8601c488e
Netflix 多区域可用性:https ://netflixtechblog.com/active-active-for-multi-regional-resiliency-c47719f6685b
Oracle 单点故障:https ://docs.oracle.com/cd/E19693-01/819-0992/fjdch/index.html
DNS 单点故障 2004:http ://www.tenereillo.com/GSLBPageOfShame.htm
分片:https ://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6

基于位置的服务
谷歌 S2 库:https ://blog.christianperone.com/2015/08/googles-s2-geometry-on-the-sphere-cells-and-hilbert-curve/

批量处理
地图缩减架构:https ://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf

实时流处理
LinkedIn Brooklin-实时数据流:https ://engineering.linkedin.com/blog/2019/brooklin-open-source
Netflix 实时流处理:https ://netflixtechblog.com/keystone-real-time-stream-processing-platform-a3ee651812a
用于 Kafka 的 KSQLDB:https ://docs.ksqldb.io/en/latest/operate-and-deploy/how-it-works/

缓存
谷歌番石榴缓存:https ://github.com/google/guava/wiki/CachesExplained
缓存(参见自述文件):https ://github.com/ben-manes/caffeine/
缓存:http ://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html
微软缓存指南:https://docs.microsoft.com/en-us/previous-versions/msp-np/dn589802(v%3dpandp.10)
缓存模式:https ://hazelcast.com/blog/a-hitchhikers-guide-to-caching-patterns/

分布式共识
Paxos:http: //ifeanyi.co/posts/understanding-consensus/
Raft:https ://raft.github.io/

授权
为企业设计授权模型:https ://cerbos.dev/blog/designing-an-authorization-model-for-an-enterprise