性能主题

使用Varnish加速Web性能

  Varnish是一个受欢迎的缓存HTTP反向代理。Nginx虽然是作为一个伟大的反向代理,它没有缓存。当网站或Web应用程序静态内容很多时需要缓存。动态的Web应用程序可能不会缓存得到优化。

  Varnish位于所有HTTP兼容的服务器之前,它可以被配置为选择性地缓存相应内容。它提供缓存的内容几乎肯定会比直接到后端的请求要快。

  Varnish会自动尝试缓存任何请求处理一下情况除外:

  1. 它不会缓存请求中所包含的Cookie标头授权的标题
  2. 它不会缓存请求后端响应中表明不应该被缓存(例如高速缓存控制无缓存
  3. 它只会缓存GET和HEAD请求
  4. 缓存默认值120秒。取决于所请求的资源的类型可能需要调整
  5. 提供任何失效期/修改/缓存控制头,除非被后端响应覆盖

  必须了解Varnish几个子程序:

  • vcl_recv:在请求开始叫。决定是否为一个请求服务,是否修改请求使用哪个后端
  • vcl_hash请求创建一个哈希数据,作为对象缓存中该请求的识别(key/value)。
  • vcl_pass:如果配置pass模式被调用,当前请求传递到后端并响应是没有被缓存的。
  • vcl_hit:类似缓存的击中
  • vcl_miss类似缓存的丢失
  • vcl_fetch:当在资源从后端被获取时调用,决定是否缓存后端响应,以及如何做,是否缓存时修改对象
  • vcl_deliver一个缓存对象传递给一个客户前被调用
  • vcl_pipe:如果管道模式被初始化时调用,在管道模式当前的连接请求通过直接到后端响应同样返回没有被缓存直到连接关闭

  Varnish每个子程序对应一个Action:

  • deliver: 插入对象到缓存中,最终流程到vcl_deliver.
  • error: 返回错误代码给客户端。
  • fetch: 从后端抓取请求的对象,最终流程是vcl_deliver.
  • hit_for_pass:创建一个传递对象缓存命中的事实,这个对象应该被通过了。最终将达到vcl_deliver
  • lookup: 查找对象缓存中请求。最终将通过达到vclhitvclmiss取决于对象存在于缓存
  • pass: 激活pass模式。最终将达到vcl_pass
  • pipe: 管模式激活。最终将达到vcl_pipe
  • hash: 创建请求数据的哈希缓存中查找相关的对象

上述状态和动作的流程:

varnish流程图

VCL配置案例:

vcl_recv

 

sub vcl_recv {

  # Many requests contain Accept-Encoding HTTP headers. We standardize and remove these when unnecessary to make it easier to cache requests
  if (req.http.Accept-Encoding) {
    # If the request URL has any of these extensions, remove the Accept-Encoding header as it is meaningless
    if (req.url ~ '\.(gif|jpg|jpeg|swf|flv|mp3|mp4|pdf|ico|png|gz|tgz|bz2)(\?.*|)$') {
      remove req.http.Accept-Encoding;
    # If the Accept-Encoding contains 'gzip' standardize it.
    } elsif (req.http.Accept-Encoding ~ 'gzip') {
      set req.http.Accept-Encoding = 'gzip';
    # If the Accept-Encoding contains 'deflate' standardize it.
    } elsif (req.http.Accept-Encoding ~ 'deflate') {
      set req.http.Accept-Encoding = 'deflate';
    # If the Accept-Encoding header isn't matched above, remove it.
    } else {
      remove req.http.Accept-Encoding;
    }
  }

  # Many requests contain cookies on requests for resources which cookies don't matter -- such as static images or documents.
  if (req.url ~ '\.(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico|png)(\?.*|)$') {
    # Remove cookies from these resources, and remove any attached query strings.
    unset req.http.cookie;
    set req.url = regsub(req.url, '\?.*$', '');
  }

  # Certain cookies (such as for Google Analytics) are client-side only, and don't matter to our web application.
  if (req.http.cookie) {
    # If a request contains cookies we care about, don't cache it (return pass).
    if (req.http.cookie ~ '(mycookie1|important-cookie|myidentification-cookie)') {
      return(pass);
    } else {
    # Otherwise, remove the cookie.
      unset req.http.cookie;
    }
  }
}

 

vcl_fetch

sub vcl_fetch {

  # If the URL is for our login page, we never want to cache the page itself.
  if (req.url ~ '/login' || req.url ~ 'preview=true') {
    # But, we can cache the fact that we don't want this page cached (return hit_for_pass).
    return (hit_for_pass);
  }

  # If the URL is for our non-admin pages, we always want them to be cached.
  if ( ! (req.url ~ '(/admin|/login)') ) {
    # Remove cookies...
    unset beresp.http.set-cookie;
    # Cache the page for 1 day
    set beresp.ttl = 86400s;
    # Remove existing Cache-Control headers...
    remove beresp.http.Cache-Control;
    # Set new Cache-Control headers for brwosers to store cache for 7 days
    set beresp.http.Cache-Control = 'public, max-age=604800';
  }

  # If the URL is for one of static images or documents, we always want them to be cached.
  if (req.url ~ '\.(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico|png)(\?.*|)$') {
    # Remove cookies...
    unset beresp.http.set-cookie;
    # Cache the page for 365 days.
    set beresp.ttl = 365d;
    # Remove existing Cache-Control headers...
    remove beresp.http.Cache-Control;
    # Set new Cache-Control headers for browser to store cache for 7 days
    set beresp.http.Cache-Control = 'public, max-age=604800';
  }
}

vcl_deliver

sub vcl_deliver {

  # Sometimes it's nice to see when content has been served from the cache. 
  if (obj.hits > 0) {
    # If the object came from the cache, set an HTTP header to say so
    set resp.http.X-Cache = 'HIT';
  } else {
    set resp.http.X-Cache = 'MISS';
  }

  # For security and asthetic reasons, remove some HTTP headers before final delivery...
  remove resp.http.Server;
  remove resp.http.X-Powered-By;
  remove resp.http.Via;
  remove resp.http.X-Varnish;
}

Web缓存教程

使用Nginx实现负载平衡和反向代理

Varnish vs. Squid