Java企业教程系列

SitemapGen4j:用Java产生Sitemap网站地图

  Sitemap网站地图是一个为网站管理员通知google 百度等搜索网站蜘蛛抓取自己网站内容的简单的方式。网站地图是一个简单的XML文件,包含URL网址和其他一些细节如最后更新时间,以及多久更新URL(URL变化频和网址的优先级。

Sitemap网站地图是提高网站SEO的基础必备之一,在百度或google的网站管理工具中有专门让你输入你网站的sitemap设置,将你的sitemap.xml输入即可,这些搜索爬虫会定期抓取这个sitemap.xml。

Sitemapgen4j是一个使用Java编写的Sitemap输出库包,通过使用SitemapGen4j你可以可以添加任意数量的URL,可以得到gzip压缩输出,可以设置上次更改选项,可以设置优先级选项,可以设置更改频率,可以设定日期格式,可以验证站点地图使用XML架构定义(XSD)

基本使用语法:
WebSitemapGenerator wsg = new WebSitemapGenerator("http://www.example.com", myDir);
wsg.addUrl("http://www.example.com/index.html"); // repeat multiple times
wsg.write();

完整代码:

import java.io.File;
import java.net.MalformedURLException;
import java.util.Date;

import com.redfin.sitemapgenerator.ChangeFreq;
import com.redfin.sitemapgenerator.WebSitemapGenerator;
import com.redfin.sitemapgenerator.WebSitemapUrl;

// Java Code To Generate Sitemap

public class SitemapGeneratorExample {

  public static void main(String[] args) throws MalformedURLException {
    // If you need gzipped output
    WebSitemapGenerator sitemapGenerator = WebSitemapGenerator
        .builder("http://www.javatips.net", new File("C:\\sitemap"))
        .gzip(true).build();

    WebSitemapUrl sitemapUrl = new WebSitemapUrl.Options(
        "http://www.javatips.net/blog/2011/08/findbugs-in-eclipse-java-tutorial")
        .lastMod(new Date()).priority(1.0)
        .changeFreq(ChangeFreq.HOURLY).build();
    // this will configure the URL with lastmod=now, priority=1.0,
    // changefreq=hourly

    // You can add any number of urls here
    sitemapGenerator.addUrl(sitemapUrl);
    sitemapGenerator
        .addUrl("http://www.javatips.net/blog/2011/09/create-sitemap-using-java");
    sitemapGenerator.write();
  }
}

输出效果图:

 

sitemapgen4j国内Jar包下载

不过Sitemapgen4的问题是无法进行流式持续不断输出,只能一次性导出文件,我们利用JDK的JAXBContext编制了流式动态输出,按这里下载。主要代码如下:

protected final static String URLSET_START = "<?xml version='1.0' encoding='UTF-8'?>\n"

         + "<urlset xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n"

         + "         xsi:schemaLocation=\"http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd\"\n"

         + "         xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n";

   protected final static String URLSET_END = "\n</urlset>";

 

   protected final static String SITEMAPINDEX_START = "<?xml version='1.0' encoding='UTF-8'?>\n"

         + "<sitemapindex xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n"

         + "         xsi:schemaLocation=\"http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd\"\n"

         + "         xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n";

   protected final static String SITEMAPINDEX_END = "\n</sitemapindex>";

 

   public static void writeSitemapIndex(Writer writer, Iterator<? extends Sitemap> mappings) throws IOException {

      writeXml(writer, SITEMAPINDEX_START, mappings, SITEMAPINDEX_END);

   }

 

   public static long writeUrlset(Writer writer, Iterator<Url> urls) throws IOException {

      return writeXml(writer, URLSET_START, urls, URLSET_END);

   }

 

   private static long writeXml(Writer writer, String start, Iterator<?> it, String end) throws IOException {

      writer.write(start);

      long count = writeSubtree(writer, it);

      writer.write(end);

      return count;

   }

 

   public static long writeSubtree(Writer writer, Iterator<?> it) throws IOException {

      long size = 0;

      Marshaller m;

      try {

         JAXBContext jc = JAXBContext.newInstance(Sitemap.class, Url.class);

         m = jc.createMarshaller();

         m.setProperty(Marshaller.JAXB_FRAGMENT, true);

         m.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);

      } catch (PropertyException e) {

         throw new DataBindingException(e);

      } catch (JAXBException e) {

         throw new DataBindingException(e);

      }

 

      boolean first = true;

      while (it.hasNext()) {

         if (first)

            first = false;

         else

            writer.write("\n");

         try {

            m.marshal(it.next(), writer);

         } catch (JAXBException e) {

            throw new IOException(e);

         }

         size++;

      }

      return size;

   }

 

   public static void main(String[] args) {

      try {

         Collection<Sitemap> s = new ArrayList();

         s.add(new Sitemap("http://www.example.com/sitemap1.xml.gz"));

         s.add(new Sitemap("http://www.example.com/sitemap2.xml.gz"));

         s.add(new Sitemap("http://www.example.com/sitemap3.xml.gz"));

         s.add(new Sitemap("http://www.example.com/sitemap4.xml.gz"));

 

         Writer wt = new PrintWriter(System.out);

         Test.writeSitemapIndex(wt, s.iterator());

         wt.close();

 

      } catch (Exception e) {

         // TODO Auto-generated catch block

         e.printStackTrace();

      }

 

   }