Java企业教程系列
SitemapGen4j:用Java产生Sitemap网站地图
Sitemap网站地图是一个为网站管理员通知google 百度等搜索网站蜘蛛抓取自己网站内容的简单的方式。网站地图是一个简单的XML文件,包含URL网址和其他一些细节如最后更新时间,以及多久更新URL(URL变化频和网址的优先级。
Sitemap网站地图是提高网站SEO的基础必备之一,在百度或google的网站管理工具中有专门让你输入你网站的sitemap设置,将你的sitemap.xml输入即可,这些搜索爬虫会定期抓取这个sitemap.xml。
Sitemapgen4j是一个使用Java编写的Sitemap输出库包,通过使用SitemapGen4j你可以可以添加任意数量的URL,可以得到gzip压缩输出,可以设置上次更改选项,可以设置优先级选项,可以设置更改频率,可以设定日期格式,可以验证站点地图使用XML架构定义(XSD)
基本使用语法:
WebSitemapGenerator wsg = new WebSitemapGenerator("http://www.example.com", myDir);
wsg.addUrl("http://www.example.com/index.html"); // repeat multiple times
wsg.write();
完整代码:
import java.io.File;
import java.net.MalformedURLException;
import java.util.Date;
import com.redfin.sitemapgenerator.ChangeFreq;
import com.redfin.sitemapgenerator.WebSitemapGenerator;
import com.redfin.sitemapgenerator.WebSitemapUrl;
// Java Code To Generate Sitemap
public class SitemapGeneratorExample {
public static void main(String[] args) throws MalformedURLException {
// If you need gzipped output
WebSitemapGenerator sitemapGenerator = WebSitemapGenerator
.builder("http://www.javatips.net", new File("C:\\sitemap"))
.gzip(true).build();
WebSitemapUrl sitemapUrl = new WebSitemapUrl.Options(
"http://www.javatips.net/blog/2011/08/findbugs-in-eclipse-java-tutorial")
.lastMod(new Date()).priority(1.0)
.changeFreq(ChangeFreq.HOURLY).build();
// this will configure the URL with lastmod=now, priority=1.0,
// changefreq=hourly
// You can add any number of urls here
sitemapGenerator.addUrl(sitemapUrl);
sitemapGenerator
.addUrl("http://www.javatips.net/blog/2011/09/create-sitemap-using-java");
sitemapGenerator.write();
}
}
输出效果图:
不过Sitemapgen4的问题是无法进行流式持续不断输出,只能一次性导出文件,我们利用JDK的JAXBContext编制了流式动态输出,按这里下载。主要代码如下:
protected final static String URLSET_START = "<?xml version='1.0' encoding='UTF-8'?>\n"
+ "<urlset xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n"
+ " xsi:schemaLocation=\"http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd\"\n"
+ " xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n";
protected final static String URLSET_END = "\n</urlset>";
protected final static String SITEMAPINDEX_START = "<?xml version='1.0' encoding='UTF-8'?>\n"
+ "<sitemapindex xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n"
+ " xsi:schemaLocation=\"http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd\"\n"
+ " xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n";
protected final static String SITEMAPINDEX_END = "\n</sitemapindex>";
public static void writeSitemapIndex(Writer writer, Iterator<? extends Sitemap> mappings) throws IOException {
writeXml(writer, SITEMAPINDEX_START, mappings, SITEMAPINDEX_END);
}
public static long writeUrlset(Writer writer, Iterator<Url> urls) throws IOException {
return writeXml(writer, URLSET_START, urls, URLSET_END);
}
private static long writeXml(Writer writer, String start, Iterator<?> it, String end) throws IOException {
writer.write(start);
long count = writeSubtree(writer, it);
writer.write(end);
return count;
}
public static long writeSubtree(Writer writer, Iterator<?> it) throws IOException {
long size = 0;
Marshaller m;
try {
JAXBContext jc = JAXBContext.newInstance(Sitemap.class, Url.class);
m = jc.createMarshaller();
m.setProperty(Marshaller.JAXB_FRAGMENT, true);
m.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
} catch (PropertyException e) {
throw new DataBindingException(e);
} catch (JAXBException e) {
throw new DataBindingException(e);
}
boolean first = true;
while (it.hasNext()) {
if (first)
first = false;
else
writer.write("\n");
try {
m.marshal(it.next(), writer);
} catch (JAXBException e) {
throw new IOException(e);
}
size++;
}
return size;
}
public static void main(String[] args) {
try {
Collection<Sitemap> s = new ArrayList();
s.add(new Sitemap("http://www.example.com/sitemap1.xml.gz"));
s.add(new Sitemap("http://www.example.com/sitemap2.xml.gz"));
s.add(new Sitemap("http://www.example.com/sitemap3.xml.gz"));
s.add(new Sitemap("http://www.example.com/sitemap4.xml.gz"));
Writer wt = new PrintWriter(System.out);
Test.writeSitemapIndex(wt, s.iterator());
wt.close();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}