简介:
平时会遇到不同的需求:Json 转化表格;表格转化Json..... 但这里转换的是不规则的表格(如下图),如何转换?
前提:这个表单保存在数据库一个字段里面!
常用方法:
1.JS脚本转换 主要是Jquery等方法,比较好用
2. Python 的模块解析SGMLParser等
3. 安装Nodejs 去解析 服务器端执行(有点大材小用)
表单如下:

HTML内容如下:
<table border="1" cellpadding="0" cellspacing="0" style="border-bottom: medium none; border-left: medium none; border-collapse: collapse; border-top: medium none; border-right: medium none" width="650"><tbody><tr style="height: 30px"><td style="border-bottom: windowtext 1pt solid; border-left: windowtext 1pt solid; padding-bottom: 0cm; padding-left: 5.4pt; width: 215px; padding-right: 5.4pt; height: 30px; border-top: windowtext 1pt solid; border-right: windowtext 1pt solid; padding-top: 0cm"><p align="right" style="text-align: right"><span style="font-family: 宋体"><span style="font-size: 12pt">应用名称<span style="font-size: 12pt"><span style="font-family: calibri">(</span></span><strong><span style="color: red"><span style="font-family: 宋体"><span style="font-size: 12pt">必填</span></span></span></strong><span style="font-size: 12pt"><span style="font-family: calibri">)</span></span></span></span></p></td><td style="border-bottom: windowtext 1pt solid; padding-bottom: 0cm; padding-left: 5.4pt; width: 151px; padding-right: 5.4pt; height: 30px; border-left-color: #f0f0f0; border-top: windowtext 1pt solid; border-right: windowtext 1pt solid; padding-top: 0cm"><p><br></p></td><td style="border-bottom: windowtext 1pt solid; padding-bottom: 0cm; padding-left: 5.4pt; width: 123px; padding-right: 5.4pt; height: ..........................................................................................很长很长。。。。
URL:http://t.mreald.com/py.html
现在使用python 去解析:
1. 常用的解析模块:
HTMLParser、SGMLParser、pyQuery、BeautifulSoup
下载:http://www.crummy.com/software/BeautifulSoup/bs4/download/
文档:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#
2.现在使用BeautifulSoup
代码如下:
import urllib2
from bs4 import BeautifulSoup
content = urllib2.urlopen('http://t.mreald.com/py.html').read()
soup = BeautifulSoup(content, 'html.parser')
#print(soup.prettify())
i=0
j=0
for tritem in soup.find_all('tr'):
if i in [0,5,6,7,8,9,10,11,12]:
print tritem.find_all('td')[0].get_text()+' '+tritem.find_all('td')[1].get_text()
i+=1;continue
elif i == 4:
print tritem.find_all('td')[1].get_text()+' '+tritem.find_all('td')[2].get_text()
i+=1; continue
elif i == 3:
print tritem.find_all('td')[0].get_text()+' '+tritem.find_all('td')[1].get_text().strip(' ')
#print tritem.find_all('td')[0].get_text()+' '+tritem.find_all('td')[1].get_text().strip(' ')+tritem.find_all('td')[2].get_text()+tritem.find_all('td')[3].get_text()
i+=1; continue
else:
print tritem.find_all('td')[0].get_text()+' '+tritem.find_all('td')[1].get_text()
print tritem.find_all('td')[2].get_text()+' '+tritem.find_all('td')[3].get_text()
i+=1; continue执行结果:

参考资料:
http://www.cnblogs.com/bluestorm/archive/2011/06/20/2298174.html
http://www.cnblogs.com/whitewolf/archive/2013/02/27/2935618.html
