C#将Word转换成HMLT格式后 多余的Word标签怎么处理?
C#将Word转换成HMLT格式后 有很多 多余的标签 怎么删除标签和标签里 用不到的属性 以及属性值。如:
<table class=MsoTableGrid border=1 cellspacing=0 cellpadding=0
style='border-collapse:collapse;border:none;mso-border-alt:solid windowtext .5pt;
mso-yfti-tbllook:480;mso-padding-alt:0cm 5.4pt 0cm 5.4pt;mso-border-insideh:
.5pt solid windowtext;mso-border-insidev:.5pt solid windowtext'>
<tr style='mso-yfti-irow:0;mso-yfti-firstrow:yes'>
<td width=189 valign=top style='width:142.0pt;border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;padding:0cm 5.4pt 0cm 5.4pt'>
<p class=MsoNormal><span style='font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>姓名</span></p>
</td>
<td width=189 valign=top style='width:142.05pt;border:solid windowtext 1.0pt;
border-left:none;mso-border-left-alt:solid windowtext .5pt;mso-border-alt:
solid windowtext .5pt;padding:0cm 5.4pt 0cm 5.4pt'>
<p class=MsoNormal><span style='font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>性别</span></p>
</td>
<td width=189 valign=top style='width:142.05pt;border:solid windowtext 1.0pt;
border-left:none;mso-border-left-alt:solid windowtext .5pt;mso-border-alt:
solid windowtext .5pt;padding:0cm 5.4pt 0cm 5.4pt'>
<p class=MsoNormal><span style='font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>年龄</span></p>
</td>
</tr>
<tr style='mso-yfti-irow:1;mso-yfti-lastrow:yes'>
<td width=189 valign=top style='width:142.0pt;border:solid windowtext 1.0pt;
border-top:none;mso-border-top-alt:solid windowtext .5pt;mso-border-alt:solid windowtext .5pt;
padding:0cm 5.4pt 0cm 5.4pt'>
<p class=MsoNormal><span style='font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>张三</span></p>
</td>
<td width=189 valign=top style='width:142.05pt;border-top:none;border-left:
none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
mso-border-top-alt:solid windowtext .5pt;mso-border-left-alt:solid windowtext .5pt;
mso-border-alt:solid windowtext .5pt;padding:0cm 5.4pt 0cm 5.4pt'>
<p class=MsoNormal><span style='font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>男</span></p>
</td>
<td width=189 valign=top style='width:142.05pt;border-top:none;border-left:
none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;
mso-border-top-alt:solid windowtext .5pt;mso-border-left-alt:solid windowtext .5pt;
mso-border-alt:solid windowtext .5pt;padding:0cm 5.4pt 0cm 5.4pt'>
<p class=MsoNormal><span lang=EN-US>20</span></p>
</td>
</tr>
</table>
转换后:
<table class=MsoTableGrid border=1 cellspacing=0 cellpadding=0>
<td >姓名</td>
<td >性别 </td>
<td>年龄 </td>
</tr>
<td >张三 </td>
<td >男 </td>
<td >20 </td>
</tr>
</table> --------------------编程问答-------------------- 用正则表达式,替换就行,网上有N个例子
--------------------编程问答-------------------- 我找了一些 可是不行啊! 谁有转换实例程序源码 供参考一下! 谢谢啦! --------------------编程问答-------------------- google --------------------编程问答-------------------- 我也碰到同样的问题,我也是需要用C#读取WORD里不规则表格转换成HTML语言,但是我用了COM组件,只能读规则表格,但是碰到合并和拆分过的单元格就不行了,请问有什么好的解决办法,只能全部转成HTML语言才行过滤才行,有空讨论一下邮件ndscwsy@tom.com,QQ39299373 --------------------编程问答-------------------- 顶一下,顶一下 --------------------编程问答--------------------
会正则的话,直接写很简单
不会正则的话,给你例子你也和看天书一样 --------------------编程问答-------------------- 自己写正则表达式转换吧
补充:.NET技术 , C#