使用Python
用到的包:xml.dom.minidom
需求:
有一个表,里面数据量比较大,每天一更新,其字段可以通过配置文件进行配置,即,可能每次建表的字段不一样。
上游跑时会根据配置从源文件中提取,到入库这一步需要根据配置进行建表。
解决:
写了一个简单的xml,配置需要字段及类型
上游读取到对应的数据
入库这一步,先把原表删除,根据配置建新表
XML文件
- <?xml version="1.0" encoding="UTF-8"?>
-
- <table name="top_query" db_name="evaluting_sys">
-
- <primary_key>
- <name>id</name>
- </primary_key>
-
- <field>
- <name>query</name>
- <type>varchar(200)</type>
- <is_index>false</is_index>
- <description>query</description>
- </field>
- <field>
- <name>pv</name>
- <type>integer</type>
- <is_index>false</is_index>
- <description>pv</description>
- </field>
- <field>
- <name>avg_money</name>
- <type>integer</type>
- <is_index>false</is_index>
- <description></description>
- </field>
-
- </table>
|
脚本:
- #!/usr/bin/python
- # -*- coding:utf-8 -*-
- #author: ken
- #desc: use to read db xml config.
- #-----------------------
- #2012-02-18 created
-
- #----------------------
-
- import sys,os
- from xml.dom import minidom, Node
-
- def read_dbconfig_xml(xml_file_path):
- content = {}
-
- root = minidom.parse(<span style="background-color: rgb(255, 255, 255); ">xml_file_path)</span>
- table = root.getElementsByTagName("table")[0]
-
- #read dbname and table name.
- tabletable_name = table.getAttribute("name")
- db_name = table.getAttribute("db_name")
-
- if len(table_name) > 0 and len(db_name) > 0:
- db_sql = "create database if not exists `" + db_name +"`; use " + db_name + ";"
- table_drop_sql = "drop " + table_name + " if exists " + table_name + ";"
- content.update({"db_sql" : db_sql})
- content.update({"table_sql" : table_drop_sql })
- else:
- print "Error:attribute is not define well! db_name=" + db_name + " ;table_name=" + table_name
- sys.exit(1)
- #print table_name, db_name
-
- table_create_sql = "create table " + table_name +"("
-
- #read primary cell
- primary_key = table.getElementsByTagName("primary_key")[0]
- primary_keyprimary_key_name = primary_key.getElementsByTagName("name")[0].childNodes[0].nodeValue
-
- table_create_sql += primary_key_name + " INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,"
-
- #print primary_key.toxml()
- #read ordernary field
- fields = table.getElementsByTagName("field")
- f_index = 0
- for field in fields:
- f_index += 1
- name = field.getElementsByTagName("name")[0].childNodes[0].nodeValue
- type = field.getElementsByTagName("type")[0].childNodes[0].nodeValue
- table_create_sql += name + " " + type
- if f_index != len(fields):
- table_create_sql += ","
- is_index = field.getElementsByTagName("is_index")[0].childNodes[0].nodeValue
-
- table_create_sql += ");"
- content.update({"table_create_sql" : table_create_sql})
- #character set latin1 collate latin1_danish_ci;
- print content
-
-
- if __name__ == "__main__":
- read_dbconfig_xml(sys.argv[1])
|
涉及方法:
root = minidom.parse(xml_file_path) 获取dom对象
root.getElementsByTagName("table") 根据tag获取节点列表
table.getAttribute("name") 获取属性
primary_key.getElementsByTagName("name")[0].childNodes[0].nodeValue 获取子节点的值(<name>id</name> 得到id)
-----------------------------------------------------
简单的读取就是这样,这里只是简单使用了下
后续需去了解下python读取xml的几个库
xml.dom.minidom
xml.etree.ElementTree
libxml2
lxml.etree
xpath