从Spark1转换到Spark2,总体逻辑可能没有什么变化,但是小细节会有很多坑
连接MySQL报No suitable driver
1 2 3 4 5 6 7 8 9 10 11
| df.registerTempTable("demo") val prop = new java.util.Properties prop.setProperty("user","root") prop.setProperty("password","123456") prop.setProperty("driver","com.mysql.jdbc.Driver") prop.setProperty("url","jdbc:mysql://127.0.0.1:3306/test") sqlContext.sql("select name,age,sex from demo") .write .mode(org.apache.spark.sql.SaveMode.Append) .jdbc(prop.getProperty("url"),"demo",prop)
|
Spark生成一个空的DataFrame
1 2 3 4 5 6 7 8 9 10 11 12 13
| # 在1中可以直接用null,2中就不可以了 var df:DataFrame = null
# 生成一个无列的空DataFrame var df = spark.emptyDataFrame
# 生成一个有列的空DataFrame val schema = StructType( Seq( StructField("lie1", StringType, true), StructField("lie2", StringType, true), StructField("lie3", StringType, true))) val df = spark.createDataFrame(sc.emptyRDD[Row], schema)
|
写Hive速度极其慢
1 2
| 使用insertinto,字段顺序与hive表字段顺序不一致导致 修改顺序后,速度正常
|