starksummer 2015-05-20
Spark(8)NonFatJar/CassandraClusterIssueandSparkVersion1.3.1
1.CanupgradetoJava8?
FixtheBouncyCastleProviderProblem
Visithttps://www.bouncycastle.org/latest_releases.html,downloadthefilebcprov-jdk15on-152.jar
Placethefileindirectory
/usr/lib/jvm/java-8-oracle/jre/lib/ext
Andthengotothisdirectory
/usr/lib/jvm/java-8-oracle/jre/lib/security
editthisfile
sudovijava.security
Addthisline
security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider
Ishoulddownloadthisfile
http://repo1.maven.org/maven2/org/bouncycastle/bcprov-jdk15%2b/1.46/bcprov-jdk15%2b-1.46.jar
FixtheJCEProblem
Downloadthefilefromhere
http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html
Unzipthefileandplacethejarsinthisdirectory
/usr/lib/jvm/java-8-oracle/jre/lib/security
2.FatJar?
https://github.com/apache/spark/pull/288?
https://issues.apache.org/jira/browse/SPARK-1154
http://apache-spark-user-list.1001560.n3.nabble.com/Clean-up-app-folders-in-worker-nodes-td20889.html
https://spark.apache.org/docs/1.0.1/spark-standalone.html
Basedonmyunderstanding,weshouldkeepusingassemblyjarinscala,submitthetaskjobtomaster,itwilldistributethejobstosparkstandaloneclusterorYARNcluster.Theclientsshouldnotrequireanysettinguporjardependencies.
3.ClusterSyncIssueinCassandra1.2.13
http://stackoverflow.com/questions/23345045/cassandra-cas-delete-does-not-work
http://wiki.apache.org/cassandra/DistributedDeletes
Needtousentpdtosynctheclock
https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/
https://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/
ClusterofCassandra,allthenodeswilldowriteoperationwithtimestamp,ifthesystemtimearedifferentacrosstheclusternodes.Thecassandracanrunintowiredstatus.Sometimes,delete,updatecannotwork.
4.Upgradeto1.3.1Version
https://spark.apache.org/docs/latest/
DownloadtheSparksourcefile
>wgethttp://apache.cs.utah.edu/spark/spark-1.3.1/spark-1.3.1.tgz
Unzipandplacethesparkfileinworkingdirectory
>sudoln-s/opt/spark-1.3.1/opt/spark
MyJavaversionandScalaversionareasfollow:
>java-version
javaversion"1.8.0_25"
Java(TM)SERuntimeEnvironment(build1.8.0_25-b17)
JavaHotSpot(TM)64-BitServerVM(build25.25-b02,mixedmode)
>scala-version
Scalacoderunnerversion2.10.4--Copyright2002-2013,LAMP/EPFL
Buildthebinary
>build/sbtclean
>build/sbtcompile
Compileisnotworkingforlackofdependencies.Iwillnotspendtimeonthat.Iwilldirectlydownloadthebinary.
>wgethttp://www.motorlogy.com/apache/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.6.tgz
Unzipitandaddittotheclasspath.
Thenmyprojectsillycat-sparkcaneasilyrun.
SimpleSparkCluster
downloadthesourcefile
>wgethttp://apache.cs.utah.edu/spark/spark-1.3.1/spark-1.3.1.tgz
buildthesource
>build/sbtclean
>build/sbtcompile
Notbuildonubuntuaswell.Usingbinaryinstead.
>wgethttp://www.motorlogy.com/apache/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.6.tgz
PrepareConfiguration
GototheCONFdirectory.
>cpspark-env.sh.templatespark-env.sh
>cpslaves.templateslaves
>catslaves
#ASparkWorkerwillbestartedoneachofthemachineslistedbelow.
ubuntu-dev1
ubuntu-dev2
>catspark-env.sh
exportSPARK_WORKER_MEMORY=768m
exportSPARK_JAVA_OPTS="-Dbuild.env=lmm.sparkvm"
exportUSER=carl
copythesamesettingstoalltheslaves
>scp-rubuntu-master:/home/carl/tool/spark-1.3.1-hadoop2.6./
Calltheshelltostartthestandalonecluster
>sbin/start-all.sh
Howtobuild
https://spark.apache.org/docs/1.1.0/building-with-maven.html
>mvn-DskipTestscleanpackage
Buildsuccessfully.
BuildwithYarnandhiveandJDBCsupport
>mvn-Pyarn-Phadoop-2.4-Dhadoop.version=2.6.0-Phive-DskipTestscleanpackage
Gotodirectory
>mvn-Pyarn-Phadoop-2.4-Dhadoop.version=2.6.0-Phive-DskipTestscleanpackageinstall
ErrorMessage:
[ERROR]Failedtoexecutegoalorg.scalastyle:scalastyle-maven-plugin:0.4.0:check(default)onprojectspark-assembly_2.10:Failedduringscalastyleexecution:Unabletofindconfigurationfileatlocationscalastyle-config.xml->[Help1]
Solution:
copythe[spark_root]/scalastyle-config.xmlto[spark_root]/examples/scalastyle-config.xmlcansolvetheproblem
>mvn-Pyarn-Phadoop-2.4-Dhadoop.version=2.6.0-Phive-Pbigtop-dist-DskipTestscleanpackage
>mvn-Pyarn-Phadoop-2.4-Dhadoop.version=2.6.0-Phive-DskipTestscleanpackageinstall
ChangesinResolver.scala
varmavenLocal=Resolver.mavenLocal
Isetitupandrunningonbatchmodeonsparksingleclusterandyarncluster.IwillkeepworkingonstreamingmodeanddynamicSQL.
Allthebasedcorecodesareinprojectsillycat-sparknow.
References:
Spark
Sparkdeployment
sparktest
http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/
http://stackoverflow.com/questions/26170957/using-funsuite-to-test-spark-throws-nullpointerexception
http://blog.quantifind.com/posts/spark-unit-test/
sparkdocs
http://www.sparkexpert.com/
https://github.com/sujee81/SparkApps
http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/
http://dataunion.org/category/tech/spark-tech
http://dataunion.org/6308.html
http://endymecy.gitbooks.io/spark-programming-guide-zh-cn/content/spark-sql/README.html
http://zhangyi.farbox.com/post/access-postgresql-based-on-spark-sql
https://github.com/mkuthan/example-spark.git