Spark(8)Non Fat Jar/Cassandra Cluster Issue and Spark Version 1.3.1

Spark(8)NonFatJar/CassandraClusterIssueandSparkVersion1.3.1

1.CanupgradetoJava8?

FixtheBouncyCastleProviderProblem

Visithttps://www.bouncycastle.org/latest_releases.html,downloadthefilebcprov-jdk15on-152.jar

Placethefileindirectory

/usr/lib/jvm/java-8-oracle/jre/lib/ext

Andthengotothisdirectory

/usr/lib/jvm/java-8-oracle/jre/lib/security

editthisfile

sudovijava.security

Addthisline

security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider

Ishoulddownloadthisfile

http://repo1.maven.org/maven2/org/bouncycastle/bcprov-jdk15%2b/1.46/bcprov-jdk15%2b-1.46.jar

FixtheJCEProblem

Downloadthefilefromhere

http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html

Unzipthefileandplacethejarsinthisdirectory

/usr/lib/jvm/java-8-oracle/jre/lib/security

2.FatJar?

https://github.com/apache/spark/pull/288?

https://issues.apache.org/jira/browse/SPARK-1154

http://apache-spark-user-list.1001560.n3.nabble.com/Clean-up-app-folders-in-worker-nodes-td20889.html

https://spark.apache.org/docs/1.0.1/spark-standalone.html

Basedonmyunderstanding,weshouldkeepusingassemblyjarinscala,submitthetaskjobtomaster,itwilldistributethejobstosparkstandaloneclusterorYARNcluster.Theclientsshouldnotrequireanysettinguporjardependencies.

3.ClusterSyncIssueinCassandra1.2.13

http://stackoverflow.com/questions/23345045/cassandra-cas-delete-does-not-work

http://wiki.apache.org/cassandra/DistributedDeletes

Needtousentpdtosynctheclock

https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/

https://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/

ClusterofCassandra,allthenodeswilldowriteoperationwithtimestamp,ifthesystemtimearedifferentacrosstheclusternodes.Thecassandracanrunintowiredstatus.Sometimes,delete,updatecannotwork.

4.Upgradeto1.3.1Version

https://spark.apache.org/docs/latest/

DownloadtheSparksourcefile

>wgethttp://apache.cs.utah.edu/spark/spark-1.3.1/spark-1.3.1.tgz

Unzipandplacethesparkfileinworkingdirectory

>sudoln-s/opt/spark-1.3.1/opt/spark

MyJavaversionandScalaversionareasfollow:

>java-version

javaversion"1.8.0_25"

Java(TM)SERuntimeEnvironment(build1.8.0_25-b17)

JavaHotSpot(TM)64-BitServerVM(build25.25-b02,mixedmode)

>scala-version

Buildthebinary

>build/sbtclean

>build/sbtcompile

Compileisnotworkingforlackofdependencies.Iwillnotspendtimeonthat.Iwilldirectlydownloadthebinary.

>wgethttp://www.motorlogy.com/apache/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.6.tgz

Unzipitandaddittotheclasspath.

Thenmyprojectsillycat-sparkcaneasilyrun.

SimpleSparkCluster

downloadthesourcefile

>wgethttp://apache.cs.utah.edu/spark/spark-1.3.1/spark-1.3.1.tgz

buildthesource

>build/sbtclean

>build/sbtcompile

Notbuildonubuntuaswell.Usingbinaryinstead.

>wgethttp://www.motorlogy.com/apache/spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.6.tgz

PrepareConfiguration

GototheCONFdirectory.

>cpspark-env.sh.templatespark-env.sh

>cpslaves.templateslaves

>catslaves

#ASparkWorkerwillbestartedoneachofthemachineslistedbelow.

ubuntu-dev1

ubuntu-dev2

>catspark-env.sh

exportSPARK_WORKER_MEMORY=768m

exportSPARK_JAVA_OPTS="-Dbuild.env=lmm.sparkvm"

exportUSER=carl

copythesamesettingstoalltheslaves

>scp-rubuntu-master:/home/carl/tool/spark-1.3.1-hadoop2.6./

Calltheshelltostartthestandalonecluster

>sbin/start-all.sh

Howtobuild

https://spark.apache.org/docs/1.1.0/building-with-maven.html

>mvn-DskipTestscleanpackage

Buildsuccessfully.

BuildwithYarnandhiveandJDBCsupport

>mvn-Pyarn-Phadoop-2.4-Dhadoop.version=2.6.0-Phive-DskipTestscleanpackage

Gotodirectory

>mvn-Pyarn-Phadoop-2.4-Dhadoop.version=2.6.0-Phive-DskipTestscleanpackageinstall

ErrorMessage:

[ERROR]Failedtoexecutegoalorg.scalastyle:scalastyle-maven-plugin:0.4.0:check(default)onprojectspark-assembly_2.10:Failedduringscalastyleexecution:Unabletofindconfigurationfileatlocationscalastyle-config.xml->[Help1]

Solution:

copythe[spark_root]/scalastyle-config.xmlto[spark_root]/examples/scalastyle-config.xmlcansolvetheproblem

>mvn-Pyarn-Phadoop-2.4-Dhadoop.version=2.6.0-Phive-Pbigtop-dist-DskipTestscleanpackage

>mvn-Pyarn-Phadoop-2.4-Dhadoop.version=2.6.0-Phive-DskipTestscleanpackageinstall

ChangesinResolver.scala

varmavenLocal=Resolver.mavenLocal

Isetitupandrunningonbatchmodeonsparksingleclusterandyarncluster.IwillkeepworkingonstreamingmodeanddynamicSQL.

Allthebasedcorecodesareinprojectsillycat-sparknow.

References:

Spark

Sparkdeployment

sparktest

http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/

http://stackoverflow.com/questions/26170957/using-funsuite-to-test-spark-throws-nullpointerexception

http://blog.quantifind.com/posts/spark-unit-test/

sparkdocs

http://www.sparkexpert.com/

https://github.com/sujee81/SparkApps

http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/

http://dataunion.org/category/tech/spark-tech

http://dataunion.org/6308.html

http://endymecy.gitbooks.io/spark-programming-guide-zh-cn/content/spark-sql/README.html

http://zhangyi.farbox.com/post/access-postgresql-based-on-spark-sql

https://github.com/mkuthan/example-spark.git

Spark(8)Non Fat Jar/Cassandra Cluster Issue and Spark Version 1.3.1

相关推荐