10
submitted 2 months ago* (last edited 2 months ago) by Tomorrow_Farewell@hexbear.net to c/technology@hexbear.net

Have been trying to set it up for hours now. Nothing works.

  • Latest version does not seem to have winutils support, and using it causes errors when using some important methods. (EDIT: this is likely wrong, and the winutils stuff that I have should probably be fine.)
  • Older versions require to be built with Maven. However, that just gives me a PluginExecutionException.

I need to do this ASAP, preferably within the next 3 hours.

I have nowhere else to ask for help, it seems, especially considering that reddit-logo suspended an account I set up specifically for asking questions after I edited a relevant post.

Highly doubt that anybody will be able to help me.

EDIT2: the issue has, thankfully, been resolved. I was using Python 3.12, and switched to 3.11.8. That made the problem go away.

you are viewing a single comment's thread
view the rest of the comments
[-] Tomorrow_Farewell@hexbear.net 1 points 2 months ago

Just in case, if I install the library the first way, for the same piece of code the logs start with this:

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
root
 |-- a: long (nullable = true)
 |-- b: double (nullable = true)
 |-- c: string (nullable = true)
 |-- d: date (nullable = true)
 |-- e: timestamp (nullable = true)

24/07/22 19:04:46 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:612)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:594)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:789)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCo*removed*tage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:842)
Caused by: java.io.EOFException
	at java.base/java.io.DataInputStream.readInt(DataInputStream.java:398)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774)
	... 26 more
[-] Edie@hexbear.net 2 points 2 months ago* (last edited 2 months ago)

One stackoverflow thread mentions running python 3.12. Are you running 3.12? Does it help if you use python 3.11?

Else, an interesting thing might be the "Caused by: java.io.EOFException", an End Of File Exception.

[-] Tomorrow_Farewell@hexbear.net 2 points 2 months ago

That actually worked. Thank you.

I have got to say that I have a special hatred for arcane programming errors like in this case.

[-] Tomorrow_Farewell@hexbear.net 1 points 2 months ago

I am running 3.12. I have not tried running 3.11. Highly doubt that that will change anything, but I guess I'll try it when I'm able to.

I'm not sure what I have to glean from the EOF exception.

this post was submitted on 22 Jul 2024
10 points (100.0% liked)

technology

23238 readers
251 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 4 years ago
MODERATORS