Trying to set up PySpark on Windows. Need help. (hexbear.net)

submitted 2 months ago* (last edited 2 months ago) by Tomorrow_Farewell@hexbear.net to c/technology@hexbear.net

12 comments fedilink hide all child comments

Have been trying to set it up for hours now. Nothing works.

Latest version does not seem to have winutils support, and using it causes errors when using some important methods. (EDIT: this is likely wrong, and the winutils stuff that I have should probably be fine.)
Older versions require to be built with Maven. However, that just gives me a PluginExecutionException.

I need to do this ASAP, preferably within the next 3 hours.

I have nowhere else to ask for help, it seems, especially considering that reddit-logo suspended an account I set up specifically for asking questions after I edited a relevant post.

Highly doubt that anybody will be able to help me.

EDIT2: the issue has, thankfully, been resolved. I was using Python 3.12, and switched to 3.11.8. That made the problem go away.

you are viewing a single comment's thread
view the rest of the comments

[-] Tomorrow_Farewell@hexbear.net 1 points 2 months ago

Just in case, if I install the library the first way, for the same piece of code the logs start with this:

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
root
 |-- a: long (nullable = true)
 |-- b: double (nullable = true)
 |-- c: string (nullable = true)
 |-- d: date (nullable = true)
 |-- e: timestamp (nullable = true)

24/07/22 19:04:46 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:612)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:594)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:789)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCo*removed*tage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:842)
Caused by: java.io.EOFException
	at java.base/java.io.DataInputStream.readInt(DataInputStream.java:398)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774)
	... 26 more

[-] Edie@hexbear.net 2 points 2 months ago* (last edited 2 months ago)

One stackoverflow thread mentions running python 3.12. Are you running 3.12? Does it help if you use python 3.11?

Else, an interesting thing might be the "Caused by: java.io.EOFException", an End Of File Exception.

[-] Tomorrow_Farewell@hexbear.net 2 points 2 months ago

That actually worked. Thank you.

I have got to say that I have a special hatred for arcane programming errors like in this case.

[-] Tomorrow_Farewell@hexbear.net 1 points 2 months ago

I am running 3.12. I have not tried running 3.11. Highly doubt that that will change anything, but I guess I'll try it when I'm able to.

I'm not sure what I have to glean from the EOF exception.

this post was submitted on 22 Jul 2024

10 points (100.0% liked)

technology

23238 readers

251 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 4 years ago

MODERATORS

Jadzia_Dax@hexbear.net

blashork@hexbear.net

context@hexbear.net

EmmaGoldman@hexbear.net

SexUnderSocialism@hexbear.net

gaycomputeruser@hexbear.net

ZoomeristLeninist@hexbear.net