Details
-
New Feature
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.3.0
-
None
Description
Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
import org.apache.spark.sql.DataFrame import org.apache.spark.sql.types.{LongType, StructField, StructType} import org.apache.spark.sql.Row def dfZipWithIndex( df: DataFrame, offset: Int = 1, colName: String = "id", inFront: Boolean = true ) : DataFrame = { df.sqlContext.createDataFrame( df.rdd.zipWithIndex.map(ln => Row.fromSeq( (if (inFront) Seq(ln._2 + offset) else Seq()) ++ ln._1.toSeq ++ (if (inFront) Seq() else Seq(ln._2 + offset)) ) ), StructType( (if (inFront) Array(StructField(colName,LongType,false)) else Array[StructField]()) ++ df.schema.fields ++ (if (inFront) Array[StructField]() else Array(StructField(colName,LongType,false))) ) ) }
credits: https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex
Attachments
Issue Links
- is related to
-
SPARK-24042 High-order function: zip_with_index
- Resolved