- Developer Notes for Krangl
- Interactive shell
- Potentially useful libraries
- Design
- Comparison to other APIs
NaN vs Non in pandas: https://stackoverflow.com/questions/17534106/what-is-the-difference-between-nan-and-none
// //class Foo{ // fun bar(predicate: (Int) -> String){} // // fun bar(predicate: (Int) -> Boolean){} //}
kscript -i - <<"EOF"
//DEPS de.mpicbg.scicomp:krangl:0.9-SNAPSHOT
EOF
- https://github.com/mplatvoet/progress
- https://github.com/SalomonBrys/Kodein cool dependency injection
- https://github.com/hotchemi/khronos date extension
- https://github.com/zeroturnaround/zt-exec cool process builder api
https://stackoverflow.com/questions/45090808/intarray-vs-arrayint-in-kotlin --> bottom line: Array<*> can be null
How to write vector utilties?
dataFrame.summarize("mean_salary") { mean(it["salaray"]) } // function parameter
dataFrame.summarize("mean_salary") { it["salaray"].mean() } // extension/member function
dataFrame.summarize("mean_salary") { it["salaray"].mean } // extension property
???
Don't overload operator Any?.plus
--> Confusion
https://kotlinlang.org/docs/reference/operator-overloading.html
create fresh gradle wrapper with:
gradle wrapper --gradle-version 4.2.1
From twosigma/beakerx#5135: Split repos?
It is a bad idea. Many different repos are hard to maintain. And you do not need this. Gradle allows to publish separate artifacts without splitting repository.
you can usegradle :kernel:base:<whatever>
instead ofcd
.
To Improve JVM compatibility use JvmName to allow for more strongly typed
@JvmName("mutateString")
fun DataFrame.mutate(name: String, formula: (DataFrame) -> List<String>): DataFrame {
if(this is SimpleDataFrame){
return addColumn(StringCol(name, formula(this)))
}else
throw UnsupportedOperationException()
}
And the same in pandas
. {PR needed here}
rename()
will preserve column positions whereasdplyr::rename
add renamed columns to the end of the table- The mapping order is inverted in
rename()
. Instead ofthe krangl syntax is inverted to be more readibledplyr::rename(data, new_name=old_name)
data.rename("old_name" to "new_name")
sortedBy()
will sort by grouping attributes first, and then per group with the provided sorting attributes.select()
does not silently ignore multiple selections of the same column, but throws an error insteadselect()
will throw an error if a grouping column is being removed (see dplyr ticket)
From spark release notes:
Unifying DataFrames and Datasets in Scala/Java: Starting in Spark 2.0, DataFrame is just a type alias for Dataset of Row. Both the typed methods (e.g. map, filter, groupByKey) and the untyped methods (e.g. select, groupBy) are available on the Dataset class. Also, this new combined Dataset interface is the abstraction used for Structured Streaming. Since compile-time type-safety in Python and R is not a language feature, the concept of Dataset does not apply to these languages’ APIs. Instead, DataFrame remains the primary programing abstraction, which is analogous to the single-node data frame notion in these languages. Get a peek from a Dataset API notebook.
- https://github.com/jtablesaw/tablesaw which is the supposedly The simplest way to slice data in Java
Feature | Krangl | TableSaw |
---|---|---|
Kotlin API | Yes | Yes |
Add column | df. | |
Select columns by type |
Select columns by type
- krangl
df.select(
- tablesaw
val df = Dataframe(df.structure().target.selectWhere(column("Column Type").isEqualTo("INTEGER")))
dev scratchpad
export KRANGL_HOME=/d/projects/misc/krangl/
cd $KRANGL_HOME/..
# start kernel
cmd.exe "/K" C:\Users\brandl\Anaconda3\Scripts\activate.bat C:\Users\brandl\Anaconda3
# no longer needed becaue no part of ipynb preamble
#rm -rf ~/.ivy2/cache/com.systema/
#rm -rf ~/.ivy2/cache/org.kalasim/
#rm -rf ~/.ivy2/cache/com.github.holgerbrandl/kravis/
#conda install -c jetbrains kotlin-jupyter-kernel
# interactive use
jupyter notebook --kernel=kotlin
#jupyter notebook --kernel=kotlin examples/jupyter/letsplot_example.ipynb
References