Skip to content

Commit

Permalink
updates to README and small bug fix
Browse files Browse the repository at this point in the history
  • Loading branch information
orellabac committed Dec 15, 2023
1 parent 413c34b commit f1dcb1e
Show file tree
Hide file tree
Showing 6 changed files with 42 additions and 9 deletions.
6 changes: 6 additions & 0 deletions CHANGE_LOG.txt
Original file line number Diff line number Diff line change
Expand Up @@ -182,3 +182,9 @@ Version 0.0.34
Version 0.0.35
--------------
- added functions.to_utc_timestamp extension

Version 0.0.36
--------------
- fixing issue with the `%%sql` magics that was causing a double execution
- update README.md for PMML
- update for the `extras/wheel_loader` thanks Karol Tarcak
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@

Snowpark by itself is a powerful library, but still some utility functions can always help.

**NOTE: we have working to integrate some of the snowpark extensions directly into the snowpark-python library.

BTW what about `Java`/ `Scala`/ `SQL` ? There is an [additional repo](https://github.com/Snowflake-Labs/snowpark-extensions "Snowpark Extensions for Java, Scala and SQL") where you will have also utility functions and extensions for those technologies.

**NOTE: we have been working to integrate some of the snowpark extensions directly into the snowpark-python library.
In most cases the APIs will be exactly the same, so there should no changes needed in your code.
However there might be breaking changes, so consider that before updating.
If any of these breaking changes are affecting you, please enter an issue so we can address it.**
Expand Down Expand Up @@ -59,7 +62,6 @@ import snowpark_extensions
new_session = Session.builder.env().appName("app1").create()
```


> NOTE: since 1.8.0 the [python connector was updated](https://docs.snowflake.com/en/release-notes/clients-drivers/python-connector-2023#version-3-1-0-july-31-2023) and we provide support for an unified configuration storage for `snowflake-python-connector` and `snowflake-snowpark-python` with this approach.
>
> You can use this connections leveraging `Session.builder.getOrCreate()` or `Session.builder.create()`
Expand Down Expand Up @@ -98,8 +100,6 @@ new_session = Session.builder.env().appName("app1").create()
>
> ```
The `appName` can use to setup a query_tag like `APPNAME=tag;execution_id=guid` which can then be used to track job actions with a query like
You can then use a query like:
Expand Down Expand Up @@ -392,7 +392,7 @@ sf_df = session.createDataFrame([(1, ["foo", "bar"], {"x": 1.0}), (2, [], {}), (
```

```
# +---+----------+----------+
# +---+----------+----------+
# | id| an_array| a_map|
# +---+----------+----------+
# | 1|[foo, bar]|{x -> 1.0}|
Expand Down
21 changes: 20 additions & 1 deletion extras/pmml/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ Predictive Model Markup Language ([PMML](https://en.wikipedia.org/wiki/Predictiv

In order to easily use these models in SnowPark a simple helper has been provided.


NOTE: this helper requires some JAR in order to perform the PMML loading and scoring.

These libraries can be downloaded from maven and they should be uploaded into an stage:

```
https://mvnrepository.com/artifact/org.pmml4s/pmml4s
https://repo1.maven.org/maven2/org/pmml4s/pmml4s_2.12/1.0.1/pmml4s_2.12-1.0.1.jar
Expand All @@ -20,6 +20,7 @@ These libraries can be downloaded from maven and they should be uploaded into an
```

And to use it in your code you can use an snippet like:

```
from pmml_builder import ScoreModelBuilder
scorer = ScoreModelBuilder() \
Expand All @@ -30,4 +31,22 @@ scorer = ScoreModelBuilder() \
scorer.transform(df).show()
```

# NOTE:

In some scenarios the number for input parameters or the number of output parameters can vary, in order to simplify the usage in those situations we provide a [helper UDF ](https://github.com/Snowflake-Labs/snowpark-extensions-py/blob/main/extras/pmml/PMML_SCORER.sql)

To use that helper you will need to previously upload the:

* SCALA_LIB for example `scala-library-2.12.17.jar` into an stage, you can [check maven](https://mvnrepository.com/artifact/org.scala-lang/scala-library)
* PMMLS_LIB for example `pmml4s_2.12-1.0.1.jar` into an stage, you can [check maven](https://mvnrepository.com/artifact/org.pmml4s/pmml4s)
* SPRAY_LIB for example `spray-json_2.12-1.3.6.jar` into an stage, you can [check maven](https://mvnrepository.com/artifact/io.spray/spray-json)

To use this in python it will be easy to do:

```python
table_with_data
.select(sql_expr("object_construct(*)").alias("INPUT_DATA"))
.join_table_function("PMML_SCORER",lit("@STAGE/path/to/model"),col("INPUT_DATA"))
```
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
this_directory = Path(__file__).parent
long_description = (this_directory / "README.md").read_text()

VERSION = '0.0.35'
VERSION = '0.0.36'

setup(name='snowpark_extensions',
version=VERSION,
Expand Down
9 changes: 8 additions & 1 deletion snowpark_extensions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ def register_sql_magic():
from IPython.core.magic import register_cell_magic
def sql(line, cell):
import IPython
import re
user_ns = IPython.get_ipython().user_ns
if "session" in user_ns:
session = user_ns['session']
Expand All @@ -24,7 +25,13 @@ def sql(line, cell):
name = None
if line and line.strip():
name = line.strip().split(" ")[0]
df = session.sql(res)
# If there are several statements only last will be returned
# also we will remove all ; at the end to avoid issues with empty statements
res = re.sub(r';+$', '', res)
for cursor in session.connection.execute_string(res):
df = session.sql(f"SELECT * FROM TABLE(RESULT_SCAN('{cursor.sfqid}'))")
# to avoid needed to do a count on display
setattr(df,"_cached_rowcount",cursor.rowcount)
if name:
user_ns[name] = df
else:
Expand Down
3 changes: 2 additions & 1 deletion snowpark_extensions/dataframe_extensions.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ def _repr_html_(self):
else:
from IPython.display import display
try:
count = self.count()
count = self._cached_rowcount if hasattr(self,"_cached_rowcount") else self.count()
self.count()
if count == 0:
return "No rows to display"
elif count == 1:
Expand Down

0 comments on commit f1dcb1e

Please sign in to comment.