Skip to content

Commit

Permalink
Add kurt and sem functions (#32)
Browse files Browse the repository at this point in the history
* Added implementation of kurt function (#23)

* Added implementation of sem function (#22)

* Refactored kurt and sem functions

---------

Co-authored-by: Miguel Gómez <[email protected]>
Co-authored-by: Francisco Tórtola Vivo <[email protected]>
  • Loading branch information
3 people authored Feb 13, 2024
1 parent 460a218 commit bb1549a
Show file tree
Hide file tree
Showing 3 changed files with 387 additions and 0 deletions.
187 changes: 187 additions & 0 deletions docs/user-guide/advanced/Pandas_API.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,91 @@
"tab.mean(axis=1)"
]
},
{
"cell_type": "markdown",
"id": "fe565b65-fbf2-47ba-a26e-791d09fd4f55",
"metadata": {},
"source": [
"### Table.kurt()\n",
"\n",
"```\n",
"Table.kurt(axis=0, skipna=True, numeric_only=False)\n",
"```\n",
"\n",
"Return unbiased kurtosis over requested axis. Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.\n",
"\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | Axis for the function to be applied on. 0 is columns, 1 is rows. | 0 |\n",
"| skipna | bool | not yet implemented | True |\n",
"| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :--------: | :--------------------------------------------------------------------------------------- |\n",
"| Dictionary | Map of columns and their yielded kurtosis values |"
]
},
{
"cell_type": "markdown",
"id": "e6069cac-d260-4f80-9688-3d1ec273cd22",
"metadata": {},
"source": [
"**Examples:**\n",
"\n",
"Calculate the kurt across the columns of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4219c826-a84b-4722-9847-372d3837acdb",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table(data=\n",
" {\n",
" 'a': [1, 2, 2, 4],\n",
" 'b': [1, 2, 6, 7],\n",
" 'c': [7, 8, 9, 10],\n",
" 'd': [7, 11, 14, 14]\n",
" }\n",
")\n",
"tab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "437ab485-bf73-4209-b63e-aa0d1bfa5d58",
"metadata": {},
"outputs": [],
"source": [
"tab.kurt()"
]
},
{
"cell_type": "markdown",
"id": "ea3e1cf6-2304-4061-a846-1cbc0572ea9d",
"metadata": {},
"source": [
"Calculate the kurtosis across the rows of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63312e8b-76f0-46eb-b4d7-b2213561c86e",
"metadata": {},
"outputs": [],
"source": [
"tab.kurt(axis=1)"
]
},
{
"cell_type": "markdown",
"id": "7bf853c5",
Expand Down Expand Up @@ -646,6 +731,108 @@
"tab.mode(dropna=False)"
]
},
{
"cell_type": "markdown",
"id": "b248fef1",
"metadata": {},
"source": [
"### Table.sem()\n",
"\n",
"```\n",
"Table.sem(axis=0, skipna=True, numeric_only=False, ddof=0)\n",
"```\n",
"Return unbiased standard error of the mean over requested axis. Normalized by N-1 by default. This can be changed using the ddof argument\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n",
"| axis | int | The axis to calculate the sum across 0 is columns, 1 is rows. | 0 |\n",
"| skipna | bool | not yet implemented | True |\n",
"| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n",
"| ddof | int | Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. | 1 |\n",
"\n",
"**Returns:**\n",
"\n",
"| Type | Description |\n",
"| :----------------: | :------------------------------------------------------------------- |\n",
"| Dictionary | The sem across each row / column with the key corresponding to the row number or column name. |"
]
},
{
"cell_type": "markdown",
"id": "71bd1d6f",
"metadata": {},
"source": [
"**Examples**\n",
"\n",
"Calculate the sem across the columns of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "350c2b7c",
"metadata": {},
"outputs": [],
"source": [
"tab = kx.Table(data=\n",
" {\n",
" 'a': [1, 2, 2, 4],\n",
" 'b': [1, 2, 6, 7],\n",
" 'c': [7, 8, 9, 10],\n",
" 'd': [7, 11, 14, 14],\n",
" }\n",
" )\n",
"tab"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b89307e9",
"metadata": {},
"outputs": [],
"source": [
"tab.sem()"
]
},
{
"cell_type": "markdown",
"id": "6933f01f",
"metadata": {},
"source": [
"Calculate the sem across the rows of a table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3edd3feb",
"metadata": {},
"outputs": [],
"source": [
"tab.sem(axis=1)"
]
},
{
"cell_type": "markdown",
"id": "ae7afe5a",
"metadata": {},
"source": [
"Calculate sem accross columns with ddof=0:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de626961",
"metadata": {},
"outputs": [],
"source": [
"tab.sem(ddof=0)"
]
},
{
"cell_type": "markdown",
"id": "7e2813b4",
Expand Down
47 changes: 47 additions & 0 deletions src/pykx/pandas_api/pandas_meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,32 @@ def mean(self, axis: int = 0, numeric_only: bool = False):
tab
)

@api_return
def kurt(self, axis: int = 0, numeric_only: bool = False):
tab = self
if 'Keyed' in str(type(tab)):
tab = q.value(tab)
if numeric_only:
tab = _get_numeric_only_subtable(tab)

axis_keys = q('{[axis;tab] $[0~axis;cols;`$string til count @] tab}', axis, tab)

return q(
'''{[tab;axis;axis_keys]
tab:$[0~axis;(::);flip] value flip tab;
kurt:{[x]
res: x - avg x;
n: count x;
m2: sum rsq: res xexp 2;
m4: sum rsq xexp 2;
adj: 3 * xexp[n - 1;2] % (n - 2) * (n - 3);
num: n * (n + 1) * (n - 1) * m4;
den: (n - 2) * (n - 3) * m2 xexp 2;
(num % den) - adj};
axis_keys!kurt each tab}
''', tab, axis, axis_keys
)

@api_return
def median(self, axis: int = 0, numeric_only: bool = False):
tab = self
Expand Down Expand Up @@ -203,6 +229,27 @@ def mode(self, axis: int = 0, numeric_only: bool = False, dropna: bool = True):
tab
)

@api_return
def sem(self, axis: int = 0, ddof: int = 1, numeric_only: bool = False):
tab = self
if 'Keyed' in str(type(tab)):
tab = q.value(tab)
if numeric_only:
tab = _get_numeric_only_subtable(tab)

axis_keys = q('{[axis;tab] $[0~axis;cols;`$string til count @] tab}', axis, tab)

if ddof == len(tab):
return q('{x!count[x]#0n}', axis_keys)

return q(
'''{[tab;axis;ddof;axis_keys]
tab:$[0~axis;(::);flip] value flip tab;
d:{dev[x] % sqrt count[x] - y}[;ddof];
axis_keys!d each tab}
''', tab, axis, ddof, axis_keys
)

@api_return
def abs(self, numeric_only=False):
tab = self
Expand Down
Loading

0 comments on commit bb1549a

Please sign in to comment.