Skip to content

Commit

Permalink
issue2551328/issue2551264 unneeded next link and total_count incorrect
Browse files Browse the repository at this point in the history
Fix: issue2551328 - REST results show next link if number of
     results is a multiple of page size. (Found by members of
     team 3 in the UMass-Boston CS682 Spring 2024 class.)

     issue2551264 - REST X-Total-Count header and @total_size
     count incorrect when paginated

These issues arose because we retrieved the exact number of rows
from the database as requested by the user using the @page_size
parameter. With this changeset, we retrieve up to 10 million + 1
rows from the database. If the total number of rows exceeds 10
million, we set the total_count indicators to -1 as an invalid
size.  (The max number of requested rows (default 10 million +1)
can be modified by the admin through interfaces.py.)

By retrieving more data than necessary, we can calculate the
total count by adding @page_index*@page_size to the number of
rows returned by the query.

Furthermore, since we return more than @page_size rows, we can
determine the existence of a row at @page_size+1 and use that
information to determine if a next link should be
provided. Previously, a next link was returned if @page_size rows
were retrieved.

This change does not guarantee that the user will get @page_size
rows returned. Access policy filtering occurs after the rows are
returned, and discards rows inaccessible by the user.

Using the current @page_index/@page_size it would be difficult to
have the roundup code refetch data and make sure that a full
@page_size set of rows is returned. E.G. @page_size=100 and 5 of
them are dropped due to access restrictions. We then fetch 10
items and add items 1-4 and 6 (5 is inaccessible). There is no
way to calculate the new database offset at:
@page_index*@page_size + 6 from the URL. We would need to add an
@page_offset=6 or something.

This could work since the client isn't adding 1 to @page_index to
get the next page. Thanks to HATEOAS, the client just uses the
'next' url. But I am not going to cross that bridge without a
concrete use case.

This can also be handled client side by merging a short response
with the next response and re-paginating client side.

Also added extra index markers to the docs to highlight use of
interfaces.py.
  • Loading branch information
rouilj committed Apr 1, 2024
1 parent d21ee54 commit b6dc3ff
Show file tree
Hide file tree
Showing 7 changed files with 313 additions and 20 deletions.
8 changes: 8 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,14 @@ Fixed:
which clobbers the info about the template issue. As a stop-gap set
the line number to -1 so the original traceback can be seen. This
could be a bug in ZopeTAL. (John Rouillard)
- issue2551328 - REST results show next link if number of results is a
multiple of page size. There should be no next link. (Found by Patel
Malav and ... of the UMass-Boston CS682 Spring 2024 class; fix John
Rouillard)
- issue2551264 - REST X-Total-Count header and @total_size count
incorrect when paginated - correct values are now returned.
(John Rouillard)


Features:

Expand Down
27 changes: 27 additions & 0 deletions doc/admin_guide.txt
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,8 @@ to compress responses on the fly. The python standard library includes
gzip support. For brotli or zstd you will need to install packages. See
the `installation documentation`_ for details.

.. index:: single: interfaces.py; configuring http compression

Some assets will not be compressed on the fly. Assets with mime types
of "image/png" or "image/jpeg" will not be compressed. You
can add mime types to the list by using ``interfaces.py`` as discussed
Expand Down Expand Up @@ -320,6 +322,31 @@ get the gzip version and not a brotli compressed version. This
mechanism allows the admin to allow use of brotli and zstd for
dynamic content, but not for static content.

.. index:: single: interfaces.py; setting REST maximum result limit

Configuring REST Maximum Result Limit
=====================================

To prevent denial of service (DOS) and limit user wait time for an
unbounded request, the REST endpoint has a maximum limit on the number
of rows that can be returned. By default, this is set to 10 million.
This setting applies to all users of the REST interface. If you want
to change this limit, you can add the following code to the
``interfaces.py`` file in your tracker::

# change max response rows
from roundup.rest import RestfulInstance
RestfulInstance.max_response_row_size = 26

This code will set the maximum number of rows to 25 (one less than the
value). Note that this setting is rarely used and is not available in
the tracker's ``config.ini`` file. Setting it through this mechanism
allows you to enter a string or number that may break Roundup, such as
"asdf" or 0. In general, it is recommended to keep the limit at its
default value. However, this option is available for cases when a
request requires more than 10 million rows and pagination using
``@page_index`` and ``@page_size=9999999`` is not possible.

Adding a Web Content Security Policy (CSP)
==========================================

Expand Down
5 changes: 5 additions & 0 deletions doc/customizing.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2176,6 +2176,8 @@ string in the nosy reaction function.
Changing How the Core Code Works
--------------------------------

.. index:: single: interfaces.py; cache-control headers

Changing Cache-Control Headers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -2202,6 +2204,7 @@ path in the tracker's `config.ini`. In the example above:

Note that a file name match overrides the mime type settings.

.. index:: single: interfaces.py; password complexity checking

Implement Password Complexity Checking
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -2241,6 +2244,8 @@ it replaces the setPassword method in the Password class. The new
version validates that the password is sufficiently complex. Then it
passes off the setting of password to the original method.

.. index:: single: interfaces.py; interpreting time interval values

Enhance Time Intervals Forms
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
2 changes: 2 additions & 0 deletions doc/reference.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1556,6 +1556,8 @@ flexible extension point mechanism.
.. _interfaces.py:
.. _modifying the core of Roundup:

.. index:: single: interfaces.py; hooking into the roundup core

interfaces.py - hooking into the core of Roundup
================================================

Expand Down
39 changes: 27 additions & 12 deletions doc/rest.txt
Original file line number Diff line number Diff line change
Expand Up @@ -491,11 +491,18 @@ Details are in the sections below.
/data/\ *class* Collection
--------------------------

When performing the ``GET`` method on a class (e.g. ``/data/issue``),
the ``data`` object includes the number of items available in
``@total_size``. A a ``collection`` list follows which contains the id
and link to the respective item. For example a get on
https://.../rest/data/issue returns::
When you use the ``GET`` method on a class (like ``/data/issue``), the
``data`` will include the number of available items in
``@total_size``. If the size exceeds the administrative limit (which
is 10 million by default), ``@total_size`` will be set to ``-1``. To
navigate to the last page of results, you can use the ``next`` links
or increment ``@page_index`` until the result does not include a
``next`` ``@link`` or ``@total_size`` is not ``-1``. The value of the
HTTP header ``X-Count-Total`` is the same as ``@total_size``.

A ``collection`` list contains the id and link to the
respective item. For example a get on https://.../rest/data/issue
returns::

{
"data": {
Expand All @@ -517,11 +524,17 @@ https://.../rest/data/issue returns::
Collection endpoints support a number of features as seen in the next
sections.

A server may implement a default maximum number of items in the
collection. This can be used to prevent denial of service (DOS). As
a result all clients must be programmed to expect pagination
decorations in the response. See the section on pagination below for
details.
Having an empty ``collection`` does not mean next next link will not
return more data. The row limit is applied when the query is made to
the database. The result set is then filtered, removing rows that the
user does not have permission to access. So it is possible to have no
data items on a page because the user does not have access to them. If
you use ``@page_size`` near the administrative limit, you may receive
fewer rows than requested. However, this does not mean you are out of
data.

All clients must be programmed to expect pagination decorations in the
response. See the section on pagination below for details.

Searching
~~~~~~~~~
Expand Down Expand Up @@ -591,7 +604,9 @@ searching keyword class not issue class) will return matches for
``Foo``, ``foobar``, ``foo taz`` etc.

In all cases the field ``@total_size`` is reported which is the total
number of items available if you were to retrieve all of them.
number of items available if you were to retrieve all of them. See
more details in the parent section about ``@total_size`` and when it
can return ``-1``.

Other data types: Date, Interval, Integer, Number need examples and may
need work to allow range searches. Full text search (e.g. over the
Expand Down Expand Up @@ -1055,7 +1070,7 @@ Does not support PUT, DELETE or PATCH.
/data/\ *class*/\ *id* item
---------------------------

When performing the ``GET`` method on an item
When you use the ``GET`` method on an item
(e.g. ``/data/issue/42``), a ``link`` attribute contains the link to
the item, ``id`` contains the id, ``type`` contains the class name
(e.g. ``issue`` in the example) and an ``etag`` property can be used
Expand Down
50 changes: 42 additions & 8 deletions roundup/rest.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,10 @@ class RestfulInstance(object):

api_version = None

# allow 10M row response - can change using interfaces.py
# limit is 1 less than this size.
max_response_row_size = 10000001

def __init__(self, client, db):
self.client = client
self.db = db
Expand Down Expand Up @@ -795,7 +799,7 @@ def get_collection(self, class_name, input):
exact_props = {}
page = {
'size': None,
'index': 1 # setting just size starts at page 1
'index': 1, # setting just size starts at page 1
}
verbose = 1
display_props = set()
Expand Down Expand Up @@ -910,12 +914,26 @@ def get_collection(self, class_name, input):
l.append(sort)
if exact_props:
kw['exact_match_spec'] = exact_props
if page['size'] is not None and page['size'] > 0:
kw['limit'] = page['size']
if page['size'] is None:
kw['limit'] = self.max_response_row_size
elif page['size'] > 0:
if page['size'] >= self.max_response_row_size:
raise UsageError(_(
"Page size %(page_size)s must be less than admin "
"limit on query result size: %(max_size)s.") % {
"page_size": page['size'],
"max_size": self.max_response_row_size,
})
kw['limit'] = self.max_response_row_size
if page['index'] is not None and page['index'] > 1:
kw['offset'] = (page['index'] - 1) * page['size']
obj_list = class_obj.filter(None, *l, **kw)

# Have we hit the max number of returned rows?
# If so there may be more data that the client
# has to explicitly page through using offset/@page_index.
overflow = len(obj_list) == self.max_response_row_size

# Note: We don't sort explicitly in python. The filter implementation
# of the DB already sorts by ID if no sort option was given.

Expand All @@ -930,7 +948,7 @@ def get_collection(self, class_name, input):
for item_id in obj_list:
r = {}
if self.db.security.hasPermission(
'View', uid, class_name, itemid=item_id, property='id'
'View', uid, class_name, itemid=item_id, property='id',
):
r = {'id': item_id, 'link': class_path + item_id}
if display_props:
Expand All @@ -942,14 +960,30 @@ def get_collection(self, class_name, input):

result_len = len(result['collection'])

if not overflow: # noqa: SIM108 - no nested ternary
# add back the number of items in the offset.
total_len = kw['offset'] + result_len if 'offset' in kw \
else result_len
else:
# we have hit the max number of rows configured to be
# returned. We hae no idea how many rows can match. We
# could use 0 as the sentinel, but a filter could match 0
# rows. So return -1 indicating we exceeded the result
# max size on this query.
total_len = -1

# truncate result['collection'] to page size
if page['size'] is not None and page['size'] > 0:
result['collection'] = result['collection'][:page['size']]

# pagination - page_index from 1...N
if page['size'] is not None and page['size'] > 0:
result['@links'] = {}
for rel in ('next', 'prev', 'self'):
if rel == 'next':
# if current index includes all data, continue
if page['size'] > result_len: continue # noqa: E701
index = page['index']+1
if page['size'] >= result_len: continue # noqa: E701
index = page['index'] + 1
if rel == 'prev':
if page['index'] <= 1: continue # noqa: E701
index = page['index'] - 1
Expand All @@ -964,8 +998,8 @@ def get_collection(self, class_name, input):
for field in input.value
if field.name != "@page_index"])})

result['@total_size'] = result_len
self.client.setHeader("X-Count-Total", str(result_len))
result['@total_size'] = total_len
self.client.setHeader("X-Count-Total", str(total_len))
self.client.setHeader("Allow", "OPTIONS, GET, POST")
return 200, result

Expand Down
Loading

0 comments on commit b6dc3ff

Please sign in to comment.