-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding delete #236
Adding delete #236
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this pull request can’t go through. To summarize my inline comments, I see mainly these issues there:
- Emitting events to the message queue doesn’t work. There aren’t tests for that so it was not caught.
- Result of the operation is not reported, 200 OK is always returned instead of 207 Multi-Status. Even the 404 Not Found described in the API specification is never returned.
- The message queue producer is instantiated on import, causing side-effects and hinders testability.
- The tests use real message queue.
Also there are now conflicts with master that need to be resolved before merging.
Just as a matter of personal taste, it might be even possible to have separate pull requests for the Kafka producer and for the delete operation. But this pull request is quite small, so no need to forcibly split it. Just saying.
from app.models import Host, HostSchema, PatchHostSchema | ||
from app.auth import current_identity | ||
from app.exceptions import InventoryException, InputFormatException | ||
from app.exceptions import InventoryException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is unrelated to the DELETE operation.
@@ -259,11 +274,16 @@ def get_host_by_id(host_id_list, page=1, per_page=100): | |||
) | |||
|
|||
|
|||
def _get_host_list_by_id_list(account_number, host_id_list): | |||
return Host.query.filter( | |||
def _get_host_list_by_id_list(account_number, host_id_list, order=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking whether we couldn’t just remove the order_by from this function entirely and use it in the respective places. E.g. the patch operation doesn’t need the ordering at all. The oder operations (get_host_by_id and get_host_system_profile_by_id) use paginate, which is the reason for the ordering. So although repeating the oder_by wouldn’t be dry, it would fit nicely together. But this is not necessary to edit in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for now.
@@ -0,0 +1,17 @@ | |||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I like putting the event schema into a separate module instead of the ever growing app.models one, it’s currently unsystematic. The app.models module already contains many schemas that aren’t models at all, e.g. the PatchHostSchema. I’d keep everything in that bulky module for now and split it later separately.
The delete method can be for now put inline into the delete_by_id operation. It’s not very nice, but it’s how we currently do it. The HTTP responses, errors and everything are composed there too. I’d rather have the code consistent than to see many first steps to various refactorings.
tasks/__init__.py
Outdated
@@ -15,6 +16,13 @@ | |||
TOPIC = os.environ.get("KAFKA_TOPIC", "platform.system-profile") | |||
KAFKA_GROUP = os.environ.get("KAFKA_GROUP", "inventory") | |||
BOOTSTRAP_SERVERS = os.environ.get("KAFKA_BOOTSTRAP_SERVERS", "kafka:29092") | |||
EVENT_TOPIC = os.environ.get("KAFKA_EVENT_TOPIC", "platform.inventory.events") | |||
|
|||
producer = KafkaProducer(bootstrap_servers=BOOTSTRAP_SERVERS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting a producer on import is not a very good thing. It’s heavy on resources and dependencies and there are a lot of side effects. Let’s initialize it explicitly like start_consumer and put it into the application context. Like that it’ll be even possible to replace the producer with a dummy (a mock) when running in the test suite. Without that it’s impossible to test the message production.
test_api.py
Outdated
self.assertEqual(response["count"], 0) | ||
self.assertEqual(response["total"], 0) | ||
self.assertEqual(response["results"], []) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two newlines between classes, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
test_api.py
Outdated
response = self.get(HOST_URL + "/" + original_id, 200) | ||
|
||
# Delete the host | ||
response = self.delete(HOST_URL + "/" + original_id, 200, return_response_as_json=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the return_response_as_json argument here? I know that there is not much need of decoding the JSON when the response is not used for anything, but neither it causes any big harm (a few CPU cycles). It adds noise though – it looks like we want to do something with the original string. Aha! It’s there because decoding the empty response fails. Ok.
Moreover, the response variable is not used at all – the response value is not checked. There is however no reason to check, because this request doesn’t return anything. I think it could return a list of deleted hosts, or even better a 207 Multi-Status response with a status and details about every deleted host, just as create_host_list does.
hosts.delete(synchronize_session="fetch") | ||
db.session.commit() | ||
metrics.delete_host_count.inc(hosts.count()) | ||
for deleted_host in hosts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually doesn’t work The hosts variable is a BaseQuery, not a list. This query is used just a few lines earlier to delete the hosts. Then here, iterating over the hosts run a SELECT query, which doesn’t find anything, because the hosts are already deleted.
@api_operation
@metrics.api_request_time.time()
def delete_by_id(host_id_list):
query = _get_host_list_by_id_list(
current_identity.account_number, host_id_list, order=False
)
hosts = query.all()
with metrics.delete_host_processing_time.time():
query.delete(synchronize_session="fetch")
db.session.commit()
metrics.delete_host_count.inc(len(hosts))
for deleted_host in hosts:
emit_event(events.delete(deleted_host.id))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed.
metrics.delete_host_count.inc(hosts.count()) | ||
for deleted_host in hosts: | ||
emit_event(events.delete(deleted_host.id)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no return statement. The API specification specifies though, that this operations should return 404 Not Found if the host is not found. This is currently not true: 200 OK is returned automatically, but nothing is deleted.
Because this can operate on more than one host, I suggest using 207 Multi-Status instead, just at add_host_list does. Like that it’d be possible to return more statūs, one for each records. It can happen that some hosts are found (and deleted) and some are not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, I do not think adding the 207 multi-status is required/needed. We can add it later on if we find that it is needed.
test_api.py
Outdated
original_id = created_host["id"] | ||
|
||
# Get the host | ||
response = self.get(HOST_URL + "/" + original_id, 200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The response variable is not used. Why the assignment?
What about extracting the URL to a variable. It’s shared between this get call and the delete call a few lines below.
tasks/__init__.py
Outdated
@@ -15,6 +16,13 @@ | |||
TOPIC = os.environ.get("KAFKA_TOPIC", "platform.system-profile") | |||
KAFKA_GROUP = os.environ.get("KAFKA_GROUP", "inventory") | |||
BOOTSTRAP_SERVERS = os.environ.get("KAFKA_BOOTSTRAP_SERVERS", "kafka:29092") | |||
EVENT_TOPIC = os.environ.get("KAFKA_EVENT_TOPIC", "platform.inventory.events") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d prefer to have the environment variable names prefixed to not clash with other apps in development. But since other ones are not prefixed too, it’s not necessary to do it here and make this pull request inconsistent. Created a separate issue for that instead. #271
tasks/__init__.py
Outdated
|
||
|
||
def emit_event(e): | ||
producer.send(EVENT_TOPIC, value=e.encode("utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the documentation, the messages are sent asynchronously. If the application crashes before the messages is actually sent to the queue, it’s lost. This can be solved by calling producer.flush, which with synchronously block, until all messages are sent.
Created a little demo on how to test the emitted events: see the delete-hosts branch in my fork. It swaps the message producer in the tests, replacing the KafkaProducer with a class that mocks the used part of its interface, storing the messages in the memory. Those can be then inspected by the tests. I added one simple test that checks the message. There is also a fix to the event-emitting code. (Don’t worry about the recent force-push to the this upstream branch. I just mistook the remote and reverted it then back to its original state.) |
Signed-off-by: Jesse Jaggars <[email protected]>
Signed-off-by: Jesse Jaggars <[email protected]>
Signed-off-by: Jesse Jaggars <[email protected]>
Signed-off-by: Jesse Jaggars <[email protected]>
* Adding a couple of metrics for the host delete operation * Commit the deleted hosts * Only emit events for hosts that were actually deleted * Verify the host is deleted in the test case
* Adding delete event info to the README
Signed-off-by: Jesse Jaggars <[email protected]>
Signed-off-by: Jesse Jaggars <[email protected]>
Signed-off-by: Jesse Jaggars <[email protected]>
Co-Authored-By: Glutexo <[email protected]>
method will initilize / start the event producer and system profile consumer if kafka is enabled.
…tialization Modified the way the tasks module is initialized.
I added one more commit to my fixes demo branch. It uses Flask’s App Context teardown to flush the message queue instead of doing so right after the emit. This is how it should work. The messages are flushed even if an exception and 500 Internal Server Error occur. |
* adding delete api * emitting events for deletes * Adding a couple of metrics for the host delete operation * Verify the host is deleted in the test case * Delete hosts add info to readme (RedHatInsights#251) * Adding delete event info to the README
* adding delete api * emitting events for deletes * Adding a couple of metrics for the host delete operation * Verify the host is deleted in the test case * Delete hosts add info to readme (#251) * Adding delete event info to the README
No description provided.