Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tickets/dm 43386 #856

Merged
merged 15 commits into from
Aug 1, 2024
Merged

Tickets/dm 43386 #856

merged 15 commits into from
Aug 1, 2024

Conversation

jgates108
Copy link
Contributor

No description provided.

}

LOGS(_log, LOG_LVL_WARN, "&&& UserQuerySelect::submitNew e");
if (uberJobsEnabled || true) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where it is sending the http/json version of the request to the worker. At this time, the worker parses this, sends a brief message back to the czar, and then throws it away.

return jsRet;
}

json HttpWorkerCzarModule::_handleQueryJob(string const& func) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the http/json message is parsed.

debug(func);
//&&&uj this seems irrelevant for a worker enforceInstanceId(func,
//cconfig::CzarConfig::instance()->replicationInstanceId());
enforceCzarName(func);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "instanceId" mechanism serves as a safeguard preventing "crosstalks" between two (or many) Qserv instances in case of misconfiguration when they run on the same network. That's why a value of this identifier is passed in all HTTP-based communications between services within Qserv (including Czar, workers, and various services of the Replication/Ingest system).

The same rule applies to the REST API version. Each time there are changes in expectations of a REST service (parameters added, or removed, or roles of parameters redefined) the API version must change and so must change the number mentioned in a code processing the corresponding requests:

checkApiVersion(__func__, 34);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this when it was posted. I'll need to add this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, but not pushed yet.

@jgates108 jgates108 marked this pull request as ready for review July 11, 2024 00:23
/// Make a new RequestBody based on `js`
/// TODO:UJ This would be much more efficient if this class had objJson defined as
/// a const reference or pointer to const, but implementation is likely ugly.
RequestBody(nlohmann::json const& js) : objJson(js) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor is useless since objJson is populated by the existing c-tor:

RequestBody::RequestBody(qhttp::Request::Ptr const& req) : objJson(json::object()) {
    // This way of parsing the optional body allows requests which have no body.

    string const contentType = req->header["Content-Type"];
    string const requiredContentType = "application/json";

    if (contentType == requiredContentType) {
        string content(istreambuf_iterator<char>(req->content), {});
        if (not content.empty()) {
            try {
                objJson = json::parse(content);
                if (objJson.is_null() or objJson.is_object()) return;
            } catch (...) {
                // Not really interested in knowing specific details of the exception.
                // All what matters here is that the string can't be parsed into
                // a valid JSON object. This will be reported via another exception
                // after this block ends.
                ;
            }
            throw invalid_argument("invalid format of the request body. A simple JSON object was expected");
        }
    }
}

Note that not all requests have JSON body.

The second reason why the change is not welcome is because it would introduce a very big undesired effect - an extra copy (or two) of potentially very large JSON objects parsed by the POST handlers. This may happen in the Czar HTTP frontend in REST services that process inputs for the user-generated data products. Some inputs could be potentially as large as 1 GB or more.

The third reason is that a hierarchy of the http & qhttp classes has been significantly changed over the past weeks as new features were introduced to qhttp and an alternative HTTP server library cpp-httplib was added. Your change would introduce a conflict that would give you a lot of trouble during the code merge into main in the future.

However, I would agree with you that exposing a modifiable public data member may not be a good idea. Since this doesn't prevent your development then I will find a solution for this problem within the production branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree that this is horribly inefficient as is. I wanted to take advantage of your "required" functions on a subsection of an existing json message. This hack was about as minimally invasive as I could manage and keep moving forward. The TODO:UJ comment indicates that this is something that needs to be addressed, and I'm interested in seeing how you want to handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking having some sort of pointer (maybe member of the object, or parameter to "required") that could be relocated within the json "body", which should allow subsections to be examined without copying anything.

auto const& chunkScanDb = rbTbl.required<string>("db");
auto const& lockInMemory = rbTbl.required<bool>("lockInMemory");
auto const& chunkScanTable = rbTbl.required<string>("table");
auto const& tblScanRating = rbTbl.required<int>("tblScanRating");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't really gain anything by having references to the primitive data types. The references are still going to be converted to values a few lines below where you are calling:

scanInfo.infoTables.emplace_back(chunkScanDb, chunkScanTable, lockInMemory,
                                                     tblScanRating);

The only variable that may make a sense to pass by reference would be a string stored in chunkScanTable.

proto::ScanInfo scanInfo;
bool scanInfoSet = false;
bool jdScanInteractive = false;
int jdMaxTableSize = 0;
Copy link
Contributor

@iagaponenko iagaponenko Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would 100% agree with the comment posted above on:

        // TODO:UJ These items should be stored higher in the message structure as they get
        //   duplicated and should always be the same within an UberJob.

Why delay it and not do it right when it's clear what should be done?

Copy link
Contributor Author

@jgates108 jgates108 Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer to "Why delay?" is going to be there that there was a lot to be done (some of which is rather complicated) and getting a prototype working is more important than optimizing. Any time I move away from the original logic, there's a very real risk it would break something downstream, which then could break something else, and so on.

Jira ticket DM-45384

@jgates108 jgates108 merged commit 79bafad into tickets/DM-43715 Aug 1, 2024
8 checks passed
@jgates108 jgates108 deleted the tickets/DM-43386 branch August 1, 2024 20:24
fritzm pushed a commit that referenced this pull request Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants