-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review all the lessons which use the Old Bailey Online website #3169
Comments
I'm really sorry I've taken so long to catch up with this. I haven't looked at specific lessons yet, but some general observations: Unfortunately I think many changes required may be more substantial than they seem on the surface, because the site is now completely dynamic using React.js, so pages are no longer HTML that can be downloaded with, say, curl or wget. So for any lessons that involve downloading/scraping pages from the site it's not just a matter of updating URLs. Instead anything like that needs more specialised tools - I don't have any experience with this kind of thing but I gather that in Python the main options are Selenium or Puppeteer. The main change to the API, I think, is that all results are returned in JSON, whereas trials used to be XML and only certain summary/stats information was in JSON. But the XML is still returned within the JSON (in fact it looks like it returns three different formats - XML, plain text and HTML) and that should be identical to the original API's XML. The "print.jsp" URLs were used a lot in lessons because they provided a very plain unstyled version of the pages, ideal for programmatic uses. As far as I know, there is no equivalent in the new site. I haven't yet really looked at how search URLs have changed. I should note that the search engine is completely different now (ElasticSearch instead of MySQL) which as I understand it is much more flexible. The URLs have clearly been simplified and shortened a lot, eg https://www.oldbaileyonline.org/search/crime?offence=kill&verdict=guilty#results - instead of being split up into offenceCategory, offenceSubCategory, etc. This looks as though it may result in some labels for specific offence/verdict categories being changed - I'll need to compare to the originals to make a list. |
Hmm yes, new URL for a search for "killing Other": .../search/crime?offence=killOther#results relevant bit of the old URL: &_offences_offenceCategory_offenceSubcategory=kill_other It seems quite likely that a lot of URLs will have that sort of change (and probably the API too). Ugh. |
Hello @sharonhoward, No need to apologise -- I really appreciate your reply to my email, and I am grateful for the time you have taken to review this Issue. It all sounds very complicated. The need for additional, specialist tools (Selenium or Puppeteer) as well as a deeper understanding of search URLs may also change the 'difficulty' / learning-level of these lessons. I'll reply in our existing email thread, and we can continue to think about this together. With many thanks, Anisa |
I'll be emailing you @anisa-hawes but I just wanted to record some key changes to search (and API query) URLs as I understand them. Previously for offences/verdicts/sentences in the URL it was always necessary to explicitly spell out category_subcategory. That's no longer necessary except in a few specific contexts. As an example - the offence "fraud" (subcategory of deception). The relevant bit of a search URL previously looked like this: &_offences_offenceCategory_offenceSubcategory=deception_fraud which is replaced by the much shorter and simpler offence=fraud Now the top level category is only needed if a) searching for the whole category, eg: offence=deception b) searching for subcategories Other or NoDetail offence=deceptionOther (which previously looked like deception_other) offence=deceptionNoDetail (new) "NoDetail" essentially means the same thing as Other but reflects some inconsistencies in the XML - it happens where the offence was tagged without a subcategory (which shouldn't really have happened). I think that previously there was no way to search for these separately at all. I'm making a list of all the new category and subcategory pairs and how they map on to the original versions. |
I wrote all but 1 of these lessons, so I can advise on all of the original learning objectives. Can I ask why the lessons haven't been flagged to readers as not working in the meantime? I thought we had a workflow to alert people when they couldn't rely on a lesson's contents. |
@adamcrymble, thank you for reminding us! What do you think of this warning message (inspired by the Twitter API warning message)?
If you and @hawc2 are happy with this, I can work on coordinating translations into ES, FR and PT, and adding them to the lessons listed above. |
Thanks @charlottejmc. I think this is really up to @hawc2 as he's the editor. I'd probably say 'examples of this lesson will not work as intended, however the skills described remain relevant and may be adapted to a different example site'. |
@charlottejmc @adamcrymble this sounds good to me. Here's a version with Adam's suggestion included, and some further edits I made. Feel free to rework further as you see fit: The Old Bailey Online’s website has recently been updated. Unfortunately, due to the various changes, many (if not all) elements of the example website used in this lesson will not work as described. The methodologies taught by this lesson remain relevant, however, and may be adapted by readers to a different example site. We are working on adapting the lesson to the new Old Bailey Online website, but we have no clear timeline on when the lesson will be updated. [April 2024] |
Thank you @adamcrymble and @hawc2 for your input – it's very much appreciated! I'll work on coordinating translations in our other languages. |
I am opening a space to review the lessons which use the Old Bailey Online's website, in light of the recent changes to the Old Bailey's API.
Although the old version of the website will still be accessible until August 2024 at https://www.dhi.ac.uk/oldbaileyonline, we want to update lessons which are affected by this change, so they remain usable in the future.
I'll then open a single issue for each of the lessons which do need to be updated, and link to them below.
I have counted 10 lessons which refer to the Old Bailey more or less extensively:
Changes needed: MINOR
In the ‘Python List’ section, we need to change the URL inside a code block from
http://www.oldbaileyonline.org/print.jsp?div=t17800628-33
tohttps://www.oldbaileyonline.org/record/t17800628-33
.(I’m not sure what the
print
component adds to the first URL and whether it is needed in the update too.)Then, perhaps we need to update the list of words received in the output below, but only if they've changed with the new URL.
Changes needed: MINOR
Where the lesson says: ‘and the Old Bailey uses this format’, the format needs to be updated to reflect the current one. The example URL should be changed as well.
Actually, the current example URL is
http://www.oldbaileyonline.org/browse.jsp?ref=OA16780417
, which doesn’t show any results on the obsolete Old Bailey site. Usingname
instead ofref
does work, though (https://www.dhi.ac.uk/oldbaileyonline/browse.jsp?name=OA16780417
).The corresponding URL on the new website is
https://www.oldbaileyonline.org/record/OA16780417
.Changes needed: NONE
Although this lesson refers to the Old Bailey, it uses a file which is already available in the lesson's assets directory, so I think it can remain as is
Changes needed: MINOR
The URL
http://www.oldbaileyonline.org/browse.jsp?id=t17800628-33&div=t17800628-33
appears twice and needs to be changed tohttps://www.oldbaileyonline.org/record/t17800628-33
. (Perhaps this URL would also need the&div=
component? I don’t know how to recreate this in the new format.)Changes needed: MINOR
The URL
http://www.oldbaileyonline.org/browse.jsp?id=t17800628-33&div=t17800628-33
needs to be changed tohttps://www.oldbaileyonline.org/record/t17800628-33
. (Again, perhaps it needs the&div=
component?)Changes needed: NONE
This lesson teaches how to create matrices with data from the Old Bailey, but never refers directly to the site
Changes needed: NONE
This lesson only shows a screenshot of the Old Bailey website and its html code. Although we could update the images to show its modern look and html code, it’s not really necessary for the lesson.
Changes needed: MAJOR
Many URLs need to be updated:
http://oldbaileyonline.org/static/Project.jsp
->unsure
.https://www.oldbaileyonline.org/search.jsp? form=searchHomePage&_divs_fulltext=arsenic&kwparse=and&_persNames_surname=&_persNames_given=&_persNames_alias=&_offences_offenceCategory_offenceSubcategory=&_verdicts_verdictCategory_verdictSubcategory=&_punishments_punishmentCategory_punishmentSubcategory=&_divs_div0Type_div1Type=&fromMonth=&fromYear=&toMonth=&toYear=&ref=&submit.x=0&submit.y=0
->unsure
. We can probably recreate it by using the Advanced Search functionality in the new website with the same parameters, though.http://www.oldbaileyonline.org/browse.jsp?id=t17800628-33&div=t17800628-33
->https://www.oldbaileyonline.org/record/t17800628-33
(Bowsey trial).We must check that the 'little bit of HTML markup' is still correct.
Also, after ‘By studying the URL we can learn a few things’, these ‘few things’ have to be reviewed to ensure they are still correct.
[On a different note, this lesson uses Komodo Edit, which we've encountered issues with in other lessons.]
Changes needed: MAJOR
See Issue #3134
Changes needed: MAJOR
Changes are needed from 'Downloading trials' onwards:
http://www.oldbaileyonline.org/obapi/ob?term0=fromdate_18300114&term1=todate_18391216&count=10&start=211&return=zip
->unsure
.Careful changes will be needed to the script which allows you to download more than 10 entries at once, and to the accompanying description.
Where it says ‘a file that looks like this:’ (
wget1830s.txt
), I expect it will look different now due to the changed URLs.After ‘Here’s a snippet from one trial:’, we might need to update it slightly. The XML markup found on the current website for
https://www.oldbaileyonline.org/record/t18300114-2
is ever so slightly different. However, I think it will perhaps still work as intended? This will be discovered if the command:still runs the script as desired. If so, then no further changes are needed after this step.
The text was updated successfully, but these errors were encountered: