Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GSD custom fields #2310

Closed
sbarbosadataverse opened this issue Jul 2, 2015 · 18 comments
Closed

Update GSD custom fields #2310

sbarbosadataverse opened this issue Jul 2, 2015 · 18 comments
Assignees
Labels
Feature: API Feature: Metadata User Role: Curator Curates and reviews datasets, manages permissions

Comments

@sbarbosadataverse
Copy link

Janina changed custom field names

@sbarbosadataverse sbarbosadataverse added this to the 4.0.2 milestone Jul 2, 2015
@sbarbosadataverse
Copy link
Author

Committed changes, passing to Phil

@sbarbosadataverse
Copy link
Author

Give back to Sonia to check to check in dvn-build

@pdurbin
Copy link
Member

pdurbin commented Jul 2, 2015

@sbarbosadataverse as we discussed I did my usual check to make sure the tsv change didn't require a change to the Solr schema.xml and it didn't.

We sat and previewed the change on my laptop but we're going to wait to merge the 2310-GSD branch (currently just commit df60357 ) into 4.0.2 until you've heard back from Janina that the information you copied from the UI reflects the intended change. Passing this back to you until you've heard back.

@pdurbin pdurbin assigned sbarbosadataverse and unassigned pdurbin Jul 2, 2015
@pdurbin
Copy link
Member

pdurbin commented Jul 6, 2015

@sbarbosadataverse after you've confirmed with Janina that the change is ok, please pass this issue to @sekmiller who will write up instructions for @kcondon for what to do with the updated tsv file.

@sbarbosadataverse
Copy link
Author

Janina has additional changes to make, as suspected.

On 7/6/2015 1:14 PM, Philip Durbin wrote:

@sbarbosadataverse
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sbarbosadataverse&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=WHwMZhNKmmVyLWAmxUPaPrQUvBzIRaZpDsWUjiNAKQM&s=0UJ4ZewFjCsNgijhvOuNzR84LDV_QhcYTpqI4rR17U4&e=
after you've confirmed with Janina that the change is ok, please pass
this issue to @sekmiller
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sekmiller&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=WHwMZhNKmmVyLWAmxUPaPrQUvBzIRaZpDsWUjiNAKQM&s=M_eU_kFHjYxWBn0Z7CdoWS6VfU17wFwKgxFTx3nI1BM&e=
who will write up instructions for @kcondon
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_kcondon&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=WHwMZhNKmmVyLWAmxUPaPrQUvBzIRaZpDsWUjiNAKQM&s=iPrwMu6Xq6uDe8P942kkFfgYtujIIZCVbKJKHCZAaJA&e=
for what to do with the updated tsv file.


Reply to this email directly or view it on GitHub
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2310-23issuecomment-2D118928058&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=WHwMZhNKmmVyLWAmxUPaPrQUvBzIRaZpDsWUjiNAKQM&s=vtAgvemvCYkvNVf-OhWrR5pKZOPqaVqyuJ895J0AOO0&e=.

Sonia Barbosa
Manager of Data Curation, IQSS Dataverse Network
Manager of the Murray Research Archive, IQSS
Data Science
Harvard University

Check-out Dataverse 4.0 Demo!
http://dataverse-demo.iq.harvard.edu/

Join our Dataverse Community!
https://groups.google.com/forum/#!forum/dataverse-community

@pdurbin
Copy link
Member

pdurbin commented Jul 8, 2015

@sbarbosadataverse ok, please feel free to update the branch we started: https://github.com/IQSS/dataverse/tree/2310-GSD

sbarbosadataverse added a commit that referenced this issue Jul 27, 2015
for both class number and faculty list
@pdurbin pdurbin modified the milestones: 4.2, 4.1 Jul 27, 2015
@sbarbosadataverse
Copy link
Author

@scolapasta
I know we missed today's deadline for 4.1 but if we can get this in before the next milestone deadline of August that would be great--they use this for their next class uploads starting soon

@scolapasta scolapasta assigned pdurbin and unassigned scolapasta Sep 17, 2015
@pdurbin
Copy link
Member

pdurbin commented Sep 17, 2015

@sbarbosadataverse just a heads up that we'll need to do the same thing as last time. I'll pull in your latest change to the GSD metadata block and I'll have you look at it to see if it's what you want before we merge it into 4.2. Let's coordinate a time to do this.

@sbarbosadataverse
Copy link
Author

Sounds good. Thanks
On Sep 17, 2015 5:07 PM, "Philip Durbin" [email protected] wrote:

@sbarbosadataverse
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sbarbosadataverse&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=cWrSTkqGhkrhdgOJaJJ6qvx8EUnLJ_kh0GcwEIJ-Or8&s=JJG1x7YFSeOgH-A2BR1PaaWbZmaR4vcjux7A7e7t3gk&e=
just a heads up that we'll need to do the same thing as last time. I'll
pull in your latest change to the GSD metadata block and I'll have you look
at it to see if it's what you want before we merge it into 4.2. Let's
coordinate a time to do this.


Reply to this email directly or view it on GitHub
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2310-23issuecomment-2D141228651&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=cWrSTkqGhkrhdgOJaJJ6qvx8EUnLJ_kh0GcwEIJ-Or8&s=pLuy2TLkryfV2nsdlOvaSUGJ1mMq_ecnbPfs29H9o0k&e=
.

@pdurbin
Copy link
Member

pdurbin commented Sep 18, 2015

On https://shibtest.dataverse.org loaded customGSD.tsv from 4.1. Then I try to re-load the version from https://github.com/IQSS/dataverse/blob/76496aa9593736d7846e1aa0222fe229198762d5/scripts/api/data/metadatablocks/customGSD.tsv but I got this error:

[root@dvn-vm3 api]# curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @data/metadatablocks/customGSD.tsv -H "Content-type: text/tab-separated-values"
{"status":"ERROR","message":"3"}

Here's the stack trace (I build the 4.2 war file on my laptop from commit 9ae1a64):

[2015-09-18T09:21:21.098-0400] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.api.DatasetFieldServiceApi] [tid: _ThreadID=30 _ThreadName=http-listener-1(5)] [timeMillis: 1442582481098] [levelValue: 900] [[
  Error parsing dataset fields:3
java.lang.ArrayIndexOutOfBoundsException: 3
    at edu.harvard.iq.dataverse.api.DatasetFieldServiceApi.parseControlledVocabulary(DatasetFieldServiceApi.java:370)
    at edu.harvard.iq.dataverse.api.DatasetFieldServiceApi.loadDatasetFields(DatasetFieldServiceApi.java:263)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:152)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:387)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:331)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:103)
    at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:297)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:254)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1028)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:372)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:381)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:344)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221)
    at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1682)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:344)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
    at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:205)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
    at edu.harvard.iq.dataverse.api.ApiBlockingFilter.doFilter(ApiBlockingFilter.java:161)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
    at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:30)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
    at org.apache.catalina.core.ApplicationDispatcher.doInvoke(ApplicationDispatcher.java:873)
    at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:739)
    at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:575)
    at org.apache.catalina.core.ApplicationDispatcher.doDispatch(ApplicationDispatcher.java:546)
    at org.apache.catalina.core.ApplicationDispatcher.dispatch(ApplicationDispatcher.java:428)
    at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:378)
    at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:34)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:316)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:160)
    at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:734)
    at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:673)
    at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:174)
    at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:734)
    at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:673)
    at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:412)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:282)
    at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:459)
    at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:167)
    at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:201)
    at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:175)
    at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:235)
    at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:284)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:201)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:133)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
    at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
    at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:561)
    at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:117)
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:56)
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:137)
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:565)
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:545)
    at java.lang.Thread.run(Thread.java:745)
]]

Line 370 is cvv.setIdentifier(values[3]);. Here:

As I mentioned to @scolapasta yesterday, the change in 76496aa (the version of the tsv I'm trying to load) seems to affect basically the entire tsv file. It's a much bigger change than the earlier commit at df60357 which seem to only change controlled vocabulary values.

In short, I think there's something wrong with the latest version of the tsv file. The code that parses this tsv file is picky and I don't know much about it. @scolapasta was the original author and @sekmiller added the feature to re-load an updated tsv file (I'm not sure which issue that was).

@posixeleni knows a lot about these tsv files as well. Again, I'm pretty sure we need a new one. One that doesn't cause the code to throw exceptions.

@pdurbin
Copy link
Member

pdurbin commented Sep 18, 2015

Line 370 is cvv.setIdentifier(values[3])

I spoke with @sekmiller and he indicated that the concept of an identifier for a controlled vocabulary value was added after 4.0. Judging from #947 it was added in 4.0.1 by @scolapasta and @posixeleni .

We were speculating that perhaps the problem is that the "identifier" column was empty but after uploading the version from 4.1 to Google Docs, I don't think that's the problem because the "identifier" column was empty back in 4.1 too:

customgsd-4 1 tsv_-google_sheets-_2015-09-18_10 11 00

That screenshot comes from here (4.1 version of the GSD block): https://docs.google.com/spreadsheets/d/1xQ8wi1-2NqylgzROf72A64ojrpQJAJTHdFe3mRzPHN0/edit?usp=sharing

In addition the "journals" metadata block in 4.1 didn't have "identifier" filled in:

dataverse_journals tsv_at_v4 1_ iqss_dataverse-_2015-09-18_10 18 43

So I'm pretty sure "identifier" is optional.

@pdurbin
Copy link
Member

pdurbin commented Sep 18, 2015

I spoke with @scolapasta and he indicated I should give this issue to @posixeleni to review the tsv file that is failing to import.

@posixeleni here's the file: https://github.com/IQSS/dataverse/blob/76496aa9593736d7846e1aa0222fe229198762d5/scripts/api/data/metadatablocks/customGSD.tsv

If there's something obvious you can fix, please feel free to push to the branch we've been using: https://github.com/IQSS/dataverse/commits/2310-GSD

If it helps, I think the line with "01321" may be a problem. It's different than the surrounding lines:

murphy:dataverse pdurbin$ grep 01321 customGSD.tsv -C3
    gsdCourseName   01317       39                                          
    gsdCourseName   01318       40                                          
    gsdCourseName   01319       41                                          
        01321                                                   
    gsdCourseName   01401       42                                          
    gsdCourseName   01401       43                                          
    gsdCourseName   01402       44                                          

This is just a theory though...

@pdurbin pdurbin assigned posixeleni and unassigned pdurbin Sep 18, 2015
@posixeleni
Copy link
Contributor

@pdurbin @sbarbosadataverse there's a bigger issue here than just the error coming out of this tsv. I can clean that up easily but it appears that the GSD wants us to replace their Course Names with new course names which after speaking with @sekmiller our code is not currently able to do yet so we at the moment we can only add new values to the tsv's controlled vocabulary.

@posixeleni
Copy link
Contributor

Waiting to get an ETA from @sekmiller on when we can replace controlled vocabulary values and not just add new values.

@posixeleni posixeleni assigned sekmiller and unassigned posixeleni Sep 21, 2015
@pdurbin
Copy link
Member

pdurbin commented Sep 22, 2015

@sekmiller as we discussed, you're welcome to look at adding a "preview" mode while you're in that part of the code: #2551

@scolapasta scolapasta modified the milestones: 4.2, 4.3 Sep 25, 2015
@mercecrosas mercecrosas modified the milestones: 4.3, In Review Nov 30, 2015
@scolapasta scolapasta removed this from the Not Assigned to a Release milestone Jan 28, 2016
@pdurbin pdurbin added User Role: Curator Curates and reviews datasets, manages permissions and removed zTriaged labels Jun 30, 2017
@djbrooke
Copy link
Contributor

I'm going to close this very old issue as I think it pertains to a custom metadata block in dataverse.harvard.edu. I contemplated bringing this into a larger metadata consolidation issue (#6030) but I'm not sure what it's about (versioning metadata blocks, maybe?).

@jggautier
Copy link
Contributor

jggautier commented Jul 19, 2019

I think this started as an update to a custom metadata block on Harvard Dataverse where the people using the metadata block wanted to edit the terms in one of its controlled vocabularies (instead of or in addition to adding terms). Maybe because the names of faculty changed or had typos. But the process of updating the metadata blocks doesn't handle editing terms in the controlled vocabularies. So I'm guessing that when this edited GSD metadata block was being uploaded, Dataverse saw the new controlled vocabulary terms in the tsv file and said, "Hey! There are saved datasets that have terms in the Faculty Name field that aren't in this new tsv file. You can't do that."

Is that right? If so:

  • Is that what's also blocking another issue about editing the controlled vocabulary terms in the Astronomy metadata block (Fix Text Spacing in Astronomy Metadata Fields #2622)?
  • There's discussion in this issue about identifiers for terms, and how the terms in the GSD tsv file are missing identifiers. But I'm guessing that that level of abstraction wouldn't let us change how the terms are being displayed in the UI without needing to change the thing that Dataverse needs to remain constant, identifiers? Was the identifiers column in the metadata block tsv added mostly for including identifiers in metadata exports?
  • Is this an issue that needs to be addressed when we want to support the use of external vocabularies, part of Re-evaluate metadata blocks, metadata display, help, and metadata sources #6030? Terms in the kinds of vocabularies that would be accessible over APIs are edited often (while the terms' identifiers (often URLs) are supposed to be persistent). It sounds like we'd want to make sure that Dataverse can update terms, or update how they're displayed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: API Feature: Metadata User Role: Curator Curates and reviews datasets, manages permissions
Projects
None yet
Development

No branches or pull requests

9 participants