Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repository export operation does not provide any valid .trig #5093

Open
MRCO-DURON opened this issue Jul 29, 2024 · 8 comments
Open

Repository export operation does not provide any valid .trig #5093

MRCO-DURON opened this issue Jul 29, 2024 · 8 comments
Labels
🐞 bug issue is a bug

Comments

@MRCO-DURON
Copy link

MRCO-DURON commented Jul 29, 2024

Current Behavior

When I do:
curl "http://localhost:8080/rdf4j-workbench/repositories/repositoryName/export?Accept=application%2Ftrig" --compressed -o ./repositoryName.trig

I get only 15k files, while my repo is 145G.
And my previous exports used to be 3G or more.

Have also tried using console.sh:
/home/ubuntu/eclipse-rdf4j-4.3.2/bin/console.sh and I get java.lang.OutOfMemoryError exceptions.

Expected Behavior

Get valid file exports by either using console.sh or rdf4j API.

Steps To Reproduce

curl "http://localhost:8080/rdf4j-workbench/repositories/repositoryName/export?Accept=application%2Ftrig" --compressed -o ./repositoryName.trig

/home/ubuntu/eclipse-rdf4j-4.3.2/bin/console.sh

Version

4.3.2

Are you interested in contributing a solution yourself?

None

Anything else?

No response

@MRCO-DURON MRCO-DURON added the 🐞 bug issue is a bug label Jul 29, 2024
@hmottestad
Copy link
Contributor

Did this used to work before? Do you know which version it worked on before?

@hmottestad
Copy link
Contributor

Btw. Trig isn't a particularly good format for exporting a lot of data since the trig writer needs to know a lot about your data to format it correctly.

Have you tried with NQUADS? That should hopefully be a fully streaming data format.

@MRCO-DURON
Copy link
Author

Sadly the last working version is something I din't know about.
However I have 3 other lower environments, running same version without issues.

The only difference is the size of the repositories.

Your saying I can export/convert my current repository(.ttl) as NQUADS?

@MRCO-DURON
Copy link
Author

MRCO-DURON commented Aug 1, 2024

Here is also an error I get when using eclipse-rdf4j-console console:

`root@ip-172-31-38-149:~# bash /home/ubuntu/eclipse-rdf4j-4.3.2/bin/console.sh
15:53:33.811 [main] DEBUG org.eclipse.rdf4j.common.platform.PlatformFactory - os.name = linux
15:53:33.814 [main] DEBUG org.eclipse.rdf4j.common.platform.PlatformFactory - Detected Posix platform
Connected to default data directory
RDF4J Console 4.3.2
Working dir: /home/ubuntu/eclipse-rdf4j-2.5.1/bin
Type 'help' for help.

connect http://127.0.0.1:8080/rdf4j-server
Disconnecting from default data directory
Connected to http://127.0.0.1:8080/rdf4j-server
open reponame
Opened repository 'reponame'
muchamiel> export /mnt/test.trig
Exception in thread "main" org.eclipse.rdf4j.repository.RepositoryException: <!doctype html><title>HTTP Status 500 – Internal Server Error</title><style type="text/css">h1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} h2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} h3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} body {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} b {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} p {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;} a {color:black;} a.name {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style>

HTTP Status 500 – Internal Server Error


Type Exception Report

Message Handler processing failed; nested exception is java.lang.OutOfMemoryError

Description The server encountered an unexpected condition that prevented it from fulfilling the request.

Exception

org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1094)
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:964)
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
com.github.ziplet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:263)

Root Cause

java.lang.OutOfMemoryError
java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125)
java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119)
java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
java.base/java.io.DataOutputStream.write(DataOutputStream.java:107)
java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:108)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.writeString(BinaryRDFWriter.java:346)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.writeLiteral(BinaryRDFWriter.java:322)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.writeValue(BinaryRDFWriter.java:293)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.assignId(BinaryRDFWriter.java:254)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.incValueFreq(BinaryRDFWriter.java:238)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.consumeStatement(BinaryRDFWriter.java:198)
org.eclipse.rdf4j.rio.helpers.AbstractRDFWriter.handleStatement(AbstractRDFWriter.java:109)
org.eclipse.rdf4j.repository.sail.SailRepositoryConnection.exportStatements(SailRepositoryConnection.java:382)
org.eclipse.rdf4j.http.server.repository.statements.ExportStatementsView.render(ExportStatementsView.java:95)
org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1405)
org.springframework.web.servlet.DispatcherServlet.processDispatchResult(DispatcherServlet.java:1149)
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1088)
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:964)
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
com.github.ziplet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:263)

Note The full stack trace of the root cause is available in the server logs.


Apache Tomcat/8.5.39 (Ubuntu)


at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.execute(SPARQLProtocolSession.java:1095)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.executeOK(SPARQLProtocolSession.java:1029)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendGraphQueryViaHttp(SPARQLProtocolSession.java:945)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getRDF(SPARQLProtocolSession.java:876)
at org.eclipse.rdf4j.http.client.RDF4JProtocolSession.getStatements(RDF4JProtocolSession.java:618)
at org.eclipse.rdf4j.repository.http.HTTPRepositoryConnection.exportStatements(HTTPRepositoryConnection.java:274)
at org.eclipse.rdf4j.repository.base.AbstractRepositoryConnection.export(AbstractRepositoryConnection.java:189)
at org.eclipse.rdf4j.console.command.Export.export(Export.java:140)
at org.eclipse.rdf4j.console.command.Export.execute(Export.java:94)
at org.eclipse.rdf4j.console.Console.executeCommand(Console.java:379)
at org.eclipse.rdf4j.console.Console.start(Console.java:336)`

@hmottestad
Copy link
Contributor

Glad to know it's not a regression at least.

Any chance you can confirm that this is still an issue on RDF4J 5.0.1?

Other than that it looks like there is something that should be streaming the output but is actually writing it to a byte array output stream instead.

@MRCO-DURON
Copy link
Author

I updated the files for it and it happens with 5.0.1 too.
Same bahavior.

Here is my config.ttl:

`cat /var/lib/tomcat8/.RDF4J/server/repositories/myRepoName/config.ttl
@Prefix ns: http://www.openrdf.org/config/sail/native# .
@Prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
@Prefix rep: http://www.openrdf.org/config/repository# .
@Prefix sail: http://www.openrdf.org/config/sail# .
@Prefix sb: http://www.openrdf.org/config/sail/base# .
@Prefix sr: http://www.openrdf.org/config/repository/sail# .
@Prefix xsd: http://www.w3.org/2001/XMLSchema# .

<#MyRepoName> a rep:Repository;
rep:repositoryID "myRepoName";
rep:repositoryImpl [
rep:repositoryType "openrdf:SailRepository";
sr:sailImpl [
sail:sailType "openrdf:NativeStore";
sb:evaluationStrategyFactory "org.eclipse.rdf4j.query.algebra.evaluation.impl.StrictEvaluationStrategyFactory";
ns:tripleIndexes "spoc,posc"
]
];
rdfs:label "Native store" .`

@hmottestad
Copy link
Contributor

Thanks for checking. And just to be sure. Is this also the case when using NQUADS?

@MRCO-DURON
Copy link
Author

I tried exporting the current repo as .nq. But that did not work.
Is there any process for this?
trig to nq?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug issue is a bug
Projects
None yet
Development

No branches or pull requests

2 participants