-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV recognized as ASCII text in Debian #208
Comments
What does the |
Does this mean I need to install a different version of libmagic? |
I'm not sure if the mac uses the same file command as debian. If so, then I'd try comparing versions and see if something has changed between. This could be due to actual code changes, or a magic definition file (which usually comes along with the code) |
So I looked in the debian image i was using (I'm using debian docker image) and the magic database was empty. I went ahead and copied the database from my mac to the docker image /usr/share/misc/magic/ (not sure this will work anyway), but still got the same result. |
fwiw, in debian bullyseye (not docker image) I'm running file 5.38-4, and it does recognize a CSV file. |
related: #75 |
at their core .csv files are just ASCII text files and as such contains the same file signature. |
Hi I think have similar issue, i am creating a pandas dataframe and doing a to_csv(), but i get different results Ubuntu:
Python 3.8.10 (default, Nov 22 2023, 10:22:35)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import magic
>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1,2,3], "b": [2,3,4]})
>>> df.to_csv()
',a,b\n0,1,2\n1,2,3\n2,3,4\n'
>>> magic.detect_from_content(df.to_csv().encode('utf-8'))
FileMagic(mime_type='application/csv', encoding='us-ascii', name='CSV text')
Centos:
Python 3.8.19 (default, May 27 2024, 05:59:07)
[GCC 10.2.1 20210130 (Red Hat 10.2.1-11)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import magic
>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1,2,3], "b": [2,3,4]})
>>> df.to_csv()
',a,b\n0,1,2\n1,2,3\n2,3,4\n'
>>> magic.detect_from_content(df.to_csv().encode('utf-8'))
FileMagic(mime_type='text/plain', encoding='us-ascii', name='ASCII text')
could this be some locale issue? |
@indiVar0508 Almost certainly this is because the centos image ships an old version of libmagic |
I see thanks for help, yeah this was the reason,
Code used to generate import magic
if magic._has_version is True:
print(magic.magic_version())
import json
import pandas as pd
import io
import zipfile
df = pd.DataFrame({"a": [1,2,3], "b": [2,3,4]})
# CSV detection
magic.detect_from_content(df.to_csv().encode('utf-8'))
# JSON detection
magic.detect_from_content(json.dumps({"a": 1, "b":[2,3]}))
# Excel detection
writerIO = io.BytesIO()
df.to_excel(writerIO)
writerIO.seek(0)
magic.detect_from_content(writerIO.read())
# Zip detection
df.to_csv("file.csv")
with zipfile.ZipFile("file_compressed.zip", "w") as zpo:
zpo.write("file.csv", compress_type=zipfile.ZIP_DEFLATED)
magic.detect_from_content(open("file_compressed.zip", "rb").read())
# PDF Detection
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
df = pd.DataFrame(np.random.random((10,3)), columns = ("col 1", "col 2", "col 3"))
#https://stackoverflow.com/questions/32137396/how-do-i-plot-only-a-table-in-matplotlib
fig, ax =plt.subplots(figsize=(12,4))
ax.axis('tight')
ax.axis('off')
the_table = ax.table(cellText=df.values,colLabels=df.columns,loc='center')
#https://stackoverflow.com/questions/4042192/reduce-left-and-right-margins-in-matplotlib-plot
pp = PdfPages("foo.pdf")
pp.savefig(fig, bbox_inches='tight')
pp.close()
magic.detect_from_content(open("foo.pdf", "rb").read()) |
cat etc/*-releases >>
File that is recognized as "CSV" in mac (that is clearly a csv file with csv extension) is recognized as ASCII text in Debian. Tried reinstalling libmagic-dev didn't help.
The text was updated successfully, but these errors were encountered: