Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update remover.py #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 33 additions & 7 deletions remover.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ class PdfEnhancedFileWriter(PdfFileWriter):
'rgb': {
'black': [NumberObject(0), NumberObject(0), NumberObject(0)],
'white': [NumberObject(1), NumberObject(1), NumberObject(1)],
'red': [NumberObject(1), NumberObject(0), NumberObject(0)],
},
'cmyk': {
'black': [NumberObject(0), NumberObject(0), NumberObject(0), NumberObject(1)],
Expand Down Expand Up @@ -78,7 +79,7 @@ def _getColorTargetOperationType(self, color_index, operations):
def getMinimumRectangleWidth(self, fontSize, minimumNumberOfLetters = 1.5):
return fontSize * minimumNumberOfLetters

def removeWordStyle(self, ignoreByteStringObject=False):
def removeWordStyle(self, is_default, ignoreByteStringObject=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add default value for backwards compatibility

"""
Removes imported styles from Word - Path Constructors rectangles - from this output.

Expand Down Expand Up @@ -138,8 +139,14 @@ def removeWordStyle(self, ignoreByteStringObject=False):
# we are coloring all text in black and all rectangles in white
# removing all colors paints rectangles in black which gives us unwanted results
if color_target_operation_type == 'text':
new_color = 'black'
elif color_target_operation_type == 'rectangle':
if is_default:
new_color = 'black'
else:
if operator_type == 'rgb' and operands == self.colors_operands[operator_type]['red']:
new_color = 'white'
else:
new_color = 'black'
elif is_default and color_target_operation_type == 'rectangle':
new_color = 'white'

if new_color:
Expand All @@ -149,7 +156,7 @@ def removeWordStyle(self, ignoreByteStringObject=False):
# remove styled rectangles (highlights, lines, etc.)
# the 're' operator is a Path Construction operator, creates a rectangle()
# presumably, that's the way word embedding all of it's graphics into a PDF when creating one
if operator == b_('re'):
if is_default and operator == b_('re'):

rectangle_width = operands[-2].as_numeric()
rectangle_height = operands[-1].as_numeric()
Expand Down Expand Up @@ -213,9 +220,10 @@ def load1():
# prints the loaded list
#print(pdf_list)

def add_to_writer(pdfsrc, writer):
def add_to_writer(pdfsrc, writer, is_default = True):
[writer.addPage(pdfsrc.getPage(i)) for i in range(pdfsrc.getNumPages())]
writer.removeWordStyle()
writer.removeWordStyle(is_default)


def remove_images():
writer = PdfEnhancedFileWriter()
Expand All @@ -235,7 +243,24 @@ def remove_images():

print("Job is done")
root.quit()
def remove_images2():
writer = PdfEnhancedFileWriter()
# output_filename = asksaveasfilename(filetypes = (('PDF File', '*.pdf'), ('All Files','*.*')))
Comment on lines +246 to +248
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what's the reason for the second remove_images?
if you split cases- give it an indicative name,
better if you are able to find a way to make some condition testing that will automatically tell which one to choose.

if you are struggling- send me an example pdf of it, we can think together.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to push quickly a fix.
Should've find a better option.
Like, changing remove_images(), that you call in the main action button to receive an argument with default value.
Then, instead of another wrapper function, using a lambda expression in line 274.
Or using the partial() function and etc, like suggested here:
https://stackoverflow.com/questions/6920302/how-to-pass-arguments-to-a-button-command-in-tkinter

The suggestion of yours, that instead of the user informing which state to be, to detect by itself.
I will think about it.
But like the original purpose of the functions, were fitted to a certain type of tests. (Multiple answers marked without explanations tests.)
Also my suggested feature is.

Copy link
Author

@gilkzxc gilkzxc Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the major problems are, that tests don't have a uniform types, or canonical forms.

output_saving_dir = askdirectory(title="Choose output folder...")
i = 0
for file in pdf_list:
head, tail = os.path.split(filePaths[i])
print(tail)
file_path = os.path.join(output_saving_dir, "SCRAPED_" + tail)
outputfile = open(file_path, 'wb')
add_to_writer(file, writer,False)
writer.write(outputfile)
outputfile.close()
i = i + 1
print(str(i) + " file(s) done")

print("Job is done")
root.quit()

##Label(root, text="Rectangles remover").grid(row=0, column=2, sticky=E)
Button(root, text="Choose one or more PDFs", command=load1, height=5, width=20).grid(row=1, column=0)
Expand All @@ -245,7 +270,8 @@ def remove_images():
#photo= PhotoImage(file=resource_path('./button_pic.png'))

#Button(root, text="Remove answers",image=photo, command=remove_images, width=100, height=120).grid(row=1, column=2,sticky=E)
Button(root, text="Remove answers", command=remove_images, font='Helvetica 12 bold', fg="red", height=4).grid(row=1, column=2, sticky=E)
Button(root, text="Remove marking answers", command=remove_images, font='Helvetica 12 bold', fg="red", height=4).grid(row=1, column=2, sticky=E)
Button(root, text="Remove red answers without deleteing code", command=remove_images2, font='Helvetica 12 bold', fg="red", height=4).grid(row=2, column=2, sticky=E)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment about #246


#Label(root, text="Remove Answers^^").grid(row=2, column=2, sticky=E)
#Label(root, text="Good Luck!").grid(row=2, column=0, sticky=W)
Expand Down