-
Notifications
You must be signed in to change notification settings - Fork 60
First steps towards generalization in reimbursiments description #66
base: master
Are you sure you want to change the base?
Conversation
A comment about 40 MB of binary file (ML model) In the last year, github has added support for Git LFS, which was apparently created to solve problems like this. https://git-lfs.github.com/ |
trying to save the pdf and png in a temporary folder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First comments on 💅 only. PEP-8 for the win ;)
Dockerfile
Outdated
COPY requirements.txt ./ | ||
COPY setup ./ | ||
COPY rosie.py ./ | ||
COPY rosie ./rosie | ||
COPY config.ini.example ./ | ||
COPY config.ini ./ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be here since Rosie's setup
file creates config.ini
from the example file copied in the previous line 😉
You can go ahead and delete this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, i removed it. Strange the first time i tried to build the image i got error without it... Any way now its ok
from keras.layers import Activation, Dropout, Flatten, Dense | ||
from keras import backend as K | ||
from keras.callbacks import ModelCheckpoint | ||
from keras.models import load_model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can follow PEP-8 imports section to guide you when organizing your imports. For example, when doing from ... import ...
you can separate the second block with comma like: from keras.models import Sequential, load_model
.
Note that you should also group your imports:
1. standard library imports
1. related third party imports
1. local application/library specific imports
You should put a blank line between each group of imports.
Also on that note, you can make use of tools to help you automatically organize your imports like isort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Now they look beautiful :D
The year the expense was generated. | ||
""" | ||
|
||
COLS = ['applicant_id', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't be afraid to use descriptive names like COLUMNS
for your variables. It will make your code clearer to read down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact i tried to follow the code of other classifiers. Ctrl+C > Ctrl+V
I changed it to COLUMNS ;)
nb_train_samples = sum([len(files) for r, d, files in os.walk(train_data_dir)]) | ||
nb_validation_samples = sum([len(files) for r, d, files in os.walk(validation_data_dir)]) | ||
|
||
print('no. of trained samples = ', nb_train_samples, ' no. of validation samples= ',nb_validation_samples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long lines aren't a good thing. This print could be like the following in other to avoid the extra long line:
print('no. of trained samples = ', nb_train_samples,
' no. of validation samples= ', nb_validation_samples)
Extra space was missing there.
|
||
img_width, img_height = 300, 300 | ||
|
||
def train(self,train_data_dir,validation_data_dir,save_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pay extra attention to spaces after commas, they help make your code easier on the eyes 😉
This method would be nicer like this:
def train(self, train_data_dir, validation_data_dir, save_dir):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the code using this tool: http://pep8online.com/checkresult
PEP8 online
Check your code for PEP8 requirements
|
||
model.add(Conv2D(64, (3, 3))) | ||
model.add(Activation('relu')) | ||
model.add(MaxPooling2D(pool_size=(2, 2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noted that you make the same process here 3 times with very little difference between parameters used. Maybe you can explain a little why is that necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included a brief description and a link to explain it: http://deeplearning.net/tutorial/lenet.html
optimizer='rmsprop', | ||
metrics=['accuracy']) | ||
|
||
#this is the augmentation configuration we will use for training |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP-8 inline comments state that:
They (inline comments) should start with a # and a single space.
# This is the augmentation configuration we will use for training
looks way better don't you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, using the tool
rescale=1. / 255, | ||
shear_range=0.2, | ||
zoom_range=0.2, | ||
horizontal_flip=False)#As you can see i put it as FALSE and on link example it is TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP-8 inline comments state that:
An inline comment is a comment on the same line as a statement. Inline comments should be separated by at least two spaces from the statement. They should start with a # and a single space.
And be careful on the line length here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I didn't quite get what you meant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a copy and past from serenata. Now i included the reason
I put horizontal_flip as FALSE because we can not handwrite from right to left in Portuguese
horizontal_flip=False)#As you can see i put it as FALSE and on link example it is TRUE | ||
#Explanation, there no possibility to write in a reverse way :P | ||
|
||
#this is the augmentation configuration we will use for testing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing as inline comments, these should start with #
followed by a single space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is applicable to all other comments you made in this file ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 requirements done, using it: http://pep8online.com/checkresult
class_mode='binary') | ||
|
||
#It allow us to save only the best model between the iterations | ||
checkpointer = ModelCheckpoint(filepath=save_dir+"weights.hdf5", verbose=1, save_best_only=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File paths should be built using os.path.join
to avoid breaking the scripts when running Rosie in different systems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtemporal Another good option with good API is pathlib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, i included the os.path.join
…eckresult) 2) Include comments in the source code
Main points after the last pull request:
|
It is the inclusion of okfn-brasil/serenata-de-amor#238 in Rosie.
Since the built classifier has 91% of accuracy, maybe it can be included in the set of classifiers for the chamber of deputies.
PS:
I uploaded the ML model (40 MB) together to code. I guess it is not the best practice, maybe it must be downloaded from the serenata-toolbox, however i don't have access to amazon to upload the model :/ ( * Probably we have to change it in order to have a better architecture)
The code works fine in the Test:
![screenshot from 2017-07-20 14-01-53](https://user-images.githubusercontent.com/7073485/28416678-4d1b34d0-6d55-11e7-9663-bd2180a418ed.png)
However i couldn't run it in the real reimbursements. Rosie is consuming all my RAM (5 Gb) while the reimbursements are downloaded, maybe someone could test it to me 👍
Sorry for my Python code. I just started to build something in Python there 4 months. I will appreciate some tips to improve the code.
As you can see in the print the TensorFlow i included is the basic one. I kept it since i don't know whether the machine where Rosie is running allows other configurations. (*We also can change it)