Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sourcery refactored master branch #15

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

sourcery-ai[bot]
Copy link

@sourcery-ai sourcery-ai bot commented Jul 18, 2022

Branch master refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch origin sourcery/master
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

@sourcery-ai sourcery-ai bot requested a review from Roshanson July 18, 2022 09:12
Copy link
Author

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to GitHub API limits, only the first 60 comments can be shown.

open('data/result.txt', 'a+').write(str(k) + ' ' + str(v) + '\n') # 将k,v转换为str类型
open('data/result.txt', 'a+').write(f'{str(k)} {str(v)}' + '\n')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function com_tf refactored with the following changes:

This removes the following comments ( why? ):

# 将k,v转换为str类型

Comment on lines -20 to +21
path1 = path + 'data/title_and_abs/'
newpath = path + "data/pro_keyword/"
path1 = f'{path}data/title_and_abs/'
newpath = f"{path}data/pro_keyword/"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 20-21 refactored with the following changes:

Comment on lines -11 to +43
data_source = open(file_import_url, 'r')
data = data_source.readline()
word_in_afile_stat = {}
word_in_allfiles_stat = {}
files_num = 0
while data != "": # 对文件pro_res.txt进行处理
data_temp_1 = data.strip("\n").split("\t") # file name and key words of a file
data_temp_2 = data_temp_1[1].split(",") # key words of a file
file_name = data_temp_1[0]
data_temp_len = len(data_temp_2)
files_num += 1
data_dict = {}
data_dict.clear()
for word in data_temp_2:
if word not in word_in_allfiles_stat:
word_in_allfiles_stat[word] = 1
data_dict[word] = 1
else:
if word not in data_dict: # 如果这个单词在这个文件中之前没有出现过
with open(file_import_url, 'r') as data_source:
data = data_source.readline()
word_in_afile_stat = {}
word_in_allfiles_stat = {}
files_num = 0
while data != "": # 对文件pro_res.txt进行处理
data_temp_1 = data.strip("\n").split("\t") # file name and key words of a file
data_temp_2 = data_temp_1[1].split(",") # key words of a file
file_name = data_temp_1[0]
data_temp_len = len(data_temp_2)
files_num += 1
data_dict = {}
data_dict.clear()
for word in data_temp_2:
if word not in word_in_allfiles_stat:
word_in_allfiles_stat[word] = 1
data_dict[word] = 1
elif word not in data_dict: # 如果这个单词在这个文件中之前没有出现过
word_in_allfiles_stat[word] += 1
data_dict[word] = 1

if not word_in_afile_stat.has_key(file_name):
word_in_afile_stat[file_name] = {}
if not word_in_afile_stat[file_name].has_key(word):
word_in_afile_stat[file_name][word] = []
word_in_afile_stat[file_name][word].append(data_temp_2.count(word))
word_in_afile_stat[file_name][word].append(data_temp_len)
data = data_source.readline()
data_source.close()

if not word_in_afile_stat.has_key(file_name):
word_in_afile_stat[file_name] = {}
if not word_in_afile_stat[file_name].has_key(word):
word_in_afile_stat[file_name][word] = [data_temp_2.count(word), data_temp_len]
data = data_source.readline()
# filelist = os.listdir(newpath2) # 取得当前路径下的所有文件
TF_IDF_last_result = []
if (word_in_afile_stat) and (word_in_allfiles_stat) and (files_num != 0):
for filename in word_in_afile_stat.keys():
for filename, value in word_in_afile_stat.items():
TF_IDF_result = {}
TF_IDF_result.clear()
for word in word_in_afile_stat[filename].keys():
for word in value.keys():
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TF_IDF_Compute refactored with the following changes:

Comment on lines -17 to +20
path = base_path + 'data/computer/'# 原始数据
path1 = base_path + 'data/title_and_abs/' # 处理后的标题和摘要
newpath = base_path + 'data/pro_keyword/'
newpath2 = base_path + 'data/keyword/'
path = f'{base_path}data/computer/'
path1 = f'{base_path}data/title_and_abs/'
newpath = f'{base_path}data/pro_keyword/'
newpath2 = f'{base_path}data/keyword/'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 17-20 refactored with the following changes:

This removes the following comments ( why? ):

# 处理后的标题和摘要
# 原始数据

Comment on lines -32 to +40
# print b
if b is None or b.string is None:
continue
else:
abstracts.extend(soup.title.stripped_strings)
s = b.string
abstracts.extend(s.encode('utf-8'))
f = open(path1 + filename + ".txt", "w+") # 写入txt文件
abstracts.extend(soup.title.stripped_strings)
s = b.string
abstracts.extend(s.encode('utf-8'))
with open(path1 + filename + ".txt", "w+") as f:
for i in abstracts:
f.write(i)
f.close()
abstracts = []
abstracts = []
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_text refactored with the following changes:

This removes the following comments ( why? ):

# 写入txt文件
# 将得到的未处理的文字放在pro_keyword文件夹中
# print b

Comment on lines -31 to +30
features = [text_len, isHasSH]
return features
return [text_len, isHasSH]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_feature refactored with the following changes:

Comment on lines -42 to +41
print(X[0:10])
print(Y[0:10])
print(X[:10])
print(Y[:10])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function load_data refactored with the following changes:

Comment on lines -48 to +46
if __name__ == '__main__':
pass
pass
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 48-49 refactored with the following changes:

for epoch in range(num_epochs):
for _ in range(num_epochs):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function batch_iter refactored with the following changes:

Comment on lines -21 to +23
raise ValueError("Linear is expecting 2D arguments: %s" % str(shape))
raise ValueError(f"Linear is expecting 2D arguments: {str(shape)}")
if not shape[1]:
raise ValueError("Linear expects shape[1] of arguments: %s" % str(shape))
raise ValueError(f"Linear expects shape[1] of arguments: {str(shape)}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function linear refactored with the following changes:

with tf.name_scope("conv-maxpool-%s" % filter_size):
with tf.name_scope(f"conv-maxpool-{filter_size}"):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TextCNN.__init__ refactored with the following changes:

print("{}={}".format(attr.upper(), value))
print(f"{attr.upper()}={value}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 36-46 refactored with the following changes:

Comment on lines -44 to +46
f2 = open('%s.txt' % item, 'a+')
for (k, v) in data_dict.items():
f2.write(v + ',' + k + ' ' + '\n')
f2.close()
with open(f'{item}.txt', 'a+') as f2:
for (k, v) in data_dict.items():
f2.write(v + ',' + k + ' ' + '\n')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_text refactored with the following changes:

Comment on lines -83 to +91
# print (files)
f = open(base_path + files, 'r')
text = (f.read().decode('GB2312', 'ignore').encode('utf-8'))
salt = ''.join(random.sample(string.ascii_letters + string.digits, 8)) # 产生随机数
f2 = open("C:/Users/kaifun/Desktop/ass_TIP/TextInfoExp/Part2_Text_Classify/test3/" + salt + '.txt', 'w')
f2.write(text)
f3.write(salt + ' ' + 'e' + '\n')
f.close()
with open(base_path + files, 'r') as f:
text = (f.read().decode('GB2312', 'ignore').encode('utf-8'))
salt = ''.join(random.sample(string.ascii_letters + string.digits, 8)) # 产生随机数
f2 = open(
f"C:/Users/kaifun/Desktop/ass_TIP/TextInfoExp/Part2_Text_Classify/test3/{salt}.txt",
'w',
)

f2.write(text)
f3.write(f'{salt} e' + '\n')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function trans_text refactored with the following changes:

This removes the following comments ( why? ):

# print (files)

Comment on lines -124 to +125
f.write(str(test_name[i]) + ' ' + str(result[i]) + '\n')
f.write(f'{str(test_name[i])} {str(result[i])}' + '\n')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_classify refactored with the following changes:

Comment on lines -211 to +222
if judgement != "":
return 4, judgement

return 0, ""
return (4, judgement) if judgement != "" else (0, "")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function DictClassifier.__analyse_word refactored with the following changes:

Comment on lines -220 to +228
if match is not None:
pattern = {"key": "要的是…给的是…", "value": 1}
return pattern
return ""
return {"key": "要的是…给的是…", "value": 1} if match is not None else ""
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function DictClassifier.__is_clause_pattern1 refactored with the following changes:

Comment on lines -227 to +232
conjunction = {"key": the_word, "value": self.__conjunction_dict[the_word]}
return conjunction
return {"key": the_word, "value": self.__conjunction_dict[the_word]}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function DictClassifier.__is_word_conjunction refactored with the following changes:

Comment on lines -233 to +237
punctuation = {"key": the_word, "value": self.__punctuation_dict[the_word]}
return punctuation
return {"key": the_word, "value": self.__punctuation_dict[the_word]}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function DictClassifier.__is_word_punctuation refactored with the following changes:

Comment on lines -332 to +336
output += "Sub-clause" + str(i) + ": "
clause = comment_analysis["su-clause" + str(i)]
output += f"Sub-clause{str(i)}: "
clause = comment_analysis[f"su-clause{str(i)}"]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function DictClassifier.__output_analysis refactored with the following changes:

Comment on lines -381 to +404
if match is not None and len(self.__split_sentence(match.group(2))) <= 2:
to_delete = []
for i in range(len(the_clauses)):
if the_clauses[i] in match.group(2):
to_delete.append(i)
if len(to_delete) > 0:
for i in range(len(to_delete)):
if match is not None and len(self.__split_sentence(match[2])) <= 2:
if to_delete := [
i for i in range(len(the_clauses)) if the_clauses[i] in match[2]
]:
for item in to_delete:
the_clauses.remove(the_clauses[to_delete[0]])
the_clauses.insert(to_delete[0], match.group(2))
the_clauses.insert(to_delete[0], match[2])

# 识别“要是|如果……就好了”的假设句式
pattern = re.compile(r"([,%。、!;??,!~~.… ]*)([\u4e00-\u9fa5]*?(如果|要是|"
r"希望).+就[\u4e00-\u9fa5]+(好|完美)了[,。;!%、??,!~~.… ]+)")
match = re.search(pattern, the_sentence.strip())
if match is not None and len(self.__split_sentence(match.group(2))) <= 3:
if match is not None and len(self.__split_sentence(match[2])) <= 3:
to_delete = []
for i in range(len(the_clauses)):
if the_clauses[i] in match.group(2):
if the_clauses[i] in match[2]:
to_delete.append(i)
if len(to_delete) > 0:
for i in range(len(to_delete)):
if to_delete:
for item_ in to_delete:
the_clauses.remove(the_clauses[to_delete[0]])
the_clauses.insert(to_delete[0], match.group(2))
the_clauses.insert(to_delete[0], match[2])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function DictClassifier.__divide_sentence_into_clauses refactored with the following changes:

Comment on lines -420 to +421
clauses = [''.join(x) for x in zip(split_clauses, punctuations)]

return clauses
return [''.join(x) for x in zip(split_clauses, punctuations)]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function DictClassifier.__split_sentence refactored with the following changes:

Comment on lines -427 to +426
with open(self.__root_filepath + "phrase_dict.txt", "r", encoding="utf-8") as f:
with open(f"{self.__root_filepath}phrase_dict.txt", "r", encoding="utf-8") as f:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function DictClassifier.__get_phrase_dict refactored with the following changes:

Comment on lines -459 to +457
f.write("%s" % info)
f.write(f"{info}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function DictClassifier.__write_runout_file refactored with the following changes:

Comment on lines -538 to +536
sorted_distances = distances.argsort()

return sorted_distances
return distances.argsort()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function KNNClassifier.__get_sorted_distances refactored with the following changes:

@@ -171,10 +171,8 @@ def test_corpus():
a = WaimaiCorpus()
a = Waimai2Corpus()
a = HotelCorpus()
pass
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function test_corpus refactored with the following changes:


if __name__ == "__main__":
pass
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 177-177 refactored with the following changes:

if need_score:
return [word for word in words[:num]]
else:
return [word[0] for word in words[:num]]
return list(words[:num]) if need_score else [word[0] for word in words[:num]]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function ChiSquare.best_words refactored with the following changes:

Comment on lines -41 to +46
if type(self.k) == int:
k = "%s" % self.k
else:
k = "-".join([str(i) for i in self.k])

k = f"{self.k}" if type(self.k) == int else "-".join([str(i) for i in self.k])
print("KNNClassifier")
print("---" * 45)
print("Train num = %s" % self.train_num)
print("Test num = %s" % self.test_num)
print("K = %s" % k)
print(f"Train num = {self.train_num}")
print(f"Test num = {self.test_num}")
print(f"K = {k}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Test.test_knn refactored with the following changes:

print("Train num = %s" % self.train_num)
print("Test num = %s" % self.test_num)
print(f"Train num = {self.train_num}")
print(f"Test num = {self.test_num}")

from classifiers import BayesClassifier
bayes = BayesClassifier(self.train_data, self.train_labels, self.best_words)

classify_labels = []
print("BayesClassifier is testing ...")
for data in self.test_data:
classify_labels.append(bayes.classify(data))
classify_labels = [bayes.classify(data) for data in self.test_data]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Test.test_bayes refactored with the following changes:

@xiaoc10020015
Copy link

xiaoc10020015 commented Jul 18, 2022 via email

@sourcery-ai
Copy link
Author

sourcery-ai bot commented Jul 18, 2022

Sourcery Code Quality Report

✅  Merging this PR will increase code quality in the affected files by 0.57%.

Quality metrics Before After Change
Complexity 14.82 🙂 14.16 🙂 -0.66 👍
Method Length 80.49 🙂 79.45 🙂 -1.04 👍
Working memory 9.53 🙂 9.51 🙂 -0.02 👍
Quality 55.32% 🙂 55.89% 🙂 0.57% 👍
Other metrics Before After Change
Lines 4172 4066 -106
Changed files Quality Before Quality After Quality Change
Part1_TF-IDF/example.py 85.89% ⭐ 85.78% ⭐ -0.11% 👎
Part1_TF-IDF/src/GrobalParament.py 92.62% ⭐ 92.31% ⭐ -0.31% 👎
Part1_TF-IDF/src/get_TF_IDF.py 31.34% 😞 34.92% 😞 3.58% 👍
Part1_TF-IDF/src/get_data.py 61.75% 🙂 67.14% 🙂 5.39% 👍
Part1_TF-IDF/src/utils.py 50.28% 🙂 53.45% 🙂 3.17% 👍
Part2_Text_Classify/classifier.py 61.19% 🙂 61.19% 🙂 0.00%
Part2_Text_Classify/feature.py 77.74% ⭐ 77.16% ⭐ -0.58% 👎
Part2_Text_Classify/cnn-text-classification-tf-chinese/data_helpers.py 82.34% ⭐ 82.34% ⭐ 0.00%
Part2_Text_Classify/cnn-text-classification-tf-chinese/text_cnn.py 49.49% 😞 49.43% 😞 -0.06% 👎
Part2_Text_Classify/cnn-text-classification-tf-chinese/train.py 65.04% 🙂 61.94% 🙂 -3.10% 👎
Part2_Text_Classify/src/get_cls.py 73.78% 🙂 74.23% 🙂 0.45% 👍
Part3_Text_Cluster/src/TextCluster.py 47.19% 😞 47.74% 😞 0.55% 👍
Part3_Text_Cluster/src/get_res.py 59.71% 🙂 61.15% 🙂 1.44% 👍
Part5_Sentiment_Analysis/src/classifiers.py 52.67% 🙂 52.86% 🙂 0.19% 👍
Part5_Sentiment_Analysis/src/corpus.py 77.27% ⭐ 76.87% ⭐ -0.40% 👎
Part5_Sentiment_Analysis/src/feature_extraction.py 64.58% 🙂 63.63% 🙂 -0.95% 👎
Part5_Sentiment_Analysis/src/test.py 74.93% 🙂 75.53% ⭐ 0.60% 👍
Part5_Sentiment_Analysis/src/tools.py 48.56% 😞 50.40% 🙂 1.84% 👍
Part6_Relation_Extraction/feature_extract.py 37.40% 😞 36.54% 😞 -0.86% 👎
Part6_Relation_Extraction/libsvm-3.21/python/svm.py 61.97% 🙂 62.10% 🙂 0.13% 👍
Part6_Relation_Extraction/libsvm-3.21/python/svmutil.py 51.17% 🙂 52.04% 🙂 0.87% 👍
Part6_Relation_Extraction/libsvm-3.21/tools/checkdata.py 38.55% 😞 38.85% 😞 0.30% 👍
Part6_Relation_Extraction/libsvm-3.21/tools/easy.py 34.03% 😞 33.93% 😞 -0.10% 👎
Part6_Relation_Extraction/libsvm-3.21/tools/grid.py 45.00% 😞 46.01% 😞 1.01% 👍
Part6_Relation_Extraction/libsvm-3.21/tools/subset.py 62.80% 🙂 64.13% 🙂 1.33% 👍
Tools/mul_thd_google_translate.py 85.85% ⭐ 87.10% ⭐ 1.25% 👍

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
Part6_Relation_Extraction/feature_extract.py feature_extract2 56 ⛔ 484 ⛔ 25 ⛔ 4.73% ⛔ Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
Part6_Relation_Extraction/libsvm-3.21/tools/grid.py GridOption.parse_options 40 ⛔ 332 ⛔ 23 ⛔ 10.72% ⛔ Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
Part6_Relation_Extraction/libsvm-3.21/python/svm.py svm_parameter.parse_options 25 😞 387 ⛔ 25 ⛔ 15.58% ⛔ Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
Part5_Sentiment_Analysis/src/classifiers.py DictClassifier.__output_analysis 64 ⛔ 285 ⛔ 12 😞 18.18% ⛔ Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
Part5_Sentiment_Analysis/src/classifiers.py MaxEntClassifier.test 39 ⛔ 277 ⛔ 14 😞 19.96% ⛔ Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

  • ⭐ excellent
  • 🙂 good
  • 😞 poor
  • ⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant