-
Notifications
You must be signed in to change notification settings - Fork 768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sourcery refactored master branch #15
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to GitHub API limits, only the first 60 comments can be shown.
open('data/result.txt', 'a+').write(str(k) + ' ' + str(v) + '\n') # 将k,v转换为str类型 | ||
open('data/result.txt', 'a+').write(f'{str(k)} {str(v)}' + '\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function com_tf
refactored with the following changes:
- Use f-string instead of string concatenation [×2] (
use-fstring-for-concatenation
)
This removes the following comments ( why? ):
# 将k,v转换为str类型
path1 = path + 'data/title_and_abs/' | ||
newpath = path + "data/pro_keyword/" | ||
path1 = f'{path}data/title_and_abs/' | ||
newpath = f"{path}data/pro_keyword/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 20-21
refactored with the following changes:
- Use f-string instead of string concatenation [×2] (
use-fstring-for-concatenation
)
data_source = open(file_import_url, 'r') | ||
data = data_source.readline() | ||
word_in_afile_stat = {} | ||
word_in_allfiles_stat = {} | ||
files_num = 0 | ||
while data != "": # 对文件pro_res.txt进行处理 | ||
data_temp_1 = data.strip("\n").split("\t") # file name and key words of a file | ||
data_temp_2 = data_temp_1[1].split(",") # key words of a file | ||
file_name = data_temp_1[0] | ||
data_temp_len = len(data_temp_2) | ||
files_num += 1 | ||
data_dict = {} | ||
data_dict.clear() | ||
for word in data_temp_2: | ||
if word not in word_in_allfiles_stat: | ||
word_in_allfiles_stat[word] = 1 | ||
data_dict[word] = 1 | ||
else: | ||
if word not in data_dict: # 如果这个单词在这个文件中之前没有出现过 | ||
with open(file_import_url, 'r') as data_source: | ||
data = data_source.readline() | ||
word_in_afile_stat = {} | ||
word_in_allfiles_stat = {} | ||
files_num = 0 | ||
while data != "": # 对文件pro_res.txt进行处理 | ||
data_temp_1 = data.strip("\n").split("\t") # file name and key words of a file | ||
data_temp_2 = data_temp_1[1].split(",") # key words of a file | ||
file_name = data_temp_1[0] | ||
data_temp_len = len(data_temp_2) | ||
files_num += 1 | ||
data_dict = {} | ||
data_dict.clear() | ||
for word in data_temp_2: | ||
if word not in word_in_allfiles_stat: | ||
word_in_allfiles_stat[word] = 1 | ||
data_dict[word] = 1 | ||
elif word not in data_dict: # 如果这个单词在这个文件中之前没有出现过 | ||
word_in_allfiles_stat[word] += 1 | ||
data_dict[word] = 1 | ||
|
||
if not word_in_afile_stat.has_key(file_name): | ||
word_in_afile_stat[file_name] = {} | ||
if not word_in_afile_stat[file_name].has_key(word): | ||
word_in_afile_stat[file_name][word] = [] | ||
word_in_afile_stat[file_name][word].append(data_temp_2.count(word)) | ||
word_in_afile_stat[file_name][word].append(data_temp_len) | ||
data = data_source.readline() | ||
data_source.close() | ||
|
||
if not word_in_afile_stat.has_key(file_name): | ||
word_in_afile_stat[file_name] = {} | ||
if not word_in_afile_stat[file_name].has_key(word): | ||
word_in_afile_stat[file_name][word] = [data_temp_2.count(word), data_temp_len] | ||
data = data_source.readline() | ||
# filelist = os.listdir(newpath2) # 取得当前路径下的所有文件 | ||
TF_IDF_last_result = [] | ||
if (word_in_afile_stat) and (word_in_allfiles_stat) and (files_num != 0): | ||
for filename in word_in_afile_stat.keys(): | ||
for filename, value in word_in_afile_stat.items(): | ||
TF_IDF_result = {} | ||
TF_IDF_result.clear() | ||
for word in word_in_afile_stat[filename].keys(): | ||
for word in value.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function TF_IDF_Compute
refactored with the following changes:
- Use
with
when opening file to ensure closure [×2] (ensure-file-closed
) - Merge else clause's nested if statement into elif (
merge-else-if-into-elif
) - Merge append into list declaration [×2] (
merge-list-append
) - Use items() to directly unpack dictionary values (
use-dict-items
) - Remove unnecessary call to keys() (
remove-dict-keys
) - Replace a[0:x] with a[:x] and a[x:len(a)] with a[x:] (
remove-redundant-slice-index
)
path = base_path + 'data/computer/'# 原始数据 | ||
path1 = base_path + 'data/title_and_abs/' # 处理后的标题和摘要 | ||
newpath = base_path + 'data/pro_keyword/' | ||
newpath2 = base_path + 'data/keyword/' | ||
path = f'{base_path}data/computer/' | ||
path1 = f'{base_path}data/title_and_abs/' | ||
newpath = f'{base_path}data/pro_keyword/' | ||
newpath2 = f'{base_path}data/keyword/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 17-20
refactored with the following changes:
- Use f-string instead of string concatenation [×4] (
use-fstring-for-concatenation
)
This removes the following comments ( why? ):
# 处理后的标题和摘要
# 原始数据
# print b | ||
if b is None or b.string is None: | ||
continue | ||
else: | ||
abstracts.extend(soup.title.stripped_strings) | ||
s = b.string | ||
abstracts.extend(s.encode('utf-8')) | ||
f = open(path1 + filename + ".txt", "w+") # 写入txt文件 | ||
abstracts.extend(soup.title.stripped_strings) | ||
s = b.string | ||
abstracts.extend(s.encode('utf-8')) | ||
with open(path1 + filename + ".txt", "w+") as f: | ||
for i in abstracts: | ||
f.write(i) | ||
f.close() | ||
abstracts = [] | ||
abstracts = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function get_text
refactored with the following changes:
- Remove unnecessary else after guard condition (
remove-unnecessary-else
) - Use
with
when opening file to ensure closure [×2] (ensure-file-closed
)
This removes the following comments ( why? ):
# 写入txt文件
# 将得到的未处理的文字放在pro_keyword文件夹中
# print b
features = [text_len, isHasSH] | ||
return features | ||
return [text_len, isHasSH] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function get_feature
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
print(X[0:10]) | ||
print(Y[0:10]) | ||
print(X[:10]) | ||
print(Y[:10]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function load_data
refactored with the following changes:
- Replace a[0:x] with a[:x] and a[x:len(a)] with a[x:] [×2] (
remove-redundant-slice-index
)
if __name__ == '__main__': | ||
pass | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 48-49
refactored with the following changes:
- Remove redundant conditional (
remove-redundant-if
)
for epoch in range(num_epochs): | ||
for _ in range(num_epochs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function batch_iter
refactored with the following changes:
- Replace unused for index with underscore (
for-index-underscore
)
raise ValueError("Linear is expecting 2D arguments: %s" % str(shape)) | ||
raise ValueError(f"Linear is expecting 2D arguments: {str(shape)}") | ||
if not shape[1]: | ||
raise ValueError("Linear expects shape[1] of arguments: %s" % str(shape)) | ||
raise ValueError(f"Linear expects shape[1] of arguments: {str(shape)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function linear
refactored with the following changes:
- Replace interpolated string formatting with f-string [×2] (
replace-interpolation-with-fstring
)
with tf.name_scope("conv-maxpool-%s" % filter_size): | ||
with tf.name_scope(f"conv-maxpool-{filter_size}"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function TextCNN.__init__
refactored with the following changes:
- Replace interpolated string formatting with f-string (
replace-interpolation-with-fstring
)
print("{}={}".format(attr.upper(), value)) | ||
print(f"{attr.upper()}={value}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 36-46
refactored with the following changes:
- Replace call to format with f-string (
use-fstring-for-formatting
)
f2 = open('%s.txt' % item, 'a+') | ||
for (k, v) in data_dict.items(): | ||
f2.write(v + ',' + k + ' ' + '\n') | ||
f2.close() | ||
with open(f'{item}.txt', 'a+') as f2: | ||
for (k, v) in data_dict.items(): | ||
f2.write(v + ',' + k + ' ' + '\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function get_text
refactored with the following changes:
- Use
with
when opening file to ensure closure (ensure-file-closed
) - Replace interpolated string formatting with f-string (
replace-interpolation-with-fstring
)
# print (files) | ||
f = open(base_path + files, 'r') | ||
text = (f.read().decode('GB2312', 'ignore').encode('utf-8')) | ||
salt = ''.join(random.sample(string.ascii_letters + string.digits, 8)) # 产生随机数 | ||
f2 = open("C:/Users/kaifun/Desktop/ass_TIP/TextInfoExp/Part2_Text_Classify/test3/" + salt + '.txt', 'w') | ||
f2.write(text) | ||
f3.write(salt + ' ' + 'e' + '\n') | ||
f.close() | ||
with open(base_path + files, 'r') as f: | ||
text = (f.read().decode('GB2312', 'ignore').encode('utf-8')) | ||
salt = ''.join(random.sample(string.ascii_letters + string.digits, 8)) # 产生随机数 | ||
f2 = open( | ||
f"C:/Users/kaifun/Desktop/ass_TIP/TextInfoExp/Part2_Text_Classify/test3/{salt}.txt", | ||
'w', | ||
) | ||
|
||
f2.write(text) | ||
f3.write(f'{salt} e' + '\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function trans_text
refactored with the following changes:
- Use
with
when opening file to ensure closure (ensure-file-closed
) - Use f-string instead of string concatenation [×4] (
use-fstring-for-concatenation
)
This removes the following comments ( why? ):
# print (files)
f.write(str(test_name[i]) + ' ' + str(result[i]) + '\n') | ||
f.write(f'{str(test_name[i])} {str(result[i])}' + '\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function get_classify
refactored with the following changes:
- Use f-string instead of string concatenation [×2] (
use-fstring-for-concatenation
)
if judgement != "": | ||
return 4, judgement | ||
|
||
return 0, "" | ||
return (4, judgement) if judgement != "" else (0, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function DictClassifier.__analyse_word
refactored with the following changes:
- Lift code into else after jump in control flow (
reintroduce-else
) - Replace if statement with if expression (
assign-if-exp
)
if match is not None: | ||
pattern = {"key": "要的是…给的是…", "value": 1} | ||
return pattern | ||
return "" | ||
return {"key": "要的是…给的是…", "value": 1} if match is not None else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function DictClassifier.__is_clause_pattern1
refactored with the following changes:
- Lift code into else after jump in control flow (
reintroduce-else
) - Replace if statement with if expression (
assign-if-exp
) - Inline variable that is immediately returned (
inline-immediately-returned-variable
)
conjunction = {"key": the_word, "value": self.__conjunction_dict[the_word]} | ||
return conjunction | ||
return {"key": the_word, "value": self.__conjunction_dict[the_word]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function DictClassifier.__is_word_conjunction
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
punctuation = {"key": the_word, "value": self.__punctuation_dict[the_word]} | ||
return punctuation | ||
return {"key": the_word, "value": self.__punctuation_dict[the_word]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function DictClassifier.__is_word_punctuation
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
output += "Sub-clause" + str(i) + ": " | ||
clause = comment_analysis["su-clause" + str(i)] | ||
output += f"Sub-clause{str(i)}: " | ||
clause = comment_analysis[f"su-clause{str(i)}"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function DictClassifier.__output_analysis
refactored with the following changes:
- Use f-string instead of string concatenation [×3] (
use-fstring-for-concatenation
)
if match is not None and len(self.__split_sentence(match.group(2))) <= 2: | ||
to_delete = [] | ||
for i in range(len(the_clauses)): | ||
if the_clauses[i] in match.group(2): | ||
to_delete.append(i) | ||
if len(to_delete) > 0: | ||
for i in range(len(to_delete)): | ||
if match is not None and len(self.__split_sentence(match[2])) <= 2: | ||
if to_delete := [ | ||
i for i in range(len(the_clauses)) if the_clauses[i] in match[2] | ||
]: | ||
for item in to_delete: | ||
the_clauses.remove(the_clauses[to_delete[0]]) | ||
the_clauses.insert(to_delete[0], match.group(2)) | ||
the_clauses.insert(to_delete[0], match[2]) | ||
|
||
# 识别“要是|如果……就好了”的假设句式 | ||
pattern = re.compile(r"([,%。、!;??,!~~.… ]*)([\u4e00-\u9fa5]*?(如果|要是|" | ||
r"希望).+就[\u4e00-\u9fa5]+(好|完美)了[,。;!%、??,!~~.… ]+)") | ||
match = re.search(pattern, the_sentence.strip()) | ||
if match is not None and len(self.__split_sentence(match.group(2))) <= 3: | ||
if match is not None and len(self.__split_sentence(match[2])) <= 3: | ||
to_delete = [] | ||
for i in range(len(the_clauses)): | ||
if the_clauses[i] in match.group(2): | ||
if the_clauses[i] in match[2]: | ||
to_delete.append(i) | ||
if len(to_delete) > 0: | ||
for i in range(len(to_delete)): | ||
if to_delete: | ||
for item_ in to_delete: | ||
the_clauses.remove(the_clauses[to_delete[0]]) | ||
the_clauses.insert(to_delete[0], match.group(2)) | ||
the_clauses.insert(to_delete[0], match[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function DictClassifier.__divide_sentence_into_clauses
refactored with the following changes:
- Replace index in for loop with direct reference [×2] (
for-index-replacement
) - Replace m.group(x) with m[x] for re.Match objects [×6] (
use-getitem-for-re-match-groups
) - Use named expression to simplify assignment and conditional (
use-named-expression
) - Convert for loop into list comprehension (
list-comprehension
) - Simplify sequence length comparison [×2] (
simplify-len-comparison
) - Replace unused for index with underscore [×2] (
for-index-underscore
)
clauses = [''.join(x) for x in zip(split_clauses, punctuations)] | ||
|
||
return clauses | ||
return [''.join(x) for x in zip(split_clauses, punctuations)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function DictClassifier.__split_sentence
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
with open(self.__root_filepath + "phrase_dict.txt", "r", encoding="utf-8") as f: | ||
with open(f"{self.__root_filepath}phrase_dict.txt", "r", encoding="utf-8") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function DictClassifier.__get_phrase_dict
refactored with the following changes:
- Use f-string instead of string concatenation (
use-fstring-for-concatenation
) - Remove unnecessary else after guard condition (
remove-unnecessary-else
)
f.write("%s" % info) | ||
f.write(f"{info}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function DictClassifier.__write_runout_file
refactored with the following changes:
- Replace interpolated string formatting with f-string (
replace-interpolation-with-fstring
)
sorted_distances = distances.argsort() | ||
|
||
return sorted_distances | ||
return distances.argsort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function KNNClassifier.__get_sorted_distances
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
@@ -171,10 +171,8 @@ def test_corpus(): | |||
a = WaimaiCorpus() | |||
a = Waimai2Corpus() | |||
a = HotelCorpus() | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function test_corpus
refactored with the following changes:
- Remove redundant pass statement (
remove-redundant-pass
)
|
||
if __name__ == "__main__": | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 177-177
refactored with the following changes:
- Remove redundant pass statement (
remove-redundant-pass
)
if need_score: | ||
return [word for word in words[:num]] | ||
else: | ||
return [word[0] for word in words[:num]] | ||
return list(words[:num]) if need_score else [word[0] for word in words[:num]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function ChiSquare.best_words
refactored with the following changes:
- Replace if statement with if expression (
assign-if-exp
) - Replace identity comprehension with call to collection constructor (
identity-comprehension
)
if type(self.k) == int: | ||
k = "%s" % self.k | ||
else: | ||
k = "-".join([str(i) for i in self.k]) | ||
|
||
k = f"{self.k}" if type(self.k) == int else "-".join([str(i) for i in self.k]) | ||
print("KNNClassifier") | ||
print("---" * 45) | ||
print("Train num = %s" % self.train_num) | ||
print("Test num = %s" % self.test_num) | ||
print("K = %s" % k) | ||
print(f"Train num = {self.train_num}") | ||
print(f"Test num = {self.test_num}") | ||
print(f"K = {k}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function Test.test_knn
refactored with the following changes:
- Replace if statement with if expression (
assign-if-exp
) - Replace interpolated string formatting with f-string [×4] (
replace-interpolation-with-fstring
) - Move assignment closer to its usage within a block (
move-assign-in-block
) - Convert for loop into list comprehension (
list-comprehension
)
print("Train num = %s" % self.train_num) | ||
print("Test num = %s" % self.test_num) | ||
print(f"Train num = {self.train_num}") | ||
print(f"Test num = {self.test_num}") | ||
|
||
from classifiers import BayesClassifier | ||
bayes = BayesClassifier(self.train_data, self.train_labels, self.best_words) | ||
|
||
classify_labels = [] | ||
print("BayesClassifier is testing ...") | ||
for data in self.test_data: | ||
classify_labels.append(bayes.classify(data)) | ||
classify_labels = [bayes.classify(data) for data in self.test_data] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function Test.test_bayes
refactored with the following changes:
- Replace interpolated string formatting with f-string [×2] (
replace-interpolation-with-fstring
) - Move assignment closer to its usage within a block (
move-assign-in-block
) - Convert for loop into list comprehension (
list-comprehension
)
这是来自QQ邮箱的假期自动回复邮件。
您好,您的来信本人已收到,将尽快给您回复。
|
Sourcery Code Quality Report✅ Merging this PR will increase code quality in the affected files by 0.57%.
Here are some functions in these files that still need a tune-up:
Legend and ExplanationThe emojis denote the absolute quality of the code:
The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request. Please see our documentation here for details on how these metrics are calculated. We are actively working on this report - lots more documentation and extra metrics to come! Help us improve this quality report! |
Branch
master
refactored by Sourcery.If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.
See our documentation here.
Run Sourcery locally
Reduce the feedback loop during development by using the Sourcery editor plugin:
Review changes via command line
To manually merge these changes, make sure you're on the
master
branch, then run:Help us improve this pull request!