forked from cesar-douady/open-lmake
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTO_DO
273 lines (254 loc) · 12.8 KB
/
TO_DO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
* This file is part of the open-lmake distribution ([email protected]:cesar-douady/open-lmake.git)
* Copyright (c) 2023 Doliam
* This program is free software: you can redistribute/modify under the terms of the GPL-v3 (https://www.gnu.org/licenses/gpl-3.0.html).
* This program is distributed WITHOUT ANY WARRANTY, without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
items :
* normal item : ordered by implementation priority in each section
! difficult item : unclear how to implement it
? idea item : unclear that it is desirable
****************************************************************************************************
* BUGS (implemented but does not work)
****************************************************************************************************
* crash if python() dynamic attribute returns None
* fix compilation with LMAKE_FLAGS=ST
* missing some deps when reading elf
- it seems that libc.so is missing at least in some occasions
* reimplement hierarchical repositories
- add a sub_repositories entry to config
- map priorities to RuleIdx
- ensure total prio of sub-rules
- add an AntiRule for files in sub-repo with prio between repo and sub-repo
- so as to ensure the sub-repo behaves as an autonomous repo
- handle cwd of rules
- devise a sound mechanism to read what must be read when reading makefiles
****************************************************************************************************
* LACK (not implemented but necessary for lmake semantic)
****************************************************************************************************
* manage 32 bits executables
- compile ld_audit.so (and co) in both 32 & 64 bits
- put adequate $PLATFORM in LD_PRELOAD
* generate meaningful message in case of I/O error such as disk full
! before erasing a dir, check for phony targets, not only files on disk
- then dir markers for gcc include dirs could be phony, which is more elegant
? improve hard link management
- when ln a b
- followed by write to b
- consider it is a write to a
****************************************************************************************************
* COSMETIC (ugly as long as not implemented)
****************************************************************************************************
* generate an error when calling depend on a target
- either known at that time
- or when it becomes a target if later
* generate target/dep key everywhere it is pertinent
* report Source/Anti rule name when appropriate
* be more consise in console output
- and more precise in recorded log
- dates, justifications go to log while essential info is shown on screen
- e.g. 'dep changed' on screen and actual dates (before, after) in log
- same for manual, etc.
? consider a package for option passing
? gnu argp
? http://nongnu.askapache.com/argpbook/step-by-step-into-argp.pdf
? if actual_job!=conform_job, show both with warning in lshow
? report resources even when job spawn process groups
- dont know how to do that
****************************************************************************************************
* ROBUSTNESS & MAINTENABILITY (fragile/difficult to read or maintain as long as not implemented)
****************************************************************************************************
* add size check to date check
- create a file version by aggregating Ddate, FileTag & size
- keep FileTag accessible from such version
- this improves protection against continuing writing without change mtime
- this is not perfect, though, as one can write w/o change the size, but decreases probability (already low) by a significant factor
- as granularity is several ms
* support 64-bits id
- configure with NBits rather than types
- in store/file.hh, reserve address space after NBits instead of type
- first candidate would be deps, by far the most demanding
* fix store to be compliant with strict aliasing rules
* add a warning when ids reach 15/16 of their limits
* implement noexcept/except everywhere pertinent
* use the Path struct instead of at/file
- everywhere applicable
* get rid of double crc (md5+xxh)
- keep only xxh
- superior on all metrics
? replace if (...) throw by throw_if or throw_unless
? or vice versa
? if a job cannot connect to job_exec to report deps
- note fact in a marker file
- exit with error is not enough as it could be hidden by the job itself if in a sub-process
- check marker file existence in job_exec
- and retry job in that case rather than generating job in error
? implement class path_?? (x, n or s) in path.hh
- semantic : path w/ & w/o / at beginning and end
- implement mk_lcl, mk_glb, mk_rel, mk_abs, ...
- inherit from ::string
- s means w/ /, n means w/o /, x means w/ or w/o /
- no actual disk access
****************************************************************************************************
* FEATURES (not implemented and can work without)
****************************************************************************************************
*3 improve job isolation by using namespaces
- much like faketree : https://github.com/enfabrica/enkit/tree/master/faketree
- generalize tmp mapping
- provide a container with just : system + repo + tmp
- this way, no harm can be done outside repo
* provide a reasonable default value when dynamic functions return None
- at least, dont crash
* provide more options to stderr management :
- redirect to stdout
- hide it (still accessible in lshow -e)
? rename allow_stderr to stderr
* implement cache v2 (copy & link) :
- 2 levels : disk level, global level
- use link instead of copy
- for disk level
! support direct rebuild of deps
- specify a target flag 'direct' for use by pattern
- specify a dep flag 'direct' for dynamic use (through ldepend)
- pass known hidden deps to job_exec to avoid connection to server when known
- the difficult point : manage resource deadlocks
- suspend waiting job (tell backend job is suspended)
- tell backend job is resumed
- while suspended, consider resources to be freed
- send a signal to job such as 28 (SIGWINCH) which is normally ignored so it can free expensive resources such as licences
- then resume participates to allocation competition
? implement a ref/check mechanism
- record a map file:crc under git
- err if generated crc does not match recorded crc
- implement "lref file crc" to record/update map
? allow dynamic attribute functions to have inherited values (passing inherited value as argument)
- maybe not strictly necessary, but this is what is done for cmd
? implement a compilation rule py->pyc
- and use python -B
- what about the numerous retries this approach implies ?
- this is a use case for direct dep update
! beware that pyc records py date
- then pyc would be broken if py is updated steady as pyc would not be remade
- requires a dep flag to say we are depending on date
? or use the python crc based mechanism
? consider fuse as an autodep method
? unclear it is faster than ptrace
? all data traffic goes through it
? many context switches
? an advantage is that accesses to system files (outside the repository) are transparent
? there must be one fuse mount for each job so as to recognize which job does the access
? how to manage so that the job sees the repository with its nominal name
? requires a chroot
? provide a lock mechanism in local backend
? much like a bunch of resources w/ capacity=1
? stored as a set
? implement prelude
? for very short rules (.pyc, opts, ...), put script in prelude and no cmd
? generally speaking, prelude can trigger deps and avoid rerun
? use autodep from within server
? it should be possible to link in such a way that only Python sees autodep, that would be ideal
? once this is done, we may record deps for rsrcs & forbid them for deps computation
? find a way for resources to be anything
? int/float/str/bool/None/... or list/tuple/set/dict thereof
? add a new autodep mechanism based on syscall user dispatch
? cf man 2 prctl, PR_SET_SYSCALL_USER_DISPATCH
? seems to be highly architecture dependent, is it worth ?
****************************************************************************************************
* OPTIMIZATIONS
****************************************************************************************************
*2 regexprs
- consider using PCRE2 instead of std::regex
- consider handling prefix/suffix apart
- most importantly sufixes
? maybe a pre-pass searching for infix is advisable
* when rounding resources using config.backends.precisions
- round to prioritize jobs
- do not round for actual allocation
* handle dir hierarchy in store
- provide a primitive to insert all dirs of a node at once
- much more efficient
- not difficult to code
- return a vector of idxs
* generate KPI's
- useful/useless created jobs
- number of non buildable nodes
- max number of resources slots in backend
- total number of deps
- total number of pressure updates
* gprof
* no req_info for non-buildable nodes
* there can be 1 backend thread per backend
- with one master socket
- and everything replicated per backend (including mutexes, tables, etc.)
- allow several backends of same type to distribute load
! trap close in autodep
- so as to know when a dep was last accessed
- and avoid too may overwrites
- must be very careful, e.g. mmap can access file after close, dup'ed file descriptors can be created, etc.
! manage HUGE pages in store
- does not work straight forward
- may be use a giant map to reserve address space and allocate a growing map in there using HUGE pages
- need to do some trials
? record dep crcs in Job rather than in Dep
- store crc just before dep error parallel chunk
- dont store deps after first dep error parallel chunk as they are useless
- this way we can severely reduce the Deps size (4x if we can manage accesses)
? avoid ack at job start
- pass all infos in command line
- pass JobRpcReply as argument through mk_shell_str
- fall back to current implementation if command line would be too large
- find a way to be certain that seq_id is not necessary to qualify job start
? launch deps as soon as they are discovered
- w/o waiting for dependent job to complete
- for non-phony non-existent deps, else there is a risk that a steady target is modified while job is on going
- or record close & dup so as to know when a file is no longer needed
- gather crc before dep launch in case dep is steady
- possibly upon declaration by user
? set an access cache in autodep
? put only static info
? mark target dirs & tmp as beginning of writable area -> do not cache within
? cache accesses as long as we are not in the writable area
? maintain table file -> Dir/Lnk(value)/FileOrNone
? replace db_date() with an index into a table
? put db_date() in ReqInfo, which is somewhat delicate
? or invent a new temporary struct which is not indexed by req
? capture table at start of req with existing reqs
? just replace in Src, a direct copy of file date by a search in this table
? replace check with a comparison on index
? rest of code is identical
? improve rule updating
? when matching changes but command does not change, avoid to relaunch job
? compute cmd checksum on its code rather than on the source to resist to cosmetic changes such as comments modifs
****************************************************************************************************
* DOC
****************************************************************************************************
* add hard limits
- rules, targets, deps, rscrs, nodes, jobs...
****************************************************************************************************
* TESTS
****************************************************************************************************
* add a TU for scp
* add a TU for dynamic config update
* add a TU for sub-Lmakefile.py
* ancillary commands to unit tests
- lshow, ...
* add a unit_test to validate n_retries
* gcov
* idx overflow
- check with 8-bits indexes
* test crash recovery
- including atomicity of store
****************************************************************************************************
* TOOLS
****************************************************************************************************
* consider gdb guis :
- apt install nemiver
- insight
* consider serialization libs :
- https://github.com/alibaba/yalantinglibs
- https://alibaba.github.io/yalantinglibs/resource/A%20Faster%20Serialization%20Library%20Based%20on%20Compile-time%20Reflection%20and%20C++%2020.pdf
* implement a file dump
- put a header in files so that format can be automatically recognized
- suitable for use in vim
! develop lmake under lmake
- autodep must support to be audited by autodep for unit tests
- provide a bootstrap script
? make a navigation tool based on Neo4J or a similarly complete tool