You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

489 lines
24KB

  1. \chapter{Custom tests for Continuous Integration}
  2. \label{cha:custom_tests_for_continuous_integration}
  3. \emph{First of all, it is important to note the distinction between the three
  4. following elements of the architecture: KernelCI, the custom tests, and the LAVA
  5. lab. All of them will be detailed in this chapter, but they are all completely
  6. independent, and are simply working together in Free Electrons' CI
  7. infrastructure.}
  8. \section{The need for custom tests}
  9. \label{sec:the_need_for_custom_tests}
  10. \subsection{The KernelCI project}
  11. \label{sub:the_kernelci_project}
  12. \textbf{KernelCI} is a project started a few years ago that aims at compiling on
  13. a per hour basis a lot of different upstream \textbf{Linux} kernel trees, then
  14. sending jobs\footnote{Jobs are the resource \textbf{LAVA} deals with. A more
  15. detailed explanation can be found in \ref{ssub:the_jobs} at page
  16. \pageref{ssub:the_jobs}.} using those kernels to the subscribed labs, before
  17. aggregating the results on different summary pages, where it is easy to see if a
  18. board has some problems booting \textbf{Linux}, and when the problems started to
  19. occur.
  20. This project already does a great part of the CI loop by building and displaying
  21. the results, but still needs the collaboration of the many labs contributing
  22. their device availability to have the complete process.
  23. With all the different kind of devices provided by the labs from all over the
  24. world, they have achieved over the years a quite good coverage of the supported
  25. device types in \textbf{Linux}. But even if they test a great number of
  26. platforms, the only jobs they send make little more than booting and running a
  27. few basic commands in userspace.
  28. Jobs as simple as this are not suitable when it comes to test the SATA, the
  29. USB, or some other specific systems that are usually unused during boot. But
  30. of course, as \textbf{KernelCI} has to deal with many different labs, they can
  31. not afford to make the jobs more specific, since many boards, despite being
  32. identical, may have different configurations and different devices attached to
  33. it.
  34. To fill that gap, custom tests must be set up at a smaller scale, to comply with
  35. the specificities of the lab and the devices.
  36. \subsection{Specifications}
  37. \label{sub:specifications}
  38. As Free Electrons counts many device family maintainers in its engineering team,
  39. it is particularly important for them to have the finest CI set up for those
  40. devices. Moreover, they frequently work and develop new drivers on custom
  41. \textbf{Linux} trees, often coming from the device's vendor tree.
  42. Since this process can take quite a long time, it would be wonderful to have
  43. some CI process in place for those custom trees, and custom features still being
  44. in development.
  45. Considering the jobs already run by \textbf{KernelCI}, something more specific
  46. but still mostly similar was needed to implement the custom tests\footnote{The term
  47. \emph{custom tests} designates tests sent by a different system than
  48. \textbf{KernelCI}'s, that would be fully controlled by \textbf{Free Electrons},
  49. and could thus be tuned finely for the engineers needs.}. Two parts were to be set
  50. up:
  51. \begin{itemize}
  52. \item launching custom scripts on the devices to check the specific
  53. functionalities.
  54. \item building and booting custom kernel trees that are not already taken
  55. care of by \textbf{KernelCI}.
  56. \end{itemize}
  57. Of course, once both parts are running, they can be combined to launch custom
  58. scripts on custom kernels.
  59. Last but not least, the overall architecture should be supporting two
  60. operating modes:
  61. \begin{itemize}
  62. \item a \textbf{manual} one, that could be triggered by hand, to run only
  63. the asked specific tests with a user-built kernel, and give immediate
  64. report once the job has been run.
  65. \item an \textbf{automatic} one, that would run tests every day, and make
  66. daily reports to the maintainers.
  67. \end{itemize}
  68. \section{The CI lab}
  69. \label{sec:the_ci_lab}
  70. Applying continuous integration to an operating system is not an easy task.
  71. Since it needs external hardware management, power supply control, and
  72. input/output control, it clearly requires a complex infrastructure.
  73. Free Electrons has a lab now for more than one year, that it is running fine,
  74. continuously testing more than 35 devices, and reporting public results
  75. making them available for the \textbf{Linux} community.
  76. \subsection{The hardware part}
  77. \label{sub:the_hardware_part}
  78. Since the tested software is an operating system, it needs to run on real
  79. hardware, and thus, it differs from more usual CI that typically runs in some
  80. container technology.
  81. Built last year, the farm at Free Electrons takes the form of a big cabinet,
  82. with eight drawers, capable of storing up to 50 boards. With those devices, USB
  83. hubs, switches, and ATX power supply with their TCP controlled relays can be
  84. found in each and every drawer. For the main management, the farm also hosts a
  85. main USB hub, a main switch, and most importantly, the NUC that runs parts of
  86. the software driving the whole lab.
  87. Everything is well wired and labelled to keep the material in a maintainable
  88. state.
  89. For more information about the ins and outs of this part, one can read this blog
  90. article:
  91. \url{http://free-electrons.com/blog/hardware-infrastructure-free-electrons-lab/}.
  92. \begin{figure}[H]
  93. \centering
  94. \includegraphics[height=0.4\paperheight]{lab.JPG}
  95. \caption{An overview of the cabinet hosting the hardware part of the lab.}
  96. \label{fig:lab}
  97. \end{figure}
  98. \subsection{The software: LAVA}
  99. \label{sub:the_lava_software}
  100. In order to run everything from a software point of view, \textbf{LAVA} has been
  101. used from the very beginning. LAVA stands for \emph{Linaro Automated Validation
  102. Architecture}, and provides a quite good way to manage and schedule automated
  103. tests on embedded devices. Most of the job to do to implement custom tests was
  104. about using \textbf{LAVA} the right way to perform the tests, so this part will
  105. include more details than the hardware one.
  106. \subsubsection{LAVA: v1 or v2?}
  107. \label{ssub:lava_v1_or_v2}
  108. Since \textbf{LAVA} is currently in the middle of a huge refactoring of its
  109. internal workflow and exposed API, we usually find people talking about
  110. \textbf{LAVA v1} and \textbf{LAVA v2}. It is actually the same software, with
  111. different behaviors at many levels, but with the same final goal: running what
  112. is called a job, a user described list of actions, on a device, and reporting
  113. the events that happened.
  114. At the beginning of the internship, only the first version was in use, and part
  115. of the job was about migrating to the second one. This ran through a lot of
  116. different problems and bugs that needed to be fixed before taking the next step,
  117. but we finally ended up with a fresh and running architecture using mostly
  118. \textbf{LAVA v2}.
  119. For clarity's sake, only the \textbf{final setup} will be described in this report. Those
  120. interested in the original software architecture can still read this blog post:
  121. \url{http://free-electrons.com/blog/software-architecture-free-electrons-lab/}.
  122. \subsubsection{The jobs}
  123. \label{ssub:the_jobs}
  124. The main resource \textbf{LAVA} has to deal with is the job. A job is defined by
  125. a \textbf{YAML}\footnote{\url{http://yaml.org/}} structure, describing multiple
  126. sections:
  127. \begin{itemize}
  128. \item The \textbf{device-type}, which is the name of the device you want to
  129. run the test on. \\
  130. \emph{Examples: beaglebone-black, sun8i-h3-orangepi-pc, ...}
  131. \item The \textbf{boot method}, to tell \textbf{LAVA} which method should be
  132. used to boot the device. \\
  133. \emph{Examples: \textbf{fastboot} or \textbf{U-boot}, \textbf{NFS} or
  134. ramdisk, \textbf{SSH}, ...}
  135. \item The \textbf{artifacts} URLs. This includes the \emph{kernel}, the
  136. \emph{device tree}, the \emph{modules}, and the \emph{rootfs}. Only the
  137. kernel is completely mandatory to boot the boards, but the other ones
  138. are common for almost every non-exotic device.
  139. \item The \textbf{tests} to run once the device is booted. This can include
  140. many possibilities, since it generally points to some shell scripts to
  141. be executed as root in userspace. It is the main place to customize the
  142. tests.
  143. \item Other less important sections, such as some \textbf{metadata}, the
  144. \textbf{notifications}, or custom \textbf{timeouts}.
  145. \end{itemize}
  146. Once a job has been sent either by the web interface or by the API, it is queued
  147. until a device corresponding to the asked device-type is free to be used. Then
  148. the job gets scheduled and run, before finishing either with the
  149. \textbf{Complete} status when everything ran well, or the \textbf{Incomplete}
  150. status when there was a problem during the execution of the different tasks.
  151. When a test is complete, \textbf{LAVA} provides access to the results by many
  152. ways, such as the web UI, the API, some emails, or a callback system making the
  153. job able to push its results to some other APIs.
  154. \subsubsection{A distributed system}
  155. \label{ssub:a_distributed_system}
  156. \textbf{LAVA} has been designed to scale for far more complex labs than Free
  157. Electrons' one. It is thus split into two parts: a master, and a worker.
  158. \begin{description}
  159. \item[The master]: \\
  160. The master can exist in only instance. It is in charge of three tasks:
  161. \begin{itemize}
  162. \item The \textbf{web interface}, allowing in-browser interaction
  163. with the software.
  164. \item The job \textbf{scheduler}, responsible for sending the queued
  165. jobs to the available devices.
  166. \item The \textbf{dispatcher-master}, that manages the different
  167. possible workers, and sends jobs to them.
  168. \end{itemize}
  169. The master also has the connection to the relational database.
  170. \item[The worker]: \\
  171. The worker is divided in only two parts:
  172. \begin{itemize}
  173. \item The \textbf{slave} is the part that connects to the
  174. dispatcher-master and receives the jobs to run.
  175. \item The \textbf{dispatcher} is the only part that really interacts
  176. with the devices under test. It is spawned on demand by the
  177. slave when a job needs to be run. It is also the only part that
  178. does not run as a daemon.
  179. \end{itemize}
  180. \end{description}
  181. \begin{figure}[H]
  182. \centering
  183. \includegraphics[width=0.8\linewidth]{arch-overview.png}
  184. \caption{Architecture schematic of the \textbf{LAVA} software.}
  185. \label{fig:arch-overview}
  186. \end{figure}
  187. \section{Developing custom tests}
  188. \label{sec:developing_custom_tests}
  189. \subsection{Beginning a proof of concept}
  190. \label{sub:beginning_a_proof_of_concept}
  191. At the beginning, a basic and functional, but still blurred specification was
  192. made, but it required a proof-of-concept to see how it would fit in final
  193. production. It had quickly been named \textbf{CTT}, standing for \emph{Custom
  194. Test Tool}\footnote{You can find the sources at this address:
  195. \url{https://github.com/free-electrons/custom_tests_tool}}, and that is how the
  196. software building the custom jobs will be designated till the end of this
  197. report.
  198. The choice of \textbf{Python} was more than obvious, since this language is accessible
  199. and widely used in the embedded world for its flexibility and portability.
  200. Moreover, most of the Free Electrons engineers had already used it, and it was
  201. not an option to introduce a new, unknown technology, in an architecture they
  202. would have to maintain in the future.
  203. \subsubsection{Understanding the depth of LAVA's jobs}
  204. \label{ssub:understanding_the_lava_jobs}
  205. The first simple part was about \textbf{LAVA}. Since \textbf{KernelCI} already
  206. provides everything (kernel, dtb and rootfs) needed to run a successful job in
  207. \textbf{LAVA}, the only part remaining was crafting and sending jobs.
  208. An easy and simple, yet flexible solution, was to use a template engine to
  209. parametrize a generic job written once and for all.
  210. The job syntax is using the human friendly \textbf{YAML}, but even if it is
  211. readable and easy to write, the data structure itself required by \textbf{LAVA}
  212. is a bit complex, and it is thus truly inconvenient to write tests by hand.
  213. Once filled, the template would just have to be sent to \textbf{LAVA} through
  214. its XML-RPC\footnote{\url{https://en.wikipedia.org/wiki/XML-RPC}} API to create
  215. a job.
  216. Knowing what to put in that template was one of the most interesting moment of
  217. this part, since it was like discovering a new programming language. There are
  218. always new features to discover, and new mechanisms for using them, and finally
  219. to make \textbf{LAVA} do exactly what you want. It is also during this period
  220. that most of the migration to \textbf{LAVA v2} was prepared, meaning that the
  221. configuration of the different levels of \textbf{LAVA} was altered.
  222. It was often required to discuss with the \textbf{LAVA} community, on
  223. \url{irc://irc.freenode.net#linaro-lava}, to get clarification when the
  224. documentation happened to be incomplete, or when \textbf{LAVA} needed to be
  225. improved\footnote{See these patches for example: \\
  226. \url{https://git.linaro.org/lava/lava-dispatcher.git/commit/?id=8df17dd7355cd82f37e1ef22a6c9d88ede44f650} \\
  227. \url{https://git.linaro.org/lava/lava-dispatcher.git/commit/?id=3bfdcdb672f1a15da96bbb221a26847dd6bf2865} \\
  228. Also don't hesitate to run \verb$git log --author "Florent Jacquet"$ in the
  229. \verb$lava-dispatcher$ and the \verb$lava-server$ projects to get an overview of
  230. the contributions made to \textbf{LAVA} (Also available in appendix
  231. \ref{cha:list_of_contributions_to_lava}, page
  232. \pageref{cha:list_of_contributions_to_lava}).
  233. }.
  234. \subsubsection{Using custom artifacts}
  235. \label{ssub:using_custom_artifacts}
  236. Once we could easily send jobs, the next step was about sending custom
  237. artifacts, such as the user-built kernel. This would be useful for the first
  238. manual mode of the tool, when a user would launch some jobs from his
  239. workstation, using a kernel built from one of his working trees.
  240. \textbf{LAVA} allowing the use of files local to the dispatcher, it would be a
  241. really convenient solution to provide the artifacts without setting up some
  242. \textbf{FTP} server or other complicated means of serving files.
  243. \textbf{SSH}, with the \verb$scp$ command, allows efficient and reliable file transfers
  244. between two machines, and since the engineers have an easy access to the
  245. dispatcher using one of Free Electrons' VPNs\footnote{Virtual Private Network
  246. (\url{https://en.wikipedia.org/wiki/Virtual\_private\_network})}, it would be
  247. easy to give them permissions to send files.
  248. With \textbf{Python}, the \textbf{paramiko}\footnote{\url{http://www.paramiko.org/}}
  249. library, allowing a native use of \textbf{SSH}, makes the choice of that
  250. protocol even more comfortable.
  251. \subsubsection{Launching automatic jobs}
  252. \label{ssub:crawling_for_existing_artifacts}
  253. The other mode of the tool, as an automatic launcher, would require to fetch
  254. pre-built artifacts available from a remote location, such as
  255. \textbf{KernelCI}'s storage, or Free Electrons' once the custom builds would be
  256. set up. Fortunately, the \textbf{KernelCI} website also provides an API,
  257. allowing to retrieve their latest builds.
  258. The most difficult part was then to make sure that the crawler would have enough
  259. information about the boards to fetch their specific artifacts, while trying to
  260. avoid having a very big file storing every possible data about the boards.
  261. Once done, getting the artifacts would only be a matter of crafting the right
  262. URL.
  263. This ended up with a simple \textbf{JSON}\footnote{\url{http://json.org/}}
  264. file, storing the list of the boards, each storing four strings:
  265. \begin{itemize}
  266. \item \textbf{arch}, the architecture of the board, to guess which kernel to
  267. use.
  268. \item \textbf{dt}, the device-tree, also mandatory for booting the devices,
  269. and unique to each and every one of them.
  270. \item \textbf{rootfs}, since they are built for many architecture flavours
  271. (ARMv4, ARMv5, ARMv7, and ARMv8)
  272. \item \textbf{test\_plan}, since it is mandatory for \textbf{LAVA}, and must
  273. be configured on a per device basis.
  274. \end{itemize}
  275. This file proved to be simple enough, and the crafter's job is now only about
  276. crafting an URL, and checking if the artifact actually exists.
  277. With the crawlers done, and the rest of the tool already working, the only thing
  278. that remained to be done in \textbf{CTT} was the custom scripts to be run once
  279. the userspace is reached.
  280. \subsection{Running custom scripts}
  281. \label{sub:running_custom_scripts}
  282. \subsubsection{Writing a test suite}
  283. \label{ssub:writing_a_test_suite}
  284. Among the many possibilities brought by the \textbf{LAVA} job structure, is the
  285. possibility of designating a \emph{git} repository and a path in that repo to a
  286. file that would be executed automatically by \textbf{LAVA} from the device's
  287. userland shell.
  288. Before writing more complex tests which would require some time of development,
  289. a simple \verb$echo "Hello world!"$ made just the job. This allowed to do a lot
  290. of testing, checking all possible solutions, and finally define an architecture
  291. that would be both simple and functional enough for the custom tests' needs.
  292. \subsubsection{Integrating custom tools in the root file system}
  293. \label{ssub:integrating_custom_tools_in_the_rootfs}
  294. Before writing more advanced test scripts in the test suite, a problem had to be
  295. solved. Many of the tests would require tools or commands that are not shipped
  296. by default in \textbf{KernelCI}'s rootfs. Moreover, a requirement was that this
  297. rootfs should be compiled for each ARM flavour, unlike the extremely generic one
  298. built by \textbf{KernelCI}.
  299. An easy and flexible way of building custom root filesystems is to use
  300. \textbf{Buildroot}\footnote{\url{https://buildroot.org/}}. This led to some
  301. simple glue scripts\footnote{\url{https://github.com/free-electrons/buildroot-ci}}
  302. building the few configurations requested by the farm, which are mainly
  303. including \emph{iperf}\footnote{\url{https://en.wikipedia.org/wiki/Iperf}
  304. and \url{https://iperf.fr/}} and a full \emph{ping}\footnote{One that includes
  305. the \verb$-f$ option, for ping floods.} version for network stressing, and
  306. \emph{Bonnie++}\footnote{\url{https://en.wikipedia.org/wiki/Bonnie++} and
  307. \url{http://www.coker.com.au/bonnie++/}} for filesystem performances, over a
  308. classic Busybox\footnote{\url{https://en.wikipedia.org/wiki/BusyBox} and
  309. \url{https://busybox.net/}} that provides the rest of the system.
  310. As my first experience with \textbf{Buildroot}, this was a quick but interesting
  311. part that made me discover the power of build systems in the embedded world.
  312. \subsubsection{The road to complex jobs}
  313. \label{ssub:the_road_to_complex_jobs}
  314. With a test suite and custom root filesystem, the overall architecture was in
  315. place. To verify that everything would work as expected, more complex tests were
  316. to be written.
  317. As Busybox provides only \emph{Ash} as its default shell, the scripts needed to
  318. be compatible with this software, and thus could not take advantage of some Bash
  319. features. This turned out to be quite an exercise, since most of the OSs in 2017
  320. provide the latter by default, and the differences may in some cases cause some
  321. headache finding workarounds for complex operations.
  322. The other most interesting part was the development of the first
  323. \emph{Multinode job}\footnote{\url{https://validation.linaro.org/static/docs/v2/multinodeapi.html}}.
  324. This is the \textbf{LAVA} term to describe jobs that require multiple devices,
  325. such as some network-related jobs. Since the boards need to interact, they need
  326. to be synchronized, and \textbf{LAVA} provides some tools in the runtime
  327. environment to allow data exchanges between the devices, but as with classic
  328. threads or processes, this can quickly lead to some race conditions, deadlocks,
  329. or other interesting concurrency problematics.
  330. Once all those problems addressed, with the network tests running, a little
  331. presentation to the team was given, so that everyone would know the status
  332. of the custom Continuous Integration, and this also allowed to show them the
  333. architecture so that they could easily add new boards and tests in the future.
  334. \subsection{Adding some reporting}
  335. \label{sub:adding_some_reporting}
  336. With the two operating modes of \textbf{CTT}, came two modes of reporting: one
  337. for the manual tests, and the other for the daily ones.
  338. For the first and easy part, it was just about adding the correct \emph{notify}
  339. section to the jobs template, so that when an engineer sends a job manually to
  340. \textbf{LAVA}, his email address is included in the definition and he gets a
  341. message as soon as the job is finished, with some details about what worked and
  342. what failed.
  343. For the second part, the daily tests, the need was to aggregate the results of
  344. the past twenty-four hours into a single, personalized email. Indeed, each
  345. engineer can subscribe to some devices, and in order not to make the reporting
  346. too verbose, a script builds a specific email for every one, so that people only
  347. get the results they are interested in.
  348. \subsection{Integrating custom kernels}
  349. \label{sub:integrating_custom_kernels}
  350. \subsubsection{Goal}
  351. \label{ssub:goal}
  352. The next and last step toward fully customized CI tests, was building custom
  353. kernels. Just like \textbf{KernelCI} does every hour, the goal is to monitor a
  354. list of kernel trees, pull them, then build them with specific configurations,
  355. and store the artifacts online, so that \textbf{LAVA} could easily use them.
  356. Custom kernels really come in handy in two cases. When the engineers would like
  357. to follow a specific tree they work on, but this tree is not close enough to
  358. mainline and \textbf{KernelCI} does not track it, Free Electrons' builder would
  359. be in charge of it. The other useful case, is when a test requires custom kernel
  360. configuration, such as the activation of the hardware cryptographic modules,
  361. that are platform specific, thus not in the range of \textbf{KernelCI}'s builds.
  362. \subsubsection{Setting up a kernel builder}
  363. \label{ssub:setting_up_a_kernel_builder}
  364. Mainly based on \textbf{KernelCI}'s Jenkins scripts, but with some
  365. modifications to work in standalone, the builder\footnote{\url{https://github.com/free-electrons/kernel-builder}}
  366. is split in two parts: a first script that checks the trees and prepares
  367. tarballs of the sources when needed, and a second script that builds the
  368. prepared sources.
  369. In the overall CI architecture, the two scripts are called sequentially, just
  370. before launching \textbf{CTT} in automatic mode, so that the newly created
  371. kernel images can quickly be tested. Of course this required adding to
  372. \textbf{CTT} the requested logic to crawl either on \textbf{KernelCI}'s storage,
  373. or Free Electrons' one.
  374. \subsection{A full rework before the end}
  375. \label{sub:a_full_rework_before_the_end}
  376. Before the end of the internship, everything was fully operational, up and
  377. running, but a quite huge problem remained. Indeed, the whole code of
  378. \textbf{CTT} had been developed quickly, as a proof-of-concept, and even if the
  379. technological choices were not bad, the overall design of the software made it
  380. awful to maintain.
  381. The decision was taken, as about one month remained, to rework completely the
  382. tool, so that it would be easier in the future to add new features. The
  383. technical debt brought by the proof-of-concept design pattern would also be
  384. paid.
  385. One of the engineers had already taken time to rework small parts, but was
  386. keeping the internal API untouched when some functions or classes required to be
  387. split in multiple ones. More was needed, but still, he had quite a good vision
  388. of the tool's design, and greatly helped in its refactoring.
  389. This brought along the way many interesting side-effects: unit tests to almost
  390. all the newly created classes, flexible and modular design, simpler
  391. configuration files, better user-interface, improved safety regarding the
  392. crafted jobs, and a fully rewritten README.
  393. Despite not being originally planned in the main subject of the internship, this
  394. truly was an instructive part, since it was all about software design, and
  395. making choices that would help make the tool maintainable for the long term, and
  396. not something that would fall into oblivion in less than six months.