PAPERS & WRITING

Action Time Description & Notice
self-archive/publish 1 paper full-text online 2+ tomato 1h+, w/o code, GitHub + RG + Google scholar etc.
converting 1 paper (A4, 2col, 11pages) from MS word to Latex 13 tomato 1 ~ 1.5 day
reading 1 paper (qiqqa) 1 tomato average time for NEW papers, regardless of scanning or deep reading.
scanning 3 papers (qiqqa) 1 tomato judge the paper relative or not, keywords & key procedure & key conclusion, citation points.
deep reading 1 paper (qiqqa) 3 tomatoes at least 3.
self revision of draft > quick mark & correction (real pen) 2 tomatoes 19 pages (w/o bibliography).
self revision of draft > quick apply correction (latex) 4 tomatoes 1 more is needed if also correcting newly found small issues (eg. typo) during quick apply. 19 pages (w/o bibliography).
detailed self revision of draft > mark & correction both structure & language (real pen) 11 tomatoes 26 pages (w/o bibliography). get bored after 5 tomatoes. +7 tomatoes if also some content-wise stuff.
update of GPU draft result section due to exp update > mark & correct & compare with old results x tomatoes boring cuz no big difference, used a lot of time.
update of GPU draft analysis section due to exp results update > mark & correct 10 tomatoes 2 full-text pages. structure & language & compare new results with literature. (kind of rewrite).
SARIMA: math, code, debug and parallel 32 tomatoes fully 4 days; including 5 tomatoes for debugging.
SARIMA: parallel: try & debug 5 tomatoes first time to implement parallel program in R. Did in 3 different ways (doParallel+foreach; parallel+, eg.parApply; techila).
Techila* platform: usage basics, try examples, implement my solution & debug 19 tomatoes 1 tomato to install; 3 tomatoes to try different official examples and to decide solution:foreach(); 5 tomatoes to learn & implement my solution using foreach(); 9 tomatoes to debug Techila's own problems (eg. how to use lib, how to upload data in tricky way); have not tried to use Techila's own way to upload data.

*:
[Techila] Good doc, very easy to follow the manual/tutorial to start the official examples in Google Cloud Platform, but needs some time to make own solution to run.

EXPERIMENTS, VISUALIZATION & CODING

Action Time Description & Notice
learning Django MVC from a good video tutorial 18 tomatoes (est.) 2.5 hours video (ps) => toggl 11 hours. only learned basic; don't know why; notes in blog; (bg:already knew Zend MVC, don't know python.)
learning Django rest_framework from two videos 18 tomatoes (est.) 1 hour, 2 videos => toggl 10 hours; with non-clear explanation + only partial code, it took longer time to follow and understand; started to have feeling about Django and its rest API; notes in blog.
learning Model Form (inc. create/update/delete) 8 hours toggl inc. 1 hour to find the right tutorial videos.
learning, trying & comparing different ways of uploading file(s) 4 hours toggl django: function-based & class-based views.
change project to new IDE 6 tomatoes install jupyter & r-dependencies, setup mandatory options, try improve other options, ok to use, know basic shortcuts, but not so familar with new jupyter environment.

HOW SPARK (BASIC) TIME PLAN FAILED

In mid-January, I planned to learn Spark basics and deploy it on standalone mode & mesos within 4 weeks.

before

Before this time plan, I have spent 72 hours for hardware & system installation.

Summary:
27 hours to learn & make auto installation. (a better automatic method PXE can be learned from a teacher)
17 hours to learn X & enable remote X.
10 hours to organize hardware, such as organizing pc cases, tables & cables.

Big Data and Cloud Computing 72 hours
Requirement Meeting 00:30:25
Spark > Meeting 01:34:00
Spark > Enable Router & Remote 10:56:44
Spark > Hardware 09:28:53
Spark > Integrate Matlab 01:40:29
Spark > Mess 01:31:20
Spark > Plan 02:03:40
Spark > System 11:21:47
Spark > System > Auto Install 12:52:49
Spark > System > Debug 01:07:23
Spark > System > LVM Partitions 00:28:25
Spark > System > Server Terminal 03:00:00
Spark > System > X 07:13:22
Spark > Vmware 04:28:53
Spark > Vmware > Matlab Headless 01:15:03
Spark > Vmware > SSH key problem 02:45:09

original plan

Week 3: install base systems (ubuntu).
Week 4: hello world (word count) in virtual machines. (+ yarn, I thought yarn is mandatory)
Week 5: mesos + bind (self-hosted DNS).
Week 6: deploy on real machines.

results

Week 3: 40 hours on Spark.
11 hours for hardware (dirty cables).
23 (+5?) hours for setting up virtual environment inc. file sharing.
6.5 hours for surfing info.
Sometimes lacking of efficiency (e.g. 6.5 hours surfing).

Big Data and Cloud Computing 40 hours
Spark > DataBricks; Github Info and MOOC etc. 06:32:00
Spark > Dirty Cables 09:04:16
Spark > Hardware > Cables 02:05:02
Spark > New VirtualMachine > Remote Work @ Home 05:17:00
Spark > New VirtualMachine > VirtualBox 01:03:52
Spark > New VirtualMachine > Vmware Host-Guest Share Files 02:00:00
Spark > Puppet/Ansible 01:57:17
Spark > Share File 05:32:21
Spark > Windows for VirtualMachine 04:29:35

Week 4: nothing on Spark.
Week 5: 25 hours on Spark [totally 50 hours].
7 hours on hardware;
15 hours to learn SaclePy coding (& book)

Big Data and Cloud Computing 25 hours
Spark > Scale Py > Book Only (Accumulated Time) 03:00:00
Spark > Scale Py > Ch1 03:28:29
Spark > Scale Py > Ch2 02:01:14
Spark > Scale Py > Plan (Accumulated Time) 02:00:00
Spark > Hardware 07:04:18
Spark > Scale Py > Remote Jupyter 03:22:25
W > T's Friend Temp Computer > Reset 03:48:52

Week 6: nothing series on Spark. Tried 1 hour to active windows and failed.
Week 7: 10 hours to get the environment kind of ready.

Big Data and Cloud Computing 10 hours
Spark > Scale Py > DL 01:00:00
Spark > Hardware > Dirty Cables 01:33:34
Spark > Software and Hardware > Network 03:37:00
Spark > Software > Network 01:48:10
Spark > Software > Puppet 01:39:05
Spark > Software > System 30:00 min

Week 8: 7.5 hours to learn SaclePy coding & book.

Big Data and Cloud Computing 7.5 hours
Spark > Scale Py > Ch2 06:30:14
Spark > Scale Py > Ch2 > Book Only 20:18 min
Spark > Scale Py > Ch9 > Book Only 22:18 min

Week 9: 85 hours, and got Spark standalone running.
24 hours to learn SaclePy coding & book.
17 hours to set up Spark standalone.
10 hours to disable X to have more resource.

Big Data and Cloud Computing 85 hours
DS Webinar 01:15:00
Spark > Scale Py > Ch8 04:58:00
Spark > Scale Py > Ch8 > Book Only 01:47:00
Spark > 1st Performance Test > Prepare System 07:28:27
Spark > DNS 02:03:17
Spark > Hardware and Software > Network 01:13:56
Spark > Scale Py > Ch8 07:41:27
Spark > Scale Py > Ch9 06:03:05
Spark > Scale Py > Ch9 > Read Book and Debug of: convert features_header to list 02:14:07
Spark > Scale Py > Ch9 > Understanding 02:00:00
Spark > Self Evaluation, Review, Summary 59:43 min
Spark > Standalone in VirtualBox 07:51:24
Spark > System > Disable X 02:26:12
Spark > System > Disable X & Auto Install Spark 06:19:00
Spark > System > Disable X > Review Basic Linux 54:09 min

other time was used for

Week 3: 20 hours to write paper.
Week 4: 30 hours to revise paper. [totally 40 hours]. (Friday is Spring Festival).
Week 5: 10 hours to write paper, 10 hours TA. [totally 50 hours]. finally found the right book/material.
Week 6: 25 hours to write paper, 20 hours TA, [totally 45 hours]. 1 hour to active Spark windows (failed).
Week 7: 45 hours to write paper, 20 hours TA, 7 hours to welcome professor [totally 45 hours].
Week 8: 10 hours for ISP, 5 hours for department meeting, 17 hours to welcome professor 22 hours for QDA. [totally 60 hours].
Week 9: 5 hours to welcome professor, 5 hours to another city campus, 4 hours for QDA. [totally 73 hours].

analysis of problems (reasons for the delay)

Totally 168 hours was spent until Spark standalone is running.

  1. Did not estimate the time in the right way. For key tasks, only 8 hours is needed to setup standalone (virtual & real PCs), with 48 hours reading & coding Scale Py book. Learn: 48 hours + Final setup: 8 hours.
  2. 12.5 hours on dirty cables is not necessary. (at least, due to 7 hours for "hardware" is not clear and not included).
  3. 30 hours on automatic installation can be reduced, (at least half? = 15 hours). Thus, 168 hours becomes 140 hours (15% time saved).
  4. In addition, if I knew Matlab can run without screen, everything related with X can be discared (27 hours). Therefore, 168 can be reduced to ≈120 and 1/3 of time can be saved in total.

solutions

  1. More detailed plan (in hours): (learn theory & coding, according to a practial book) + final setup; the same time for surrending work (app running env & sys); add 1/3 for extra unexpected wasting. Youtube video watching is not included.
  2. Ask for help and give the rights. 放权,不要以为所有事情都是自己做得最好,团队合作才能 1+1 >> 2, 才能合理分配时间。
  3. Ask experts & teachers for help. It is not manadatory to learn everything, even if I want to learn details, experts usually provide better methods.
  4. It is kind of for sure that Matlab can run without screen, I should confirm it first and develop the system with some limitations (no screen) to cut unnecessary time. If the stakeholders do want it later, I can spend extra time to develop this feature.

PS

  • The paper reading time is for qiqqa reading & noting.
  • Sentdex's video is 3 hours, but 2.5 hours without server & ssl part.
  • est. :+ estimated, when lacking of tomato log.
  • the tomatoes in tables are shown standard tomatoes (1 tomato = 25min + 5min), if the actual recored time is not 25min, it will be translated to standarded values.