Analytics/Server Admin Log/Archive/2015
Appearance
2015-12-30
[edit]- 15:23 ottomata: killing oozie legacy_tsv job 0102159-150605005438095-oozie-oozi-B to restart it without mobile, 5xx-mobile and zero outputs
2015-11-10
[edit]- 03:14 ottomata: restarted eventlogging
2015-11-09
[edit]- 14:40 ottomata: restarting eventlogging to see if it is ok after enabling firewall rules on kafka1014
2015-11-06
[edit]- 15:51 joal: Change replication factor to 2 in cassandra per_article_flat keyspace
- 15:47 ottomata: deploying aqs
2015-11-05
[edit]- 18:24 ottomata: deploying aqs
2015-10-29
[edit]- 10:35 joal: Gzipped already archived pageview files
- 10:34 joal: restarted pageview job to archive gzipped files
- 10:34 joal: refinery deployed
2015-10-28
[edit]- 19:16 joal: Downsizing cassandra replication from 3 to to 2 on per_article_flat keyspace
- 19:07 joal: Restart load job (based on IMPORTED flag)
- 15:48 joal: Deploying refinery
- 15:40 joal: deploying refinery-source v0.0.22
2015-10-27
[edit]- 19:06 ottomata: deploying aqs
- 18:24 joal: deploying refinery
- 16:46 joal: Releasing refinery-source v0.0.21
- 10:34 joal: manual aggregator launch after small bug correction
2015-10-26
[edit]- 18:42 joal: refine bundle, pageview_hourly and projectview_hourly coord restarted
- 18:41 joal: refinery deployed on HDFS
- 14:33 joal: truncating "local_group_default_T_pageviews_per_article".data on aqs
- 09:58 joal: Restart cassandra on aqs1001
2015-10-22
[edit]- 20:24 ottomata: deploying aqs
- 09:51 joal: restart cassandra on aqs1003
2015-10-21
[edit]- 22:53 milimetric: deployed EventLogging and tried to backfill data lost on 2015.10.14 but failed
- 18:24 joal: Stopped per article loading in cassandra
- 13:39 ottomata: deploying aqs
2015-10-20
[edit]- 10:12 joal: restart cassandra on aqs1002
2015-10-19
[edit]- 18:35 ottomata: restarting eventlogging with change to parse schema names out of errored events
2015-10-16
[edit]- 20:38 joal: restarted cassandra on aqs100[1,2,3]
2015-10-15
[edit]- 12:17 joal: Refinery deploy needed before restart --> Deploying
- 12:12 joal: Restarting daily and monthly mobile unique coordinators with new patch
- 12:12 joal: Rerunning daily mobile unique jobs for days 2015-08-[03,04,11,12,12,14,17], 2015-09-16
- 12:10 joal: Stopped daily and monthly mobile unique coordinators
2015-10-14
[edit]- 15:22 ottomata: restarting lagging eventlogging mysql consumer
2015-10-09
[edit]- 19:26 ottomata: releasing refinery 0.20
- 15:19 ottomata: moved camus property files out of refinery repository and into puppet. Camus properties now live on an27 at /etc/camus.d, and camus log files are in /var/log/camus
- 14:54 joal: Cassandra restarted on aqs1003
- 09:15 joal: Restart cassandra on aqs1002
2015-10-08
[edit]- 17:38 joal: Backfilling load from hadoop to cassandra from beginning of october
2015-10-07
[edit]- 16:32 joal: Started cassandra load jobs from 2015-10-01
2015-10-01
[edit]- 16:27 valhallasw`cloud: testing again
- 16:13 valhallasw`cloud: test
2015-09-29
[edit]- 10:51 joal: cluster back to normql state. Some errors are still not explained, need to be carefull.
2015-09-28
[edit]- 14:56 joal: backfilling various load jobs having failed at earlier stages than check_sequence_statistics
- 13:03 joal: Errors on cluster, dome refine jobs have failed, investigating.
2015-08-19
[edit]- 18:20 ottomata: does this log work?
March 25
[edit]- 22:09 qchris: starting HDFS balance for unhealty node analytics1016.eqiad.wmnet with healty nodes analytics1037.eqiad.wmnet,analytics1040.eqiad.wmnet
February 25
[edit]- 16:07 ottomata: hello?
February 7
[edit]- 02:10 qchris: Ran kafka leader re-election as analytics1021 dropped out of it's partition leader role.
- 01:32 qchris: name nodes died with error "Java heap space" and did not come back up. Bumping heap allowed to resurrect them (See task T88871).
February 4
[edit]- 23:22 qchris: Manual failover of Hadoop namenode from analytics1001 to analytics1002, as analytics1001 had Heap space errors
- 07:49 qchris: Manual failover of Hadoop namenode from analytics1002 to analytics1001, as analytics1002 had Heap space errors
January 30
[edit]- 20:21 ottomata: deployed refinery 0.0.4
- 19:37 ottomata: released refinery 0.0.4
January 25
[edit]- 21:53 qchris: Marked raw text webrequest partition for 2015-01-24T00/1H ok (See task T87545)
January 23
[edit]- 22:46 qchris: Marked raw upload webrequest partition for 2015-01-16T12/1H ok (The partition only needed deduping)
- 22:23 qchris: Marked raw upload webrequest partition for 2015-01-16T01/1H ok (The partition only needed deduping)
- 22:11 qchris: Marked raw upload webrequest partition for 2015-01-15T17/1H ok (The partition only needed deduping)
- 22:04 qchris: Marked raw text webrequest partition for 2015-01-15T15/1H ok (The partition only needed deduping)
- 22:01 qchris: Marked raw mobile webrequest partition for 2015-01-16T01/1H ok (The partition only needed deduping)
January 15
[edit]- 08:25 qchris: Ran kafka leader re-election to bring analytics1021 back into the set of leaders
January 10
[edit]- 16:55 qchris: Dropped wmf.webstats tables, as announced on https://lists.wikimedia.org/pipermail/analytics/2015-January/003019.html
January 6
[edit]- 12:15 qchris: Marked raw mobile+text webrequest partitions for 2015-01-05T17/1H ok (See task T85918)
January 4
[edit]- 12:06 qchris: Marked raw mobile and upload webrequest partition for 2015-01-03T10/1H ok (See task T85758)
January 2
[edit]- 21:21 qchris: Ran kafka leader re-election to bring analytics1021 back into the set of leaders
- 21:07 qchris: Marked raw bits, text, and upload webrequest partition for 2014-12-11T14/1H ok (See task T85712)
- 19:05 qchris: Marked raw text+upload webrequest partitions for 2014-12-26T06/1H ok (See task T85709)
- 15:51 qchris: Marked raw text webrequest partition for 2014-12-11T20/1H ok (See task T85699)
- 12:39 qchris: Marked raw mobile webrequest partition for 2014-12-29T17/1H ok (See task T85695)
- 11:21 qchris: Marked raw text webrequest partition for 2014-12-30T20/1H ok (See task T85692)
January 1
[edit]- 20:26 qchris: Marked raw webrequest partitions for 2014-12-10T14/2H ok (See task T85675)