Indeed, some sectors finish plotting successfully, then some chunks are received and written to disk and… nothing, it just hangs in there. I’d suspect the disk has hanged, but no, farming and plotting of other sector worked fine (even after one of them hanged on writes).
In fact there was a slow down on disk due to sector being written and some slots were skipped completely (Slow farming, skipping slot
), but then it normalized again, while sector 1653
just got stuck and never made further progress after that, while sector 1466
finished plotting later and was successfully written to disk:
2024-07-08T09:24:44.714548Z TRACE {farm_index=36}:{sector_index=1653}: subspace_farmer::cluster::plotter: Processing plotting response notification (sector chunk)
2024-07-08T09:24:44.806791Z TRACE {farm_index=36}:{sector_index=1653}: subspace_farmer::cluster::plotter: Processing plotting response notification (sector chunk)
2024-07-08T09:24:45.045560Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584784, global_challenge: [118, 135, 213, 4, 234, 161, 90, 167, 207, 156, 200, 83, 109, 185, 51, 6, 151, 123, 143, 90, 70, 79, 10, 151, 29, 227, 103, 18, 108, 85, 111, 117], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:45.046970Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Reading sectors slot=13584784 sector_count=1813
2024-07-08T09:24:45.142752Z TRACE {farm_index=36}:{sector_index=1653}: subspace_farmer::cluster::plotter: Processing plotting response notification (sector chunk)
2024-07-08T09:24:46.094130Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584785, global_challenge: [206, 180, 67, 103, 48, 240, 52, 157, 195, 27, 31, 73, 125, 123, 146, 11, 102, 253, 9, 125, 91, 235, 114, 195, 44, 249, 203, 86, 80, 220, 159, 27], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:47.105552Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584786, global_challenge: [101, 62, 14, 176, 75, 115, 80, 178, 16, 157, 96, 187, 189, 113, 130, 98, 12, 18, 27, 91, 16, 73, 35, 102, 154, 54, 238, 211, 140, 84, 144, 225], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:47.105556Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584786
2024-07-08T09:24:48.052897Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584787, global_challenge: [192, 68, 225, 144, 135, 73, 124, 132, 225, 215, 143, 209, 91, 154, 199, 180, 130, 167, 211, 242, 35, 141, 32, 191, 53, 179, 136, 109, 47, 227, 209, 110], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:48.052902Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584787
2024-07-08T09:24:49.048287Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584788, global_challenge: [215, 156, 119, 163, 195, 159, 237, 121, 22, 168, 247, 10, 156, 60, 69, 79, 50, 136, 50, 151, 101, 178, 115, 223, 191, 67, 155, 197, 158, 94, 203, 89], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:49.048291Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584788
2024-07-08T09:24:50.066853Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584789, global_challenge: [44, 59, 187, 117, 132, 15, 155, 193, 229, 70, 245, 182, 46, 97, 164, 243, 37, 83, 176, 211, 72, 24, 72, 178, 13, 241, 98, 118, 100, 174, 129, 36], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:50.066857Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584789
2024-07-08T09:24:51.082080Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584790, global_challenge: [249, 78, 120, 21, 156, 154, 169, 187, 59, 3, 48, 252, 206, 157, 250, 135, 24, 38, 174, 107, 46, 142, 154, 134, 144, 246, 184, 85, 185, 89, 249, 109], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:51.082083Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584790
2024-07-08T09:24:52.032467Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584791, global_challenge: [81, 119, 67, 215, 154, 193, 62, 133, 98, 60, 252, 55, 190, 89, 202, 202, 167, 155, 168, 155, 75, 205, 52, 145, 141, 145, 114, 39, 134, 33, 178, 207], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:52.032471Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584791
2024-07-08T09:24:53.029417Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584792, global_challenge: [238, 48, 69, 150, 209, 180, 123, 22, 205, 23, 19, 195, 239, 83, 223, 211, 239, 19, 31, 58, 29, 2, 71, 108, 230, 206, 46, 145, 149, 149, 128, 174], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:53.029421Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584792
2024-07-08T09:24:54.056406Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584793, global_challenge: [73, 35, 197, 246, 248, 172, 251, 114, 221, 195, 179, 168, 219, 91, 215, 218, 38, 53, 157, 99, 164, 7, 249, 163, 73, 146, 4, 98, 37, 9, 62, 222], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:54.056410Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584793
2024-07-08T09:24:55.044074Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584794, global_challenge: [215, 111, 193, 144, 121, 71, 8, 1, 10, 1, 79, 62, 6, 19, 242, 70, 186, 43, 168, 211, 168, 148, 195, 186, 228, 94, 23, 140, 199, 218, 117, 220], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:55.044089Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584794
2024-07-08T09:24:56.059822Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584795, global_challenge: [96, 49, 180, 28, 192, 173, 253, 170, 233, 48, 174, 21, 2, 55, 67, 176, 184, 47, 155, 101, 248, 92, 116, 76, 110, 128, 85, 15, 82, 117, 27, 30], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:56.059826Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584795
2024-07-08T09:24:57.082102Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584796, global_challenge: [86, 162, 216, 104, 16, 188, 65, 153, 59, 243, 48, 144, 105, 147, 151, 240, 106, 164, 164, 137, 214, 242, 21, 11, 24, 39, 187, 134, 199, 32, 118, 142], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:57.082106Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584796
2024-07-08T09:24:58.058746Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584797, global_challenge: [149, 148, 245, 153, 150, 113, 69, 128, 187, 176, 64, 113, 113, 47, 88, 66, 6, 82, 115, 173, 155, 145, 251, 96, 245, 254, 3, 209, 147, 140, 129, 169], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:58.058751Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584797
2024-07-08T09:24:59.056199Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584798, global_challenge: [42, 68, 96, 235, 38, 180, 7, 239, 232, 109, 175, 70, 122, 34, 93, 107, 164, 173, 225, 61, 112, 191, 136, 54, 231, 113, 123, 57, 91, 99, 228, 74], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:24:59.056205Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584798
2024-07-08T09:24:59.468734Z TRACE {farm_index=36}:{sector_index=1466}: subspace_farmer::cluster::plotter: Processing plotting response notification response=Ping
2024-07-08T09:25:00.112820Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584799, global_challenge: [130, 25, 165, 201, 157, 179, 126, 175, 105, 157, 65, 157, 130, 214, 56, 249, 158, 134, 185, 172, 178, 28, 62, 97, 41, 127, 202, 78, 145, 94, 164, 28], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:25:00.112847Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584799
2024-07-08T09:25:00.443839Z TRACE {farm_index=36}:{sector_index=1466}: subspace_farmer::cluster::plotter: Processing plotting response notification response=Ping
2024-07-08T09:25:01.063193Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584800, global_challenge: [186, 176, 90, 114, 71, 67, 138, 253, 112, 144, 28, 10, 7, 95, 162, 104, 54, 136, 243, 225, 26, 36, 152, 61, 69, 51, 171, 92, 13, 235, 45, 94], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:25:01.063198Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584800
2024-07-08T09:25:02.081086Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584801, global_challenge: [4, 186, 20, 66, 6, 248, 117, 38, 132, 201, 237, 31, 6, 161, 151, 216, 168, 139, 79, 148, 219, 125, 125, 86, 188, 229, 93, 185, 205, 43, 16, 195], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:25:02.081091Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Slow farming, skipping slot slot=13584801
2024-07-08T09:25:02.833868Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Reading sectors slot=13584785 sector_count=1813
2024-07-08T09:25:03.051899Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584802, global_challenge: [228, 60, 53, 178, 198, 228, 231, 8, 172, 180, 76, 176, 234, 108, 121, 34, 6, 42, 51, 253, 163, 9, 244, 108, 226, 164, 57, 69, 93, 28, 39, 102], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:25:03.051946Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Reading sectors slot=13584802 sector_count=1813
2024-07-08T09:25:04.059804Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: New slot slot_info=SlotInfo { slot_number: 13584803, global_challenge: [66, 10, 106, 222, 30, 178, 29, 166, 99, 57, 71, 76, 176, 8, 206, 38, 104, 212, 148, 121, 144, 252, 65, 116, 71, 20, 253, 130, 145, 218, 18, 123], solution_range: 42072681, voting_solution_range: 420726810 }
2024-07-08T09:25:04.059829Z DEBUG {farm_index=36}: subspace_farmer::single_disk_farm::farming: Reading sectors slot=13584803 sector_count=1813
I have reported "malloc_consolidate(): unaligned fastbin chunk detected" when attempting to block single-threaded tokio runtime · Issue #2863 · rust-lang/futures-rs · GitHub before and we do use FuturesOrdered
/FuturesUnordered
heavily, so if there is some kind of memory safety issue there it may potentially cause this, though I’m having a hard time getting to the source of it and the fact that it is the reason here is fairly low.
I have added two low-level trace messages in farmer code in Snapshot build · subspace/subspace@27b377e · GitHub, it’ll hopefully help me narrow down whether it is tokio’s timer that is not working when NATS communication with plotter breaks or spawning blocking task that writes plotted sector to disk or something else.
I strongly suspect it is not our code to blame, but we’ll see.